METHODS FOR EXPLORATORY ASSESSMENT OFCONSENT-TO-LINK IN A HOUSEHOLD SURVEY
DANIEL YANGSCOTT FRICKERdagger
JOHN ELTINGE
There is increasing interest in linking survey data to administrativerecords to reduce respondent burden and enhance the amount and qualityof information available on sample respondents In many cases legalconstraints or societal norms require survey organizations to obtain in-formed consent from sample units before linking survey responses withadministrative data Guiding such efforts is a growing empirical litera-ture examining factors that impact respondentsrsquo consent decisions andthe success of linkage attempts as well as evaluations of potential differ-ences between consenting and non-consenting respondents This paperoutlines a range of options that statistical organizations can consider forevaluation and testing of linked datasets We apply methods for assess-ing consent propensity and consent bias to data from the US ConsumerExpenditure Survey and investigate the impacts of demographic socio-economic and attitudinal variables on respondentsrsquo consent-to-link pro-pensities We then analyze potential consent-to-link biases in mean andquantile estimates of several economic variables by comparing differentpropensity-adjusted and unadjusted estimates and by comparing
Address correspondence to Daniel Yang US Bureau of Labor Statistics Office of SurveyMethods Research Washington DC USA E-mails yangdanielblsgov
DANIEL YANG is a Research Mathematical Statistician Office of Survey Methods Research USBureau of Labor Statistics Washington DC USA SCOTT FRICKER is the Senior ResearchPsychologist Office of Survey Methods Research US Bureau of Labor Statistics WashingtonDC USA JOHN ELTINGE is the Assistant Director for Research and Methodology US CensusBureau Suitland MD USA
The authors thank Steve Henderson Brandon Kopp and Jay Ryan for many helpful discussionsof the Consumer Expenditure Survey data and Joe Sakshaug for valuable comments on an earlierversion of this paper which was presented at the 2015 Joint Statistical Meetings Initial design andplacement of the ldquoconsent-to-linkrdquo question analyzed in this paper was carried out by Davis et al(2013) The views expressed in this paper are those of the authors and do not necessarily reflectthe policies of the US Bureau of Labor Statistics or the US Census Bureau
daggerDeceased
doi 101093jssamsmx031 Advance access publication 7 December 2017Published by Oxford University Press on behalf of the American Association for Public Opinion Research 2017This work is written by US Government employees and is in the public domain in the US
Journal of Survey Statistics and Methodology (2019) 7 118ndash155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
estimates from consenting and non-consenting respondents We contrastseveral estimation approaches and discuss implications of our findingsfor consent-propensity assessments and for approaches to minimize risksof consent-to-link bias
KEYWORDS Administrative Record Data Burden ReductionIncomplete Data Informed Consent Propensity Model
1 INTRODUCTION
Survey organizations face many challenges in their efforts to produce high-quality survey data The costs of data collection and the demand for data prod-ucts are greater than ever and survey budgets often are under serious strain tomeet these demands Declining survey response rates further complicate costand data-quality considerations Given these challenges survey organizationsincreasingly are exploring the possibility of linking survey data to administra-tive records Combining survey and administrative data on the same sampleunit has the potential to reduce the cost length and perceived burden of asurvey enrich our understanding of the underlying substantive phenomenaand offer a mechanism for targeted assessments of survey error components
Linking survey data to administrative data sources on the same individual orhousehold requires matching records from one dataset to the other The effi-ciency and success of this matching process depends on the variables and link-age strategy used to establish the link Exact matching techniques are mostsuccessful when unique identifying information such as a social securitynumber (SSN) is available but these techniques can also be effective in the ab-sence of unique identifiers when combinations of other personal variables arecompared (eg last name date of birth street name) (see Herzog Scheurenand Winkler 2007 for a review of statistical linkage techniques and relateddata-cleaning issues) Before any linkage attempt can be made however mostcountries require that survey respondents give their informed consent to linkand consent rates can vary considerablymdashfrom as low as 19 to as high as96 (Sakshaug Couper Ofstedal and Weir 2012) Lower consent rates arepotentially a major challenge to wider adoption of record-linkage in statisticalagencies because they increase the risk of bias in estimates derived from com-bined data (to the extent that there are systematic differences in key outcomemeasures between those who consent to link and those who do not)
As interest in and adoption of record-linkage methods have increased sotoo have investigations into factors associated with respondentsrsquo consent deci-sions and their potential impact on consent bias In general consent-to-linkphenomena can be viewed as a type of incomplete-data problem and thus canmake use of the broad spectrum of conceptual and methodological tools thathave developed for work with incomplete data Examples include assessmentof cognitive and social processes that lead to survey response (or consent to
Exploratory Assessment of Consent-to-Link 119
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
link) modeling and diagnostic tools for estimation and evaluation of relatedpropensity models and empirical assessment of biases resulting from theincomplete-data pattern
To date direct examinations of consent bias in estimates derived from com-bined data are extremely rare in the literature because they require researcheraccess to administrative records for both consenters and non-consenters (cfSakshaug and Kreuter 2012 Kreuter Sakshaug and Tourangeau 2016) Moststudies of linkage consent therefore have used data from survey respondentsmdashavailable for both consenters and non-consentersmdashto identify characteristicscorrelated with consent propensity These studies provide an indirect means ofassessing the potential risk of consent bias if the underlying consent propensi-ties are related to differences in respondentsrsquo administrative record profilesFindings from this literature indicate that linkage consent is often associatedwith respondent demographics (eg age education income) indicators of sur-vey reluctance (eg prior nonresponse in a panel survey) and features of thesurvey (eg wording placement and timing of consent requests) but the mag-nitude and direction of these effects vary across studies (Bates 2005Dahlhamer and Cox 2007 Sala Burton and Knies 2012 Sakshaug Tutz andKreuter 2013) Only very recently have researchers started to develop theory-based hypotheses about the mechanisms of consent decisions and to incorpo-rate more sophisticated analytic approaches to test these hypotheses (Sala et al2012 Sakshaug et al 2012 Mostafa 2015)
Finally empirical assessment of consent-to-link propensity patterns and re-lated potential consent biases naturally involve a complex set of trade-offs in-volving (a) the degree to which a given set of test conditions are relevant tocurrent or prospective production conditions (b) the ability to control applica-ble design factors and to measure relevant covariates within the context ofthose current production conditions (c) the ability to measure and model spe-cific portions of the complex processes that lead to respondent consent and co-operation in a given setting and (d) constraints on resources including boththe direct costs of testing consent-to-link options and the indirect costs arisingfrom the potential impact of testing on current survey production The remain-der of this paper considers some aspects of issues (a)ndash(d) with emphasis on ex-ploratory analyses for one case study that was embedded within a currentsurvey production process
In the next section we review the literature on consent decisions and con-sent bias In section 3 we describe in detail a specific consent-to-link case in-volving the US Consumer Expenditure Survey (CEQ) and then presentconsent propensity models and our evaluation methodology The results of ourdescriptive and multivariate examinations of consent propensity and assess-ments of consent bias are presented in section 4 We summarize and discussthe implications of our findings in section 5 Appendix A provides detaileddescriptions of the analytic variables Appendix B presents technical results onthe variability of weights used in the CEQ dataset Appendix C discusses some
120 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
features of the goodness-of-fit tests used for the models considered in section4 In addition an online Appendix D (Supplementary Materials) presents somerelated conceptual material and numerical results for hypothesis testing that ledto the modeling results summarized in section 4
2 LITERATURE REVIEW
21 Factors Affecting Linkage Consent
The earliest investigations of consent-to-linkage effects were mostly conductedin epidemiology and health studies that requested patientsrsquo consent to accesstheir medical records (Woolf Rothemich Johnson and Marsland 2000 DunnJordan Lacey Shapley and Jinks 2004 Kho et al 2009) but more recentstudies have assessed consent in general population surveys with linkagerequests to an array of administrative data sources (Knies Burton and Sala2012 Sala et al 2012 Sakshaug et al 2012 2013) In this section we summa-rize findings from this research on the factors that affect consent to datalinkage
211 Respondent demographics Linkage consent studies largely have fo-cused on respondentsrsquo sociodemographic characteristics most often age gen-der ethnicity education and income (Kho et al 2009 Fulton 2012) Thesevariables are widely available across surveys and although they are unlikely tohave direct causal impact on most consent decisions they provide indirectmeasures of psychosociological factors that may influence those choicesDemographic differences between consenters and non-consenters are commonbut the patterns of findings differ across studies For example older individualsfrequently have been found to be less likely to consent to record linkage thanyounger people (Dunn et al 2004 Bates 2005 Dahlhamer and Cox 2007Huang Shih Chang and Chou 2007 Pascale 2011 Sala et al 2012 AlBaghal Knies and Burton 2014) But some studies have found the opposite ef-fect (Woolf et al 2000 Beebe et al 2011) or no age effect (Kho et al 2009)Males often consent at higher rates than females (Dunn et al 2004 Bates2005 Woolf et al 2000 Sala et al 2012 Al Baghal et al 2014) but somestudies find no gender effect (Pascale 2011 Sakshaug et al 2012) Consentpropensities for ethnic minorities and non-citizens tend to be lower than formajority groups and citizens (Woolf et al 2000 Beebe et al 2011 Al Baghalet al 2014 Mostafa 2015) although not all studies show these effects (Bates2005 Kho et al 2009) Similar inconsistencies are evident across studies forthe effects of education and income on consent (Kho et al 2009 Fulton 2012Sakshaug et al 2012 Sala et al 2012) Other respondent demographic andhousehold characteristics have been examined less frequently (eg marital sta-tus employment status household size and ownerrenter) again with mixed
Exploratory Assessment of Consent-to-Link 121
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
findings (Olson 1999 Jenkins Cappellari Lynn Jackle and Sala 2006 AlBaghal et al 2014 Mostafa 2015)
212 Respondent attitudes Attitudes can have a powerful impact on thoughtand behavior and there is a long history of survey researchers attempting tomeasure respondentsrsquo attitudes and their impact on various survey outcomes(eg Goyder 1986) Particular attention has been given to respondentsrsquo atti-tudes about privacy and confidentiality Research conducted by the US CensusBureau going back to the 1990s demonstrates that concerns about personal pri-vacy and data confidentiality have increased in the general public and thatthese attitudes are associated with lower participation rates in the decennialcensus (Singer 1993 2003) and more negative attitudes toward the use of ad-ministrative records (Singer Bates and Van Hoewyk 2011) Privacy and con-fidentiality concerns can influence record linkage consent as well Both directmeasures of privacy concerns (respondent self-reports) and indirect indicators(item refusals on financial questions) have been shown to be negatively associ-ated with consent (Sakshaug et al 2012 Sala Knies and Burton 2014Mostafa 2015) Similarly Sakshaug et al (2012) demonstrated that the moreconfidentiality-related concerns respondents expressed to interviewers in a pre-vious survey wave the less likely they were to subsequently consent to datalinkage And Sala et al (2014) found that concern about data confidentialitywas the most frequent reason given by respondents who declined a linkage re-quest There also is evidence that trust (in other people in government) andcivic engagement (volunteering political involvement) are positively related toconsent (Sala et al 2012 Al Baghal et al 2014)
213 Saliency Respondentsrsquo interest in topics related to the record requestor their experiences with organizations that house those records can also affectconsent decisions For example a number of studies have found that respond-ents have a higher propensity to accept medical consent requests when they arein poorer health or have symptoms germane to the survey subject (Woolf et al2000 Dunn et al 2004 Dahlhamer and Cox 2007 Beebe et al 2011) One ex-planation for this finding is that consent requests on topics salient to respond-ents enhance the perceived benefits of record linkage (eg morecomprehensive medical evaluation or the general advancement of knowledgeabout a disease relevant to the respondent) or reduce the perceived risks(eg by inducing more extensive cognitive processing of the request) (GrovesSinger and Corning 2000) In addition to topic saliency respondentsrsquo existingrelationships with government agencies also can play a role in their consentdecisions Studies by Sala et al (2012) Sakshaug et al (2012) and Mostafa(2015) for example found that individuals who received government benefits(eg welfare food stamps veteransrsquo benefits) were more likely to consent toeconomic data linkage than those who did not These results again suggest that
122 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the salience of (and attitudes toward) service-providing government agenciesmay make some respondents more amenable to linkage requests involvingthose agencies
214 Socio-environmental features The respondentsrsquo environments help toshape the context in which consent decisions are made and a handful of stud-ies have examined associations between area characteristics and attitudes to-ward use of administrative records and consent decisions For example Singeret al (2011) found that individuals living in the South and Mid-Atlanticregions of the country had more favorable attitudes about administrative recorduse by the US Census Bureau than those living in other regions of the countryStudies of actual consent and linkage rates have demonstrated regional varia-tions as well with higher rates in the South and Midwest and lower rates inparts of the Northeast (eg Olson 1999 Dahlhamer and Cox 2007) Consentrates also can vary by urban status Consistent with urbanicity effects seen inthe literature on survey participation and pro-social behavior (Groves andCouper 1998 Mattis Hammond Grayman Bonacci Brennan et al 2009)respondents living in urban areas have been found to be less likely to consentthan those living in non-urban areas (cf Jenkins et al 2006 Dahlhamer andCox 2007 Al Baghal et al 2014 who show a marginally significant positiveeffect for urbanicity) Together such area effects may indicate the influence ofunderlying ecological factors within those communities (eg differences inpopulation density crime social engagement) but may also reflect differencesin survey operations (eg in staff protocol and training) clustered withinthose geographic areas A recent study by Mostafa (2015) found area charac-teristics by themselves added little explanatory power to models of consentpropensity suggesting that respondent and interview characteristics may bemore important factors
215 Interviewer characteristics Interviewer attributes and behaviors canhave significant impact on survey participation and data quality(OrsquoMuircheartaigh and Campanelli 1999 West Kreuter and Jaenichen 2013)including linkage consent decisions Studies investigating the impact of inter-viewer demographics generally find that they are unrelated to the consent out-come (Sakshaug et al 2012 Sala et al 2012) although there is some evidenceof a positive effect of interviewer age on consent (Krobmacher and Schroeder2013 Al Baghal et al 2014) Interviewer experience has shown mixed effectsThe amount of time spent working as an interviewer overall (ie job tenure) iseither unrelated to consent (Sakshaug et al 2012 Sala et al 2012) or can actu-ally have a small negative impact (Sakshaug et al 2013 Al Baghal et al2014) Interviewersrsquo survey-specific experience as measured by the number ofinterviews already completed prior to the current consent request shows simi-lar effects (Sakshaug et al 2012 Sala et al 2012) One aspect of interviewer
Exploratory Assessment of Consent-to-Link 123
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
experience that is positively related to consent in these studies is past perfor-mance in gaining respondent consent Sala et al (2012) and Sakshaug et al(2012) found that the likelihood of consent increased with the number of con-sents obtained earlier in the field period These authors also attempted to iden-tify interviewer personality traits and attitudes that could affect respondentconsent decisions but largely failed to find significant effects The one excep-tion was that respondent consent was positively related to interviewersrsquo ownwillingness to consent to linkage (Sakshaug et al 2012)
216 Survey design features The way in which the consent requests are ad-ministered can impact linkage consent Consent rates appear to be higher inface-to-face surveys than in phone surveys though there are relatively fewmode studies that have examined this phenomenon (Fulton 2012) Consentquestions that ask respondents to provide personal identifiers (eg full or par-tial SSNs) as matching variables produce lower consent rates than those thatdo not (Bates 2005) This finding and advancements in statistical matchingtechniques prompted the Census Bureau to change its approach to gaining link-age consent in 2006 and it has since adopted a passive opt-out consent proce-dure in which respondents are informed of the intent to link and consent isassumed unless respondents explicitly object (McNabb Timmons Song andPuckett 2009) These implicit consent procedures (as they are sometimescalled) result in higher consent rates than opt-in approaches where respondentsmust affirmatively state their consent (Bates 2005 Pascale 2011) See also Dasand Couper (2014)
Since most surveys employ opt-in formats researchers have focused on thepotential effects of the wording or framing of these questions Consent framingexperiments vary factors mentioned in the request that are thought to be per-suasive to respondents for example highlighting the quality benefits associ-ated with linkage the reduction in survey collection costs or the time savingsfor respondents Evidence of framing effects in these studies is surprisinglyweak however Bates Wroblewski and Pascale (2012) found that respondentsreported more positive attitudes toward record linkage under cost- and time-savings frames but the study did not measure actual linkage consent propensi-ties In Sakshaug and Kreuter (2014) a time-savings frame produced higherconsent rates for web survey respondents than a neutrally worded consentquestion but this is the only study in the literature to find significant question-framing effects (Pascale 2011 Sakshaug et al 2013)
The timing of consent requests appears to have some influence on likelihoodof consent Although it is common practice to delay asking the most sensitiveitems like linkage-consent requests until near the end of the questionnaire re-cent empirical evidence indicates that this may not be optimal Sakshaug et al(2013) found that respondents were more amenable to consent requests admin-istered at the beginning of the survey than at the end and suggest that the
124 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
proximity of the survey and linkage requests may reinforce respondentsrsquo desireor inclination to be consistent (ie agree to both) This result could inform po-tentially promising adaptive design interventions (eg asking for consent earlyin the interview then skipping subsequent burdensome questions for consent-ers) Sala et al (2014) obtained higher consent rates when the request wasasked immediately following a series of questions on a related topic rather thanwaiting until the end of the survey The authors reasoned that contextual place-ment of the linkage question increases the salience of the request and inducesmore careful consideration by respondents Both explanations find some sup-port in the broader psychological literature on compliance cognitive disso-nance and context effects but further research is needed to evaluate these andother mechanisms (eg survey fatigue) underlying consent placement effects(Sala et al 2014)
22 Analytic Approaches to Assessing Consent Bias
Early linkage consent studies simply looked for evidence of sample bias (iedifferences in sample composition for consenters and non-consenters) Mostrecent studies use logistic regression models to identify factors that influenceconsent and infer potential consent bias (eg differential consent to medical-records linkage by respondent health status) Several studies have employedmulti-level models to assess the impact of interviewers on consent propensity(eg Fulton 2012 Sala et al 2012 Mostafa 2015) and others have jointlymodeled respondentsrsquo consent propensities on multiple consent items in agiven survey (e g Mostafa 2015) Studies that have examined direct estimatesof consent bias using administrative records available for both consenters andnon-consenters are much less common in the literature Recent research bySakshaug and his colleagues are notable exceptions (Sakshaug and Kreuter2012 Sakshaug and Huber 2016) Using administrative data linked to Germanpanel survey data they have compared the magnitude of consent biases to biasestimates for other error sources (nonresponse measurement) and the longitu-dinal changes in these biases
Given evidence of potential consent bias (eg differential consent by spe-cific demographic groups) one promising approach that has not yet been ex-plored in this literature is to adopt propensity weighting methods that arewidely used in nonresponse adjustment Traditionally propensity weightednonresponse adjustments are accomplished by modeling response propensitiesusing logistic regression and auxiliary data available for both respondents andnonrespondents and then using the inverse of the modeled propensity as aweight adjustment factor (Little 1986) If the predicted propensity is unbiasedthis adjustment method may reduce the potential bias due to nonresponse Ofcourse bias reduction is predicated on correct model specification and thismay be particularly challenging in consent propensity applications given the
Exploratory Assessment of Consent-to-Link 125
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
absence of well-developed consent theories and inconsistency in the effects ofmany its predictors
Application of these ideas in the analysis of ldquoconsent-to-linkrdquo patterns iscomplicated by two issues First the decision of a respondent to provide formalconsent to link does not guarantee the successful linkage of that unit to a givenexternal data source For example a nominally consenting respondent may failto provide specific forms of information (eg account numbers) required toperform the linkage to the external source the external source itself may besubject to incomplete-data problems or there may be other problems with im-perfect linkage as outlined in Herzog et al (2007) Consequently it can be use-ful to consider a decomposition
pL x bLeth THORN frac14 pC x bCeth THORN pLjC x bLjC
(21)
where pL x bLeth THORN is the probability that a unit with predictor variable x will ul-timately have a successful linkage to a given data source pC x bCeth THORN is theprobability that this unit will provide nominal informed consent to link
pLjC x bLjC
is the probability of successful linkage conditional on the unit
providing consent and bL bC and bLjC are the parameter vectors for the three
respective probability models Note that to some degree the first factorpC x bCeth THORN is analogous to the probability of unit response in a standard survey
setting and the second factor pLjC x bLjC
is analogous to the probability of
section or item response In addition note that the conditional probabilities
pLjC x bLjC
may depend on a wide range of factors including perceived
sensitivity of a given set of linkage variables that the respondent may need toprovide
23 Options for Exploratory Analyses of Consent-to-Link
Ideally one would explore informed-consent issues by estimating all of theparameters of model (21) and by evaluating potential non-consent-basedbiases for estimators of a large number of population parameters of interestHowever in-depth empirical work with consent for record linkage imposes asubstantial burden on field data collection In addition large-scale implementa-tion of record linkage incurs substantial costs related to production systemsand ldquodata cleaningrdquo for the variables on which we intend to link and also incursa risk of disruption of the ongoing survey production process Consequently itis important to identify alternative approaches that allow initial exploration ofsome aspects of model (21) with considerably lower costs and risks For ex-ample one could consider the following sequence of exploratory options
126 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
(i) Simple lab studies This step has the advantages of not disruptingproduction and relatively low costs However results may not align withldquoreal worldrdquo production conditions (per the preceding literature results oninterviewer characteristics) nor full population coverage (per the com-ments on respondent demographics and socio-environmental features)
(ii) Addition of simple consent-to-link questions to standard productioninstruments This approach may incur a relatively low risk of disruptionof production and relatively low incremental costs and has the advantageof being naturally embedded in production conditions
(iii) Same as (ii) but with actual linkage to administrative records Thisapproach incurs some additional cost (to carry out record linkage and re-lated data-management and cleaning processes) In addition this optionmay incur some additional respondent burden arising from collection ofinformation required to enhance the probability of a successful link Onthe other hand option (iii) allows assessment of additional linkage-relatedissues (eg cases in which conditional linkage probabilities
pLjC x bLjC
are less than 1)
(iv) Full-scale field tests of consent-to-link This option will incur highercosts and higher risks of disruption of production processes but will gen-erally be considered necessary before an organization makes a final deci-sion to implement record linkage in production processes Also in somecases interviewer attitudes and behaviors may differ between cases (ii)and (iv)
Due to the balance of potential costs risks and benefits arising from options(i) through (iv) it may be useful to focus initial exploratory attention onoptions (i) and (ii) and then consider use of options (iii) and (iv) for cases inwhich the initial results indicate reasonable prospects for successfulimplementation
The current paper presents a case study of option (ii) for the ConsumerExpenditure Survey with principal emphasis on evaluation of the extent towhich variability in the propensity to consent may lead to biases in unadjustedestimators of some commonly studied economic variables when restricted toconsenting sample units and evaluation of the extent to which simplepropensity-based adjustments may reduce those potential biases
3 DATA AND METHODS
31 Possible Linkage of Government Records with Sample Units fromthe US Consumer Expenditure Survey
In this paper we extend the survey-based approach to assessing consent pro-pensity and consent bias in the context of a large household expenditure
Exploratory Assessment of Consent-to-Link 127
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
estimates from consenting and non-consenting respondents We contrastseveral estimation approaches and discuss implications of our findingsfor consent-propensity assessments and for approaches to minimize risksof consent-to-link bias
KEYWORDS Administrative Record Data Burden ReductionIncomplete Data Informed Consent Propensity Model
1 INTRODUCTION
Survey organizations face many challenges in their efforts to produce high-quality survey data The costs of data collection and the demand for data prod-ucts are greater than ever and survey budgets often are under serious strain tomeet these demands Declining survey response rates further complicate costand data-quality considerations Given these challenges survey organizationsincreasingly are exploring the possibility of linking survey data to administra-tive records Combining survey and administrative data on the same sampleunit has the potential to reduce the cost length and perceived burden of asurvey enrich our understanding of the underlying substantive phenomenaand offer a mechanism for targeted assessments of survey error components
Linking survey data to administrative data sources on the same individual orhousehold requires matching records from one dataset to the other The effi-ciency and success of this matching process depends on the variables and link-age strategy used to establish the link Exact matching techniques are mostsuccessful when unique identifying information such as a social securitynumber (SSN) is available but these techniques can also be effective in the ab-sence of unique identifiers when combinations of other personal variables arecompared (eg last name date of birth street name) (see Herzog Scheurenand Winkler 2007 for a review of statistical linkage techniques and relateddata-cleaning issues) Before any linkage attempt can be made however mostcountries require that survey respondents give their informed consent to linkand consent rates can vary considerablymdashfrom as low as 19 to as high as96 (Sakshaug Couper Ofstedal and Weir 2012) Lower consent rates arepotentially a major challenge to wider adoption of record-linkage in statisticalagencies because they increase the risk of bias in estimates derived from com-bined data (to the extent that there are systematic differences in key outcomemeasures between those who consent to link and those who do not)
As interest in and adoption of record-linkage methods have increased sotoo have investigations into factors associated with respondentsrsquo consent deci-sions and their potential impact on consent bias In general consent-to-linkphenomena can be viewed as a type of incomplete-data problem and thus canmake use of the broad spectrum of conceptual and methodological tools thathave developed for work with incomplete data Examples include assessmentof cognitive and social processes that lead to survey response (or consent to
Exploratory Assessment of Consent-to-Link 119
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
link) modeling and diagnostic tools for estimation and evaluation of relatedpropensity models and empirical assessment of biases resulting from theincomplete-data pattern
To date direct examinations of consent bias in estimates derived from com-bined data are extremely rare in the literature because they require researcheraccess to administrative records for both consenters and non-consenters (cfSakshaug and Kreuter 2012 Kreuter Sakshaug and Tourangeau 2016) Moststudies of linkage consent therefore have used data from survey respondentsmdashavailable for both consenters and non-consentersmdashto identify characteristicscorrelated with consent propensity These studies provide an indirect means ofassessing the potential risk of consent bias if the underlying consent propensi-ties are related to differences in respondentsrsquo administrative record profilesFindings from this literature indicate that linkage consent is often associatedwith respondent demographics (eg age education income) indicators of sur-vey reluctance (eg prior nonresponse in a panel survey) and features of thesurvey (eg wording placement and timing of consent requests) but the mag-nitude and direction of these effects vary across studies (Bates 2005Dahlhamer and Cox 2007 Sala Burton and Knies 2012 Sakshaug Tutz andKreuter 2013) Only very recently have researchers started to develop theory-based hypotheses about the mechanisms of consent decisions and to incorpo-rate more sophisticated analytic approaches to test these hypotheses (Sala et al2012 Sakshaug et al 2012 Mostafa 2015)
Finally empirical assessment of consent-to-link propensity patterns and re-lated potential consent biases naturally involve a complex set of trade-offs in-volving (a) the degree to which a given set of test conditions are relevant tocurrent or prospective production conditions (b) the ability to control applica-ble design factors and to measure relevant covariates within the context ofthose current production conditions (c) the ability to measure and model spe-cific portions of the complex processes that lead to respondent consent and co-operation in a given setting and (d) constraints on resources including boththe direct costs of testing consent-to-link options and the indirect costs arisingfrom the potential impact of testing on current survey production The remain-der of this paper considers some aspects of issues (a)ndash(d) with emphasis on ex-ploratory analyses for one case study that was embedded within a currentsurvey production process
In the next section we review the literature on consent decisions and con-sent bias In section 3 we describe in detail a specific consent-to-link case in-volving the US Consumer Expenditure Survey (CEQ) and then presentconsent propensity models and our evaluation methodology The results of ourdescriptive and multivariate examinations of consent propensity and assess-ments of consent bias are presented in section 4 We summarize and discussthe implications of our findings in section 5 Appendix A provides detaileddescriptions of the analytic variables Appendix B presents technical results onthe variability of weights used in the CEQ dataset Appendix C discusses some
120 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
features of the goodness-of-fit tests used for the models considered in section4 In addition an online Appendix D (Supplementary Materials) presents somerelated conceptual material and numerical results for hypothesis testing that ledto the modeling results summarized in section 4
2 LITERATURE REVIEW
21 Factors Affecting Linkage Consent
The earliest investigations of consent-to-linkage effects were mostly conductedin epidemiology and health studies that requested patientsrsquo consent to accesstheir medical records (Woolf Rothemich Johnson and Marsland 2000 DunnJordan Lacey Shapley and Jinks 2004 Kho et al 2009) but more recentstudies have assessed consent in general population surveys with linkagerequests to an array of administrative data sources (Knies Burton and Sala2012 Sala et al 2012 Sakshaug et al 2012 2013) In this section we summa-rize findings from this research on the factors that affect consent to datalinkage
211 Respondent demographics Linkage consent studies largely have fo-cused on respondentsrsquo sociodemographic characteristics most often age gen-der ethnicity education and income (Kho et al 2009 Fulton 2012) Thesevariables are widely available across surveys and although they are unlikely tohave direct causal impact on most consent decisions they provide indirectmeasures of psychosociological factors that may influence those choicesDemographic differences between consenters and non-consenters are commonbut the patterns of findings differ across studies For example older individualsfrequently have been found to be less likely to consent to record linkage thanyounger people (Dunn et al 2004 Bates 2005 Dahlhamer and Cox 2007Huang Shih Chang and Chou 2007 Pascale 2011 Sala et al 2012 AlBaghal Knies and Burton 2014) But some studies have found the opposite ef-fect (Woolf et al 2000 Beebe et al 2011) or no age effect (Kho et al 2009)Males often consent at higher rates than females (Dunn et al 2004 Bates2005 Woolf et al 2000 Sala et al 2012 Al Baghal et al 2014) but somestudies find no gender effect (Pascale 2011 Sakshaug et al 2012) Consentpropensities for ethnic minorities and non-citizens tend to be lower than formajority groups and citizens (Woolf et al 2000 Beebe et al 2011 Al Baghalet al 2014 Mostafa 2015) although not all studies show these effects (Bates2005 Kho et al 2009) Similar inconsistencies are evident across studies forthe effects of education and income on consent (Kho et al 2009 Fulton 2012Sakshaug et al 2012 Sala et al 2012) Other respondent demographic andhousehold characteristics have been examined less frequently (eg marital sta-tus employment status household size and ownerrenter) again with mixed
Exploratory Assessment of Consent-to-Link 121
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
findings (Olson 1999 Jenkins Cappellari Lynn Jackle and Sala 2006 AlBaghal et al 2014 Mostafa 2015)
212 Respondent attitudes Attitudes can have a powerful impact on thoughtand behavior and there is a long history of survey researchers attempting tomeasure respondentsrsquo attitudes and their impact on various survey outcomes(eg Goyder 1986) Particular attention has been given to respondentsrsquo atti-tudes about privacy and confidentiality Research conducted by the US CensusBureau going back to the 1990s demonstrates that concerns about personal pri-vacy and data confidentiality have increased in the general public and thatthese attitudes are associated with lower participation rates in the decennialcensus (Singer 1993 2003) and more negative attitudes toward the use of ad-ministrative records (Singer Bates and Van Hoewyk 2011) Privacy and con-fidentiality concerns can influence record linkage consent as well Both directmeasures of privacy concerns (respondent self-reports) and indirect indicators(item refusals on financial questions) have been shown to be negatively associ-ated with consent (Sakshaug et al 2012 Sala Knies and Burton 2014Mostafa 2015) Similarly Sakshaug et al (2012) demonstrated that the moreconfidentiality-related concerns respondents expressed to interviewers in a pre-vious survey wave the less likely they were to subsequently consent to datalinkage And Sala et al (2014) found that concern about data confidentialitywas the most frequent reason given by respondents who declined a linkage re-quest There also is evidence that trust (in other people in government) andcivic engagement (volunteering political involvement) are positively related toconsent (Sala et al 2012 Al Baghal et al 2014)
213 Saliency Respondentsrsquo interest in topics related to the record requestor their experiences with organizations that house those records can also affectconsent decisions For example a number of studies have found that respond-ents have a higher propensity to accept medical consent requests when they arein poorer health or have symptoms germane to the survey subject (Woolf et al2000 Dunn et al 2004 Dahlhamer and Cox 2007 Beebe et al 2011) One ex-planation for this finding is that consent requests on topics salient to respond-ents enhance the perceived benefits of record linkage (eg morecomprehensive medical evaluation or the general advancement of knowledgeabout a disease relevant to the respondent) or reduce the perceived risks(eg by inducing more extensive cognitive processing of the request) (GrovesSinger and Corning 2000) In addition to topic saliency respondentsrsquo existingrelationships with government agencies also can play a role in their consentdecisions Studies by Sala et al (2012) Sakshaug et al (2012) and Mostafa(2015) for example found that individuals who received government benefits(eg welfare food stamps veteransrsquo benefits) were more likely to consent toeconomic data linkage than those who did not These results again suggest that
122 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the salience of (and attitudes toward) service-providing government agenciesmay make some respondents more amenable to linkage requests involvingthose agencies
214 Socio-environmental features The respondentsrsquo environments help toshape the context in which consent decisions are made and a handful of stud-ies have examined associations between area characteristics and attitudes to-ward use of administrative records and consent decisions For example Singeret al (2011) found that individuals living in the South and Mid-Atlanticregions of the country had more favorable attitudes about administrative recorduse by the US Census Bureau than those living in other regions of the countryStudies of actual consent and linkage rates have demonstrated regional varia-tions as well with higher rates in the South and Midwest and lower rates inparts of the Northeast (eg Olson 1999 Dahlhamer and Cox 2007) Consentrates also can vary by urban status Consistent with urbanicity effects seen inthe literature on survey participation and pro-social behavior (Groves andCouper 1998 Mattis Hammond Grayman Bonacci Brennan et al 2009)respondents living in urban areas have been found to be less likely to consentthan those living in non-urban areas (cf Jenkins et al 2006 Dahlhamer andCox 2007 Al Baghal et al 2014 who show a marginally significant positiveeffect for urbanicity) Together such area effects may indicate the influence ofunderlying ecological factors within those communities (eg differences inpopulation density crime social engagement) but may also reflect differencesin survey operations (eg in staff protocol and training) clustered withinthose geographic areas A recent study by Mostafa (2015) found area charac-teristics by themselves added little explanatory power to models of consentpropensity suggesting that respondent and interview characteristics may bemore important factors
215 Interviewer characteristics Interviewer attributes and behaviors canhave significant impact on survey participation and data quality(OrsquoMuircheartaigh and Campanelli 1999 West Kreuter and Jaenichen 2013)including linkage consent decisions Studies investigating the impact of inter-viewer demographics generally find that they are unrelated to the consent out-come (Sakshaug et al 2012 Sala et al 2012) although there is some evidenceof a positive effect of interviewer age on consent (Krobmacher and Schroeder2013 Al Baghal et al 2014) Interviewer experience has shown mixed effectsThe amount of time spent working as an interviewer overall (ie job tenure) iseither unrelated to consent (Sakshaug et al 2012 Sala et al 2012) or can actu-ally have a small negative impact (Sakshaug et al 2013 Al Baghal et al2014) Interviewersrsquo survey-specific experience as measured by the number ofinterviews already completed prior to the current consent request shows simi-lar effects (Sakshaug et al 2012 Sala et al 2012) One aspect of interviewer
Exploratory Assessment of Consent-to-Link 123
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
experience that is positively related to consent in these studies is past perfor-mance in gaining respondent consent Sala et al (2012) and Sakshaug et al(2012) found that the likelihood of consent increased with the number of con-sents obtained earlier in the field period These authors also attempted to iden-tify interviewer personality traits and attitudes that could affect respondentconsent decisions but largely failed to find significant effects The one excep-tion was that respondent consent was positively related to interviewersrsquo ownwillingness to consent to linkage (Sakshaug et al 2012)
216 Survey design features The way in which the consent requests are ad-ministered can impact linkage consent Consent rates appear to be higher inface-to-face surveys than in phone surveys though there are relatively fewmode studies that have examined this phenomenon (Fulton 2012) Consentquestions that ask respondents to provide personal identifiers (eg full or par-tial SSNs) as matching variables produce lower consent rates than those thatdo not (Bates 2005) This finding and advancements in statistical matchingtechniques prompted the Census Bureau to change its approach to gaining link-age consent in 2006 and it has since adopted a passive opt-out consent proce-dure in which respondents are informed of the intent to link and consent isassumed unless respondents explicitly object (McNabb Timmons Song andPuckett 2009) These implicit consent procedures (as they are sometimescalled) result in higher consent rates than opt-in approaches where respondentsmust affirmatively state their consent (Bates 2005 Pascale 2011) See also Dasand Couper (2014)
Since most surveys employ opt-in formats researchers have focused on thepotential effects of the wording or framing of these questions Consent framingexperiments vary factors mentioned in the request that are thought to be per-suasive to respondents for example highlighting the quality benefits associ-ated with linkage the reduction in survey collection costs or the time savingsfor respondents Evidence of framing effects in these studies is surprisinglyweak however Bates Wroblewski and Pascale (2012) found that respondentsreported more positive attitudes toward record linkage under cost- and time-savings frames but the study did not measure actual linkage consent propensi-ties In Sakshaug and Kreuter (2014) a time-savings frame produced higherconsent rates for web survey respondents than a neutrally worded consentquestion but this is the only study in the literature to find significant question-framing effects (Pascale 2011 Sakshaug et al 2013)
The timing of consent requests appears to have some influence on likelihoodof consent Although it is common practice to delay asking the most sensitiveitems like linkage-consent requests until near the end of the questionnaire re-cent empirical evidence indicates that this may not be optimal Sakshaug et al(2013) found that respondents were more amenable to consent requests admin-istered at the beginning of the survey than at the end and suggest that the
124 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
proximity of the survey and linkage requests may reinforce respondentsrsquo desireor inclination to be consistent (ie agree to both) This result could inform po-tentially promising adaptive design interventions (eg asking for consent earlyin the interview then skipping subsequent burdensome questions for consent-ers) Sala et al (2014) obtained higher consent rates when the request wasasked immediately following a series of questions on a related topic rather thanwaiting until the end of the survey The authors reasoned that contextual place-ment of the linkage question increases the salience of the request and inducesmore careful consideration by respondents Both explanations find some sup-port in the broader psychological literature on compliance cognitive disso-nance and context effects but further research is needed to evaluate these andother mechanisms (eg survey fatigue) underlying consent placement effects(Sala et al 2014)
22 Analytic Approaches to Assessing Consent Bias
Early linkage consent studies simply looked for evidence of sample bias (iedifferences in sample composition for consenters and non-consenters) Mostrecent studies use logistic regression models to identify factors that influenceconsent and infer potential consent bias (eg differential consent to medical-records linkage by respondent health status) Several studies have employedmulti-level models to assess the impact of interviewers on consent propensity(eg Fulton 2012 Sala et al 2012 Mostafa 2015) and others have jointlymodeled respondentsrsquo consent propensities on multiple consent items in agiven survey (e g Mostafa 2015) Studies that have examined direct estimatesof consent bias using administrative records available for both consenters andnon-consenters are much less common in the literature Recent research bySakshaug and his colleagues are notable exceptions (Sakshaug and Kreuter2012 Sakshaug and Huber 2016) Using administrative data linked to Germanpanel survey data they have compared the magnitude of consent biases to biasestimates for other error sources (nonresponse measurement) and the longitu-dinal changes in these biases
Given evidence of potential consent bias (eg differential consent by spe-cific demographic groups) one promising approach that has not yet been ex-plored in this literature is to adopt propensity weighting methods that arewidely used in nonresponse adjustment Traditionally propensity weightednonresponse adjustments are accomplished by modeling response propensitiesusing logistic regression and auxiliary data available for both respondents andnonrespondents and then using the inverse of the modeled propensity as aweight adjustment factor (Little 1986) If the predicted propensity is unbiasedthis adjustment method may reduce the potential bias due to nonresponse Ofcourse bias reduction is predicated on correct model specification and thismay be particularly challenging in consent propensity applications given the
Exploratory Assessment of Consent-to-Link 125
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
absence of well-developed consent theories and inconsistency in the effects ofmany its predictors
Application of these ideas in the analysis of ldquoconsent-to-linkrdquo patterns iscomplicated by two issues First the decision of a respondent to provide formalconsent to link does not guarantee the successful linkage of that unit to a givenexternal data source For example a nominally consenting respondent may failto provide specific forms of information (eg account numbers) required toperform the linkage to the external source the external source itself may besubject to incomplete-data problems or there may be other problems with im-perfect linkage as outlined in Herzog et al (2007) Consequently it can be use-ful to consider a decomposition
pL x bLeth THORN frac14 pC x bCeth THORN pLjC x bLjC
(21)
where pL x bLeth THORN is the probability that a unit with predictor variable x will ul-timately have a successful linkage to a given data source pC x bCeth THORN is theprobability that this unit will provide nominal informed consent to link
pLjC x bLjC
is the probability of successful linkage conditional on the unit
providing consent and bL bC and bLjC are the parameter vectors for the three
respective probability models Note that to some degree the first factorpC x bCeth THORN is analogous to the probability of unit response in a standard survey
setting and the second factor pLjC x bLjC
is analogous to the probability of
section or item response In addition note that the conditional probabilities
pLjC x bLjC
may depend on a wide range of factors including perceived
sensitivity of a given set of linkage variables that the respondent may need toprovide
23 Options for Exploratory Analyses of Consent-to-Link
Ideally one would explore informed-consent issues by estimating all of theparameters of model (21) and by evaluating potential non-consent-basedbiases for estimators of a large number of population parameters of interestHowever in-depth empirical work with consent for record linkage imposes asubstantial burden on field data collection In addition large-scale implementa-tion of record linkage incurs substantial costs related to production systemsand ldquodata cleaningrdquo for the variables on which we intend to link and also incursa risk of disruption of the ongoing survey production process Consequently itis important to identify alternative approaches that allow initial exploration ofsome aspects of model (21) with considerably lower costs and risks For ex-ample one could consider the following sequence of exploratory options
126 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
(i) Simple lab studies This step has the advantages of not disruptingproduction and relatively low costs However results may not align withldquoreal worldrdquo production conditions (per the preceding literature results oninterviewer characteristics) nor full population coverage (per the com-ments on respondent demographics and socio-environmental features)
(ii) Addition of simple consent-to-link questions to standard productioninstruments This approach may incur a relatively low risk of disruptionof production and relatively low incremental costs and has the advantageof being naturally embedded in production conditions
(iii) Same as (ii) but with actual linkage to administrative records Thisapproach incurs some additional cost (to carry out record linkage and re-lated data-management and cleaning processes) In addition this optionmay incur some additional respondent burden arising from collection ofinformation required to enhance the probability of a successful link Onthe other hand option (iii) allows assessment of additional linkage-relatedissues (eg cases in which conditional linkage probabilities
pLjC x bLjC
are less than 1)
(iv) Full-scale field tests of consent-to-link This option will incur highercosts and higher risks of disruption of production processes but will gen-erally be considered necessary before an organization makes a final deci-sion to implement record linkage in production processes Also in somecases interviewer attitudes and behaviors may differ between cases (ii)and (iv)
Due to the balance of potential costs risks and benefits arising from options(i) through (iv) it may be useful to focus initial exploratory attention onoptions (i) and (ii) and then consider use of options (iii) and (iv) for cases inwhich the initial results indicate reasonable prospects for successfulimplementation
The current paper presents a case study of option (ii) for the ConsumerExpenditure Survey with principal emphasis on evaluation of the extent towhich variability in the propensity to consent may lead to biases in unadjustedestimators of some commonly studied economic variables when restricted toconsenting sample units and evaluation of the extent to which simplepropensity-based adjustments may reduce those potential biases
3 DATA AND METHODS
31 Possible Linkage of Government Records with Sample Units fromthe US Consumer Expenditure Survey
In this paper we extend the survey-based approach to assessing consent pro-pensity and consent bias in the context of a large household expenditure
Exploratory Assessment of Consent-to-Link 127
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
link) modeling and diagnostic tools for estimation and evaluation of relatedpropensity models and empirical assessment of biases resulting from theincomplete-data pattern
To date direct examinations of consent bias in estimates derived from com-bined data are extremely rare in the literature because they require researcheraccess to administrative records for both consenters and non-consenters (cfSakshaug and Kreuter 2012 Kreuter Sakshaug and Tourangeau 2016) Moststudies of linkage consent therefore have used data from survey respondentsmdashavailable for both consenters and non-consentersmdashto identify characteristicscorrelated with consent propensity These studies provide an indirect means ofassessing the potential risk of consent bias if the underlying consent propensi-ties are related to differences in respondentsrsquo administrative record profilesFindings from this literature indicate that linkage consent is often associatedwith respondent demographics (eg age education income) indicators of sur-vey reluctance (eg prior nonresponse in a panel survey) and features of thesurvey (eg wording placement and timing of consent requests) but the mag-nitude and direction of these effects vary across studies (Bates 2005Dahlhamer and Cox 2007 Sala Burton and Knies 2012 Sakshaug Tutz andKreuter 2013) Only very recently have researchers started to develop theory-based hypotheses about the mechanisms of consent decisions and to incorpo-rate more sophisticated analytic approaches to test these hypotheses (Sala et al2012 Sakshaug et al 2012 Mostafa 2015)
Finally empirical assessment of consent-to-link propensity patterns and re-lated potential consent biases naturally involve a complex set of trade-offs in-volving (a) the degree to which a given set of test conditions are relevant tocurrent or prospective production conditions (b) the ability to control applica-ble design factors and to measure relevant covariates within the context ofthose current production conditions (c) the ability to measure and model spe-cific portions of the complex processes that lead to respondent consent and co-operation in a given setting and (d) constraints on resources including boththe direct costs of testing consent-to-link options and the indirect costs arisingfrom the potential impact of testing on current survey production The remain-der of this paper considers some aspects of issues (a)ndash(d) with emphasis on ex-ploratory analyses for one case study that was embedded within a currentsurvey production process
In the next section we review the literature on consent decisions and con-sent bias In section 3 we describe in detail a specific consent-to-link case in-volving the US Consumer Expenditure Survey (CEQ) and then presentconsent propensity models and our evaluation methodology The results of ourdescriptive and multivariate examinations of consent propensity and assess-ments of consent bias are presented in section 4 We summarize and discussthe implications of our findings in section 5 Appendix A provides detaileddescriptions of the analytic variables Appendix B presents technical results onthe variability of weights used in the CEQ dataset Appendix C discusses some
120 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
features of the goodness-of-fit tests used for the models considered in section4 In addition an online Appendix D (Supplementary Materials) presents somerelated conceptual material and numerical results for hypothesis testing that ledto the modeling results summarized in section 4
2 LITERATURE REVIEW
21 Factors Affecting Linkage Consent
The earliest investigations of consent-to-linkage effects were mostly conductedin epidemiology and health studies that requested patientsrsquo consent to accesstheir medical records (Woolf Rothemich Johnson and Marsland 2000 DunnJordan Lacey Shapley and Jinks 2004 Kho et al 2009) but more recentstudies have assessed consent in general population surveys with linkagerequests to an array of administrative data sources (Knies Burton and Sala2012 Sala et al 2012 Sakshaug et al 2012 2013) In this section we summa-rize findings from this research on the factors that affect consent to datalinkage
211 Respondent demographics Linkage consent studies largely have fo-cused on respondentsrsquo sociodemographic characteristics most often age gen-der ethnicity education and income (Kho et al 2009 Fulton 2012) Thesevariables are widely available across surveys and although they are unlikely tohave direct causal impact on most consent decisions they provide indirectmeasures of psychosociological factors that may influence those choicesDemographic differences between consenters and non-consenters are commonbut the patterns of findings differ across studies For example older individualsfrequently have been found to be less likely to consent to record linkage thanyounger people (Dunn et al 2004 Bates 2005 Dahlhamer and Cox 2007Huang Shih Chang and Chou 2007 Pascale 2011 Sala et al 2012 AlBaghal Knies and Burton 2014) But some studies have found the opposite ef-fect (Woolf et al 2000 Beebe et al 2011) or no age effect (Kho et al 2009)Males often consent at higher rates than females (Dunn et al 2004 Bates2005 Woolf et al 2000 Sala et al 2012 Al Baghal et al 2014) but somestudies find no gender effect (Pascale 2011 Sakshaug et al 2012) Consentpropensities for ethnic minorities and non-citizens tend to be lower than formajority groups and citizens (Woolf et al 2000 Beebe et al 2011 Al Baghalet al 2014 Mostafa 2015) although not all studies show these effects (Bates2005 Kho et al 2009) Similar inconsistencies are evident across studies forthe effects of education and income on consent (Kho et al 2009 Fulton 2012Sakshaug et al 2012 Sala et al 2012) Other respondent demographic andhousehold characteristics have been examined less frequently (eg marital sta-tus employment status household size and ownerrenter) again with mixed
Exploratory Assessment of Consent-to-Link 121
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
findings (Olson 1999 Jenkins Cappellari Lynn Jackle and Sala 2006 AlBaghal et al 2014 Mostafa 2015)
212 Respondent attitudes Attitudes can have a powerful impact on thoughtand behavior and there is a long history of survey researchers attempting tomeasure respondentsrsquo attitudes and their impact on various survey outcomes(eg Goyder 1986) Particular attention has been given to respondentsrsquo atti-tudes about privacy and confidentiality Research conducted by the US CensusBureau going back to the 1990s demonstrates that concerns about personal pri-vacy and data confidentiality have increased in the general public and thatthese attitudes are associated with lower participation rates in the decennialcensus (Singer 1993 2003) and more negative attitudes toward the use of ad-ministrative records (Singer Bates and Van Hoewyk 2011) Privacy and con-fidentiality concerns can influence record linkage consent as well Both directmeasures of privacy concerns (respondent self-reports) and indirect indicators(item refusals on financial questions) have been shown to be negatively associ-ated with consent (Sakshaug et al 2012 Sala Knies and Burton 2014Mostafa 2015) Similarly Sakshaug et al (2012) demonstrated that the moreconfidentiality-related concerns respondents expressed to interviewers in a pre-vious survey wave the less likely they were to subsequently consent to datalinkage And Sala et al (2014) found that concern about data confidentialitywas the most frequent reason given by respondents who declined a linkage re-quest There also is evidence that trust (in other people in government) andcivic engagement (volunteering political involvement) are positively related toconsent (Sala et al 2012 Al Baghal et al 2014)
213 Saliency Respondentsrsquo interest in topics related to the record requestor their experiences with organizations that house those records can also affectconsent decisions For example a number of studies have found that respond-ents have a higher propensity to accept medical consent requests when they arein poorer health or have symptoms germane to the survey subject (Woolf et al2000 Dunn et al 2004 Dahlhamer and Cox 2007 Beebe et al 2011) One ex-planation for this finding is that consent requests on topics salient to respond-ents enhance the perceived benefits of record linkage (eg morecomprehensive medical evaluation or the general advancement of knowledgeabout a disease relevant to the respondent) or reduce the perceived risks(eg by inducing more extensive cognitive processing of the request) (GrovesSinger and Corning 2000) In addition to topic saliency respondentsrsquo existingrelationships with government agencies also can play a role in their consentdecisions Studies by Sala et al (2012) Sakshaug et al (2012) and Mostafa(2015) for example found that individuals who received government benefits(eg welfare food stamps veteransrsquo benefits) were more likely to consent toeconomic data linkage than those who did not These results again suggest that
122 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the salience of (and attitudes toward) service-providing government agenciesmay make some respondents more amenable to linkage requests involvingthose agencies
214 Socio-environmental features The respondentsrsquo environments help toshape the context in which consent decisions are made and a handful of stud-ies have examined associations between area characteristics and attitudes to-ward use of administrative records and consent decisions For example Singeret al (2011) found that individuals living in the South and Mid-Atlanticregions of the country had more favorable attitudes about administrative recorduse by the US Census Bureau than those living in other regions of the countryStudies of actual consent and linkage rates have demonstrated regional varia-tions as well with higher rates in the South and Midwest and lower rates inparts of the Northeast (eg Olson 1999 Dahlhamer and Cox 2007) Consentrates also can vary by urban status Consistent with urbanicity effects seen inthe literature on survey participation and pro-social behavior (Groves andCouper 1998 Mattis Hammond Grayman Bonacci Brennan et al 2009)respondents living in urban areas have been found to be less likely to consentthan those living in non-urban areas (cf Jenkins et al 2006 Dahlhamer andCox 2007 Al Baghal et al 2014 who show a marginally significant positiveeffect for urbanicity) Together such area effects may indicate the influence ofunderlying ecological factors within those communities (eg differences inpopulation density crime social engagement) but may also reflect differencesin survey operations (eg in staff protocol and training) clustered withinthose geographic areas A recent study by Mostafa (2015) found area charac-teristics by themselves added little explanatory power to models of consentpropensity suggesting that respondent and interview characteristics may bemore important factors
215 Interviewer characteristics Interviewer attributes and behaviors canhave significant impact on survey participation and data quality(OrsquoMuircheartaigh and Campanelli 1999 West Kreuter and Jaenichen 2013)including linkage consent decisions Studies investigating the impact of inter-viewer demographics generally find that they are unrelated to the consent out-come (Sakshaug et al 2012 Sala et al 2012) although there is some evidenceof a positive effect of interviewer age on consent (Krobmacher and Schroeder2013 Al Baghal et al 2014) Interviewer experience has shown mixed effectsThe amount of time spent working as an interviewer overall (ie job tenure) iseither unrelated to consent (Sakshaug et al 2012 Sala et al 2012) or can actu-ally have a small negative impact (Sakshaug et al 2013 Al Baghal et al2014) Interviewersrsquo survey-specific experience as measured by the number ofinterviews already completed prior to the current consent request shows simi-lar effects (Sakshaug et al 2012 Sala et al 2012) One aspect of interviewer
Exploratory Assessment of Consent-to-Link 123
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
experience that is positively related to consent in these studies is past perfor-mance in gaining respondent consent Sala et al (2012) and Sakshaug et al(2012) found that the likelihood of consent increased with the number of con-sents obtained earlier in the field period These authors also attempted to iden-tify interviewer personality traits and attitudes that could affect respondentconsent decisions but largely failed to find significant effects The one excep-tion was that respondent consent was positively related to interviewersrsquo ownwillingness to consent to linkage (Sakshaug et al 2012)
216 Survey design features The way in which the consent requests are ad-ministered can impact linkage consent Consent rates appear to be higher inface-to-face surveys than in phone surveys though there are relatively fewmode studies that have examined this phenomenon (Fulton 2012) Consentquestions that ask respondents to provide personal identifiers (eg full or par-tial SSNs) as matching variables produce lower consent rates than those thatdo not (Bates 2005) This finding and advancements in statistical matchingtechniques prompted the Census Bureau to change its approach to gaining link-age consent in 2006 and it has since adopted a passive opt-out consent proce-dure in which respondents are informed of the intent to link and consent isassumed unless respondents explicitly object (McNabb Timmons Song andPuckett 2009) These implicit consent procedures (as they are sometimescalled) result in higher consent rates than opt-in approaches where respondentsmust affirmatively state their consent (Bates 2005 Pascale 2011) See also Dasand Couper (2014)
Since most surveys employ opt-in formats researchers have focused on thepotential effects of the wording or framing of these questions Consent framingexperiments vary factors mentioned in the request that are thought to be per-suasive to respondents for example highlighting the quality benefits associ-ated with linkage the reduction in survey collection costs or the time savingsfor respondents Evidence of framing effects in these studies is surprisinglyweak however Bates Wroblewski and Pascale (2012) found that respondentsreported more positive attitudes toward record linkage under cost- and time-savings frames but the study did not measure actual linkage consent propensi-ties In Sakshaug and Kreuter (2014) a time-savings frame produced higherconsent rates for web survey respondents than a neutrally worded consentquestion but this is the only study in the literature to find significant question-framing effects (Pascale 2011 Sakshaug et al 2013)
The timing of consent requests appears to have some influence on likelihoodof consent Although it is common practice to delay asking the most sensitiveitems like linkage-consent requests until near the end of the questionnaire re-cent empirical evidence indicates that this may not be optimal Sakshaug et al(2013) found that respondents were more amenable to consent requests admin-istered at the beginning of the survey than at the end and suggest that the
124 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
proximity of the survey and linkage requests may reinforce respondentsrsquo desireor inclination to be consistent (ie agree to both) This result could inform po-tentially promising adaptive design interventions (eg asking for consent earlyin the interview then skipping subsequent burdensome questions for consent-ers) Sala et al (2014) obtained higher consent rates when the request wasasked immediately following a series of questions on a related topic rather thanwaiting until the end of the survey The authors reasoned that contextual place-ment of the linkage question increases the salience of the request and inducesmore careful consideration by respondents Both explanations find some sup-port in the broader psychological literature on compliance cognitive disso-nance and context effects but further research is needed to evaluate these andother mechanisms (eg survey fatigue) underlying consent placement effects(Sala et al 2014)
22 Analytic Approaches to Assessing Consent Bias
Early linkage consent studies simply looked for evidence of sample bias (iedifferences in sample composition for consenters and non-consenters) Mostrecent studies use logistic regression models to identify factors that influenceconsent and infer potential consent bias (eg differential consent to medical-records linkage by respondent health status) Several studies have employedmulti-level models to assess the impact of interviewers on consent propensity(eg Fulton 2012 Sala et al 2012 Mostafa 2015) and others have jointlymodeled respondentsrsquo consent propensities on multiple consent items in agiven survey (e g Mostafa 2015) Studies that have examined direct estimatesof consent bias using administrative records available for both consenters andnon-consenters are much less common in the literature Recent research bySakshaug and his colleagues are notable exceptions (Sakshaug and Kreuter2012 Sakshaug and Huber 2016) Using administrative data linked to Germanpanel survey data they have compared the magnitude of consent biases to biasestimates for other error sources (nonresponse measurement) and the longitu-dinal changes in these biases
Given evidence of potential consent bias (eg differential consent by spe-cific demographic groups) one promising approach that has not yet been ex-plored in this literature is to adopt propensity weighting methods that arewidely used in nonresponse adjustment Traditionally propensity weightednonresponse adjustments are accomplished by modeling response propensitiesusing logistic regression and auxiliary data available for both respondents andnonrespondents and then using the inverse of the modeled propensity as aweight adjustment factor (Little 1986) If the predicted propensity is unbiasedthis adjustment method may reduce the potential bias due to nonresponse Ofcourse bias reduction is predicated on correct model specification and thismay be particularly challenging in consent propensity applications given the
Exploratory Assessment of Consent-to-Link 125
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
absence of well-developed consent theories and inconsistency in the effects ofmany its predictors
Application of these ideas in the analysis of ldquoconsent-to-linkrdquo patterns iscomplicated by two issues First the decision of a respondent to provide formalconsent to link does not guarantee the successful linkage of that unit to a givenexternal data source For example a nominally consenting respondent may failto provide specific forms of information (eg account numbers) required toperform the linkage to the external source the external source itself may besubject to incomplete-data problems or there may be other problems with im-perfect linkage as outlined in Herzog et al (2007) Consequently it can be use-ful to consider a decomposition
pL x bLeth THORN frac14 pC x bCeth THORN pLjC x bLjC
(21)
where pL x bLeth THORN is the probability that a unit with predictor variable x will ul-timately have a successful linkage to a given data source pC x bCeth THORN is theprobability that this unit will provide nominal informed consent to link
pLjC x bLjC
is the probability of successful linkage conditional on the unit
providing consent and bL bC and bLjC are the parameter vectors for the three
respective probability models Note that to some degree the first factorpC x bCeth THORN is analogous to the probability of unit response in a standard survey
setting and the second factor pLjC x bLjC
is analogous to the probability of
section or item response In addition note that the conditional probabilities
pLjC x bLjC
may depend on a wide range of factors including perceived
sensitivity of a given set of linkage variables that the respondent may need toprovide
23 Options for Exploratory Analyses of Consent-to-Link
Ideally one would explore informed-consent issues by estimating all of theparameters of model (21) and by evaluating potential non-consent-basedbiases for estimators of a large number of population parameters of interestHowever in-depth empirical work with consent for record linkage imposes asubstantial burden on field data collection In addition large-scale implementa-tion of record linkage incurs substantial costs related to production systemsand ldquodata cleaningrdquo for the variables on which we intend to link and also incursa risk of disruption of the ongoing survey production process Consequently itis important to identify alternative approaches that allow initial exploration ofsome aspects of model (21) with considerably lower costs and risks For ex-ample one could consider the following sequence of exploratory options
126 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
(i) Simple lab studies This step has the advantages of not disruptingproduction and relatively low costs However results may not align withldquoreal worldrdquo production conditions (per the preceding literature results oninterviewer characteristics) nor full population coverage (per the com-ments on respondent demographics and socio-environmental features)
(ii) Addition of simple consent-to-link questions to standard productioninstruments This approach may incur a relatively low risk of disruptionof production and relatively low incremental costs and has the advantageof being naturally embedded in production conditions
(iii) Same as (ii) but with actual linkage to administrative records Thisapproach incurs some additional cost (to carry out record linkage and re-lated data-management and cleaning processes) In addition this optionmay incur some additional respondent burden arising from collection ofinformation required to enhance the probability of a successful link Onthe other hand option (iii) allows assessment of additional linkage-relatedissues (eg cases in which conditional linkage probabilities
pLjC x bLjC
are less than 1)
(iv) Full-scale field tests of consent-to-link This option will incur highercosts and higher risks of disruption of production processes but will gen-erally be considered necessary before an organization makes a final deci-sion to implement record linkage in production processes Also in somecases interviewer attitudes and behaviors may differ between cases (ii)and (iv)
Due to the balance of potential costs risks and benefits arising from options(i) through (iv) it may be useful to focus initial exploratory attention onoptions (i) and (ii) and then consider use of options (iii) and (iv) for cases inwhich the initial results indicate reasonable prospects for successfulimplementation
The current paper presents a case study of option (ii) for the ConsumerExpenditure Survey with principal emphasis on evaluation of the extent towhich variability in the propensity to consent may lead to biases in unadjustedestimators of some commonly studied economic variables when restricted toconsenting sample units and evaluation of the extent to which simplepropensity-based adjustments may reduce those potential biases
3 DATA AND METHODS
31 Possible Linkage of Government Records with Sample Units fromthe US Consumer Expenditure Survey
In this paper we extend the survey-based approach to assessing consent pro-pensity and consent bias in the context of a large household expenditure
Exploratory Assessment of Consent-to-Link 127
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
features of the goodness-of-fit tests used for the models considered in section4 In addition an online Appendix D (Supplementary Materials) presents somerelated conceptual material and numerical results for hypothesis testing that ledto the modeling results summarized in section 4
2 LITERATURE REVIEW
21 Factors Affecting Linkage Consent
The earliest investigations of consent-to-linkage effects were mostly conductedin epidemiology and health studies that requested patientsrsquo consent to accesstheir medical records (Woolf Rothemich Johnson and Marsland 2000 DunnJordan Lacey Shapley and Jinks 2004 Kho et al 2009) but more recentstudies have assessed consent in general population surveys with linkagerequests to an array of administrative data sources (Knies Burton and Sala2012 Sala et al 2012 Sakshaug et al 2012 2013) In this section we summa-rize findings from this research on the factors that affect consent to datalinkage
211 Respondent demographics Linkage consent studies largely have fo-cused on respondentsrsquo sociodemographic characteristics most often age gen-der ethnicity education and income (Kho et al 2009 Fulton 2012) Thesevariables are widely available across surveys and although they are unlikely tohave direct causal impact on most consent decisions they provide indirectmeasures of psychosociological factors that may influence those choicesDemographic differences between consenters and non-consenters are commonbut the patterns of findings differ across studies For example older individualsfrequently have been found to be less likely to consent to record linkage thanyounger people (Dunn et al 2004 Bates 2005 Dahlhamer and Cox 2007Huang Shih Chang and Chou 2007 Pascale 2011 Sala et al 2012 AlBaghal Knies and Burton 2014) But some studies have found the opposite ef-fect (Woolf et al 2000 Beebe et al 2011) or no age effect (Kho et al 2009)Males often consent at higher rates than females (Dunn et al 2004 Bates2005 Woolf et al 2000 Sala et al 2012 Al Baghal et al 2014) but somestudies find no gender effect (Pascale 2011 Sakshaug et al 2012) Consentpropensities for ethnic minorities and non-citizens tend to be lower than formajority groups and citizens (Woolf et al 2000 Beebe et al 2011 Al Baghalet al 2014 Mostafa 2015) although not all studies show these effects (Bates2005 Kho et al 2009) Similar inconsistencies are evident across studies forthe effects of education and income on consent (Kho et al 2009 Fulton 2012Sakshaug et al 2012 Sala et al 2012) Other respondent demographic andhousehold characteristics have been examined less frequently (eg marital sta-tus employment status household size and ownerrenter) again with mixed
Exploratory Assessment of Consent-to-Link 121
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
findings (Olson 1999 Jenkins Cappellari Lynn Jackle and Sala 2006 AlBaghal et al 2014 Mostafa 2015)
212 Respondent attitudes Attitudes can have a powerful impact on thoughtand behavior and there is a long history of survey researchers attempting tomeasure respondentsrsquo attitudes and their impact on various survey outcomes(eg Goyder 1986) Particular attention has been given to respondentsrsquo atti-tudes about privacy and confidentiality Research conducted by the US CensusBureau going back to the 1990s demonstrates that concerns about personal pri-vacy and data confidentiality have increased in the general public and thatthese attitudes are associated with lower participation rates in the decennialcensus (Singer 1993 2003) and more negative attitudes toward the use of ad-ministrative records (Singer Bates and Van Hoewyk 2011) Privacy and con-fidentiality concerns can influence record linkage consent as well Both directmeasures of privacy concerns (respondent self-reports) and indirect indicators(item refusals on financial questions) have been shown to be negatively associ-ated with consent (Sakshaug et al 2012 Sala Knies and Burton 2014Mostafa 2015) Similarly Sakshaug et al (2012) demonstrated that the moreconfidentiality-related concerns respondents expressed to interviewers in a pre-vious survey wave the less likely they were to subsequently consent to datalinkage And Sala et al (2014) found that concern about data confidentialitywas the most frequent reason given by respondents who declined a linkage re-quest There also is evidence that trust (in other people in government) andcivic engagement (volunteering political involvement) are positively related toconsent (Sala et al 2012 Al Baghal et al 2014)
213 Saliency Respondentsrsquo interest in topics related to the record requestor their experiences with organizations that house those records can also affectconsent decisions For example a number of studies have found that respond-ents have a higher propensity to accept medical consent requests when they arein poorer health or have symptoms germane to the survey subject (Woolf et al2000 Dunn et al 2004 Dahlhamer and Cox 2007 Beebe et al 2011) One ex-planation for this finding is that consent requests on topics salient to respond-ents enhance the perceived benefits of record linkage (eg morecomprehensive medical evaluation or the general advancement of knowledgeabout a disease relevant to the respondent) or reduce the perceived risks(eg by inducing more extensive cognitive processing of the request) (GrovesSinger and Corning 2000) In addition to topic saliency respondentsrsquo existingrelationships with government agencies also can play a role in their consentdecisions Studies by Sala et al (2012) Sakshaug et al (2012) and Mostafa(2015) for example found that individuals who received government benefits(eg welfare food stamps veteransrsquo benefits) were more likely to consent toeconomic data linkage than those who did not These results again suggest that
122 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the salience of (and attitudes toward) service-providing government agenciesmay make some respondents more amenable to linkage requests involvingthose agencies
214 Socio-environmental features The respondentsrsquo environments help toshape the context in which consent decisions are made and a handful of stud-ies have examined associations between area characteristics and attitudes to-ward use of administrative records and consent decisions For example Singeret al (2011) found that individuals living in the South and Mid-Atlanticregions of the country had more favorable attitudes about administrative recorduse by the US Census Bureau than those living in other regions of the countryStudies of actual consent and linkage rates have demonstrated regional varia-tions as well with higher rates in the South and Midwest and lower rates inparts of the Northeast (eg Olson 1999 Dahlhamer and Cox 2007) Consentrates also can vary by urban status Consistent with urbanicity effects seen inthe literature on survey participation and pro-social behavior (Groves andCouper 1998 Mattis Hammond Grayman Bonacci Brennan et al 2009)respondents living in urban areas have been found to be less likely to consentthan those living in non-urban areas (cf Jenkins et al 2006 Dahlhamer andCox 2007 Al Baghal et al 2014 who show a marginally significant positiveeffect for urbanicity) Together such area effects may indicate the influence ofunderlying ecological factors within those communities (eg differences inpopulation density crime social engagement) but may also reflect differencesin survey operations (eg in staff protocol and training) clustered withinthose geographic areas A recent study by Mostafa (2015) found area charac-teristics by themselves added little explanatory power to models of consentpropensity suggesting that respondent and interview characteristics may bemore important factors
215 Interviewer characteristics Interviewer attributes and behaviors canhave significant impact on survey participation and data quality(OrsquoMuircheartaigh and Campanelli 1999 West Kreuter and Jaenichen 2013)including linkage consent decisions Studies investigating the impact of inter-viewer demographics generally find that they are unrelated to the consent out-come (Sakshaug et al 2012 Sala et al 2012) although there is some evidenceof a positive effect of interviewer age on consent (Krobmacher and Schroeder2013 Al Baghal et al 2014) Interviewer experience has shown mixed effectsThe amount of time spent working as an interviewer overall (ie job tenure) iseither unrelated to consent (Sakshaug et al 2012 Sala et al 2012) or can actu-ally have a small negative impact (Sakshaug et al 2013 Al Baghal et al2014) Interviewersrsquo survey-specific experience as measured by the number ofinterviews already completed prior to the current consent request shows simi-lar effects (Sakshaug et al 2012 Sala et al 2012) One aspect of interviewer
Exploratory Assessment of Consent-to-Link 123
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
experience that is positively related to consent in these studies is past perfor-mance in gaining respondent consent Sala et al (2012) and Sakshaug et al(2012) found that the likelihood of consent increased with the number of con-sents obtained earlier in the field period These authors also attempted to iden-tify interviewer personality traits and attitudes that could affect respondentconsent decisions but largely failed to find significant effects The one excep-tion was that respondent consent was positively related to interviewersrsquo ownwillingness to consent to linkage (Sakshaug et al 2012)
216 Survey design features The way in which the consent requests are ad-ministered can impact linkage consent Consent rates appear to be higher inface-to-face surveys than in phone surveys though there are relatively fewmode studies that have examined this phenomenon (Fulton 2012) Consentquestions that ask respondents to provide personal identifiers (eg full or par-tial SSNs) as matching variables produce lower consent rates than those thatdo not (Bates 2005) This finding and advancements in statistical matchingtechniques prompted the Census Bureau to change its approach to gaining link-age consent in 2006 and it has since adopted a passive opt-out consent proce-dure in which respondents are informed of the intent to link and consent isassumed unless respondents explicitly object (McNabb Timmons Song andPuckett 2009) These implicit consent procedures (as they are sometimescalled) result in higher consent rates than opt-in approaches where respondentsmust affirmatively state their consent (Bates 2005 Pascale 2011) See also Dasand Couper (2014)
Since most surveys employ opt-in formats researchers have focused on thepotential effects of the wording or framing of these questions Consent framingexperiments vary factors mentioned in the request that are thought to be per-suasive to respondents for example highlighting the quality benefits associ-ated with linkage the reduction in survey collection costs or the time savingsfor respondents Evidence of framing effects in these studies is surprisinglyweak however Bates Wroblewski and Pascale (2012) found that respondentsreported more positive attitudes toward record linkage under cost- and time-savings frames but the study did not measure actual linkage consent propensi-ties In Sakshaug and Kreuter (2014) a time-savings frame produced higherconsent rates for web survey respondents than a neutrally worded consentquestion but this is the only study in the literature to find significant question-framing effects (Pascale 2011 Sakshaug et al 2013)
The timing of consent requests appears to have some influence on likelihoodof consent Although it is common practice to delay asking the most sensitiveitems like linkage-consent requests until near the end of the questionnaire re-cent empirical evidence indicates that this may not be optimal Sakshaug et al(2013) found that respondents were more amenable to consent requests admin-istered at the beginning of the survey than at the end and suggest that the
124 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
proximity of the survey and linkage requests may reinforce respondentsrsquo desireor inclination to be consistent (ie agree to both) This result could inform po-tentially promising adaptive design interventions (eg asking for consent earlyin the interview then skipping subsequent burdensome questions for consent-ers) Sala et al (2014) obtained higher consent rates when the request wasasked immediately following a series of questions on a related topic rather thanwaiting until the end of the survey The authors reasoned that contextual place-ment of the linkage question increases the salience of the request and inducesmore careful consideration by respondents Both explanations find some sup-port in the broader psychological literature on compliance cognitive disso-nance and context effects but further research is needed to evaluate these andother mechanisms (eg survey fatigue) underlying consent placement effects(Sala et al 2014)
22 Analytic Approaches to Assessing Consent Bias
Early linkage consent studies simply looked for evidence of sample bias (iedifferences in sample composition for consenters and non-consenters) Mostrecent studies use logistic regression models to identify factors that influenceconsent and infer potential consent bias (eg differential consent to medical-records linkage by respondent health status) Several studies have employedmulti-level models to assess the impact of interviewers on consent propensity(eg Fulton 2012 Sala et al 2012 Mostafa 2015) and others have jointlymodeled respondentsrsquo consent propensities on multiple consent items in agiven survey (e g Mostafa 2015) Studies that have examined direct estimatesof consent bias using administrative records available for both consenters andnon-consenters are much less common in the literature Recent research bySakshaug and his colleagues are notable exceptions (Sakshaug and Kreuter2012 Sakshaug and Huber 2016) Using administrative data linked to Germanpanel survey data they have compared the magnitude of consent biases to biasestimates for other error sources (nonresponse measurement) and the longitu-dinal changes in these biases
Given evidence of potential consent bias (eg differential consent by spe-cific demographic groups) one promising approach that has not yet been ex-plored in this literature is to adopt propensity weighting methods that arewidely used in nonresponse adjustment Traditionally propensity weightednonresponse adjustments are accomplished by modeling response propensitiesusing logistic regression and auxiliary data available for both respondents andnonrespondents and then using the inverse of the modeled propensity as aweight adjustment factor (Little 1986) If the predicted propensity is unbiasedthis adjustment method may reduce the potential bias due to nonresponse Ofcourse bias reduction is predicated on correct model specification and thismay be particularly challenging in consent propensity applications given the
Exploratory Assessment of Consent-to-Link 125
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
absence of well-developed consent theories and inconsistency in the effects ofmany its predictors
Application of these ideas in the analysis of ldquoconsent-to-linkrdquo patterns iscomplicated by two issues First the decision of a respondent to provide formalconsent to link does not guarantee the successful linkage of that unit to a givenexternal data source For example a nominally consenting respondent may failto provide specific forms of information (eg account numbers) required toperform the linkage to the external source the external source itself may besubject to incomplete-data problems or there may be other problems with im-perfect linkage as outlined in Herzog et al (2007) Consequently it can be use-ful to consider a decomposition
pL x bLeth THORN frac14 pC x bCeth THORN pLjC x bLjC
(21)
where pL x bLeth THORN is the probability that a unit with predictor variable x will ul-timately have a successful linkage to a given data source pC x bCeth THORN is theprobability that this unit will provide nominal informed consent to link
pLjC x bLjC
is the probability of successful linkage conditional on the unit
providing consent and bL bC and bLjC are the parameter vectors for the three
respective probability models Note that to some degree the first factorpC x bCeth THORN is analogous to the probability of unit response in a standard survey
setting and the second factor pLjC x bLjC
is analogous to the probability of
section or item response In addition note that the conditional probabilities
pLjC x bLjC
may depend on a wide range of factors including perceived
sensitivity of a given set of linkage variables that the respondent may need toprovide
23 Options for Exploratory Analyses of Consent-to-Link
Ideally one would explore informed-consent issues by estimating all of theparameters of model (21) and by evaluating potential non-consent-basedbiases for estimators of a large number of population parameters of interestHowever in-depth empirical work with consent for record linkage imposes asubstantial burden on field data collection In addition large-scale implementa-tion of record linkage incurs substantial costs related to production systemsand ldquodata cleaningrdquo for the variables on which we intend to link and also incursa risk of disruption of the ongoing survey production process Consequently itis important to identify alternative approaches that allow initial exploration ofsome aspects of model (21) with considerably lower costs and risks For ex-ample one could consider the following sequence of exploratory options
126 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
(i) Simple lab studies This step has the advantages of not disruptingproduction and relatively low costs However results may not align withldquoreal worldrdquo production conditions (per the preceding literature results oninterviewer characteristics) nor full population coverage (per the com-ments on respondent demographics and socio-environmental features)
(ii) Addition of simple consent-to-link questions to standard productioninstruments This approach may incur a relatively low risk of disruptionof production and relatively low incremental costs and has the advantageof being naturally embedded in production conditions
(iii) Same as (ii) but with actual linkage to administrative records Thisapproach incurs some additional cost (to carry out record linkage and re-lated data-management and cleaning processes) In addition this optionmay incur some additional respondent burden arising from collection ofinformation required to enhance the probability of a successful link Onthe other hand option (iii) allows assessment of additional linkage-relatedissues (eg cases in which conditional linkage probabilities
pLjC x bLjC
are less than 1)
(iv) Full-scale field tests of consent-to-link This option will incur highercosts and higher risks of disruption of production processes but will gen-erally be considered necessary before an organization makes a final deci-sion to implement record linkage in production processes Also in somecases interviewer attitudes and behaviors may differ between cases (ii)and (iv)
Due to the balance of potential costs risks and benefits arising from options(i) through (iv) it may be useful to focus initial exploratory attention onoptions (i) and (ii) and then consider use of options (iii) and (iv) for cases inwhich the initial results indicate reasonable prospects for successfulimplementation
The current paper presents a case study of option (ii) for the ConsumerExpenditure Survey with principal emphasis on evaluation of the extent towhich variability in the propensity to consent may lead to biases in unadjustedestimators of some commonly studied economic variables when restricted toconsenting sample units and evaluation of the extent to which simplepropensity-based adjustments may reduce those potential biases
3 DATA AND METHODS
31 Possible Linkage of Government Records with Sample Units fromthe US Consumer Expenditure Survey
In this paper we extend the survey-based approach to assessing consent pro-pensity and consent bias in the context of a large household expenditure
Exploratory Assessment of Consent-to-Link 127
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
findings (Olson 1999 Jenkins Cappellari Lynn Jackle and Sala 2006 AlBaghal et al 2014 Mostafa 2015)
212 Respondent attitudes Attitudes can have a powerful impact on thoughtand behavior and there is a long history of survey researchers attempting tomeasure respondentsrsquo attitudes and their impact on various survey outcomes(eg Goyder 1986) Particular attention has been given to respondentsrsquo atti-tudes about privacy and confidentiality Research conducted by the US CensusBureau going back to the 1990s demonstrates that concerns about personal pri-vacy and data confidentiality have increased in the general public and thatthese attitudes are associated with lower participation rates in the decennialcensus (Singer 1993 2003) and more negative attitudes toward the use of ad-ministrative records (Singer Bates and Van Hoewyk 2011) Privacy and con-fidentiality concerns can influence record linkage consent as well Both directmeasures of privacy concerns (respondent self-reports) and indirect indicators(item refusals on financial questions) have been shown to be negatively associ-ated with consent (Sakshaug et al 2012 Sala Knies and Burton 2014Mostafa 2015) Similarly Sakshaug et al (2012) demonstrated that the moreconfidentiality-related concerns respondents expressed to interviewers in a pre-vious survey wave the less likely they were to subsequently consent to datalinkage And Sala et al (2014) found that concern about data confidentialitywas the most frequent reason given by respondents who declined a linkage re-quest There also is evidence that trust (in other people in government) andcivic engagement (volunteering political involvement) are positively related toconsent (Sala et al 2012 Al Baghal et al 2014)
213 Saliency Respondentsrsquo interest in topics related to the record requestor their experiences with organizations that house those records can also affectconsent decisions For example a number of studies have found that respond-ents have a higher propensity to accept medical consent requests when they arein poorer health or have symptoms germane to the survey subject (Woolf et al2000 Dunn et al 2004 Dahlhamer and Cox 2007 Beebe et al 2011) One ex-planation for this finding is that consent requests on topics salient to respond-ents enhance the perceived benefits of record linkage (eg morecomprehensive medical evaluation or the general advancement of knowledgeabout a disease relevant to the respondent) or reduce the perceived risks(eg by inducing more extensive cognitive processing of the request) (GrovesSinger and Corning 2000) In addition to topic saliency respondentsrsquo existingrelationships with government agencies also can play a role in their consentdecisions Studies by Sala et al (2012) Sakshaug et al (2012) and Mostafa(2015) for example found that individuals who received government benefits(eg welfare food stamps veteransrsquo benefits) were more likely to consent toeconomic data linkage than those who did not These results again suggest that
122 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the salience of (and attitudes toward) service-providing government agenciesmay make some respondents more amenable to linkage requests involvingthose agencies
214 Socio-environmental features The respondentsrsquo environments help toshape the context in which consent decisions are made and a handful of stud-ies have examined associations between area characteristics and attitudes to-ward use of administrative records and consent decisions For example Singeret al (2011) found that individuals living in the South and Mid-Atlanticregions of the country had more favorable attitudes about administrative recorduse by the US Census Bureau than those living in other regions of the countryStudies of actual consent and linkage rates have demonstrated regional varia-tions as well with higher rates in the South and Midwest and lower rates inparts of the Northeast (eg Olson 1999 Dahlhamer and Cox 2007) Consentrates also can vary by urban status Consistent with urbanicity effects seen inthe literature on survey participation and pro-social behavior (Groves andCouper 1998 Mattis Hammond Grayman Bonacci Brennan et al 2009)respondents living in urban areas have been found to be less likely to consentthan those living in non-urban areas (cf Jenkins et al 2006 Dahlhamer andCox 2007 Al Baghal et al 2014 who show a marginally significant positiveeffect for urbanicity) Together such area effects may indicate the influence ofunderlying ecological factors within those communities (eg differences inpopulation density crime social engagement) but may also reflect differencesin survey operations (eg in staff protocol and training) clustered withinthose geographic areas A recent study by Mostafa (2015) found area charac-teristics by themselves added little explanatory power to models of consentpropensity suggesting that respondent and interview characteristics may bemore important factors
215 Interviewer characteristics Interviewer attributes and behaviors canhave significant impact on survey participation and data quality(OrsquoMuircheartaigh and Campanelli 1999 West Kreuter and Jaenichen 2013)including linkage consent decisions Studies investigating the impact of inter-viewer demographics generally find that they are unrelated to the consent out-come (Sakshaug et al 2012 Sala et al 2012) although there is some evidenceof a positive effect of interviewer age on consent (Krobmacher and Schroeder2013 Al Baghal et al 2014) Interviewer experience has shown mixed effectsThe amount of time spent working as an interviewer overall (ie job tenure) iseither unrelated to consent (Sakshaug et al 2012 Sala et al 2012) or can actu-ally have a small negative impact (Sakshaug et al 2013 Al Baghal et al2014) Interviewersrsquo survey-specific experience as measured by the number ofinterviews already completed prior to the current consent request shows simi-lar effects (Sakshaug et al 2012 Sala et al 2012) One aspect of interviewer
Exploratory Assessment of Consent-to-Link 123
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
experience that is positively related to consent in these studies is past perfor-mance in gaining respondent consent Sala et al (2012) and Sakshaug et al(2012) found that the likelihood of consent increased with the number of con-sents obtained earlier in the field period These authors also attempted to iden-tify interviewer personality traits and attitudes that could affect respondentconsent decisions but largely failed to find significant effects The one excep-tion was that respondent consent was positively related to interviewersrsquo ownwillingness to consent to linkage (Sakshaug et al 2012)
216 Survey design features The way in which the consent requests are ad-ministered can impact linkage consent Consent rates appear to be higher inface-to-face surveys than in phone surveys though there are relatively fewmode studies that have examined this phenomenon (Fulton 2012) Consentquestions that ask respondents to provide personal identifiers (eg full or par-tial SSNs) as matching variables produce lower consent rates than those thatdo not (Bates 2005) This finding and advancements in statistical matchingtechniques prompted the Census Bureau to change its approach to gaining link-age consent in 2006 and it has since adopted a passive opt-out consent proce-dure in which respondents are informed of the intent to link and consent isassumed unless respondents explicitly object (McNabb Timmons Song andPuckett 2009) These implicit consent procedures (as they are sometimescalled) result in higher consent rates than opt-in approaches where respondentsmust affirmatively state their consent (Bates 2005 Pascale 2011) See also Dasand Couper (2014)
Since most surveys employ opt-in formats researchers have focused on thepotential effects of the wording or framing of these questions Consent framingexperiments vary factors mentioned in the request that are thought to be per-suasive to respondents for example highlighting the quality benefits associ-ated with linkage the reduction in survey collection costs or the time savingsfor respondents Evidence of framing effects in these studies is surprisinglyweak however Bates Wroblewski and Pascale (2012) found that respondentsreported more positive attitudes toward record linkage under cost- and time-savings frames but the study did not measure actual linkage consent propensi-ties In Sakshaug and Kreuter (2014) a time-savings frame produced higherconsent rates for web survey respondents than a neutrally worded consentquestion but this is the only study in the literature to find significant question-framing effects (Pascale 2011 Sakshaug et al 2013)
The timing of consent requests appears to have some influence on likelihoodof consent Although it is common practice to delay asking the most sensitiveitems like linkage-consent requests until near the end of the questionnaire re-cent empirical evidence indicates that this may not be optimal Sakshaug et al(2013) found that respondents were more amenable to consent requests admin-istered at the beginning of the survey than at the end and suggest that the
124 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
proximity of the survey and linkage requests may reinforce respondentsrsquo desireor inclination to be consistent (ie agree to both) This result could inform po-tentially promising adaptive design interventions (eg asking for consent earlyin the interview then skipping subsequent burdensome questions for consent-ers) Sala et al (2014) obtained higher consent rates when the request wasasked immediately following a series of questions on a related topic rather thanwaiting until the end of the survey The authors reasoned that contextual place-ment of the linkage question increases the salience of the request and inducesmore careful consideration by respondents Both explanations find some sup-port in the broader psychological literature on compliance cognitive disso-nance and context effects but further research is needed to evaluate these andother mechanisms (eg survey fatigue) underlying consent placement effects(Sala et al 2014)
22 Analytic Approaches to Assessing Consent Bias
Early linkage consent studies simply looked for evidence of sample bias (iedifferences in sample composition for consenters and non-consenters) Mostrecent studies use logistic regression models to identify factors that influenceconsent and infer potential consent bias (eg differential consent to medical-records linkage by respondent health status) Several studies have employedmulti-level models to assess the impact of interviewers on consent propensity(eg Fulton 2012 Sala et al 2012 Mostafa 2015) and others have jointlymodeled respondentsrsquo consent propensities on multiple consent items in agiven survey (e g Mostafa 2015) Studies that have examined direct estimatesof consent bias using administrative records available for both consenters andnon-consenters are much less common in the literature Recent research bySakshaug and his colleagues are notable exceptions (Sakshaug and Kreuter2012 Sakshaug and Huber 2016) Using administrative data linked to Germanpanel survey data they have compared the magnitude of consent biases to biasestimates for other error sources (nonresponse measurement) and the longitu-dinal changes in these biases
Given evidence of potential consent bias (eg differential consent by spe-cific demographic groups) one promising approach that has not yet been ex-plored in this literature is to adopt propensity weighting methods that arewidely used in nonresponse adjustment Traditionally propensity weightednonresponse adjustments are accomplished by modeling response propensitiesusing logistic regression and auxiliary data available for both respondents andnonrespondents and then using the inverse of the modeled propensity as aweight adjustment factor (Little 1986) If the predicted propensity is unbiasedthis adjustment method may reduce the potential bias due to nonresponse Ofcourse bias reduction is predicated on correct model specification and thismay be particularly challenging in consent propensity applications given the
Exploratory Assessment of Consent-to-Link 125
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
absence of well-developed consent theories and inconsistency in the effects ofmany its predictors
Application of these ideas in the analysis of ldquoconsent-to-linkrdquo patterns iscomplicated by two issues First the decision of a respondent to provide formalconsent to link does not guarantee the successful linkage of that unit to a givenexternal data source For example a nominally consenting respondent may failto provide specific forms of information (eg account numbers) required toperform the linkage to the external source the external source itself may besubject to incomplete-data problems or there may be other problems with im-perfect linkage as outlined in Herzog et al (2007) Consequently it can be use-ful to consider a decomposition
pL x bLeth THORN frac14 pC x bCeth THORN pLjC x bLjC
(21)
where pL x bLeth THORN is the probability that a unit with predictor variable x will ul-timately have a successful linkage to a given data source pC x bCeth THORN is theprobability that this unit will provide nominal informed consent to link
pLjC x bLjC
is the probability of successful linkage conditional on the unit
providing consent and bL bC and bLjC are the parameter vectors for the three
respective probability models Note that to some degree the first factorpC x bCeth THORN is analogous to the probability of unit response in a standard survey
setting and the second factor pLjC x bLjC
is analogous to the probability of
section or item response In addition note that the conditional probabilities
pLjC x bLjC
may depend on a wide range of factors including perceived
sensitivity of a given set of linkage variables that the respondent may need toprovide
23 Options for Exploratory Analyses of Consent-to-Link
Ideally one would explore informed-consent issues by estimating all of theparameters of model (21) and by evaluating potential non-consent-basedbiases for estimators of a large number of population parameters of interestHowever in-depth empirical work with consent for record linkage imposes asubstantial burden on field data collection In addition large-scale implementa-tion of record linkage incurs substantial costs related to production systemsand ldquodata cleaningrdquo for the variables on which we intend to link and also incursa risk of disruption of the ongoing survey production process Consequently itis important to identify alternative approaches that allow initial exploration ofsome aspects of model (21) with considerably lower costs and risks For ex-ample one could consider the following sequence of exploratory options
126 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
(i) Simple lab studies This step has the advantages of not disruptingproduction and relatively low costs However results may not align withldquoreal worldrdquo production conditions (per the preceding literature results oninterviewer characteristics) nor full population coverage (per the com-ments on respondent demographics and socio-environmental features)
(ii) Addition of simple consent-to-link questions to standard productioninstruments This approach may incur a relatively low risk of disruptionof production and relatively low incremental costs and has the advantageof being naturally embedded in production conditions
(iii) Same as (ii) but with actual linkage to administrative records Thisapproach incurs some additional cost (to carry out record linkage and re-lated data-management and cleaning processes) In addition this optionmay incur some additional respondent burden arising from collection ofinformation required to enhance the probability of a successful link Onthe other hand option (iii) allows assessment of additional linkage-relatedissues (eg cases in which conditional linkage probabilities
pLjC x bLjC
are less than 1)
(iv) Full-scale field tests of consent-to-link This option will incur highercosts and higher risks of disruption of production processes but will gen-erally be considered necessary before an organization makes a final deci-sion to implement record linkage in production processes Also in somecases interviewer attitudes and behaviors may differ between cases (ii)and (iv)
Due to the balance of potential costs risks and benefits arising from options(i) through (iv) it may be useful to focus initial exploratory attention onoptions (i) and (ii) and then consider use of options (iii) and (iv) for cases inwhich the initial results indicate reasonable prospects for successfulimplementation
The current paper presents a case study of option (ii) for the ConsumerExpenditure Survey with principal emphasis on evaluation of the extent towhich variability in the propensity to consent may lead to biases in unadjustedestimators of some commonly studied economic variables when restricted toconsenting sample units and evaluation of the extent to which simplepropensity-based adjustments may reduce those potential biases
3 DATA AND METHODS
31 Possible Linkage of Government Records with Sample Units fromthe US Consumer Expenditure Survey
In this paper we extend the survey-based approach to assessing consent pro-pensity and consent bias in the context of a large household expenditure
Exploratory Assessment of Consent-to-Link 127
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the salience of (and attitudes toward) service-providing government agenciesmay make some respondents more amenable to linkage requests involvingthose agencies
214 Socio-environmental features The respondentsrsquo environments help toshape the context in which consent decisions are made and a handful of stud-ies have examined associations between area characteristics and attitudes to-ward use of administrative records and consent decisions For example Singeret al (2011) found that individuals living in the South and Mid-Atlanticregions of the country had more favorable attitudes about administrative recorduse by the US Census Bureau than those living in other regions of the countryStudies of actual consent and linkage rates have demonstrated regional varia-tions as well with higher rates in the South and Midwest and lower rates inparts of the Northeast (eg Olson 1999 Dahlhamer and Cox 2007) Consentrates also can vary by urban status Consistent with urbanicity effects seen inthe literature on survey participation and pro-social behavior (Groves andCouper 1998 Mattis Hammond Grayman Bonacci Brennan et al 2009)respondents living in urban areas have been found to be less likely to consentthan those living in non-urban areas (cf Jenkins et al 2006 Dahlhamer andCox 2007 Al Baghal et al 2014 who show a marginally significant positiveeffect for urbanicity) Together such area effects may indicate the influence ofunderlying ecological factors within those communities (eg differences inpopulation density crime social engagement) but may also reflect differencesin survey operations (eg in staff protocol and training) clustered withinthose geographic areas A recent study by Mostafa (2015) found area charac-teristics by themselves added little explanatory power to models of consentpropensity suggesting that respondent and interview characteristics may bemore important factors
215 Interviewer characteristics Interviewer attributes and behaviors canhave significant impact on survey participation and data quality(OrsquoMuircheartaigh and Campanelli 1999 West Kreuter and Jaenichen 2013)including linkage consent decisions Studies investigating the impact of inter-viewer demographics generally find that they are unrelated to the consent out-come (Sakshaug et al 2012 Sala et al 2012) although there is some evidenceof a positive effect of interviewer age on consent (Krobmacher and Schroeder2013 Al Baghal et al 2014) Interviewer experience has shown mixed effectsThe amount of time spent working as an interviewer overall (ie job tenure) iseither unrelated to consent (Sakshaug et al 2012 Sala et al 2012) or can actu-ally have a small negative impact (Sakshaug et al 2013 Al Baghal et al2014) Interviewersrsquo survey-specific experience as measured by the number ofinterviews already completed prior to the current consent request shows simi-lar effects (Sakshaug et al 2012 Sala et al 2012) One aspect of interviewer
Exploratory Assessment of Consent-to-Link 123
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
experience that is positively related to consent in these studies is past perfor-mance in gaining respondent consent Sala et al (2012) and Sakshaug et al(2012) found that the likelihood of consent increased with the number of con-sents obtained earlier in the field period These authors also attempted to iden-tify interviewer personality traits and attitudes that could affect respondentconsent decisions but largely failed to find significant effects The one excep-tion was that respondent consent was positively related to interviewersrsquo ownwillingness to consent to linkage (Sakshaug et al 2012)
216 Survey design features The way in which the consent requests are ad-ministered can impact linkage consent Consent rates appear to be higher inface-to-face surveys than in phone surveys though there are relatively fewmode studies that have examined this phenomenon (Fulton 2012) Consentquestions that ask respondents to provide personal identifiers (eg full or par-tial SSNs) as matching variables produce lower consent rates than those thatdo not (Bates 2005) This finding and advancements in statistical matchingtechniques prompted the Census Bureau to change its approach to gaining link-age consent in 2006 and it has since adopted a passive opt-out consent proce-dure in which respondents are informed of the intent to link and consent isassumed unless respondents explicitly object (McNabb Timmons Song andPuckett 2009) These implicit consent procedures (as they are sometimescalled) result in higher consent rates than opt-in approaches where respondentsmust affirmatively state their consent (Bates 2005 Pascale 2011) See also Dasand Couper (2014)
Since most surveys employ opt-in formats researchers have focused on thepotential effects of the wording or framing of these questions Consent framingexperiments vary factors mentioned in the request that are thought to be per-suasive to respondents for example highlighting the quality benefits associ-ated with linkage the reduction in survey collection costs or the time savingsfor respondents Evidence of framing effects in these studies is surprisinglyweak however Bates Wroblewski and Pascale (2012) found that respondentsreported more positive attitudes toward record linkage under cost- and time-savings frames but the study did not measure actual linkage consent propensi-ties In Sakshaug and Kreuter (2014) a time-savings frame produced higherconsent rates for web survey respondents than a neutrally worded consentquestion but this is the only study in the literature to find significant question-framing effects (Pascale 2011 Sakshaug et al 2013)
The timing of consent requests appears to have some influence on likelihoodof consent Although it is common practice to delay asking the most sensitiveitems like linkage-consent requests until near the end of the questionnaire re-cent empirical evidence indicates that this may not be optimal Sakshaug et al(2013) found that respondents were more amenable to consent requests admin-istered at the beginning of the survey than at the end and suggest that the
124 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
proximity of the survey and linkage requests may reinforce respondentsrsquo desireor inclination to be consistent (ie agree to both) This result could inform po-tentially promising adaptive design interventions (eg asking for consent earlyin the interview then skipping subsequent burdensome questions for consent-ers) Sala et al (2014) obtained higher consent rates when the request wasasked immediately following a series of questions on a related topic rather thanwaiting until the end of the survey The authors reasoned that contextual place-ment of the linkage question increases the salience of the request and inducesmore careful consideration by respondents Both explanations find some sup-port in the broader psychological literature on compliance cognitive disso-nance and context effects but further research is needed to evaluate these andother mechanisms (eg survey fatigue) underlying consent placement effects(Sala et al 2014)
22 Analytic Approaches to Assessing Consent Bias
Early linkage consent studies simply looked for evidence of sample bias (iedifferences in sample composition for consenters and non-consenters) Mostrecent studies use logistic regression models to identify factors that influenceconsent and infer potential consent bias (eg differential consent to medical-records linkage by respondent health status) Several studies have employedmulti-level models to assess the impact of interviewers on consent propensity(eg Fulton 2012 Sala et al 2012 Mostafa 2015) and others have jointlymodeled respondentsrsquo consent propensities on multiple consent items in agiven survey (e g Mostafa 2015) Studies that have examined direct estimatesof consent bias using administrative records available for both consenters andnon-consenters are much less common in the literature Recent research bySakshaug and his colleagues are notable exceptions (Sakshaug and Kreuter2012 Sakshaug and Huber 2016) Using administrative data linked to Germanpanel survey data they have compared the magnitude of consent biases to biasestimates for other error sources (nonresponse measurement) and the longitu-dinal changes in these biases
Given evidence of potential consent bias (eg differential consent by spe-cific demographic groups) one promising approach that has not yet been ex-plored in this literature is to adopt propensity weighting methods that arewidely used in nonresponse adjustment Traditionally propensity weightednonresponse adjustments are accomplished by modeling response propensitiesusing logistic regression and auxiliary data available for both respondents andnonrespondents and then using the inverse of the modeled propensity as aweight adjustment factor (Little 1986) If the predicted propensity is unbiasedthis adjustment method may reduce the potential bias due to nonresponse Ofcourse bias reduction is predicated on correct model specification and thismay be particularly challenging in consent propensity applications given the
Exploratory Assessment of Consent-to-Link 125
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
absence of well-developed consent theories and inconsistency in the effects ofmany its predictors
Application of these ideas in the analysis of ldquoconsent-to-linkrdquo patterns iscomplicated by two issues First the decision of a respondent to provide formalconsent to link does not guarantee the successful linkage of that unit to a givenexternal data source For example a nominally consenting respondent may failto provide specific forms of information (eg account numbers) required toperform the linkage to the external source the external source itself may besubject to incomplete-data problems or there may be other problems with im-perfect linkage as outlined in Herzog et al (2007) Consequently it can be use-ful to consider a decomposition
pL x bLeth THORN frac14 pC x bCeth THORN pLjC x bLjC
(21)
where pL x bLeth THORN is the probability that a unit with predictor variable x will ul-timately have a successful linkage to a given data source pC x bCeth THORN is theprobability that this unit will provide nominal informed consent to link
pLjC x bLjC
is the probability of successful linkage conditional on the unit
providing consent and bL bC and bLjC are the parameter vectors for the three
respective probability models Note that to some degree the first factorpC x bCeth THORN is analogous to the probability of unit response in a standard survey
setting and the second factor pLjC x bLjC
is analogous to the probability of
section or item response In addition note that the conditional probabilities
pLjC x bLjC
may depend on a wide range of factors including perceived
sensitivity of a given set of linkage variables that the respondent may need toprovide
23 Options for Exploratory Analyses of Consent-to-Link
Ideally one would explore informed-consent issues by estimating all of theparameters of model (21) and by evaluating potential non-consent-basedbiases for estimators of a large number of population parameters of interestHowever in-depth empirical work with consent for record linkage imposes asubstantial burden on field data collection In addition large-scale implementa-tion of record linkage incurs substantial costs related to production systemsand ldquodata cleaningrdquo for the variables on which we intend to link and also incursa risk of disruption of the ongoing survey production process Consequently itis important to identify alternative approaches that allow initial exploration ofsome aspects of model (21) with considerably lower costs and risks For ex-ample one could consider the following sequence of exploratory options
126 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
(i) Simple lab studies This step has the advantages of not disruptingproduction and relatively low costs However results may not align withldquoreal worldrdquo production conditions (per the preceding literature results oninterviewer characteristics) nor full population coverage (per the com-ments on respondent demographics and socio-environmental features)
(ii) Addition of simple consent-to-link questions to standard productioninstruments This approach may incur a relatively low risk of disruptionof production and relatively low incremental costs and has the advantageof being naturally embedded in production conditions
(iii) Same as (ii) but with actual linkage to administrative records Thisapproach incurs some additional cost (to carry out record linkage and re-lated data-management and cleaning processes) In addition this optionmay incur some additional respondent burden arising from collection ofinformation required to enhance the probability of a successful link Onthe other hand option (iii) allows assessment of additional linkage-relatedissues (eg cases in which conditional linkage probabilities
pLjC x bLjC
are less than 1)
(iv) Full-scale field tests of consent-to-link This option will incur highercosts and higher risks of disruption of production processes but will gen-erally be considered necessary before an organization makes a final deci-sion to implement record linkage in production processes Also in somecases interviewer attitudes and behaviors may differ between cases (ii)and (iv)
Due to the balance of potential costs risks and benefits arising from options(i) through (iv) it may be useful to focus initial exploratory attention onoptions (i) and (ii) and then consider use of options (iii) and (iv) for cases inwhich the initial results indicate reasonable prospects for successfulimplementation
The current paper presents a case study of option (ii) for the ConsumerExpenditure Survey with principal emphasis on evaluation of the extent towhich variability in the propensity to consent may lead to biases in unadjustedestimators of some commonly studied economic variables when restricted toconsenting sample units and evaluation of the extent to which simplepropensity-based adjustments may reduce those potential biases
3 DATA AND METHODS
31 Possible Linkage of Government Records with Sample Units fromthe US Consumer Expenditure Survey
In this paper we extend the survey-based approach to assessing consent pro-pensity and consent bias in the context of a large household expenditure
Exploratory Assessment of Consent-to-Link 127
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
experience that is positively related to consent in these studies is past perfor-mance in gaining respondent consent Sala et al (2012) and Sakshaug et al(2012) found that the likelihood of consent increased with the number of con-sents obtained earlier in the field period These authors also attempted to iden-tify interviewer personality traits and attitudes that could affect respondentconsent decisions but largely failed to find significant effects The one excep-tion was that respondent consent was positively related to interviewersrsquo ownwillingness to consent to linkage (Sakshaug et al 2012)
216 Survey design features The way in which the consent requests are ad-ministered can impact linkage consent Consent rates appear to be higher inface-to-face surveys than in phone surveys though there are relatively fewmode studies that have examined this phenomenon (Fulton 2012) Consentquestions that ask respondents to provide personal identifiers (eg full or par-tial SSNs) as matching variables produce lower consent rates than those thatdo not (Bates 2005) This finding and advancements in statistical matchingtechniques prompted the Census Bureau to change its approach to gaining link-age consent in 2006 and it has since adopted a passive opt-out consent proce-dure in which respondents are informed of the intent to link and consent isassumed unless respondents explicitly object (McNabb Timmons Song andPuckett 2009) These implicit consent procedures (as they are sometimescalled) result in higher consent rates than opt-in approaches where respondentsmust affirmatively state their consent (Bates 2005 Pascale 2011) See also Dasand Couper (2014)
Since most surveys employ opt-in formats researchers have focused on thepotential effects of the wording or framing of these questions Consent framingexperiments vary factors mentioned in the request that are thought to be per-suasive to respondents for example highlighting the quality benefits associ-ated with linkage the reduction in survey collection costs or the time savingsfor respondents Evidence of framing effects in these studies is surprisinglyweak however Bates Wroblewski and Pascale (2012) found that respondentsreported more positive attitudes toward record linkage under cost- and time-savings frames but the study did not measure actual linkage consent propensi-ties In Sakshaug and Kreuter (2014) a time-savings frame produced higherconsent rates for web survey respondents than a neutrally worded consentquestion but this is the only study in the literature to find significant question-framing effects (Pascale 2011 Sakshaug et al 2013)
The timing of consent requests appears to have some influence on likelihoodof consent Although it is common practice to delay asking the most sensitiveitems like linkage-consent requests until near the end of the questionnaire re-cent empirical evidence indicates that this may not be optimal Sakshaug et al(2013) found that respondents were more amenable to consent requests admin-istered at the beginning of the survey than at the end and suggest that the
124 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
proximity of the survey and linkage requests may reinforce respondentsrsquo desireor inclination to be consistent (ie agree to both) This result could inform po-tentially promising adaptive design interventions (eg asking for consent earlyin the interview then skipping subsequent burdensome questions for consent-ers) Sala et al (2014) obtained higher consent rates when the request wasasked immediately following a series of questions on a related topic rather thanwaiting until the end of the survey The authors reasoned that contextual place-ment of the linkage question increases the salience of the request and inducesmore careful consideration by respondents Both explanations find some sup-port in the broader psychological literature on compliance cognitive disso-nance and context effects but further research is needed to evaluate these andother mechanisms (eg survey fatigue) underlying consent placement effects(Sala et al 2014)
22 Analytic Approaches to Assessing Consent Bias
Early linkage consent studies simply looked for evidence of sample bias (iedifferences in sample composition for consenters and non-consenters) Mostrecent studies use logistic regression models to identify factors that influenceconsent and infer potential consent bias (eg differential consent to medical-records linkage by respondent health status) Several studies have employedmulti-level models to assess the impact of interviewers on consent propensity(eg Fulton 2012 Sala et al 2012 Mostafa 2015) and others have jointlymodeled respondentsrsquo consent propensities on multiple consent items in agiven survey (e g Mostafa 2015) Studies that have examined direct estimatesof consent bias using administrative records available for both consenters andnon-consenters are much less common in the literature Recent research bySakshaug and his colleagues are notable exceptions (Sakshaug and Kreuter2012 Sakshaug and Huber 2016) Using administrative data linked to Germanpanel survey data they have compared the magnitude of consent biases to biasestimates for other error sources (nonresponse measurement) and the longitu-dinal changes in these biases
Given evidence of potential consent bias (eg differential consent by spe-cific demographic groups) one promising approach that has not yet been ex-plored in this literature is to adopt propensity weighting methods that arewidely used in nonresponse adjustment Traditionally propensity weightednonresponse adjustments are accomplished by modeling response propensitiesusing logistic regression and auxiliary data available for both respondents andnonrespondents and then using the inverse of the modeled propensity as aweight adjustment factor (Little 1986) If the predicted propensity is unbiasedthis adjustment method may reduce the potential bias due to nonresponse Ofcourse bias reduction is predicated on correct model specification and thismay be particularly challenging in consent propensity applications given the
Exploratory Assessment of Consent-to-Link 125
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
absence of well-developed consent theories and inconsistency in the effects ofmany its predictors
Application of these ideas in the analysis of ldquoconsent-to-linkrdquo patterns iscomplicated by two issues First the decision of a respondent to provide formalconsent to link does not guarantee the successful linkage of that unit to a givenexternal data source For example a nominally consenting respondent may failto provide specific forms of information (eg account numbers) required toperform the linkage to the external source the external source itself may besubject to incomplete-data problems or there may be other problems with im-perfect linkage as outlined in Herzog et al (2007) Consequently it can be use-ful to consider a decomposition
pL x bLeth THORN frac14 pC x bCeth THORN pLjC x bLjC
(21)
where pL x bLeth THORN is the probability that a unit with predictor variable x will ul-timately have a successful linkage to a given data source pC x bCeth THORN is theprobability that this unit will provide nominal informed consent to link
pLjC x bLjC
is the probability of successful linkage conditional on the unit
providing consent and bL bC and bLjC are the parameter vectors for the three
respective probability models Note that to some degree the first factorpC x bCeth THORN is analogous to the probability of unit response in a standard survey
setting and the second factor pLjC x bLjC
is analogous to the probability of
section or item response In addition note that the conditional probabilities
pLjC x bLjC
may depend on a wide range of factors including perceived
sensitivity of a given set of linkage variables that the respondent may need toprovide
23 Options for Exploratory Analyses of Consent-to-Link
Ideally one would explore informed-consent issues by estimating all of theparameters of model (21) and by evaluating potential non-consent-basedbiases for estimators of a large number of population parameters of interestHowever in-depth empirical work with consent for record linkage imposes asubstantial burden on field data collection In addition large-scale implementa-tion of record linkage incurs substantial costs related to production systemsand ldquodata cleaningrdquo for the variables on which we intend to link and also incursa risk of disruption of the ongoing survey production process Consequently itis important to identify alternative approaches that allow initial exploration ofsome aspects of model (21) with considerably lower costs and risks For ex-ample one could consider the following sequence of exploratory options
126 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
(i) Simple lab studies This step has the advantages of not disruptingproduction and relatively low costs However results may not align withldquoreal worldrdquo production conditions (per the preceding literature results oninterviewer characteristics) nor full population coverage (per the com-ments on respondent demographics and socio-environmental features)
(ii) Addition of simple consent-to-link questions to standard productioninstruments This approach may incur a relatively low risk of disruptionof production and relatively low incremental costs and has the advantageof being naturally embedded in production conditions
(iii) Same as (ii) but with actual linkage to administrative records Thisapproach incurs some additional cost (to carry out record linkage and re-lated data-management and cleaning processes) In addition this optionmay incur some additional respondent burden arising from collection ofinformation required to enhance the probability of a successful link Onthe other hand option (iii) allows assessment of additional linkage-relatedissues (eg cases in which conditional linkage probabilities
pLjC x bLjC
are less than 1)
(iv) Full-scale field tests of consent-to-link This option will incur highercosts and higher risks of disruption of production processes but will gen-erally be considered necessary before an organization makes a final deci-sion to implement record linkage in production processes Also in somecases interviewer attitudes and behaviors may differ between cases (ii)and (iv)
Due to the balance of potential costs risks and benefits arising from options(i) through (iv) it may be useful to focus initial exploratory attention onoptions (i) and (ii) and then consider use of options (iii) and (iv) for cases inwhich the initial results indicate reasonable prospects for successfulimplementation
The current paper presents a case study of option (ii) for the ConsumerExpenditure Survey with principal emphasis on evaluation of the extent towhich variability in the propensity to consent may lead to biases in unadjustedestimators of some commonly studied economic variables when restricted toconsenting sample units and evaluation of the extent to which simplepropensity-based adjustments may reduce those potential biases
3 DATA AND METHODS
31 Possible Linkage of Government Records with Sample Units fromthe US Consumer Expenditure Survey
In this paper we extend the survey-based approach to assessing consent pro-pensity and consent bias in the context of a large household expenditure
Exploratory Assessment of Consent-to-Link 127
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
proximity of the survey and linkage requests may reinforce respondentsrsquo desireor inclination to be consistent (ie agree to both) This result could inform po-tentially promising adaptive design interventions (eg asking for consent earlyin the interview then skipping subsequent burdensome questions for consent-ers) Sala et al (2014) obtained higher consent rates when the request wasasked immediately following a series of questions on a related topic rather thanwaiting until the end of the survey The authors reasoned that contextual place-ment of the linkage question increases the salience of the request and inducesmore careful consideration by respondents Both explanations find some sup-port in the broader psychological literature on compliance cognitive disso-nance and context effects but further research is needed to evaluate these andother mechanisms (eg survey fatigue) underlying consent placement effects(Sala et al 2014)
22 Analytic Approaches to Assessing Consent Bias
Early linkage consent studies simply looked for evidence of sample bias (iedifferences in sample composition for consenters and non-consenters) Mostrecent studies use logistic regression models to identify factors that influenceconsent and infer potential consent bias (eg differential consent to medical-records linkage by respondent health status) Several studies have employedmulti-level models to assess the impact of interviewers on consent propensity(eg Fulton 2012 Sala et al 2012 Mostafa 2015) and others have jointlymodeled respondentsrsquo consent propensities on multiple consent items in agiven survey (e g Mostafa 2015) Studies that have examined direct estimatesof consent bias using administrative records available for both consenters andnon-consenters are much less common in the literature Recent research bySakshaug and his colleagues are notable exceptions (Sakshaug and Kreuter2012 Sakshaug and Huber 2016) Using administrative data linked to Germanpanel survey data they have compared the magnitude of consent biases to biasestimates for other error sources (nonresponse measurement) and the longitu-dinal changes in these biases
Given evidence of potential consent bias (eg differential consent by spe-cific demographic groups) one promising approach that has not yet been ex-plored in this literature is to adopt propensity weighting methods that arewidely used in nonresponse adjustment Traditionally propensity weightednonresponse adjustments are accomplished by modeling response propensitiesusing logistic regression and auxiliary data available for both respondents andnonrespondents and then using the inverse of the modeled propensity as aweight adjustment factor (Little 1986) If the predicted propensity is unbiasedthis adjustment method may reduce the potential bias due to nonresponse Ofcourse bias reduction is predicated on correct model specification and thismay be particularly challenging in consent propensity applications given the
Exploratory Assessment of Consent-to-Link 125
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
absence of well-developed consent theories and inconsistency in the effects ofmany its predictors
Application of these ideas in the analysis of ldquoconsent-to-linkrdquo patterns iscomplicated by two issues First the decision of a respondent to provide formalconsent to link does not guarantee the successful linkage of that unit to a givenexternal data source For example a nominally consenting respondent may failto provide specific forms of information (eg account numbers) required toperform the linkage to the external source the external source itself may besubject to incomplete-data problems or there may be other problems with im-perfect linkage as outlined in Herzog et al (2007) Consequently it can be use-ful to consider a decomposition
pL x bLeth THORN frac14 pC x bCeth THORN pLjC x bLjC
(21)
where pL x bLeth THORN is the probability that a unit with predictor variable x will ul-timately have a successful linkage to a given data source pC x bCeth THORN is theprobability that this unit will provide nominal informed consent to link
pLjC x bLjC
is the probability of successful linkage conditional on the unit
providing consent and bL bC and bLjC are the parameter vectors for the three
respective probability models Note that to some degree the first factorpC x bCeth THORN is analogous to the probability of unit response in a standard survey
setting and the second factor pLjC x bLjC
is analogous to the probability of
section or item response In addition note that the conditional probabilities
pLjC x bLjC
may depend on a wide range of factors including perceived
sensitivity of a given set of linkage variables that the respondent may need toprovide
23 Options for Exploratory Analyses of Consent-to-Link
Ideally one would explore informed-consent issues by estimating all of theparameters of model (21) and by evaluating potential non-consent-basedbiases for estimators of a large number of population parameters of interestHowever in-depth empirical work with consent for record linkage imposes asubstantial burden on field data collection In addition large-scale implementa-tion of record linkage incurs substantial costs related to production systemsand ldquodata cleaningrdquo for the variables on which we intend to link and also incursa risk of disruption of the ongoing survey production process Consequently itis important to identify alternative approaches that allow initial exploration ofsome aspects of model (21) with considerably lower costs and risks For ex-ample one could consider the following sequence of exploratory options
126 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
(i) Simple lab studies This step has the advantages of not disruptingproduction and relatively low costs However results may not align withldquoreal worldrdquo production conditions (per the preceding literature results oninterviewer characteristics) nor full population coverage (per the com-ments on respondent demographics and socio-environmental features)
(ii) Addition of simple consent-to-link questions to standard productioninstruments This approach may incur a relatively low risk of disruptionof production and relatively low incremental costs and has the advantageof being naturally embedded in production conditions
(iii) Same as (ii) but with actual linkage to administrative records Thisapproach incurs some additional cost (to carry out record linkage and re-lated data-management and cleaning processes) In addition this optionmay incur some additional respondent burden arising from collection ofinformation required to enhance the probability of a successful link Onthe other hand option (iii) allows assessment of additional linkage-relatedissues (eg cases in which conditional linkage probabilities
pLjC x bLjC
are less than 1)
(iv) Full-scale field tests of consent-to-link This option will incur highercosts and higher risks of disruption of production processes but will gen-erally be considered necessary before an organization makes a final deci-sion to implement record linkage in production processes Also in somecases interviewer attitudes and behaviors may differ between cases (ii)and (iv)
Due to the balance of potential costs risks and benefits arising from options(i) through (iv) it may be useful to focus initial exploratory attention onoptions (i) and (ii) and then consider use of options (iii) and (iv) for cases inwhich the initial results indicate reasonable prospects for successfulimplementation
The current paper presents a case study of option (ii) for the ConsumerExpenditure Survey with principal emphasis on evaluation of the extent towhich variability in the propensity to consent may lead to biases in unadjustedestimators of some commonly studied economic variables when restricted toconsenting sample units and evaluation of the extent to which simplepropensity-based adjustments may reduce those potential biases
3 DATA AND METHODS
31 Possible Linkage of Government Records with Sample Units fromthe US Consumer Expenditure Survey
In this paper we extend the survey-based approach to assessing consent pro-pensity and consent bias in the context of a large household expenditure
Exploratory Assessment of Consent-to-Link 127
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
absence of well-developed consent theories and inconsistency in the effects ofmany its predictors
Application of these ideas in the analysis of ldquoconsent-to-linkrdquo patterns iscomplicated by two issues First the decision of a respondent to provide formalconsent to link does not guarantee the successful linkage of that unit to a givenexternal data source For example a nominally consenting respondent may failto provide specific forms of information (eg account numbers) required toperform the linkage to the external source the external source itself may besubject to incomplete-data problems or there may be other problems with im-perfect linkage as outlined in Herzog et al (2007) Consequently it can be use-ful to consider a decomposition
pL x bLeth THORN frac14 pC x bCeth THORN pLjC x bLjC
(21)
where pL x bLeth THORN is the probability that a unit with predictor variable x will ul-timately have a successful linkage to a given data source pC x bCeth THORN is theprobability that this unit will provide nominal informed consent to link
pLjC x bLjC
is the probability of successful linkage conditional on the unit
providing consent and bL bC and bLjC are the parameter vectors for the three
respective probability models Note that to some degree the first factorpC x bCeth THORN is analogous to the probability of unit response in a standard survey
setting and the second factor pLjC x bLjC
is analogous to the probability of
section or item response In addition note that the conditional probabilities
pLjC x bLjC
may depend on a wide range of factors including perceived
sensitivity of a given set of linkage variables that the respondent may need toprovide
23 Options for Exploratory Analyses of Consent-to-Link
Ideally one would explore informed-consent issues by estimating all of theparameters of model (21) and by evaluating potential non-consent-basedbiases for estimators of a large number of population parameters of interestHowever in-depth empirical work with consent for record linkage imposes asubstantial burden on field data collection In addition large-scale implementa-tion of record linkage incurs substantial costs related to production systemsand ldquodata cleaningrdquo for the variables on which we intend to link and also incursa risk of disruption of the ongoing survey production process Consequently itis important to identify alternative approaches that allow initial exploration ofsome aspects of model (21) with considerably lower costs and risks For ex-ample one could consider the following sequence of exploratory options
126 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
(i) Simple lab studies This step has the advantages of not disruptingproduction and relatively low costs However results may not align withldquoreal worldrdquo production conditions (per the preceding literature results oninterviewer characteristics) nor full population coverage (per the com-ments on respondent demographics and socio-environmental features)
(ii) Addition of simple consent-to-link questions to standard productioninstruments This approach may incur a relatively low risk of disruptionof production and relatively low incremental costs and has the advantageof being naturally embedded in production conditions
(iii) Same as (ii) but with actual linkage to administrative records Thisapproach incurs some additional cost (to carry out record linkage and re-lated data-management and cleaning processes) In addition this optionmay incur some additional respondent burden arising from collection ofinformation required to enhance the probability of a successful link Onthe other hand option (iii) allows assessment of additional linkage-relatedissues (eg cases in which conditional linkage probabilities
pLjC x bLjC
are less than 1)
(iv) Full-scale field tests of consent-to-link This option will incur highercosts and higher risks of disruption of production processes but will gen-erally be considered necessary before an organization makes a final deci-sion to implement record linkage in production processes Also in somecases interviewer attitudes and behaviors may differ between cases (ii)and (iv)
Due to the balance of potential costs risks and benefits arising from options(i) through (iv) it may be useful to focus initial exploratory attention onoptions (i) and (ii) and then consider use of options (iii) and (iv) for cases inwhich the initial results indicate reasonable prospects for successfulimplementation
The current paper presents a case study of option (ii) for the ConsumerExpenditure Survey with principal emphasis on evaluation of the extent towhich variability in the propensity to consent may lead to biases in unadjustedestimators of some commonly studied economic variables when restricted toconsenting sample units and evaluation of the extent to which simplepropensity-based adjustments may reduce those potential biases
3 DATA AND METHODS
31 Possible Linkage of Government Records with Sample Units fromthe US Consumer Expenditure Survey
In this paper we extend the survey-based approach to assessing consent pro-pensity and consent bias in the context of a large household expenditure
Exploratory Assessment of Consent-to-Link 127
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
(i) Simple lab studies This step has the advantages of not disruptingproduction and relatively low costs However results may not align withldquoreal worldrdquo production conditions (per the preceding literature results oninterviewer characteristics) nor full population coverage (per the com-ments on respondent demographics and socio-environmental features)
(ii) Addition of simple consent-to-link questions to standard productioninstruments This approach may incur a relatively low risk of disruptionof production and relatively low incremental costs and has the advantageof being naturally embedded in production conditions
(iii) Same as (ii) but with actual linkage to administrative records Thisapproach incurs some additional cost (to carry out record linkage and re-lated data-management and cleaning processes) In addition this optionmay incur some additional respondent burden arising from collection ofinformation required to enhance the probability of a successful link Onthe other hand option (iii) allows assessment of additional linkage-relatedissues (eg cases in which conditional linkage probabilities
pLjC x bLjC
are less than 1)
(iv) Full-scale field tests of consent-to-link This option will incur highercosts and higher risks of disruption of production processes but will gen-erally be considered necessary before an organization makes a final deci-sion to implement record linkage in production processes Also in somecases interviewer attitudes and behaviors may differ between cases (ii)and (iv)
Due to the balance of potential costs risks and benefits arising from options(i) through (iv) it may be useful to focus initial exploratory attention onoptions (i) and (ii) and then consider use of options (iii) and (iv) for cases inwhich the initial results indicate reasonable prospects for successfulimplementation
The current paper presents a case study of option (ii) for the ConsumerExpenditure Survey with principal emphasis on evaluation of the extent towhich variability in the propensity to consent may lead to biases in unadjustedestimators of some commonly studied economic variables when restricted toconsenting sample units and evaluation of the extent to which simplepropensity-based adjustments may reduce those potential biases
3 DATA AND METHODS
31 Possible Linkage of Government Records with Sample Units fromthe US Consumer Expenditure Survey
In this paper we extend the survey-based approach to assessing consent pro-pensity and consent bias in the context of a large household expenditure
Exploratory Assessment of Consent-to-Link 127
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey the US Consumer Expenditure Quarterly Interview Survey (CEQ)sponsored by the Bureau of Labor Statistics (BLS) The CEQ is an ongoingnationally representative panel survey that collects comprehensive informationon a wide range of consumersrsquo expenditures and incomes as well as the char-acteristics of those consumers It is designed to collect one yearrsquos worth of ex-penditure data from sample units through five interviews the first interview isfor bounding purposes only and the remaining four interviews are conductedat three-month intervals1
It is a rather long and burdensome survey and the ability to link CEQ datato relevant administrative data sources (eg IRS data for incomeassets) couldeliminate the need to ask respondents to report some of these data themselvesIn principle linkage with administrative records also could allow one to cap-ture information that would be difficult or impossible to collect through a sur-vey instrument This latter motivation is of some potential interest for CE butat present may be somewhat secondary relative to reduction of current burdenlevels
The CEQ employs a complex sample using a stratified-clustered design andeach calendar quarter approximately 7100 usable interviews are conducted In2011 the CEQ achieved a response rate of 715 (BLS 2014) The survey isadministered by computer-assisted personal interviewing (CAPI) either bypersonal visit or by telephone Mode selection is determined jointly by the in-terviewer and respondent though personal visits are encouraged particularlyin the second and fifth interviews when more detailed financial information iscollected Telephone interviewing is conducted by the same CE interviewerassigned to the case using the same CAPI instrument as used in the personalvisit interviews
Beginning in 2011 BLS conducted research to explore the feasibility andpotential impacts of integrating administrative data with CEQ surveyresponses CEQ respondents who completed their final interview were askedwhether they would object to combining their survey answers with data fromother government agencies (Davis Elkin McBride and To 2013 Section III)Nearly 80 of respondents had no objection to linkage Although no actualdata linkage occurred we use this 2011 data to explore the extent to which pro-spective replacement of survey responses with administrative data could im-pact the quality of production estimates To do this we develop and compareconsent propensity models that incorporate demographic household environ-mental and attitudinal predictors suggested by the literature on linkage con-sent attitudes towards administrative record use and survey nonresponse We
1 In January 2015 the CEQ dropped its initial bounding interview and currently administers onlyfour quarterly interviews The data used in our analyses were collected prior to this design changeHowever the fundamental issues addressed in this paper remain relevant under the new CEQ sur-vey design
128 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
then explore an approach for examining potential consent bias by comparingfull-sample consent-only and propensity-weight-adjusted expenditureestimates
This study used CEQ data from April 2011 through March 2012 Duringthat period respondents who completed their fifth and final CEQ interviewwere administered the following data-linkage consent item ldquoWersquod like to pro-duce additional statistical data without taking up your time with more ques-tions by combining your survey answers with data from other governmentagencies Do you have any objectionsrdquo Of the 5037 respondents who wereasked this question 784 percent had no objections 187 percent objected andanother 28 percent gave a ldquodonrsquot knowrdquo response or were item nonrespond-ents on this item We restrict our analyses to a dichotomous outcome indicatorfor the 4893 respondents who consented or explicitly objected to the linkagerequest
32 Consent Propensity Models
We develop logistic regression models that estimate sample membersrsquo propen-sity to consent to the CEQ linkage request the dependent variable takes avalue of 1 if respondents consent to linkage and a value of 0 if they do not con-sent Model specifications were developed through fairly extensive exploratoryanalyses (eg examinations of descriptive statistics theory-based bivariate lo-gistic regressions and stepwise logistic regressions) Some results from theseexploratory analyses are provided in the online Appendix D
To account for the complex stratified sampling design of CEQ the analyseswere conducted with the SAS surveylogistic procedure All point estimatesreported in this paper are based on standard complex design weights and allstandard errors are based on balanced repeated replication (BRR) using 44 rep-licate weights with a Fay factor Kfrac14 05 In addition we used F-adjustedWald statistics to evaluate Goodness-of-fit (GOF) for our models (table 7 inAppendix C)
33 Using Propensity Adjustments to Explore Reductionsin Consent Bias
To examine potential consent bias in our data we focus on six CEQ varia-bles that (a) are from sensitive or burdensome sections of the survey and (b)could potentially be substituted with information available in administrativerecords These variables were before-tax income before-tax income with im-puted values property tax vehicle purchase amount property value andrental value For these exploratory analyses we treat the reported CEQ val-ues as ldquotruthrdquo and compare estimates from the full sample to those derived
Exploratory Assessment of Consent-to-Link 129
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
from consenters only Despite the limitations of this approach (eg there isalmost certainly some amount of measurement error in reported values) itprovides a means of examining consent bias indirectly without incurring thecosts and production disruptions of fully matching survey responses with ad-ministrative data
We apply propensity adjustments to estimates for these variables based onthe estimated consent propensity scores calculated for each respondent(weighting each respondent by the inverse of their consent propensity seeadditional details in section 42) We then compare full-sample estimates tothose from the weight-adjusted consenters-only to explore the extent towhich adjustment techniques can reduce consent bias This procedure isanalogous to propensity-adjustment weighting methods commonly used toreduce other sources of bias in sample surveys (nonresponse coverage)(eg Groves Dillman Eltinge and Little 2002) For this analysis we usedpropensity model 2 for two reasons First in keeping with the general ap-proach to propensity modeling for incomplete data (eg Rosenbaum andRubin 1983) we especially wished to condition on predictor variables thatmay potentially be associated with both the consent decision and the under-lying economic variables of interest Because one generally would seek touse the same propensity model for adjusted estimation for each economicvariable of interest and different economic variables may display differentpatterns of association with the candidate predictors one is inclined to berelatively inclusive in the choice of predictor variables in the propensitymodel Second an important exception to this general approach is that onewould not wish to include predictor variables that depend directly onreported income variables consequently for the propensity-weighting workwe used model 2 (which excludes the low-income and imputed-income indi-cators used in model 1)
4 RESULTS
41 Consent-to-Linkage Predictors
We begin by examining indicators of the proposed mechanisms of consentTable 1 shows the weighted percentages (means) and standard errors for eachindicator for the full sample and separately for consenters and non-consenters
We find evidence that privacy concerns are more prevalent amongrespondents who objected to the linkage request than with those who gavetheir consent (181 versus 42) consistent with our hypothesis There isalso support at the bivariate level for a reluctance mechanism non-consenterswere significantly more likely than those who consented to the record linkagerequest to require refusal conversions (160 versus 96 ) and income
130 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
imputation (570 versus 418) express concerns about the time requiredby the survey (127 versus 69) or to be rated as less effortful and coop-erative by CEQ interviewers Additionally non-consenting respondents had ahigher proportion of phone interviews (relative to in-person interviews) thanthose who consented to data linkage (401 versus 338) consistent with arapport hypothesis (ie the difficulty of achieving and maintaining rapportover the phone reduces consent propensity relative to in-person interviews)
Evidence for the role of burden in table 1 is mixed We examined themetric most commonly used to measure burden in the literature (interviewduration) as well as several other variables that typically increase the lengthandor difficulty of the CEQ interview (household size total expendituresfamily income and home ownership) Contrary to expectations non-consenters and consenters did not differ significantly in household size (226versus 226) interview duration (633 minutes versus 653 minutes) or in to-tal household spending ($11218 versus $11511) though the direction ofthe latter two findings is consistent with a burden hypothesis (consent wouldbe highest for those most burdened) There were significant differences be-tween consenters and non-consenters in family income and home ownershipstatus but the effects were in opposite directions Non-consenters were morehighly concentrated in the lowest income group (308 of non-consentershad family income under $8180 versus 174 of consenters) but also hada higher proportion of home ownership than consenters (713 versus640)
Table 2 presents the weighted percentages and standard errors for the basicrespondent demographics and area characteristics Overall there were few dif-ferences in sample composition between consenters and non-consenters Therewere no significant consent rate differences by respondent gender race educa-tion language of the interview or urban status Non-consenters did skew sig-nificantly older than those who provided consent (eg 246 of non-consenters were 65 or older versus 199 for consenters) and were more likelythan consenters to live in the Northeast (222 versus 176) and in the West(271 versus 220)
Finally table 3 presents results from two logistic regression models forthe propensity to consent to linkage Based on results from the exploratoryanalyses reported in the online Appendix D model 1 includes several clas-ses of predictors including consumer-unit demographics proxies for re-spondent attitudes and several related two-factor interactions Model 2 isidentical to model 1 except for exclusion of low-income and imputed-income indicator variables Consequently model 2 should be consideredfor sensitivity analyses of consent-propensity-adjusted income estimatorsin section 42 below Note especially that relative to the correspondingstandard errors the estimated coefficients for models 1 and 2 are verysimilar
Exploratory Assessment of Consent-to-Link 131
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
42 Analysis of Economic Variables Subject to ldquoInformed ConsentrdquoAccess
We examine consent bias for six CEQ variables for which there potentially isinformation available in administrative data sources that could be used to de-rive publishable estimates and obviate the need to ask respondents burdensome
Table 1 Weighted Estimates of Indicators of Hypothesized Consent Mechanisms
Indicator All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Privacyanti-governmentconcerns
69 (08) 42 (06) 181 (22)
ReluctanceConverted refusal 108 (07) 96 (08) 160 (15)Income imputation required 447 (10) 418 (11) 570 (21)Too busy 81 (05) 69 (06) 127 (13)Respondent effort
A lot of effort 349 (09) 367 (10) 273 (19)Moderate effort 372 (11) 374 (13) 364 (18)Bare minimum effort 279 (11) 259 (12) 363 (23)
Respondent cooperationVery cooperative 508 (10) 535 (10) 394 (26)Somewhat cooperative 331 (08) 331 (09) 331 (13)Neither cooperative nor
uncooperative54 (05) 45 (04) 92 (11)
Somewhat uncooperative 45 (03) 33 (03) 95 (11)Very uncooperative 63 (04) 57 (05) 88 (09)
RapportFace-to-face 650 (14) 662 (15) 599 (19)Phone 350 (14) 338 (15) 401 (19)
BurdenTotal interview time 649 (10) 653 (11) 633 (14)Household size 242 (003) 242 (003) 242 (007)Total expenditures $11454 ($148) $11511 ($145) $11218 ($384)Family income
LT $8180 198 (08) 174 (09) 308 (19)$8180ndash$24000 202 (06) 208 (09) 168 (14)$24001ndash$46000 208 (06) 205 (07) 184 (13)$46001ndash$85855 198 (06) 207 (07) 167 (13)GT $85585 194 (06) 206 (09) 173 (18)
Owner 654 (08) 640 (09) 713 (16)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences (plt 001 plt 00001)
132 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
survey questions For each of these variables we computed three prospectiveestimators of the population mean Y
FS The full-sample mean (ie data from consenters and non-consenters) withweights equal to the standard complex design weight wi used for analyses ofCE variables (a joint product of sample selection weight last visit non-response weight subsampling weight and non-response weight) (FullSample)
Table 2 Characteristics of Sample Respondents
Characteristic All respondents(nfrac14 4893)
Consenters(nfrac14 3951)
Non-consenters(nfrac14 942)
Age group18ndash32 200 (09) 215 (11) 138 (10)32ndash65 592 (09) 586 (12) 616 (15)65thorn 208 (07) 199 (08) 246 (15)
GenderMale 463 (07) 468 (08) 443 (17)Female 537 (07) 532 (08) 557 (17)
RaceWhite 831 (09) 830 (10) 831 (11)Non-white 169 (09) 170 (10) 169 (11)
Education groupLess than HS 135 (06) 134 (07) 140 (16)HS graduate 247 (08) 245 (09) 253 (18)Some college 214 (07) 212 (09) 222 (19)Associates degree 102 (04) 100 (05) 109 (11)Bachelorrsquos degree 191 (05) 194 (06) 179 (14)Advance degree 111 (05) 115 (06) 97 (09)
Spanish language interviewYes 41 (05) 37 (05) 54 (15)No 959 (05) 963 (05) 946 (15)
RegionNortheast 185 (06) 176 (07) 222 (17)Midwest 235 (13) 245 (14) 192 (23)South 350 (13) 359 (15) 315 (27)West 230 (15) 220 (17) 271 (18)
UrbanicityUrban 805 (09) 805 (12) 807 (20)Rural 195 (09) 195 (12) 193 (20)
NOTEmdashStandard errors shown in parentheses Asterisks denote statistically significantdifferences ( plt 001 plt 0001)
Exploratory Assessment of Consent-to-Link 133
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Table 3 Multivariate Logistic Models Predicting Consent-to-Link (Weighted)
Variable Model 1 Model 2
Estimate SE Estimate SE
Demographic characteristicsAge group (32ndash65)
18ndash32 03435 00633 03389 0063465thorn 02692 00645 02532 00656
Gender (Male)Female 00161 00426 00229 00421
Race (White)Non-white 00949 01264 00612 01260
Spanish interview (No)Yes 04234 02923 04452 02790
Education group (HS grad)Less than HS 03097 01879 02926 01835Some college 04016 01423 03739 01389Associatersquos degree 03321 02041 03150 01993Bachelorrsquos degree 00654 02112 00663 02082Advance degree 01747 02762 01512 02773
Home owner (Renter)Owner 02767dagger 01453 03233 01436
Total expenditures (Log) 00605 00799 00083 00800Income group
Less than $8181 02155 00604Income imputed (No)
Yes 01487 00513Race Gender 01874dagger 00956 01953 00923Owner Education
Less than HS 04931 02066 04632 01996Some college 06575 01567 06370 01613Associatersquos degree 03407dagger 02013 03402dagger 01986Bachelorrsquos degree 01385 02402 01398 02368Advance degree 04713 02847 04226 02870
Environmental featuresRegion (Northeast)
Midwest 02097dagger 01234 02111dagger 01163South 02451 01190 02408 01187West 02670 01055 02557 01066
Urbanicity (Rural)Urban 00268 00829 00234 00824
R attitude proxiesConverted refusal (No)
Yes 00721 00756 00838 00763
Continued
134 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
CUNA The weighted sample mean based only on the Y variables observedfor the consenting units using the same weights wi as employed for the FSestimator (Consenting Units No Adjustment)
CUPA A weighted sample mean based only on the Y variables observedfor the consenting units and using weights equal to wpi frac14 wi
pi (Consenting
Units Propensity-Adjusted)2
CUPDA A weighted sample mean based only on the Y variables observedfor the consenting units using weights equal to wpdecile frac14 wi
pdecile
(Consenting Units Propensity-Decile Adjusted where pdecile is theunweighted propensity of ldquoconsentrdquo units in the specified decile group)
CUPQA A weighted sample mean based only on the Y variables observed forthe consenting units using weights equal to wpquintile frac14 wi
pquintile (Consenting
Units Propensity-Quintile Adjusted where pquinile is the unweighted propen-sity of ldquoconsentrdquo units in the specified quintile group)
Table 3 Continued
Variable Model 1 Model 2
Estimate SE Estimate SE
Effort (Moderate)A lot of effort 05454 02081 06142 02065Bare minimum effort 04879 01365 05721 01371
Doorstep concerns (None)Too busy 02207 01485 02137 01466Privacygovrsquot concerns 11895 01667 12241 01622Other 10547 03261 10255 03255
Doorstep concerns effortPrivacy a lot of effort 05884 02789 05312dagger 02728Privacy minimum effort 03183 01918 02601 01924Busy a lot of effort 00888 02736 00890 02749Busy minimum effort 02238 02096 02277 02095Other a lot of effort 10794dagger 06224 10477 06182Other minimum effort 09002 03610 08777 03593
NOTEmdash daggerplt 010 plt 005 plt 001 plt 0001
2 Since one of the key outcome variables in our bias analyses is family income before taxes inthese analyses we remove the family income group variable and imputation indicator from model1 to estimate the survey-weighted propensity of consent-to-link p i for all sample units i To be con-sistent we used the same propensity model for all prospective economic variables in these analy-ses (model 2 in table 3)
Exploratory Assessment of Consent-to-Link 135
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
For some general background on propensity-score subclassification methodsdeveloped for survey nonresponse analyses see Little (1986) Yang LesserGitelman and Birkes (2010 2012) and references cited therein
Table 4 presents mean estimates for the six CEQ outcome variables underthese five approaches (columns 2ndash6) The seventh column of the table exam-ines the difference between mean estimates based on the full-sample (FS) andthe unadjusted consenter group (CUNA) These results show fairly substantialdifferences between the two approaches for the mean estimates of incomeproperty-tax and rental value variables This suggests it would be important tocarry out adjustments to account for the approximately 20 of respondents whoobjected to linkage and were omitted from these consenter-based estimates
The eighth column of table 4 examines the impact of consent propensity weightadjustments on mean estimates comparing the full-sample to the adjusted-consenter group In general the propensity adjustments improve estimates (iemoving them closer to the full-sample means) For example the significant differ-ence we observed in mean family income between the full-sample and the unad-justed consenters (column 5) is now reduced and only marginally significantNonetheless it is evident that even after adjustment for the propensity to consentto link there still are strong indications of bias for estimation of the property-taxproperty-value and rental-value variables Consequently we would need to studyalternative approaches before considering this type of linkage in practice
Finally the ninth and tenth columns of table 4 report related results for thedifferences CUPDA-FS and CUPQA-FS respectively Relative to the corre-sponding standard errors the differences in these columns are qualitativelysimilar to those reported for CUPA-FS Consequently we do not see strongindications of sensitivity of results to the specific methods of propensity-basedweighting adjustment employed
To complement the results reported in tables 3 and 4 it is useful to explore tworelated issues centered on broader distributional patterns First the logistic regres-sion results in table 3 describe the complex pattern of main effects and interactionsestimated for the model for the propensity of a consumer unit to consent to linkageTo provide a summary comparison of these results figure 1 gives a quantile-quantile plot of the estimated consent-to-linkage propensity for respectively theldquoobjectionrdquo subpopulation (horizontal axis) and the ldquoconsentrdquo subpopulation (ver-tical axis) The 99 plotted points are for the 001 to 099 quantiles (with 001 incre-ment) The diagonal reference line has a slope equal to 1 and an intercept equal to0 If the plotted points all fell on that line then the ldquoobjectrdquo and ldquoconsentrdquo subpo-pulations would have essentially the same distributions of propensity-to-consentscores and one would conclude that the propensity scores estimated from thespecified model have provided essentially no practical discriminating powerConversely if all of the plotted points had a horizontal axis value close to 1 and avertical axis value close to 0 then one would conclude that the propensity scoreswere providing very strong discriminating power For the propensity scores pro-duced from model 2 note especially that for quantiles below the median the
136 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le4
Com
pari
son
ofT
hree
Est
imat
ion
Met
hods
Ful
l-sa
mpl
e(F
S)
mea
n(S
E)
Con
sent
ing
units
no
adju
stm
ent
(CU
NA
)m
ean
(SE
)
Con
sent
ing
units
pr
open
sity
-ad
just
ed(C
UP
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-de
cile
adju
sted
(CU
PD
A)
mea
n(S
E)
Con
sent
ing
units
pr
open
sity
-qu
intil
ead
just
ed(C
UP
QA
)m
ean
(SE
)
Poi
ntes
timat
edi
ffer
ence
s
CU
NA
ndashF
S(S
Edif
f)C
UP
Andash
FS
(SE
dif
f)C
UPD
Andash
FS
(SE
dif
f)C
UPQ
Andash
FS
(SE
dif
f)
Col
umn
12
34
56
78
910
Fam
ilyin
com
ebe
fore
taxe
s$5
093
900
$52
869
00$5
211
700
$52
269
00$5
240
500
$19
300
0
$11
780
0dagger$1
330
00
$14
660
0($
122
751
)($
153
504
)($
152
385
)($
152
074
)($
151
430
)($
580
98)
($61
909
)($
602
64)
($60
676
)F
amily
inco
me
befo
reta
xes
with
impu
ted
valu
es
$61
207
00$6
140
500
$61
228
00$6
133
700
$61
382
00$1
980
0$2
100
$130
00
$175
00
($1
193
34)
($1
510
55)
($1
454
39)
($1
463
34)
($1
466
25)
($57
346
)($
563
54)
($55
710
)($
552
20)
Veh
icle
purc
hase
cost
$599
59
$619
14
$607
80
$607
63
$611
75
$19
55dagger
$82
1$8
04
$12
17($
332
2)($
370
5)($
366
3)($
366
7)($
376
1)($
108
3)($
153
4)($
152
3)($
161
3)P
rope
rty
taxe
s$4
541
5$4
291
2$4
347
4$4
348
8$4
367
52
$25
02
2
$19
41
2$1
927
2
$17
40
($10
41)
($10
76)
($11
39)
($11
37)
($11
30)
($6
08)
($6
48)
($6
05)
($6
23)
Pro
pert
yva
lue
$232
226
00
$227
734
00
$227
938
00
$227
935
00
$228
640
00
2$4
492
00
2$4
288
00dagger
2$4
291
00
2$3
586
00
($5
055
68)
($5
730
38)
($5
592
27)
($5
532
64)
($5
519
90)
($2
118
39)
($2
165
93)
($2
132
40)
($2
063
66)
Ren
talv
alue
$12
965
2$1
269
65
$12
724
0$1
272
91
$12
746
72
$26
87
2$2
412
2
$23
61
2$2
185
($
155
8)($
170
5)($
178
1)($
178
6)($
176
8)($
743
)($
818
)($
801
)($
802
)
NO
TEmdash
daggerplt
010
plt
005
plt
001
(bol
d)
plt
000
1(b
old)
Exploratory Assessment of Consent-to-Link 137
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
values for the ldquoobjectionrdquo and ldquoconsentrdquo subpopulations are relatively close indi-cating that the lower values of the propensity scores provide relatively little dis-criminating power Larger propensity score values (eg above the 070 quantile)show somewhat stronger discrimination Table 5 elaborates on this result by pre-senting the estimated seventieth eightieth ninetieth and ninety-ninenth percentilesof the propensity-score values from the ldquoconsentrdquo and ldquoobjectrdquo subpopulations
Second the numerical results in table 4 identified substantial differences inthe means of the consenting units (after propensity adjustment) and the fullsample for the variables defined by unadjusted income property tax propertyvalue and rental value Consequently it is of interest to explore the extent towhich the reported mean differences are associated primarily with strong dif-ferences between distributional tails or with broader-based differences betweenthe respective distributions To explore this figures 2ndash4 present plots of speci-fied functions of the estimated distributions of the underlying populations ofthe unadjusted income variable (FINCBTAX)
Figure 1 Plot of the Estimated Quantiles of P(Consent) for the ldquoConsentrdquoSubpopulation (Vertical Axis) and ldquoObjectrdquo Subpopulation (Horizontal Axis)Reported estimates for the 001 to 099 quantiles (with 001 increment) Diagonalreference line has slope 5 1 and intercept 5 0
138 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Figure 2 presents a plot of the 001 to 099 quantiles (with 001 increment)of income accompanied by pointwise 95 confidence intervals for the fullsample based on a standard survey-weighted estimator of the correspondingdistribution function The curvature pattern is consistent with a heavily right-skewed distribution as one generally would expect for an income variableFigure 3 presents side-by-side boxplots of the values of the unadjusted incomevariable (FINCBTAX) separately for each of the ten subpopulations definedby the deciles of the P (Consent) propensity values With the exception of oneextreme positive outlier in the seventh decile group the ten boxplots are
Figure 2 Quantile Estimates and Associated Pointwise 95 Confidence Boundsfor Before-Tax Family Income Based on the Sample from the Full Population
Table 5 Selected Percentiles of the Estimated Propensity to Consent for theldquoConsentrdquo and ldquoObjectrdquo Subpopulations
Percentile () Estimated propensity of consent
Consent subpopulation Object subpopulation
70 084 08880 086 08990 088 09299 093 096
Exploratory Assessment of Consent-to-Link 139
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
similar and thus do not provide strong evidence of differences in income distri-butions across the ten propensity groups To explore this further for each valueqfrac14 001 to 099 (with 001 increment in quantiles) figure 4 displays the differen-ces between the q-th quantiles of estimated income for the ldquoconsentrdquo subpopula-tion (after propensity score adjustment) and the full-sample-based estimateAccompanying each difference is a pointwise 95 confidence interval Againhere the plot does not display strong trends across quantile groups as the valueof q increases Consequently the mean differences noted for income in table 4cannot be attributed to differences in one tail of the income distribution Alsonote that the widths of the pointwise confidence intervals for the quantile differ-ences increase substantially as q increases This phenomenon arises commonlyin quantile analyses for right-skewed distributions and results from the decliningdensity of observations in the right tail of the distribution In quantile plots not in-cluded here qualitatively similar patterns of centrality and dispersion were ob-served for the property tax property value and rental value variables
5 DISCUSSION
51 Summary of Results
As noted in the introduction legal and social environments often require thatsampled survey units to provide informed consent before a survey organization
Figure 3 Boxplots of Values of Before-Tax Family Income Separately for theTen Groups Defined by Deciles of P(Consent) as Estimated by Model 2 (Table 3)Group labels on the horizontal axis equal the midpoints of the respective decile groups
140 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
may link their responses with administrative or commercial records For thesecases survey organizations must assess a complex range of factors including(a) the general willingness of a respondent to consent to linkage (b) the proba-bility of successful linkage with a given record source conditional on consentin (a) (c) the quality of the linked source and (d) the impact of (a)ndash(c) on theproperties of the resulting estimators that combine survey and linked-sourcedata Rigorous assessment of (b) (c) and (d) in a production environment canbe quite expensive and time-consuming which in turn appears to have limitedthe extent and pace of exploration of record linkage to supplement standardsample survey data collection
To address this problem the current paper has considered an approach basedon inclusion of a simple ldquoconsent-to-linkrdquo question in a standard survey instru-ment followed by analyses to address issue (a) The resulting models for thepropensity to object to linkage identified significant factors in standard demo-graphic characteristics proxy variables related to respondent attitudes and re-lated two-factor interactions In addition follow-up analyses of severaleconomic variables (directly reported income property tax property valueand rental value) identified substantial differences between respectively the
Figure 4 Differences Between Estimated Quantiles for Before-Tax FamilyIncome Based on the ldquoConsentrdquo Subpopulation with Propensity ScoreAdjustment and the Full Population Respectively Reported estimates for the 001to 099 quantiles (with 001 increment) and associated pointwise 95 percent confi-dence bounds
Exploratory Assessment of Consent-to-Link 141
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
full population and the propensity-adjusted means of the consenting subpopu-lation Further analyses of the estimated quantiles of these economic variablesdid not indicate that these mean differences were attributable to simple tail-quantile pheonomena
52 Prospective Extensions
The conceptual development and numerical results considered in this papercould be extended in several directions Of special interest would be use of al-ternative propensity-based weighting methods joint modeling of consent-to-link and item response probabilities approximate optimization of design deci-sions related to potentially burdensome questions and efficient integration oftest procedures into production-oriented settings
521 Joint modeling of consent-to-link and item-missingness propensities Itwould be useful to extend the analyses in table 4 to cover a wide range ofsurvey variables with varying rates of item missingness This would allowexploration of issues identified in previous papers that found consent-to-linkage rates were significantly lower for respondents who had higher lev-els of item nonresponse on previous interview waves particularly refusalson income or wealth questions (Jenkins et al 2006 Olson 1999 Woolfet al 2000 Bates 2005 Dahlhamer and Cox 2007 Pascale 2011) Alsomissing survey items and refusal to consent to record linkage may both beassociated with the same underlying unit attributes eg lack of trust orlack of interest in the survey topic For those cases versions of the table 4analyses would require in-depth analyses of the joint propensity of a sam-ple unit to provide consent to linkage and to provide a response for a givenset of items
522 Design optimization linkage and assignment of potentially burdensomequestions In addition as noted in the introduction statistical agencies are in-terested in increasing the use of record linkage in ways that may reduce datacollection costs and respondent burden To implement this idea it would be ofinterest to explore the following trade-offs in approximate measures of costand burden that may arise through integrated use of direct collection of low-burden items from all sample units direct collection of higher-burden itemsfrom some units with the selection of those units determined through subsam-pling of the ldquoconsentrdquo and ldquorefusalrdquo cases at different rates while accountingfor the estimated probability that a given unit will have item nonresponse forthis item and use of administrative records at the unit level for some or all ofthe consenting units The resulting estimation and inference methods would bebased on integration of the abovementioned data sources based on modeling of
142 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
the propensity to consent propensity to obtain a successful link conditional onconsent and possibly the use of aggregates from the administrative record vari-able for all units to provide calibration weights
Within this framework efforts to produce approximate design optimizationcenter on determination of subsampling probabilities conditional on observableX variables This would require estimates of the joint propensities to provideconsent-to-link and to provide survey item responses Specifics of the optimi-zation process would depend on the inferential goals eg (a) minimizing thevariance for mean estimators for specified variables like income or expendi-tures and (b) minimizing selected measures of cost or burden of special impor-tance to the statistical organization
523 Analysis of consent decisions linked with the survey lifecycle Finallyas noted in section 11 efficient integration of test procedures into production-oriented settings generally involves trade-offs among relevance resource con-straints and potential confounding effects To illustrate a primary goal of thecurrent study was to assess record-linkage options for reduction of respondentburden and we sought to identify factors associated with linkage consentwithin the production environment with little or no disruption of current pro-duction work This naturally led to limitations of the study For example thecurrent study has consent-to-link information only for respondents who com-pleted the fifth and final wave of CEQ This raises possible confounding ofconsent effects with wave-level attrition and general cooperation effects Thusit would be of interest to extend the current study by measuring respondentconsent-to-link decisions and modeling the resulting propensities at differentstages of the survey lifecycle
Supplementary Materials
Supplementary materials are available online at academicoupcomjssam
REFERENCES
Al Baghal T G Knies and J Burton (2014) ldquoLinking Administrative Records to SurveysDifferences in the Correlates to Consent Decisionsrdquo Understanding Society Working PaperSeries Institute for Social and Economic Research No 2014-09
Archer K J and S Lemeshow (2006) ldquoGoodness-of-Fit Test For a Logistic Regression ModelFitted Using Survey Sample Datardquo The Stata Journal 6 97ndash105
Archer K J S Lemeshow and D W Hosmer (2007) ldquoGoodness-of-Fit Tests For LogisticRegression Models When Data Are Collected Using a Complex Sampling DesignrdquoComputational Statistics and Data Analysis 51 4450ndash4464
Exploratory Assessment of Consent-to-Link 143
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Bates N (2005) ldquoDevelopment and Testing of Informed Consent Questions to Link Survey Datawith Administrative Recordsrdquo Proceedings of the AAPOR-ASA Section on Survey MethodsResearch pp 3786ndash3792 Washington DC American Statistical Association
Bates N M Wroblewski and J Pascale (2012) ldquoPublic Attitudes Toward the Use ofAdministrative Records in the US Census Does Question Frame Matterrdquo Center for SurveyMeasurement US Census Bureau Washington DC Survey Methodology Study Series 2012-04
Beebe T J Ziegenfuss J St Sauver S Jenkins L Haas M Davern and J Tally (2011)ldquoHIPAA Authorization and Survey Nonresponse Biasrdquo Med Care 49 365ndash370
Couper M E Singer F Conrad and R Groves (2008) ldquoRisk of Disclosure Perceptions of Riskand Concerns About Privacy and Confidentiality as Factors in Survey Participationrdquo Journal ofOfficial Statistics 24 255ndash275
Dahlhamer J and S C Cox (2007) ldquoRespondent Consent to Link Survey data withAdministrative Records Results from a Split-Ballot Field Test with the 2007 National HealthInterview Surveyrdquo Proceedings of the Federal Committee on Statistical Methodology ResearchMeeting Washington DC Available at httpss3amazonawscomsitesusawp-contentuploadssites2422014052007FCSM_Dahlhamer-IV-Bpdf last accessed December 22 2016
Das M and M P Couper (2014) ldquoOptimizing Opt-Out Consent for Record Linkagerdquo Journal ofOfficial Statstics 30 479ndash497
Davis J I Elkin B McBride and N To (2013) ldquo2011 Research Section Analysisrdquo TechnicalReport Division of Consumer Expenditure Surveys US Bureau of Labor Statistics
Drolet A and M Morris (2000) ldquoRapport in Conflict Resolution Accounting For How Face-to-Face Contact Fosters Mutual Cooperation in Mixed-Motive Conflictsrdquo Journal of ExperimentalSocial Psychology 36 26ndash50
Dunn K K Jordan R Lacey M Shapley and C Jinks (2004) ldquoPatterns of Consent inEpidemiologic Research Evidence from over 25 000 Respondersrdquo American Journal ofEpidemiology 159 1087ndash1094
Fulton J (2012) ldquoRespondent Consent to Use Administrative Datardquo PhD dissertationUniversity of Maryland Joint Program in Survey Methodology College Park MD
Goyder J (1986) ldquoSurveys on Surveys Limitations and Potentialitiesrdquo Public OpinionQuarterly 50 27ndash41
Groves R R Cialdini and M Couper (1992) ldquoUnderstanding the Decision to Participate in aSurveyrdquo Public Opinion Quarterly 56 475ndash495
Groves R and M Couper (1998) Nonresponse in Household Interview Surveys New YorkWiley
Groves R D Dillman J Eltinge and R Little (2002) Survey Nonresponse New York WileyGroves R E Singer and A Corning (2000) ldquoLeverage-Salience Theory of Survey Participation
Description and an Illustrationrdquo Public Opinion Quarterly 64 299ndash308Herzog T F Scheuren and W Winkler (2007) Data Quality and Record Linkage Techniques
New York SpringerHolbrook A M Green and J Krosnick (2003) ldquoTelephone vs Face-to-Face Interviewing of
National Probability Samples with Long Questionnaires Comparisons of RespondentSatisficing and Social Desirability Response Biasrdquo Public Opinion Quarterly 67 79ndash125
Huang N S Shih H Chang and Y Chou (2007) ldquoRecord Linkage Research and InformedConsent Who Consentsrdquo BMC Health Services Research 7 18ndash23
Jenkins S L Cappellari P Lynn A Jackle and E Sala (2006) ldquoPatterns of Consent Evidencefrom a General Household Surveyrdquo Journal of the Royal Statistical Society Series A 169701ndash722
Judkins D R (1990) ldquoFayrsquos Method for Variance Estimationrdquo Journal of Official Statistics 6223ndash239
Kho M M Duggett D Willison D Cook and M Brouwers (2009) ldquoWritten Informed Consentand Selection Bias in Observational Studies Using Medical Records Systematic ReviewrdquoBritish Medical Journal 338b866
Kish L (1965) Survey Sampling New York Wiley
144 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Knies G J Burton and E Sala (2012) ldquoConsenting to Health-Record Linkage Evidence from aMulti-Purpose Longitudinal Survey of a General Populationrdquo BMC Health Services Research12 1ndash6
Korn E L and B I Graubard (1990) ldquoSimultaneous Testing of Regression Coefficients withComplex Survey Data Use of Bonferroni t Statisticsrdquo The American Statistician 44 270ndash276
Kreuter F J W Sakshaug and R Tourangeau (2016) ldquoThe Framing of the Record LinkageConsent Questionrdquo International Journal of Public Opinion Research 28 142ndash152
Krobmacher J and M Schroeder (2013) ldquoConsent When Linking Survey Data withAdministrative Records The Role of the Interviewerrdquo Survey Research Methods 7 115ndash131
Little R (1986) ldquoSurvey Nonresponse Adjustments for Estimates of Meansrdquo InternationalStatistical Review 54 139ndash157
Mattis J S W P Hammond N Grayman M Bonacci W Brennan S-A Cowie LLadyzhenskaya and S So (2009) ldquoThe Social Production of Altruism Motivations for CaringAction in a Low-Income Urban Communityrdquo American Journal of Community Psychology 4371ndash84
McNabb J D Timmons J Song and C Puckett (2009) ldquoUses of Administrative Data at theSocial Security Administrationrdquo Social Security Bulletin 69 75ndash84
Mostafa T (2015) ldquoVariation Within Households in Consent to Link Survey Data toAdministrative Records Evidence from the UK Millennium Cohort Studyrdquo InternationalJournal of Social Research Methodology 19 355ndash375
Olson J (1999) ldquoLinkages with Data from Social Security Administrative Records in the Healthand Retirement Studyrdquo Social Security Bulletin 62 73ndash85
OrsquoMuircheartaigh C and P Campanelli (1999) ldquoA Multilevel Exploration of the Role ofInterviewers in Survey Non-Responserdquo Journal of the Royal Statistical Society Series A 162437ndash446
Pascale J (2011) ldquoRequesting Consent to Link Survey Data to Administrative Records Resultsfrom a Split-Ballot Experiment in the Survey of Health Insurance and Program Participation(SHIPP)rdquo Center for Survey Measurement Research and Methodology Directorate US CensusBureau Washington DC Survey Methodology Study Series 2011-03
Putnam R (1995) ldquoBowling Alone Americarsquos Declining Social Capitalrdquo Journal of Democracy6 65ndash78
Rosenbaum P R and D B Rubin (1983) ldquoThe Central Role of the Propensity Score inObservational Studies for Causal Effectsrdquo Biometrika 70 41ndash55
Sabelhaus J D Johnson S Ash D Swanson T Garner J Greenlees and S Henderson (2014)Is the Consumer Expenditure Survey representative by income Improving the Measurementof Consumer Expenditures National Bureau of Economic Research Studies in Income andWealth Chicago University of Chicago Press
Sakshaug J M Couper and M Ofstedal (2010) ldquoCharacteristics of Physical MeasurementConsent in a Population-Based Survey of Older Adultsrdquo Medical Care 48 64ndash71
Sakshaug J M Couper M Ofstedal and D Weir (2012) ldquoLinking Survey and AdministrativeData Mechanisms of Consentrdquo Sociological Methods and Research 41 535ndash569
Sakshaug J W and M Huber (2016) ldquoAn Evaluation of Panel Nonresponse and LinkageConsent Bias in a Survey of Employees in Germanyrdquo Journal of Survey Statistics andMethodology 4 71ndash93
Sakshaug J and F Kreuter (2012) ldquoAssessing the Magnitude of Non-Consent Biases in LinkedSurvey and Administrative Datardquo Survey Research Methods 6 113ndash22
mdashmdashmdashmdash (2014) ldquoThe Effect of Benefit Wording on Consent to Link Survey and AdministrativeRecords in a Web Surveyrdquo Public Opinion Quarterly 78 166ndash176
Sakshaug J V Tutz and F Kreuter (2013) ldquoPlacement Wording and Interviewers IdentifyingCorrelates of Consent to Link Survey and Administrative Datardquo Survey Research Methods 733ndash144
Sala E J Burton and G Knies (2012) ldquoCorrelates of Obtaining Informed Consent to DataLinkage Respondent Interview and Interviewer Characteristicsrdquo Sociological Methods andResearch 41 414ndash439
Exploratory Assessment of Consent-to-Link 145
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Sala E G Knies and J Burton (2014) ldquoPropensity to Consent to Data Linkage ExperimentalEvidence on the Role of Three Survey Design Features in a UK Longitudinal PanelrdquoInternational Journal of Social Research Methodology 17 455ndash473
Singer E (1993) ldquoInformed Consent and Survey Response A Summary of the EmpiricalLiteraturerdquo Journal of Official Statistics 9 361ndash375
mdashmdashmdashmdash (2003) ldquoExploring the Meaning of Consent Participation in Research and Beliefs AboutRisks and Benefitsrdquo Journal of Official Statistics 19 273ndash285
Singer E N Bates and J Van Hoewyk (2011) ldquoConcerns About Privacy Trust in Governmentand Willingness to Use Administrative Records to Improve the Decennial Censusrdquo Proceedingsof American Statistical Association Section of the Survey Research Methods
Tan L (2011) ldquoAn Introduction to the Contact History Instrument (CHI) for the ConsumerExpenditure Surveyrdquo Consumer Expenditure Survey Anthology 2011 8ndash16
US Department of Labor [Bureau of Labor Statistics] (2014) ldquoConsumer Expenditures andIncomerdquo in Handbook of Methods Chapter 16 Washington DC USA httpwwwblsgovopubhompdfhomch16pdf last accessed December 19 2016
West B F Kreuter and U Jaenichen (2013) ldquoInterviewerrsquo Effects in Face-to-Face Surveys AFunction of Sampling Measurement Error or Nonresponserdquo Journal of Official Statistics 29277ndash297
Woolf S S Rothemich R Johnson and D Marsland (2000) ldquoSelection Bias from RequiringPatients to Give Consent to Examine Data for Health Services Researchrdquo Archives of FamilyMedicine 9 1111ndash1118
Yang D V M Lesser A I Gitelman and D S Birkes (2010) Improving the Propensity ScoreEqual Frequency Adjustment Estimator Using an Alternative Weight paper presented at theAmerican Statistical Association (ASA) Joint Statistical Meetings (JSM) Vancouver BritishColumbia August 2010
mdashmdashmdashmdash (2012) Propensity Score Adjustments for Covariates in Observational Studies paper pre-sented at the American Statistical Association (ASA) Joint Statistical Meetings (JSM) SanDiego California August 2012
Appendix A Variable descriptions
A1 Predictor Variables Examined
Respondent sociodemographics Variable description
Age 1 if age 18 to 32 2 if 32 to 65 (reference group) 3if older than 65
Gender 1 if male (reference group) 2 if femaleRace 0 if white (reference group) 1 if non-whiteEducation 0 if less than high school 1 if high school graduate
(reference group) 2 if some college 3 ifAssociates degree 4 if Bachelorrsquos degree and 5if more than Bachelorrsquos
Language 1 if interview language was Spanish 0 if English(reference group)
Continued
146 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Continued
Respondent sociodemographics Variable description
Household size 1 if single-person HH 2 if 2-person HH (referencegroup) 3 if 3- or 4-person HH 4 if more than 4-person HH
Household type 0 if renter (reference group) 1 if ownerFamily income 1 if total family income before taxes is less than or
equal to $8180 2 if it exceeds $8180Total spending (log) Log of total expenditures reported for all major
expenditure categories
Survey featuresMode 1 if telephone 0 if personal visit (reference group)
Area characteristicsRegion 1 if Northeast (reference group) 2 if Midwest 3 if
South 4 if WestUrban status 1 if urban 2 if rural (reference group)
Respondent attitude and effortConverted refusal status 1 if ever a converted refusal during previous inter-
views 2 if not (reference group)Interview duration 1 if total interview time was less than 52 minutes 2
if greater than 52 minutes (reference group)Cooperation 1 if ldquovery cooperativerdquo 2 if ldquosomewhat coopera-
tiverdquo 3 if ldquoneither cooperative nor unco-operativerdquo 4 if somewhat uncooperativerdquo 5 ifldquovery uncooperativerdquo
Effort 1 if ldquoa lot of effortrdquo 2 if ldquoa moderate amountrdquo(reference group) 3 if ldquoa bare minimumrdquo
Effort change 1 if effort increased during interview 2 if decreased3 if stayed the same (reference group) 4 if notsure
Use of information booklet 1 if respondent used booklet ldquoalmost alwaysrdquo 2 ifldquomost of the timerdquo 3 if occasionallyrdquo 4 if ldquoneveror almost neverrdquo and 5 if did not have access tobooklet (reference group)
Doorstep concerns 0 if no concerns (reference group) 1 if privacyanti-government concerns 2 if too busy or logisticalconcerns 3 if ldquootherrdquo
Exploratory Assessment of Consent-to-Link 147
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
A2 Interviewer-Captured Respondent Doorstep Concerns and TheirAssigned Concern Category
A3 Post-Survey Interviewer Questions
A31 Respondent Cooperation How cooperative was this respondent dur-ing this interview (very cooperative somewhat cooperative neither coopera-tive nor uncooperative somewhat cooperative or very uncooperative)
A32 Respondent Effort How much effort would you say this respondentput into answering the expenditure questions during this survey (a lot of ef-fort a moderate amount of effort or a bare minimum of effort)
Respondent ldquodoorsteprdquo concerns Assigned category
Privacy concerns Privacygovernment concernsAnti-government concerns
Too busy Too busylogistical concernsInterview takes too much timeBreaks appointments (puts off FR indefinitely)Not interestedDoes not want to be botheredToo many interviewsScheduling difficultiesFamily issuesLast interview took too longSurvey is voluntary Other reluctanceDoes not understandAsks questions about survey
Survey content does not applyHang-upslams door on FRHostile or threatens FROther HH members tell R not to participateTalk only to specific household memberRespondent requests same FR as last timeGave that information last timeAsked too many personal questions last timeIntends to quit surveyOthermdashspecify
No concerns No Concerns
148 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix B Variability of Weights
In weighted analyses of incomplete-data patterns it is useful to assess poten-tial variance inflation that may arise from the heterogeneity of the weights thatare used For the current consent-to-link case table 6 presents summary statis-tics for three sets of weights the unadjusted weights wi (standard complex de-sign weights) for all applicable units in the full sample the same weights forsample units with consent (ie the units that did not object to record linkage)and propensity-adjusted weights wpi frac14 wi=pi for the sample units with con-sent where pi is the estimated propensity that unit i will consent to recordlinkage based on model 2 in table 3
In keeping with standard analyses that link heterogeneity of weights with vari-ance inflation (eg Kish 1965 Section 117) the second row of table 6 reportsthe values 1thorn CV2
wt where CV2wt is the squared coefficient of variation of the
weights under consideration For all three cases 1thorn CV2wt is only moderately
larger than one The remaining rows of table 6 provide some related non-parametric summary indications of the heterogeneity of weights based onrespectively the ratio of the interquartile range of the weights divided bythe median weight the ratio of the third and first quartiles of the weightsthe ratio of the 90th and 10th percentiles of the weights and the ratio ofthe 95th and 5th percentiles of the weights In each case the ratios for thepropensity-adjusted weights are only moderately larger than the corre-sponding ratios of the unadjusted weights Thus the propensity adjust-ments do not lead to substantial inflation in the heterogeneity of weightsfor the CEQ analyses considered in this paper
Table 6 Variability of Weights
Weights fromfull sample
Unadjustedweights
for sampleunits
with consent
Propensity-adjusted
weights forsample
units withconsent
Propensity-decile
adjustedweights
for sampleunits withconsent
Propensity-quintile adjusted
weights forsample
units withconsent
n 4893 3951 3915 3915 39151thorn CV2
wt 113 113 116 116 115IQR=q05 0875 0877 0858 0853 0851q075=q025 1357 1359 1448 1463 1468q090=q010 1970 1966 2148 2178 2124q095=q005 2699 2665 3028 2961 2898
Exploratory Assessment of Consent-to-Link 149
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Appendix C Variance Estimators and Goodness-of-Fit Tests
C1 Notation and Variance Estimators
In this paper all inferential work for means proportions and logistic regres-sion coefficient vectors b are based on estimated variance-covariance matri-ces computed through balanced repeated replication that use 44 replicateweights and a Fay factor Kfrac14 05 Judkins (1990) provides general back-ground on balanced repeated replication with Fay factors Bureau of LaborStatistics (2014) provides additional background on balanced repeated repli-cation variance estimation for the CEQ To illustrate for a coefficient vectorb considered in table 3 let b be the survey-weighted estimator based on thefull-sample weights and let br be the corresponding estimator based on ther-th set of replicate weights Then the standard errors reported in table 3 arebased on the variance estimator
varethbTHORN frac14 1
Reth1 KTHORN2XR
rfrac141
ethbr bTHORNethbr bTHORN0
where Rfrac14 44 and Kfrac14 05 Similar comments apply to the standard errorsfor the mean and proportion estimates reported in tables 1 and 2 for thebias analyses reported in table 4 and for the standard errors used in theconstruction of the pointwise confidence intervals presented in figures 3and 4
C2 Goodness-of-Fit Test
In keeping with Archer and Lemeshow (2006) and Archer Lemeshow andHosmer (2007) we also considered goodness-of-fit tests for the logistic re-gression models 1 and 2 based on the F-adjusted mean residual test statisticQFadj as presented by Archer and Lemeshow (2006) and Archer et al(2007) A direct extension of the reasoning considered in Archer andLemeshow (2006) and Archer et al (2007) indicates that under the nullhypothesis of no lack of fit and additional conditions QFadjis distributedapproximately as a central Fethg1 fgthorn2THORN random variable where f frac14 43 andg frac14 10
Table 7 F-adjusted Mean Residual Goodness-of-Fit Test
Model F-adjusted goodness-of-fit test p-values
1 02922 0810
150 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Mul
tiva
riat
eL
ogis
tic
Mod
els
Pre
dict
ing
Con
sent
-to-
Lin
k(W
eigh
ted)
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Dem
ogra
phic
char
acte
rist
ics
Age
grou
p(3
2ndash65
)18
ndash32
034
35
0
0633
033
89
0
0634
030
44
0
0717
033
19
0
0742
032
82
0
0728
034
25
0
0640
033
68
0
0639
65thorn
0
2692
006
45
025
32
0
0656
019
71
006
13
023
87
0
0608
023
74
0
0630
026
94
0
0644
025
38
0
0653
Gen
der
(Mal
e)F
emal
e
001
610
0426
002
290
0421
003
960
0359
003
880
0363
000
200
0427
001
520
0425
002
070
0419
Rac
e(W
hite
)N
on-w
hite
009
490
1264
006
120
1260
007
810
1059
001
920
1056
003
220
1155
009
380
1269
005
930
1264
Spa
nish
inte
rvie
w(N
o) Yes
0
4234
029
23
044
520
2790
040
440
2930
042
710
3120
048
010
3118
042
870
2973
045
760
2846
Edu
catio
ngr
oup
(HS
grad
)L
ess
than
HS
030
970
1879
029
260
1835
001
420
1349
001
860
1383
033
17dagger
018
380
3081
018
850
2894
018
44S
ome
colle
ge0
4016
0
1423
037
39
013
89
009
860
1087
008
150
1129
040
66
013
890
3996
0
1442
036
95
014
09A
ssoc
iate
rsquosde
gree
0
3321
020
41
031
500
1993
011
090
1100
012
490
1145
028
580
2160
033
260
2036
031
600
1990
Bac
helo
rrsquos
degr
ee
006
540
2112
006
630
2082
003
830
0719
004
250
0742
002
300
2037
006
180
2127
005
810
2095
Con
tinue
d
Exploratory Assessment of Consent-to-Link 151
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Adv
ance
degr
ee
017
470
2762
015
120
2773
018
990
1215
019
410
1235
027
380
2482
017
200
2773
014
520
2796
Hom
eow
ner
(Ren
ter)
Ow
ner
0
2767
dagger0
1453
032
33
014
36
036
12
012
77
035
89
013
14
027
03dagger
013
97
027
54dagger
014
48
031
98
014
33T
otal
expe
nditu
res
(Log
)
006
050
0799
000
830
0800
010
250
0698
003
720
0754
004
190
0746
005
940
0802
000
650
0801
Inco
me
grou
pL
ess
than
$81
81
021
55
0
0604
0
4182
005
61
042
47
0
0577
021
42
006
09
Inco
me
impu
ted
(No) Y
es
014
87
005
13
014
81
005
10R
ace
gend
er
018
74dagger
009
56
019
53
009
23
022
68
009
30
018
73dagger
009
57
019
46
009
23O
wne
r
educ
atio
nL
ess
than
HS
049
31
020
66
046
32
019
96
047
50
020
45
049
29
020
67
046
37
020
02S
ome
colle
ge
065
75
0
1567
063
70
0
1613
0
6764
014
38
065
48
0
1578
063
09
0
1633
Ass
ocia
tersquos
degr
ee0
3407
dagger0
2013
034
02dagger
019
860
2616
020
900
3401
dagger0
2007
033
87dagger
019
78
Bac
helo
rrsquos
degr
ee0
1385
024
020
1398
023
680
1057
022
960
1358
024
090
1335
023
76
Adv
ance
degr
ee0
4713
028
470
4226
028
700
5867
0
2583
046
910
2854
041
830
2890
152 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Env
iron
men
tal
feat
ures
Reg
ion
(Nor
thea
st)
Mid
wes
t0
2097
dagger0
1234
021
11dagger
011
630
2602
0
1100
024
57
011
960
2352
dagger0
1208
020
98dagger
012
320
2111
dagger0
1159
Sou
th0
2451
0
1190
024
08
011
870
1841
011
410
2017
dagger0
1149
021
29dagger
011
520
2446
0
1189
023
99
011
87W
est
0
2670
0
1055
025
57
010
66
022
48
009
43
024
02
010
01
024
41
010
19
026
78
010
61
025
74
010
71U
rban
icity
(Rur
al)
Urb
an
002
680
0829
002
340
0824
002
530
0818
002
260
0833
002
490
0821
002
680
0829
002
330
0823
Rat
titud
epr
oxie
sC
onve
rted
refu
sal
(No) Y
es
007
210
0756
008
380
0763
0
0714
007
53
008
180
0760
Eff
ort(
Mod
erat
e)A
loto
fef
fort
054
54
020
810
6142
0
2065
054
18
021
010
6051
0
2082
Bar
e min
imum
effo
rt
0
4879
013
65
057
21
0
1371
0
4842
0
1387
056
26
0
1389
Doo
rste
pco
ncer
ns(N
one)
Too
busy
0
2207
014
85
021
370
1466
0
2192
014
91
021
060
1477
Pri
vacy
gov
rsquotco
ncer
ns
118
95
0
1667
122
41
0
1622
1
1891
016
66
122
31
0
1621
Oth
er1
0547
0
3261
102
55
032
551
0549
0
3259
102
67
032
52
Con
tinue
d
Exploratory Assessment of Consent-to-Link 153
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Tab
le8
Con
tinue
d
Var
iabl
eM
odel
1M
odel
2M
odel
AM
odel
BM
odel
CM
odel
DM
odel
E
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
EE
stim
ate
SE
Est
imat
eS
E
Doo
rste
pco
ncer
ns
effo
rtP
riva
cy
alo
tofe
ffor
t
058
84
027
89
053
12dagger
027
28
058
85
027
89
053
19dagger
027
25
Pri
vacy
min
imum
effo
rt
031
830
1918
026
010
1924
031
720
1915
025
810
1925
Bus
y
alo
tof
effo
rt
008
880
2736
008
900
2749
0
0841
027
58
007
850
2765
Bus
y
min
imum
effo
rt
022
380
2096
022
770
2095
022
190
2114
022
340
2112
Oth
er
alo
tof
effo
rt1
0794
dagger0
6224
104
770
6182
107
46dagger
062
441
0370
061
94
Oth
er
min
imum
effo
rt
0
9002
0
3610
087
77
035
93
089
64
036
17
086
93
035
97
Rap
port
(Pho
ne)
Per
sona
lV
isit
001
700
0428
003
950
0412
NO
TEmdash
daggerplt
010
plt
005
plt
001
plt
000
1
154 Yang Fricker and Eltinge
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019
Based on the approach outlined above table 7 reports the p-values associ-ated with QFadj computed for each of models 1 and 2 through use of the svy-logitgofado module for the Stata package (httpwwwpeoplevcuedukjarcherResearchdocumentssvylogitgofado)
For both models the p-values do not provide any indication of lack of fitSee also Korn and Graubard (1990) for further discussion of distributionalapproximations for variance-covariance matrices estimated from complex sur-vey data and for related quadratic-form test statistics Additional study of theproperties of these test statistics would be of interest but is beyond the scopeof the current paper
Table 9 F-adjusted Mean Residual Goodness-of-Fit Test p-values and test-statistic Model of ConsentObject Comparison
Model (of consent) 2nd revision F-adjusted Goodness-of-fit test p-values(test-statistic F(9 35))
1 0292 (1262)2 0810 (0573)A 0336 (1183)B 0929 (0396)C 0061 (2061)D 0022 (2576)E 0968 (0305)
Exploratory Assessment of Consent-to-Link 155
Dow
nloaded from httpsacadem
icoupcomjssam
article-abstract711184689177 by guest on 19 February 2019