+ All Categories
Home > Documents > Home | The University of Sheffield/file/... · Web viewThe NPRI includes the following Funding...

Home | The University of Sheffield/file/... · Web viewThe NPRI includes the following Funding...

Date post: 24-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
108
Copy of authors’ final peer-reviewed manuscript as accepted for publication (1 October 2013). Nepusz, T., Petróczi, A., Naughton, D., Epton, T., & Norman, P. (2014). Estimating the prevalence of socially sensitive behaviours: Attributing guilty and innocent noncompliance with the single sample count method. Psychological Methods, 19, 334-355. doi: 10.1037/a0034961 This article may not exactly replicate the final version published in the APA journal. It is not the copy of record.
Transcript

66

Running head: ATTRIBUTING NONCOMPLIANCE WITH THE SSC

Copy of authors’ final peer-reviewed manuscript as accepted for publication (1 October 2013).

Nepusz, T., Petróczi, A., Naughton, D., Epton, T., & Norman, P. (2014). Estimating the prevalence of socially sensitive behaviours: Attributing guilty and innocent noncompliance with the single sample count method. Psychological Methods, 19, 334-355. doi: 10.1037/a0034961

This article may not exactly replicate the final version published in the APA journal. It is not the copy of record.

Estimating the Prevalence of Socially Sensitive Behaviours: Attributing Guilty and Innocent Noncompliance with the Single Sample Count Method

Tamás Nepusz

Kingston University

Sixdegrees Ltd

Andrea Petróczi

Kingston University

University of Sheffield

Declan P. Naughton

Kingston University

Tracy Epton

University of Sheffield

Paul Norman

University of Sheffield

Author Note

Tamás Nepusz, Six Degrees Ltd. and Faculty of Science, Engineering and Computing, Kingston University; Andrea Petróczi, Faculty of Science, Engineering and Computing, Kingston University and Department of Psychology, University of Sheffield; Declan P. Naughton, Faculty of Science, Engineering and Computing, Kingston University; Tracy Epton, Department of Psychology, University of Sheffield; Paul Norman, Department of Psychology, University of Sheffield.

The study which included the recreational drug question was funded by the UK National Prevention Research Initiative (NPRI) Phase 4 (grant number: MR/J0004501/1). The NPRI includes the following Funding Partners (in alphabetical order): Alzheimer's Research Trust, Alzheimer's Society, Biotechnology and Biological Sciences Research Council, British Heart Foundation, Cancer Research UK, Chief Scientist Office, Scottish Government Health Directorate, Department of Health, Diabetes UK, Economic and Social Research Council, Health and Social Care Research and Development Division of the Public Health Agency (HSC & R&D Division), Medical Research Council, The Stroke Association, Wellcome Trust, Welsh Assembly Government and World Cancer Research Fund.

Correspondence concerning this article should be addressed to Andrea Petróczi, Faculty of Science, Engineering and Computing, Kingston University, Penrhyn Road, Kingston upon Thames, Surrey, KT1 2EE. [email protected]

Abstract

Prevalence estimation models, using randomised or fuzzy responses, provide protection against exposure to respondents beyond anonymity, and represent a useful research tool in socially sensitive situations. However, both guilty and innocent noncompliance can have a profound impact on prevalence estimations derived from these models. In this paper we introduce the maximum-likelihood extension of the Single Sample Count (SSC-MLE) estimation model to detect and attribute noncompliance through testing five competing hypotheses on possible ways of noncompliance. We demonstrate the ability of the SSC-MLE to estimate and attribute noncompliance with a single sample using the observed distribution of affirmative answers on recent recreational drug use from a sample of university students (N = 1,441). Based on the survey answers, the drug use prevalence was estimated at 17.62% (±6.75%), which is in line with relevant drug use statistics. Only 2.51% (1.54%) were noncompliant, of which 0.55% (0.44%) was attributed to guilty noncompliance (i.e., have used drugs but did not admit) and 2.17% (1.44%) to innocent noncompliers with no drug use in the past three months to hide. The SSC-MLE indirect estimation method represents an important tool for estimating the prevalence of a broad range of socially sensitive behaviours. Subsequent applications of the SSC-MLE to a range of transgressive behaviours with varying sensitivity will contribute to establishing the SSC-MLE’s performance properties, along with obtaining empirical evidence to test the underlying assumption of independence of noncompliance from involvement. Freely downloadable, user-friendly software to facilitate applications of the SSC-MLE model is provided.

Keywords: illicit drug, prevalence estimation, birthday distribution, cheating detection, randomized response

Estimating the Prevalence of Socially Sensitive Behaviours: Attributing Guilty and Innocent Noncompliance with the Single Sample Count Method

Owing to their ease of use and cost effectiveness, many prevalence studies rely on self-reports. However, obtaining truthful information on socially sensitive, potentially embarrassing and/or transgressive behaviours via direct self-reports is a challenging task because the reliability of any sensitive information gleaned from direct questioning is limited by the degree of respondents’ willingness to answer honestly. Generally it has been acknowledged that people’s willingness to disclose sensitive information is a function of the intrusiveness, associated risks and social desirability of the sensitive question (Tourangeau & Yan, 2007). To counterbalance the resistance to reveal sensitive information, even under anonymous conditions, indirect approaches to prevalence estimation of a sensitive behaviour have been widely utilised (Lensvelt-Mulders, Hox & van der Heijden, 2005a).

Indirect estimation models abate the intrusiveness of direct questioning, creating a transparently safe survey situation at the time of data collection by making it impossible to relate answers to individuals. This safe survey situation provides protection for the respondents and researchers alike by removing any chance of exposure for the respondent or potential legal obligations for the researcher to take action if an illegal activity becomes known. The use of estimation models has been shown to help to alleviate, to a degree, distortion from socially desirable responding (Lensvelt-Mulders et al., 2005a; Peeters, Lensvelt-Mulders & Lasthuizen, 2010). It is expected that the indirect estimation models will continue to attract interest in sociology (Lee & Lee, 2012) with an increase in presence in epidemiology and social psychology.

Although indirect estimation models have been widely used in epidemiology studies investigating social and health-related questions (Lensvelt-Mulders, 2005a), most developmental work has focussed on mathematical and methodological aspects. Over the past three decades, efforts have been made to improve the efficiency of the method (Lensvelt-Mulders, Hox, van der Heijden & Maas, 2005b; Peeters et al., 2010) and to provide practical guidelines for model selection (Nayak & Adeshiyan, 2009; Ulrich, Schröter, Striegel & Simon, 2012). With the development of a modified logistic regression model to accommodate variables from randomized responses (van den Hout, van der Heijden & Gilchrist, 2007) or item counts (Imai, 2011), along with the extension of the randomized response procedure to produce multiple-item scales to measure psychological constructs (Himmelfarb, 2008; Moshagen & Musch, 2011), the potential application of indirect methods to psychological research into sensitive constructs and concealed behavioural choices has increased significantly. However, whilst the literature provides ample evidence that indirect estimation models help to reduce evasiveness compared to direct self-reports (Lensvelt-Mulders et al., 2005a), a significant unexplained amount still remains.

The aim of this study is to help to close this gap by addressing noncompliance with a ‘noncompliance detection’ extension of the previously validated estimation model, the Single Sample Count (SSC) (Petroczi et al., 2011) in order to develop a better understanding of the magnitude and nature of noncompliance in anonymous random response surveys. In real life, respondents do not always follow survey instructions for various reasons. Here, we collectively refer to this phenomenon as noncompliance, which is not equivalent to deliberate cheating but rather, it can be a manifestation of the lack of motivation to participate properly, lack of understanding of the instructions and/or self-protective lying. Response bias distortion arising from noncompliance is likely to be the combination of guilty and innocent noncompliance. Therefore, for the purpose of this study, noncompliance is defined as ‘not following the survey instructions’, independently of motivation for doing so. Within the term ‘noncompliance’, guilty noncompliance refers to those who possess the sensitive attribute and noncompliant (e.g., due to lying or lack of understanding) whereas innocent noncompliance refers to those who do not have the sensitive attribute yet do not answer as instructed (e.g., due to lying or lack of understanding).

Indirect Estimation Models

Randomised response models. Following Warner's (1965) original development of the technique, where respondents can answer sensitive questions without risking any chance of exposure, a plethora of randomised response models have been developed (Lensvelt-Mulders et al., 2005a). The common feature of these models is that they instruct respondents how they should respond, based on the outcome of some randomisation device such as cards, dice or spinner; or according to an unrelated question with known probability, such as someone’s birthday or last digit of a phone number. Although numerous variations of indirect estimation models have been developed in the past 60 years, the Forced Response (Boruch, 1971) model is one of the most frequently used methods. This model is described below, but interested readers may refer to the literature (e.g., Lensvelt-Mulders et al., 2005a; 2005b) for information on other randomised response models.

In the Forced Response (FR) model (Boruch, 1971), respondents use an independent device to determine whether they should answer honestly or simply say ‘yes’ or ‘no’, irrespective of the truth to the sensitive question (e.g., “Have you used drugs in the last 3 months?”). For example, in a two-dice version, respondents are instructed to answer honestly if the summed score from the two dice is between 5 and 10 (inclusive), but to simply say ‘no’ for sums of 2, 3 and 4 or ‘yes’ for sums of 11 or 12. Naturally the method works with one die, other randomisation devices or altered probabilities. The key condition is that the outcome of the randomisation is only known to the respondents, thus a simple ‘yes’ or ‘no’ answer at the individual level is meaningless to the researcher, and thus protects both the respondent and the researcher. Because of the probability of getting certain sums from two six-side dice is known, we can take the forced ‘yes’ (16.7%) and ‘no’ (8.3%) responses into account and thus calculate the proportion of true ‘yes’ answers to the sensitive questions. The prevalence rate for the sensitive question then is calculated as:

where n is the sample size, 1 is the probability that the respondent is forced to say ‘yes’, 2 is the probability that the respondent is forced to answer a sensitive question honestly and is the observed proportion of ‘yes’ answers in the sample.

Non-randomised models. The common characteristics of randomised response models is that respondents are required to answer the sensitive question directly and to give a response that sounds discriminating (e.g., a forced ‘yes’), have attracted criticism and have been linked to increased noncompliance (Ostapczuk, Moshagen, Zhao & Musch, 2009). In contrast, in non-randomised response models respondents are not required to answer the sensitive question directly but rather, they are instructed to answer a set of questions with a single Yes/No or with a summative response by only giving the total number of affirmative answers without revealing their specific answers to each question individually. These models rely on deliberately fuzzy responses to provide protection as the researcher is unable to deduct the specific answer to the sensitive question from the single answer and hence the respondent is fully protected against exposure. For reviews of the nonrandom models, readers should consult James, Naughton and Petroczi (2012), Lensvelt-Mulders et al. (2005a, 2005b), Petroczi et al. (2011) and Tan, Tian and Tang (2009).

Item count models. The characteristic feature of the family of estimation models that use variations of item counts, as introduced by Raghavarao and Federer (1979), is that respondents answer the survey by indicating the total number of affirmative answers without giving specifics on which items solicited the 'yes' answers (Chaudhuri & Christofides, 2007; Dalton et al., 1994; Droitcour, Caspar, Hubbard, Parsley, Visscher & Ezzati, 1991; Hussain et al., 2012). In the item count models, estimation of the proportion of ‘yes’ answers to the sensitive target question is derived experimentally by simultaneously administering two questionnaires randomly to the same target population. These questionnaires may contain any number of innocuous questions (e.g., being abroad on holiday, having a driving licence, liking a certain type of music or sport, etc.) but only one version contains the sensitive target question (e.g., “I have used drugs in the last 3 months”). Assuming that the two random groups do not differ in the probability of the innocuous questions, the difference in the average number of affirmative answers then indicates the proportion of ‘yes’ answers to the sensitive target question.

An important negative consequence of this setup is the need for two independent samples to be drawn from the target population and the estimation of the probability of affirmative answer to the sensitive target question is only made for half of the sample (the half that received 4+1 questions). The other half (those who received the survey without the sensitive question), serves no other purpose than to establish a probability for the innocuous questions for the given sample. For example, in the Unmatched Count set-up (Dalton et al., 1994), the items of the control questionnaire are: “I have moved town in the past”, “I have a pet”, “I like to go to dancing” and “I have never been abroad”; whereas the experimental questionnaire contains these four statements and the sensitive target question of “I have used drugs in the past 3 months”. The general advantage of the item count technique is that respondents are not forced to give false (and potentially compromising) answers or required to answer the sensitive question directly. The major disadvantage is the need for dual sampling to establish prevalence rates for the innocuous questions. In contrast, the SSC (Petroczi et al., 2011) offers a facile solution to this problem by using innocuous questions with already known probability of 50/50 (e.g., birthdays, last digits of house or phone numbers). As such, it is a streamlined variation of the Unmatched Count model (Dalton et al., 1994) that affords the same level of protection without the need for dual sampling (Dalton et al., 1994) or additional information for an innocuous unrelated question (Hussain et al., 2012).

The baseline in the SSC model (Petroczi et al., 2011) is established with k innocuous questions with 50/50 probability (e.g., birthdays) instead of a control group of the original UC model. Thus the baseline is calculated as B(k*n, 0.5), where k is the number of the innocuous questions in the model and n is the sample size. This setup eliminates the need for a control sample, and hence it is more resource effective and has better face validity. An empirical dataset using the SSC method then contains n individual responses where each question in the model is ‘answered’ with zero or one but only the sum is recorded, which ranges between zero and k+1.

The prevalence rate estimation for the sample is calculated as the difference between the observed average of affirmative responses across the SSC model questions and the expected sum from the k innocuous questions:

where d is the unknown prevalence, n is the sample size and k is the number of innocuous questions with 50/50 probability. The 95% confidence interval is calculated using normal approximation as:

where d is the estimated proportion of ‘yes’ answers given to the sensitive target question, n is the sample size and Z(0.95) is 1.96.

The observed number of ‘affirmative’ answers is derived from the sum of two random variables, birthdays and the sensitive target question with distribution of B (k*n, 0.5) and B(n, d), where d is the population distribution of the sensitive target question and n is the number of respondents in the sample. The distribution of the sum of d and k is unknown, but we can use the normal approximation for a binomial distribution derived as mean = np and variance = n*p*(1-p), where n and p are the distribution of the two binomial parameters (Petroczi et al., 2011).

For example, assume that the sensitive target question is embedded in four birthday questions with 0.5 probability each. The survey then consists of five binary questions or statements: “My birthday falls in the first half of the year”, “My mother’s birthday is on an odd day”, “My birthday is in the second half of the month”, “My father’s birthday falls on an even numbered month” and “I have used drugs in the past three months”. Respondents answer with the total number of statements that are true for them (0, 1, 2, 3, 4 or 5), which can be added together for the entire sample. Let us say that we have 1,000 completed surveys where the sum of ‘yes’ answers is 2,200, giving a mean of 2.2. We are interested in the unknown probability of the sensitive target question and we know that the probability from the four innocuous question is 2.0 (= 4 * 0.5). Thus, we can calculate the probability of affirmative answers to the sensitive question by taking the difference between the observed mean of ‘yes’ answers (2.2) and the expected mean from the innocuous questions with 0.5 probability (2.0), which gives us 0.2 (or 20%) for the sensitive target question.

However, having four ‘yes’ answers to the birthday questions is expected in 6.25% of any large sample, thus the fifth ‘yes’ in this subgroup would automatically mean also having an affirmative answer to the sensitive question. In order to avoid potential exposure, respondents should have the option to protect themselves by ‘hiding’ the five affirmative answers. One way to do so is to be instructed to select ‘zero’ instead of ‘five’; whereas the alternative way is to select any other answer option randomly (Petroczi et al., 2011). Through a series of simulations, we have shown that from the numerical point of view, there is no difference between the ‘select zero’ and the ‘select any number’ variations. The results of this simulation are depicted in Figure 1. The advantage of the shared ‘0 or 5’ option is the quick and straightforward detection of noncompliance, which would not be so easily available if respondents are instructed to select any response option randomly instead of the ‘5’ (see the ‘Detecting noncompliance’ section below). However, the recorded sum of affirmative answers with the '0 or 5' ranges between zero and k, instead of k+1 and this option renders the simple calculation method presented in Eq.3 unsuitable. The alternative maximum-likelihood estimation method required for the exposure-free SSC model is presented later under Hypothesis 1.

Noncompliance Effect

Previous studies estimated a considerable presence up to 57% of noncompliance in self-reports about sensitive behaviours such as social security benefit fraud, doping, invalid insurance claims, dental hygiene, medication non-adherence and attitudes toward mental and physical disability (Böckenholt & van der Heijden, 2007; Böckenholt et al., 2009; Moshagen, Musch, Ostapczuk, & Zhao, 2010; Ostapczuk & Musch, 2011; Ostapczuk, Musch & Moshagen, 2011; Pitsch & Emrich, 2012; van den Hout, Böckenholt & van der Heijden, 2010). Although these studies estimated the proportion of noncompliance, they were not able to accurately assign a proportion or all of noncompliance to guilty noncompliance aiming to hide the truth about the sensitive attribute. Consequently, some of these studies have opted for either reporting noncompliance without making an attempt to adjust the overall prevalence rates (e.g., Pitsch & Emrich, 2012) and/or using it as a ‘worst case scenario’ upper bound (Böckenholt et al., 2009; Ostapczuk et al., 2011; Ostapczuk & Musch, 2011). Others have made an unsubstantiated assumption that noncompliance is equivalent to denial, hence have combined the rate from the honestly admitted behaviour with the rate of noncompliance to derive the ‘accurate’ estimation of the behaviour in question (e.g., Moshagen at al., 2010; Ostapczuk & Mush, 2011). On one hand, reporting noncompliance without adjustment to the overall prevalence rate is safe but less informative. On the other hand, the assumption that noncompliance is equivalent to guilty noncompliance (i.e., only those who possess the target attribute would be deliberately noncompliant) most likely produces an inflated (hence inaccurate) prevalence figure.

Owing to the complexity of the instructions of the estimation models, the likelihood of innocent noncompliance could be larger than it is with direct questioning, potentially having a distortion effect on the model outcomes. The susceptibility of the randomised models to noncompliance has also been highlighted (Boeije & Lensvelt-Mulders, 2002; James, Nepusz, Naughton & Petroczi, 2013; Lensvelt-Mulders & Boeije, 2007; Ostapczuk & Musch, 2011). Notably, one study with known prevalence rate, not only revealed that approximately half of the respondents lied on the survey regarding their behavior, but also that lying was likely to be driven by the fear of repercussions (van der Heijden, van Gils, Bouts & Hox, 2000). In particular, all respondents surveyed in the study were known to have committed social security fraud, thus the true prevalence rate for the sensitive target question was 100%. However, this a priori knowledge was not revealed to the interviewers or the respondents when they were asked about their fraudulent behaviour. Attempts to address this shortcoming have used various statistical techniques to estimate the proportion of noncompliance (e.g. Böckenholt, Barlas & van der Heijden, 2009; Böckenholt & van der Heijden, 2007; Cruyff, van den Hout, van der Heijden & Böckenholt, 2007; Cruyff, Böckenholt, van den Hout, & van der Heijden, 2008; Moshagen & Musch, 2010; Moshagen, Musch & Erdfelder, 2012; Moshagen et al., 2010; Ostapczuk et al., 2011; van den Hout & Klugkist, 2009), which usually requires a dual sampling design which puts additional strain on data collection.

Qualitative approaches to reduce noncompliance have emphasized the importance of building trust (Böckenholt & van der Heijden, 2007; De Schrijver, 2012; Landsheer, van der Heijden & van Gils, 1999; Moshagen et al., 2012) and avoiding forced responses (Boeije & Lensvelt-Mulders, 2002). Simple and straightforward instructions that foster a clear understanding of what is required from the participant can also help to reduce non-deliberate noncompliance (Böckenholt & van der Heijden, 2007; De Schrijver, 2012), as can paying attention to the language used in formulating the statements. People generally respond more truthfully if the sensitive statement makes allowance for their rationalization for the incriminating behaviour, or is more in line with their motives for that behaviour (Lensvelt-Mulders & Boeije, 2007). Counterbalancing the sensitive statements phrased affirmatively (I did) and negatively (I did not) so the ‘yes’ answer does not always mean incrimination when multiple questions are used also helps to gain trust. However, mixing affirmative and negative statements within the same model places larger cognitive load on respondents and thus increases the chance of random responding. In addition, comparative studies have shown that providing a greater degree of protection by selecting the model best suited to the sensitivity of the question and participant characteristics such as educational level (Böckenholt et al., 2009), results in a higher attained prevalence rate of the same behavior compared to direct questioning or alternative models, hence suggesting a reduction in evasive responding. Using symmetric models where neither the ‘yes’ nor the ‘no’ answer can imply the respondent’s true status with regard to the sensitive attribute is highly recommended (Ostapczuk et al., 2011). It is particularly important for models using forced responses and can be ensured by counterbalancing the questions as explained above or manipulating the probabilities to create the impression that there is an equal chance of being asked to answer honestly or forced to say ‘yes’ or ‘no’.

Although it is widely assumed that noncompliance arises from self-protection, thus lowering the prevalence estimates, empirical studies acknowledge that whilst it is possible to estimate the proportion of non-compliant respondents, both the reasons behind noncompliance and what the true answer would be can only be speculated (Ostapczuk et al., 2011; Ostapczuk & Musch, 2011; Pitsch & Emrich, 2012). Automatically equating noncompliance with deliberate cheating is an incorrect way of dealing with noncompliance and can easily lead to inflated prevalence rates. A recent advance using a ‘stochastic lie detector’ (Moshagen et al., 2012), as an extension to Mangat’s (1994) version of the randomised response model, directly estimates the extent of truthful responding rather than the magnitude of non-adherence to the model instructions. In Mangat’s (1994) model, individuals who possess the target attribute are requested to answer honestly whereas those who do not should answer according to the outcome of the randomisation process. Moshagen et al.’s (2012) stochastic lie detector advances on this model by introducing an additional variable for compliance/noncompliance. However, estimating the two unknown parameters, namely the proportion of those who carry the target attribute and the proportion of these who are noncompliant, requires two independent samples to be drawn randomly from the population and administration of two surveys containing different randomisation probabilities (Clark & Desharnais, 1998).

In the stochastic lie detector design, the sample is randomly split in half and respondents are either assigned to a ‘high randomisation probability’ (based on the respondent’s birthday falling between January and October, 83.3%) or to a ‘low randomisation probability’ (based on the respondent’s birthday falling on November or December; 16.7%) condition. Respondents in both groups are presented with both the affirmative sensitive statement (“I have intentionally hurt my partner physically”) and its negation (“I have never intentionally hurt my partner physically”). Under the high probability condition, which is vital for estimating the sensitive attribute and the magnitude of noncompliance because the difference between the two conditions serves the basis for the stochastic lie detector, those with the sensitive attribute are instructed to respond to the affirmative sensitive statement if their birthday is between January and October (inclusive). Respondents in the low probability condition are asked to respond the sensitive affirmative statement if they were born in November or December. The respective counterparts (people with the sensitive attribute and birthday in November or December in the high probability condition and people without the sensitive attribute and having a birthday between January and October in the low probability condition) are instructed to answer the negated versions. However, this model, like its predecessors, assumes that noncompliance only occurs among those who are guilty of the target behaviour. Consequently, it ignores the proportion of noncompliance among the innocent potentially arising from misunderstanding or lack of motivation to answer truthfully. The latter is an important element in devising an effective and efficient tool. Further limitations of the stochastic lie detection method are the complexity of the instructions and the need to use negative statements to which respondents may have to respond affirmatively.

In line with the stochastic lie detector (Moshagen et al., 2012), the noncompliance-corrected SSC model introduced in this paper also contains two unknown variables: the true prevalence of the target attribute and the rate of compliance/noncompliance, thus the model has two ‘real’ parameters (attribute d and noncompliance c), and an additional df from the hypothesized ways of noncompliance. Contrary to the single df in binary response models used with cheating detection (e.g., Clark & Desharnais, 1998; Moshagen et al., 2012), data from the SSC model has 4 dfs. This increase in dfs allows us to work with several hypothesized models of plausible noncompliance and thus differentiate between guilty and innocent noncompliance. The values of the two unknown parameters are estimated with the method of maximum likelihood (ML). Simply put, the ML method finds the parameter combination of the model for which the probability of observing the exact empirical data at hand is the highest. One key advantage of the ML method, which is utilised in the work presented here, is that likelihood functions can be used to test competing hypotheses about models and parameters such as the presence and ways of noncompliance.

Maximum Likelihood Estimation of Prevalence and Noncompliance in SSC

The SSC model with four innocuous but personal questions affords more flexibility in creating a combination of personal information that is feasible, accessible and ensures the desired level of confidence in respondents (James et al., 2013). The probability of having an affirmative answer to the 4 innocuous questions is 0.5 each and these are assumed to be independent from the sensitive target question. Estimates of the target socially sensitive attribute (e.g., drug use), prevalence d and the noncompliance probability c are determined using the ML principle, which sets the estimates of d and c to the values where it is the most likely that, given these values, we would have observed exactly the same result as what we have seen in the empirical dataset. That is, with the ML function we test all possible combinations of d and c (assuming that they both present in the data but occur independently of each other) against the observed distribution of the empirical data and select the best fitting scenario. In order to apply the ML principle, one must calculate the probability of observing a given sample if the underlying drug use and probabilities of noncompliance are known (this is called the likelihood of the particular parameter configuration), then take the derivative of the likelihood with respect to c and d and make the derivative equal to zero in order to determine where the local extrema lie in the d-c space.

Hypothetical Models of Noncompliance

Detecting noncompliance in the SSC with the ‘0 or 5’ response option is straightforward. Here, p of 0 is .0625 * (1-d), and the p of 5 is 0.0625 * d, hence p of ‘0 or 5’ is independent from d, thus p of ‘0 or 5’ is 1/16 (6.25%). The significant difference between the observed p and the expected p = .0625 is the evidence for noncompliance.

In order to take ‘noncompliance’ into account, we need to have an idea about how it may happen and what is the relationship between noncompliance and the sensitive question. The following hypotheses outline five theoretical possibilities with the null hypothesis (H0) representing the scenario where there is no drug use and everyone is compliant, whereas H1 represents the default assumption that evidence for drug use is present but everyone follows the instructions and answers honestly. The other five alternative hypotheses, all assuming that at least some people in the sample have used drugs in the last three months, considers the way noncompliance might happen. In the following section we discuss these scenarios individually, and present the mathematical representations used for estimation of drug use and guilty/innocent noncompliance in each. In all cases, we will work with the logarithm of the likelihood instead of the actual likelihood, because the derivatives of the log-likelihood are easier to treat analytically, and both the likelihood and the log-likelihood attain their maxima at the same d and c.

Null hypothesis: No one takes drugs and everyone responds honestly. Let x denote the vector of observed responses, i.e., let x0 denote the number of observed ‘0 or 5’ responses, x1 denote the number of observed ‘1’ responses, x2 denote the number of observed ‘2’ responses and so on. Furthermore, let pi denote the probability of the event that i out of the 4 non-sensitive questions are true for a given respondent. Because everyone responds honestly, the log-likelihood is given as follows (note that it does not depend on c or d as we assumed a priori that both c and d are zeros in the null hypothesis):

Note that pi is known because every non-sensitive question has a 0.5 probability for the ‘yes’ answer, hence the number of yes responses for the non-sensitive questions follows a binomial distribution with n = 4 and p = .5.

The exact values of pi are as follows:

Also note that the above formula of the log-likelihood assumes that we consider the responses as an ordered sequence; in other words, we have assumed that we know that the response of the 1st respondent was z1, the response of the 2nd respondent was z2 and so on, therefore calculating the probability of seeing this response set simply means that we multiply the probabilities of the individual responses together (or sum the log-probabilities for the log-likelihood, as above). Alternatively, we could have assumed that we only know how many of the responses were ‘0 or 5’, ‘1’, ‘2’ and so on, without knowing the order of them. This distinction makes no difference in the end, as discussed later.

The null hypothesis is interesting for us for one particular reason: it gives us a baseline with which we can compare the likelihoods of our ‘alternative’ hypotheses. Dividing the likelihood of a hypothesis with the likelihood of the null hypothesis gives us the likelihood ratio of the two hypotheses. If the likelihood ratio is larger than 1, it means that the ‘alternative’ hypothesis is more likely than the null hypothesis (and the ratio gives us how many times it is more likely). Similarly, if it is smaller than 1, it means that the null hypothesis is more likely than the ‘alternative’ hypothesis. Note that the likelihood ratio can also be calculated as exp(logL1 - logL0), where logL0 and logL1 are the log-likelihoods of the null and the ‘alternative’ hypotheses, respectively.

Hypothesis 1: Drug use prevalence is d but otherwise everyone responds honestly. Because there is a non-zero drug use prevalence in this case, the probability that the user actually responds X in the SSC question is now not equal to px, e.g., the probability of pressing ‘0 or 5’ is now the sum of pressing ‘0 or 5’ if the respondent did not take drugs, plus the probability of pressing ‘0 or 5’ if the respondent used drugs in the last three months.

Let qi denote the probability of pressing response button i. qi is given as follows:

This results in the following values for qi:

Note: q0 denotes the probability of pressing the ‘0 or 5’ response button throughout.

The log-likelihood is as follows:

Taking the derivative with respect to d and making it equal to zero yields:

which is a cubic equation in d and can be solved easily for d after trivial algebraic manipulations. As a cubic equation has three roots, we must always take the one that is real and falls between 0 and 1. It is the case because these are natural constraints for d. Should none of the real roots fall between 0 and 1, it indicates that the extrema for d are ‘outside’ the 0-1 interval, therefore it is enough to calculate the log-likelihood for d=0 and d=1 and choose the d which has the higher log-likelihood.

To confirm that the extremum is indeed a local maximum and not a local minimum, we must also investigate the second derivative of the log-likelihood. The second derivative must be negative:

This holds for every value of d because all the numerators and denominators are positive.

Hypothesis 2: Drug use prevalence is d and noncompliant respondents give a random answer with probability c. In this hypothesis, each respondent may give an honest answer with probability 1-c or decide to give a completely random answer with probability c. The probability of noncompliance is completely independent from whether the respondent uses drugs or not. The general form of the log-likelihood is the same as above:

However, note that the probabilities qi (pressing a given response button) are now different as they must take into account both c and d:

The procedure is then the same as before in the case of H1, and the same for the subsequent hypotheses: Taking the derivatives with respect to d and c lead to two cubic equations (one is cubic in d when c is assumed to be constant, the other is cubic in c when d is assumed to be constant), which must be solved together as an equation system in order to find the optimal combination of c and d.

Hypothesis 3: Drug use prevalence is d and noncompliant respondents choose ‘0 or 5’ with probability c. In this scenario, we work with the possibility that noncompliance happens by selecting ‘0 or 5’. This is the closest SSC-equivalent for ‘self-protective no-saying’ in binary models, which is what respondents may opt for in order to avoid any possible connection with the sensitive attribute. (Note that selecting zero would offer a better protection but in the exposure-free SSC model, this option is not available.) ‘Self-protective no saying’ refers to the notion of ignoring the instructions and simply saying ‘no’ as it can never be interpreted as admission.

The derivations are again the same as in H1, but with different qi, which is to reflect that instead of choosing randomly, noncompliance respondent are all selecting the ‘0 or 5’ option:

Hypothesis 4: Drug use prevalence is d and noncompliant respondents choose randomly from ‘0 or 5’, ‘1’ or ‘2’ with probability c. This hypothesis assumes that noncompliant respondents prefer to respond by selecting an option from the lower half of the scale. Intuitively, the logic behind is similar to the ‘self-protective no-saying’. This option may be preferred if respondents with true answer (even without having an affirmative answer to the sensitive target question) would be ‘3’ or ‘4’ as the high numbers could be read as a higher likelihood for having the sensitive attribute.

The derivations are again the same as above, but with different qi reflecting that noncompliant respondents are selecting either the ‘0 or 5’, or ‘1’ or ‘2’ option instead of their true answer:

Hypothesis 5: Drug use prevalence is d and noncompliance respondents choose ‘3’ or ‘4’ with probability c. This scenario is included for the sake of completeness. Given that a high number of affirmative answer can be interpreted that it is likely to include a ‘yes’ to the sensitive question, it seems unlikely that a noncompliant respondent would deliberately create a potentially more compromising situation.

The derivations are again the same as above, but with different qi reflecting the changes in the probability of selecting each response button with noncompliance:

Hypothesis 6: Drug use prevalence is d and noncompliant respondents saying one less than the truth if they are using drugs (i.e., not admitting the use of drugs) with probability c. Similarly to H4, this method of noncompliance may be used if the number of true answers (even without having an affirmative answer to the sensitive target question) would be on the high end of the scale. In order to avoid the high number being read as a higher likelihood for having the sensitive attribute, participants may reduce the total number of affirmative answers in their response.

The derivations are again the same as above, but with different qi reflecting the changes in the probability of selecting each response button:

Note that this hypothesis is practically equivalent to H1 if the parameters are chosen appropriately. Assume that the drug use probability in H1 is denoted by d1 and the parameters in this hypothesis are denoted by d5 and c5. By choosing d5(1-c5)=d1, one can make the two hypotheses completely equivalent. This, coupled with the one extra parameter in H6, means that its maximum log-likelihood never exceeds those from the other Hs whereas the fit indices (e.g., Akaike's Information Criterion and Bayesian Information Criterion) will be higher owing to the increased complexity. Consequently, this model cannot outperform the others and thus is not considered further.

Model Fit and Selection

The prediction model has two ‘real’ parameters (d, the drug use prevalence, and c, the probability of noncompliance), and at least one additional degree of freedom from the hypothesized method of noncompliance, resulting in minimum three degrees of freedom in our hypothesized model. The input model (SSC) has only four degrees of freedom because the fifth fraction is determined by the other four to add up to 100%. Therefore the degrees of freedom of the prediction model (df = 3) and the degrees of freedom of the input model (df = 4) being close could result in a danger of overfitting the model.

To counterbalance this to a degree, along with the likelihood values, we used Akaike's Information Criterion (AIC) and Bayesian Information Criterion (BIC) to select the best model. Although AIC and BIC are based on different theoretical approaches and goals, from the application point of view, both aim to identify good models whilst penalising complexity. The difference between the two lies in their respective definition of a ‘good model’. From the computational point of view, both include penalties for the number of parameters; however, whilst the AIC (Akaike, 1973, 1974) has stronger penalties, it does not take the sample size into account, whereas the BIC (Akaike, 1978; Schwarz, 1978) does. Therefore, the best fitting model is selected based on having a high log likelihood value coupled with low AIC/BIC values.

Deriving Confidence Intervals for the Maximum Likelihood Method

In order to calculate confidence intervals (CIs) for the above models, first we temporarily assume that we have only one parameter d for which we would like to determine a 95% CI. A 95%CI means that the real value of the parameter is expected to be within this interval with a 95% probability, given the data we have observed. Now, recall that the likelihood of a given parameter value for d is the probability of observing the data we have at hand if the true value of the parameter was the one we have entered into the likelihood function. Therefore, the 95%CI can easily be determined by finding an interval [a, b] centred on our ML estimate of the parameter value such that the integral of the likelihood function along this interval is 0.95. When we have more than one parameter (e.g., d and c), we usually wish to find confidence intervals such that the value of d is within d's confidence interval with probability 0.95 and the value of c is within c's confidence interval with probability 0.95, given the data. Because the likelihood function depends both on d and c, it is not a single one-dimensional function now but a surface in 3D space where the x and y axes represent the values of d and c, and the z axis is the value of the likelihood function. Here one can either try to find a ‘confidence area’ on the d-c plane such that the integral of the likelihood function (which is the volume under the surface) is 0.95, or take the projections of the likelihood surface to the x and y axes, respectively, obtaining two one-dimensional functions (one for d and one for c) for which independent confidence intervals can now be derived. Here, the projected one-dimensional functions represent the marginal probabilities.

Example Application of the SSC-MLE to Student Drug Use

Methods

Sample. The data collection was conducted in the Fall semester during the 2012/13 academic year in the UK at a leading university with a global reputation. The university is a research intensive university, located in the north of England with outstanding records in both teaching and research. It accommodates some 25,000 undergraduate (~70%) and postgraduate (~30%) students.

The SSC model. The target sensitive question “I have used recreational drugs in the last three months” was embedded in four questions about the birthday of a person of the respondent’s choice, each with assumed 50/50 probability. These questions were selected based on 31,159,563 live birth records in England and Wales from 1993 to 2009, obtained from the Office for National Statistics (Table 1), and previously reported statistics (Petroczi et al., 2011). Odd vs. even years of the birth year, which also features 50/50 distribution, was not used owing to the expected narrow age range (incoming university students) in the current sample. The five statements were presented with bullet points, not numbers, to avoid confusion between the position of the sensitive question and the number of ‘yes’ responses. To mitigate against potential exposure, the SSC model was set up so that ‘5’ and ‘0’ shared the same response button.

The survey, containing the SSC, was administered to incoming university students as part of a larger research project (Epton et al., 2013). The questions and response options of the SSC are presented in Appendix A. The survey was administered online using SurveyGizmo (2012). The link to the survey was emailed to all incoming new undergraduate students (4,611), two weeks before entering university, of whom 1,445 completed the survey. For the drug use question section, respondents were instructed to indicate the total number of ‘yes’ answers without revealing which ones were affirmative. A total of 1,441 respondents (41.6% male) provided a valid answer on the SSC. The mean age of the sample was 18.90 years 2.49. In order to facilitate comparison with national statistics, the dataset was also filtered for UK students (n = 1,083), as well as using gender and age for detailed prevalence estimations. For age groups, the sample was split into those who were likely to enter university directly from secondary education (18 years of age and under, n = 890, M = 17.97 ± 0.17) and older (19 years and above, n = 554, M = 20.39 ± 3.53). Gender and age group were independent (Fisher’s exact test p = .821).

Data analysis. Data were analysed using standalone software, which was developed in house. The software includes the five hypotheses presented in this paper, along with a re-sampling option for checking the reliability of the estimates (see Appendix B for illustration), The software can be downloaded free of charge for non-commercial use from http://staffnet.kingston.ac.uk/~ku36087/ssc-mle/ or requested from the corresponding author.

Prevalence estimates from the raw data are obtained using covariance matrix adaptation evolution strategy (CMA-ES) optimization algorithm (Hansen, Müller & Koumoutsakos, 2003) to find a local minimum. The program restarts the search 100 times from random positions and the best solution with the largest log-likelihood value is kept. This CMA-ES-optimized approach is a time-effective alternative to the exhaustive 'brute force' search strategy of evaluating the log-likelihood for every possible combination of the presence of the target attribute and noncompliance independently for each scenario as outlined in H2-H5. The combination of the target attribute (d) and noncompliance (c) in these hypotheses lead to two cubic equations (one is cubic in d when c is assumed to be constant, the other is cubic in c when d is assumed to be constant), which must be solved together as an equation system in order to find the optimal combination of c and d. Because this is not tractable analytically, the ‘brute force’ search strategy evaluates the log-likelihood for every possible pair of d and c with a step size of 10-4 for each hypothesis and chooses the pair that yields the largest log-likelihood taking the derivatives with respect to d and c. When we compared the CMA-ES optimization algorithm to an exhaustive ‘brute-force’ search strategy, the CMA-ES returned the same results up to four decimal digits as the brute force search, but much faster, providing evidence that the CMA-ES approach is very effective. Specifically, the average run time with CMA-ES optimization algorithm using MacBook Pro 2,83 GHz took 2.057 seconds on our data. In comparison, the exhaustive brute force search strategy with 10-4 grid (i.e., the optimized result is accurate to four decimal digits) took more than 50 times longer at 105.235 seconds.

Thus using the CMA-ES-optimized algorithm, the corresponding confidence intervals for d and c were calculated as:

where 1.96 is the approximate value of the 97.5 percentile point (3SD) of the normal distribution and L’’ is the second derivative of the log-likelihood for the maximum.

This approach yields the same CIs up to two decimal points as the significantly more time-consuming ML approximation of confidence intervals for d and c, described in the previous section.

Prevalence Estimations for Recreational Drug Use

Valid responses to the SSC question were provided by 1,441 students. The frequency counts for the full sample, along with gender and age-group breakdowns, are provided in Appendix C. The observed p of ‘0 or 5’ was .086, which is higher than the expected .0625 if everybody is compliant (p < .001), thus indicating the presence of noncompliance to some degree. Therefore, the ML method was used to estimate the proportion of drug use (d) and noncompliance (c).

Based on the log-likelihood and AIC/BIC criteria, the models were ranked as: H3 > H4 > H2 > H1 = H5 > H0. Estimated proportions of the ‘drug use’ by ‘noncompliance’ under the competing models are depicted in Figure 2. Log-likelihood and AIC/BIC values, along with estimated values for d and c under each assumptions are presented in Table 2. Notably, three of the hypotheses yielded the same estimated prevalence for drug use. Of these three (H1, H3 and H5), the log-likelihood values and AIC/BIC indices suggest that the H3 scenario is a more likely explanation for the observed variance in our dataset than any of the other two hypotheses (H1 or H5), even if in this particular case, this does not result in a different prevalence estimation. Furthermore, we have evidence from the number of answers to the ‘0 or 5’ question to reject H1 (no presence of noncompliance) at p < .001.

Despite the same outcome from our present data, H1 is not necessarily equivalent to H5. Note that hypotheses H2-H5 all include H1 as a special case when c = 0. In our particular dataset, the noncompliance mechanism assumed for H5 was so far from the truth that assuming zero noncompliance still yielded a higher log-likelihood value for H5 than assuming any other prevalence of noncompliance. However, this is not always the case. Hypothetically, in a sample where everyone is noncompliant and chooses '3' or '4' with equal probability, H5 would yield a high log-likelihood with c = 1 and any d, while all the other hypotheses would result in a lower log-likelihood. Therefore, H1 and H5 cannot be considered completely equivalent. In realistic samples, H5 would likely result in zero estimated noncompliance because the noncompliance mechanism in H5 is very unlikely in the real world.

Of the 1,441 respondents, model H3 estimates that 17.62% (±6.75%) admitted using recreational drugs in the past three months and only 2.51% (1.54%) were noncompliant, of whom 0.55% () were guilty noncompliers (i.e., have used drugs but did not admit) and 2.17% () were innocent noncompliers with no drug use in the past three months to hide. For detailed calculation and explanation, see Appendix D. When the analysis was limited to UK students only, who contributed to 75% of the sample, the prevalence rate of drug use in the last three months was 18.72% (±7.85%); with a slightly reduced noncompliance rate of 0.42% (±0.44%) guilty and 1.37% (±1.5%) innocent noncompliance. This prevalence rate is in line with recent statistics, based on structured anonymous questionnaires, reporting that among the 16-24 year olds 20% used drugs at least once within a year, and 11.6% within the last month (UK Focal Point on Drugs, 2011).

Admitted recent drug use estimation by gender and age group (under 20 years of age and 20 years and older) revealed differences in both drug use prevalence and noncompliance (Figure 3). Estimations and model parameters for all five hypotheses are provided in Table 3. The estimated noncompliance was slightly higher in the female sub-sample (‘0 or 5’ =73/839, p = .0031), compared to the males (for ‘0 or 5’ = 51/600, p = .0173); and for the 19 year olds and older (‘0 or 5’ = 62/551, p < .001 ), compared to the 18 year olds (‘0 or 5’ = 62/889, p = .2034), where p values are for H0: p for ‘0 or 5’ = .0625. A slightly higher rate of noncompliance among females has been observed previously (Moshagen et al., 2010), but the opposite trend was noted in doping research (Pitsch & Emrich, 2012). The best fitting model was H3 in all sub-samples except for the younger group, where no evidence for noncompliance was detected (therefore H1 and H3 led to the same estimated prevalence).

The gender difference in recent drug use was expected from literature precedence. The UK Focal Point on Drugs (2011) noted a gender difference in all age groups, with the average reported drug use being twice as high for males as for females. In our sample, the estimated drug prevalence admitted by males was, as expected, at a higher rate of 21.3% (±10.9%), compared to 14.7% (±8.6%) females. The estimated noncompliance rates were similar in both gender groups, 2.4% (±2.4%) and 2.6% (±2.03%), respectively. In the ‘younger age group’, 97.5% (and 60.1% of the full sample) was 18 years old, hence it can be assumed that most entered to university directly from secondary education. The estimated prevalence rate for drug use was lower for this group than for their slightly older counterparts (13.96% ± 8.26% and 24.37% ± 11.62%, respectively).

Reliability of the Estimated Parameters

The reliability of the estimated parameters for each hypothesis was estimated by a re-sampling procedure. We generated 1000 artificial response sets from the true response set by randomly selecting 80% of the true responses using the full sample (N = 1,441) into each artificial response set without replacement. The hypothesis parameters were then estimated for each artificial sample and the estimates were averaged across the samples. We observed that the mean values of the parameter estimates for d and c in the artificial samples matched the estimates obtained from the true response set with a maximum difference of 0.006, confirming the reliability of our approach at this sample size. Detailed results from the re-sampling simulation are presented in Appendix E (Table E1).

Discussion

In the empirical data, the estimated noncompliance was relatively low, ranging between 1% to 10%. This is probably due to a combination of various factors. Firstly, the SSC provided complete protection. Previous studies showed that intuitively understanding the indirect mechanism is necessary to have trust in the method (Böckenholt & van der Heijden, 2007; De Schrijver, 2012; Landsheer et al., 1999; Moshagen et al., 2012). The sample population consisted of well-educated individuals who gained admission to undergraduate programmes at a world top-100 university, hence it is fair to assume that they had the necessary understanding and thus trust in the method. Secondly, using recreational drugs in this age group is no longer perceived as an immensely sensitive issue as ‘sensible’ recreational drug use has been increasingly accepted as part of young adults’ social life (Parker, Williams & Aldridge, 2002). Consequently, in the present sample, noncompliance made only a small difference to the prevalence estimation. However, the extended SSC-MLE model can also be employed in situations investigating more contentious issues.

Strengths and Limitations

The 4+1 exposure-free SSC model embeds the sensitive target question in four innocuous questions and respondents are asked to indicate only the total number of ‘yes’ answers by selecting one response option from ‘0 or 5’, ‘1', ‘2’, ‘3’ or ‘4’, without revealing which questions solicited affirmative answers. The shared ‘0 or 5’ option offers a very quick and straightforward way to detect the presence and magnitude of noncompliance. This key characteristic of the item count models and the SSC leads to several possible ways of inaccurate responding. Paradoxically, this feature is also an advantage of the SSC over other estimation models with binary outcomes when noncompliance is to be considered. Contrary to the one df arising from binary models, the SSC model has four dfs which in turn affords increased flexibility for modelling various ways of noncompliance.

The proposed SSC-MLE makes a unique contribution to the methodological research aiming to improve the item count technique. As Imai (2011) noted, such development is limited to a handful of studies despite the growing application of this technique in various fields. These advances to date have focussed on improving efficiency (e.g., Chaudhuri & Christofides, 2007) and the applicability of the method to research where linking participant characteristics to the sensitive question is important (Imai, 2011). The SSC-MLE presents an extension that improves the item count method on two accounts: (a) it eliminates the potential exposure from having a full set of affirmative answers (i.e., affirmative answer of all innocuous and the target question) and at the same time (b) it addresses noncompliance.

The key limitation of the SSC-MLE method, along with other item count models based on difference-in-means estimators, is the relative loss of efficiency owing to having wider confidence intervals compared to estimation models with a binary outcome, which needs to be compensated with a larger sample size. Furthermore, it is not currently possible to use responses from the SSC-MLE model in traditional correlational research designs or in regression models, because the '0 or 5' response carries no information on the sensitive attribute; i.e., the probability of the ‘0 or 5’ response is independent of the model parameter d (see our derivations under ‘Detecting noncompliance’). The ‘selecting randomly instead of 5’ could offer a feasible alternative, but this question has to be addressed in future research. Finally, when attributing noncompliance to the guilty and innocent proportions of the sample, the estimations are based on the assumption that the sensitive attribute and noncompliance are statistically independent. Further research is needed to explore the effect of the alternative assumptions regarding the nature of the relationship between possession of the sensitive attribute and noncompliance.

The Assumption of Independence of Noncompliance and Involvement

In order to deal with noncompliance, one has to make an assumption about the nature of noncompliance. One assumption, which dominates the field of prevalence estimation models, is that only those who have something to hide and wish to hide it will be noncompliant. Alternatively, it could be assumed that people can be noncompliant even if they have nothing to hide, potentially leading to under- as well as over-estimation of prevalence (James et al., 2013). Given that the underlying assumption about noncompliance is key to developing noncompliance-detection extensions, consideration given to this issue has been surprisingly scarce with empirical evidence mostly lacking. The present study also falls short on this aspect as we have no empirical evidence for the assumption of independence of noncompliance and involvement. Although additional parameters can be built into the model reflecting the probability of noncompliance if involved (c1) and not involved (c2), along with the probability of involvement (d), it could easily result in serious overfitting of the model. Increasing the number of innocuous questions in the model to create a 5+1 or a 6+1 setup (Petroczi et al., 2011) to increase the available dfs somewhat mitigates against this concern, but the increased number of questions also increases the cognitive demand on respondents; unfortunately no ideal ratio of known to unknown parameters has been established for ML methods.

Furthermore, we investigated whether introducing dependence between involvement (d) and noncompliance (c) in any hypothesis about noncompliance can result in an estimation model with higher log-likelihood value for any of the alternative scenarios in which d and c are not independent and found that the log likelihood value for d – c1 – c2 can only match but not exceed the one obtained for the independent d - c solution. It is because for every d – c1 – c2 combination, where d denotes involvement, c1 denotes innocent noncompliance and c2 denotes guilty noncompliance in the non-independent model, there is a corresponding d - c solution. The same log-likelihood values for d - c and its corresponding d – c1 – c2 means that there are two equally possible solutions but with different parameters for involvement and the various forms of noncompliance. Following the Occam’s razor principle, the independent version (d - c) should be favoured for its relative simplicity.

A more desirable approach to delineate the noncompliance conundrum is through experiments and tightly controlled field studies, potentially involving some objective verification of the presence or absence of the discriminating behaviour. It is generally accepted that protection guaranteed by anonymity increases respondents’ willingness to answer sensitive questions honestly (Tourangeau & Yan, 2007) and thus it is widely used in survey research to facilitate obtaining more accurate information on sensitive or transgressive behaviour by reducing social desirability bias. However, anonymity does not necessarily increase people’s motivation to participate (Peeters et al., 2010), nor prevent them claiming behaviour that is actually absent either arising from uniqueness bias (Monin & Norton, 2003) or driven by strategic considerations (Petroczi & Haugen, 2012). Recent research suggests that complete anonymity may even have an opposite effect owing to the reduced sense of accountability, leading respondents to complete the survey less attentively or accurately (Lelkes, Krosnick, Marx, Judd & Park, 2012). Survey satisficing is a well-known notion (Krosnick, 1999) whereby less motivated respondents tend to make shortcuts to the cognitive demands of giving the expected and assumed full consideration and attention to the task, and answer randomly or take the first option offered, opt for a noncommittal response and so on. Indirect prevalence estimation models have more complex instructions compared to direct questioning, thus they are particularly prone to response bias distortion arising from survey satisficing. This type of response bias cannot be directly linked to motivation to hide compromising facts because such distortion can easily arise from a lack of effort to carefully follow survey instructions, irrespective of the true status of the respondent regarding the sensitive attribute. As the key characteristics of the indirect estimation models (not being able to link answers to individuals) may even further reduce the sense of accountability than anonymity (and congruently increase the survey satisficing effect), it is timely to consider noncompliance as a behaviour independent of involvement.

Independence of the Model Questions

The key assumption of the SSC-MLE model is that the sensitive target question is independent of the added innocuous questions. It is only reasonable to assume that a person's behavioural choice is not associated with or can be explained by the distribution of an arbitrarily selected person's unrelated characteristics such as birthday date, phone number or house number. In the present study, it means that the fact whether the respondent uses recreational drugs or not has no relation to the information the birthday data of the person the respondents choose to 'use' when answer the SSC innocuous questions carries.

The model also assumes independence between the innocuous questions when uses binomial distribution. The consequence of having associations between the innocuous (here: birthday) questions is that the pi values in Eq 7 cannot be calculated from a binomial distribution but rather, the probability of having 0, 1, 2, 3 or 4 affirmative answers among the innocuous associated questions has to be calculated first taking the association(s) into account. However, as shown in Appendix E (Table E2), associations between any two pairs of birthday data do not affect the SSC model outcome significantly as the recalculated estimation taking associations into account shows < 0.7% difference in the best (H3) model.

Ordered Versus Unordered Response Sets in the SSC

When we derived the log-likelihood formula, we assumed that the observed response set is an ordered sequence, i.e., we know that the response of the first respondent was z1, the response of the second respondent was z2 and so on. When calculating the log-likelihood, we simply multiply the probabilities corresponding to the observed responses for each respondent (or sum the log-probabilities if we are calculating the log-likelihood). One could argue that this is not necessarily the case. It is very likely that we do not know the exact order of the responses. Owing to the method, we only know that there were x0 people who chose ‘0 or 5’, x1 people who chose ‘1’ and so on. Here we will show that although this assumption changes the exact value of the (log-)likelihood, it does not change the values of c and d where the maximum is attained.

For the sake of simplicity, let us assume that we have three responses in an imaginary survey: A, B or C. A is chosen with a probability of 0.4, B is chosen with a probability of 0.3 and C is chosen with a probability of 0.2. Let us also assume that there are three respondents, and suppose that in the end, it turned out that all the responses were A. We know immediately that the only possibility to observe three As is when respondent #1, #2 and #3 have all chosen A. Now suppose that we have observed two As and one B. This is a completely different situation as we have three possible outcomes:

1. Respondent #1 chose A and the other two chose B.

2. Respondent #2 chose A and the other two chose B.

3. Respondent #3 chose A and the other two chose B.

Therefore, the probability of seeing two A's and one B is equal to the probability of the A-B-B response sequence (in an ordered manner), multiplied by the number of possible outcomes where two respondents chose A and one chose B. In general, in our case with the SSC test, if the order of the responses is not known, the probability of seeing a particular outcome vector x = [ x0, x1, x2, x3, x4] among N respondents is equal to the probability from our original, ordered model (that was used in the section about the null hypothesis and below), multiplied by a quantity that describes how many possible ways there are to divide a group of N people into five groups of size x0, x1, x2, x3 and x4. The latter quantity is given by the so-called multinomial coefficient:

It can easily be seen that the above quantity depends solely on the observed outcome vector x, but not on the probabilities pi or qi that we have used in the model. Incorporating this term into the likelihood (or its logarithm into the log-likelihood) would therefore only multiply it with a positive constant and would not change where the maximum occurs.

Treatment of Random Measurement Error

Similar to other indirect estimation models, the SSC (and its MLE expansion) was primarily developed to obtain a truthful and factual answer about an autobiographical event that is assumed to be fully accessible in memory. In such applications, random measurement error in the target question is not expected as long as careful and precise wording is used that leaves no room for potential misinterpretation or indistinct answering. However, indirect estimation models such as the SSC-MLE can also be used to create safe survey situations to counterbalance socially desirable answering when measuring attitudes – for example, when asking about negative attitudes toward people with disability (Ostapczuk & Musch, 2011). In such cases, it is reasonable to assume that random measurement error will be present. The SSC-MLE model adequately addresses this through the binomial error model (Lord & Novick, 1968, chap. 23), using normal approximation. One potential limitation in practice is that the normal approximation for a binomial distribution is applicable if np > k*1 and n*(1-p) > k*1, and 0.021 < d < 0.979. In practical terms, the normal approximation is suitable when the population prevalence for the target question is expected to fall between 2 and 98 percent, which should cover most socially sensitive behaviours.

Confidence Intervals

Calculating confidence intervals (CIs) is not straightforward with the SSC model. Let us assume that the possible interplays between involvement and noncompliance are fully covered by the following four scenarios: d * (1-c); d * c; (1-d) * c and (1-d) * (1-c), where d denotes the sensitive behaviour and c denotes noncompliance. Then d * (1-c) gives those who admit drug use, d * c gives the proportion of those who use drugs and are noncompliant, (1-d) * c gives those who are noncompliant but do not use drugs and (1-d) * (1-c) gives the proportion of the compliant non-users.

However, both d and c have their own 95%CIs so the question is how to determine the CIs. To illustrate the problem, let us use the noncompliant drug users as an example. Assuming that d and c are independent and following the d * c estimation, we simply can multiply the two CIs:

with which will only give 90%CI for d * c (because 0.95 * 0.95 = 0.9025). Increasing the CIs independently to 97.4% would result in combined 95%CI (because 0.95 * 0.95 = 0.948). However, the interval where the p of any combination of d and c occurs with 95% probability cannot be simply derived from 95%CId and 95%CIc. This is because in addition to the 95%CIs, the uncertainties also multiply.

Further complications arise from the assumption that CIs are usually considered symmetric, but this assumption does not necessarily hold after multiplication. Alternatively, we can calculate the combined CIs by multiplying the lower and upper bounds separately, then calculating d * (1-c); d * c; (1-d) * c and (1-d) * (1-c) by taking the midpoint between CId lower and CId upper values for d and CIc lower and CIc upper values for c. CIs for total d and total c are calculated by separately by adding the lower CIs together and the upper CIs together for (d * (1-c)) + (d * c), whereas the total estimated proportion of noncompliance is (1-d) * c) + (1-d) * (1-c). From a practical point of view, the difference between the two approaches is relatively small in the present case, as illustrated in Table 4. In situations where the prevalence of the sensitive attribute is higher or the magnitude of noncompliance is larger, this may be more noticeable. Thus researchers are advised to cater for the asymmetric nature of the combined CIs and calculate these values as described above. Equations and a numerical worked example are given in Appendix D. This slightly more complicated approach is necessary because the CIs are supposed to be symmetric but in this case, the midpoint is not where probability distribution is the largest. This is because the probability distribution, on which the CIs are based, is not symmetric around the median.

In order to avoid total prevalence estimation exceeding 100%, Figures 2 and 3 are presented with asymmetric CIs. For the ease of understanding, Tables 2 and 3 contain prevalence estimates with symmetric CIs. All calculations are based on the method of multiplying CIs of d and c instead of d and c directly. The software output is generated using the same approach with CIs set to be symmetric around the calculated midpoints.

Efficiency and Sample Size

Although a randomised response model, with its built-in ‘noise’ (i.e., respondents do not have to reveal directly their position on the targeted sensitive behavior) can offer a buffer against social desirability, some models provide better protection than others. In general, estimation models are, on average, 50% less efficient than direct questioning (Lensvelt-Mulders et al., 2005b). This reduced efficiency translates to practice as the need for doubling the sample size for estimation models to achieve the same level of reliability as direct questioning. Efficiency increases with the increase of the prevalence of the sensitive attribute in the population; and is generally higher if the prevalence of innocuous questions is known (Lensvert-Mulders et al., 2005b). Direct comparison between the different classes of estimation models is not possible, the efficiency of the item count models are generally less than their random response counterparts (Ulrich et al., 2012), with the wider confidence intervals yielding less precise estimates (James et al., 2013; Petroczi et al., 2011). This is an inevitable trade-off for the increase in noise to offer protection, which in turn affords higher dfs and added flexibility in dealing with guilty and innocent noncompliance but has to be compensated with a larger sample size to achieve the same level of power as the other models (Ulrich et al., 2012). The recommended minimum sample size for the SSC model has been established as N > 300 (Petroczi et al., 2011). Whilst the power can be increased with an increase in the innocuous questions, it consequently results in a larger cognitive load on respondents, making it more difficult to keep the information ‘in their heads’ when answering the SSC question. This aspect is not a problem if the test is completed in privacy (e.g., respondents can use their fingers to keep track with the number of ‘yes’ answers as they process the SSC statements) but it could be cumbersome if completion is done in front of the data collector or other participants. The ideal number of innocuous questions for the SSC model has been extensively discussed during the validation of the SSC (Petroczi et al., 2011) and concluded that the 4+1 model appears to be a good compromise between power, protection and complexity. However, empirical testing of the alternative models with increasing number of innocuous questions with probability of 0.5 or less would benefit future refinement of the SSC model.

Confidence intervals can be reduced if the probability of getting an affirmative answer on the innocuous question is less than 0.5. For example, instead of setting the birthday question to 50/50, we can work with the probability of the birthday falling in the first 10 days of the month or the birthday falling into one of the last four months (September – December), where p = 1/3, not 1/2. In such a set-up, the baseline for the SSC model is calculated as (2 * 0.5) + (2 * 0.33), thus the baseline average affirmative answers is 1.6 instead of 2.0, which may help in cases where the probability of the sensitive target question is relatively low. Correspondingly, our stochastic simulations indicate that a mix of two innocuous questions with 1/2 and two with 1/3 probabilities for an affirmative answer reduces the confidence intervals by about 1% (7% vs. 6% on each side), assuming prevalence > 10%, N > 1000 and full compliance.

Although efficiency can be increased by careful manipulation of the model parameters, it is important to maintain an optimal balance between efficiency and protection as these two are inversely related (Chaudhuri & Mukerjee, 1987). Unfortunately, the options for setting such balance for the SSC from known probabilities are limited. The probability of ‘yes’ from the innocuous questions and chance of exposure is also inversely related. Thus, reduction in the probability of ‘yes’ answers among the innocuous questions incrementally increases the sense of exposure to the state of direct questioning. As an extreme example, if p = 0 holds for all innocuous questions then there is no difference between the SSC approach (which now provides no protection) and direct questions, but the SSC becomes much more complex for no gains. Therefore it is a balance between the gains from the reduction of confidence intervals versus the real or perceived risk of exposure. The additional computational demands that are required to deal with different probabilities also need to be taken into consideration.

Implication for Practice and Future Directions

In addition to making a non-trivial advance in the field of prevalence estimation models, the method presented in this paper provides the scientific community with a sophisticated prevalence estimation tool that can estimate and separate guilty and innocent noncompliance, and thus adjust prevalence estimates accordingly. Obtaining reliable information about socially sensitive, potentially embarrassing and/or transgressive behaviour such as habitual excessive drinking, illicit drug use, domestic violence, mistreatment in healthcare settings, tax evasions or fraudulent business activity is critically important for evaluating the need for intervention policies. Furthermore, assessing the effectiveness of such interventions within an outcome-based evaluation framework demands evidence that changes in the target behaviour have been made to the desired direction (i.e., reduced post-intervention prevalence rate). The improved SSC-MLE offers a sophisticated yet flexible, user-friendly and cost-effective tool for researchers and policy makers alike to assess and re-assess prevalence rates of relevant sensitive social issues. These issues may encompass autobiographical events or beliefs and opinions - as long as these can be expressed as a single statement with dichotomous response format.

Like other indirect estimation methods, the SSC also builds on the respondents’ full understanding of how the SSC creates a safe survey situation when soliciting truthful information on sensitive/transgressive or embarrassing issues. Thus, the SSC is a ‘survey trick’ can be equally useful in cross-sectional studies where most respondents are likely to be new to the method, as well as in longitudinal investigations with repeated administration. To maintain the level of protection in repeated administration of the SSC with the same participant sample, it is recommended that researchers change or negate at least half of the innocuous questions. The rapidly developing field of efficient estimation models would greatly benefit from empirical work to gain a better understanding, both experimentally and in ecological settings, of the nature of noncompliance. Finding empirical evidence for the fundamental question, whether noncompliance is dependent or independent of the targeted sensitive behaviour, will afford building estimation models based on accurate assumptions about noncompliance.

Further applications to a wide range of transgressive behaviour under various conditions will contribute to establishing the SSC’s performance properties with regards to effectiveness, malleability, reliability and incremental validity across the range of sensitive information. Generally, using indirect models for prevalence estimation is more valid when investigating sensitive issues, compared to seeking nonsensitive information which can be answered with direct questions. As indirect estimation models introduce random error for added protection, it would unnecessarily burden the straightforward questions.

The sensitivity of the model to detect and attribute the two types of noncompliance can be tested if a combination of lightly sensitive, sensitive and very sensitive questions are asked in the same setting (e.g., binge drinking/recreational drug use vs. cheating in exams/plagiarising vs. abuse/sexual assault among university students); or by contrasting groups where the target behaviour is known and expected to be present (e.g., students having a record of academic misconduct for the known group). Experimental work investigating the reasons behind noncompliance and how respondents manipulate their responses on the SSC survey if they have a discriminating behaviour to hide, as opposed to simply not following the instructions, will not only benefit the SSC but also advance the field of prevalence estimation models.

Conclusion

In this paper, we worked with the assumption that the sensitive attribute and noncompliance are statistically independent, thus noncompliance can theoretically occur with equal probability among guilty and innocent participants. The maximum-likelihood extension of the SSC estimates the probability of noncompliance and attributes a proportion to each type in the observed distribution of affirmative answers. Unlike many existing random response models, the SSC setup does not force respondents to answer in a way that can be misinterpreted or is against their true answers (i.e., the forced response model), nor does it require a two-step process (i.e., unrelated question) or dual sampling for estimating non-compliance. The shared ‘0 or 5’ response option meets the criteria of presenting an acceptable level of exposure, trust, feasibility, accessibility and resistance to manipulation for a valid instrument.

References

Boeije, H., & Lensvelt-Mulders, G. (2002). Honest by chance: a qualitative interview study to clarify respondents’ (non-)compliance with computer-assisted randomized response. Bulletin de Méthodologie Sociologique, 75, 24-39.

Boruch, R. F. (1971). Assuring confidentiality of responses in social research: A note on strategies. American Sociologist, 6, 308–311.

Böckenholt, U., & van der Heijden, P.G.M. (2007). Item randomized-response models for measuring noncompliance: risk return perceptions, social influences, and self-protective responses. Psychometrika, 72, 245-262.

Böckenholt, U., Barlas, S., & van der Heijden, P.G.M. (2009). Do randomized-response designs eliminate response biases? An empirical study of non-compliance behaviour. Journal of Applied Econometrics, 24, 377–392.

Chaudhuri, A. & Christofides, T. C. (2007). Item count technique in estimating the proportion of people with a sensitive feature. Journal of Statistical Planning and Inference, 137, 589–593.

Chaudhuri, A. & Mukherjee, R. (1987). Randomized response techniques: A review. Statistica Neerlandica, 41, 27-44.

Clark, S. J., & Desharnais, R. A. (1998). Honest answers to embarrassing questions: detecting cheating in the randomized response model. Psychological Methods, 3, 160-168.

Cruyff, M.J.L.F., Böckenholt, U., van den Hout, A., & van der Heijden, P.G.M. (2008). Accounting for self-protective responses in randomized response data from a social security survey using zero-inflated Poisson model. The Annals of Applied Statistics, 2, 316-331.

Cruyff, M.J.L.F., van den Hout, A., van der Heijden, P.G.M., & Böckenholt, U. (2007). Log-linear randomized-response models taking self-protective response behavior into account. Sociological Methods and Research, 3, 266-282.

Dalton, D. R., Wimbush, J. C., & Daily, C. M. (1994). Using the unmatched count technique (UCT) to estimate base rates for sensitive behavior. Personnel Psychology, 47, 817-829.

Droitcour, J. A., Caspar, R. A., Hubbard, M. L., Parsley, T. L., Visscher, W., & Ezzati, T. M. (1991). The item count technique as a method of indirect questioning: A review of its development and a case study application. In P. P. Biemer, R. M. Groves, L. E. Lyberg, N. Mathiowetz & S. Sudman (eds) Measurement Errors in Surveys, Wiley, New York.

De Schrijver, A. (2012). Sample survey on sensitive topics: investigating respondents’ understanding and trust in alternative versions of the Randomized response technique. Journal of Research Practice, 8, Available from: http://jrp.icaap.org/index.php/jrp/article/view/277. Accessed on 16/12/2012.

Epton, T., Norman, P., Sheeran, P., Harris, P.R., Webb, T.L., Ciravegna, F. Meier, P., Brennan, A., Julious, A.S., Naughton, D., Petroczi, A., Dadzie, A-S, & Kruger J. (2013). A theory-based online health behaviour intervention for new university students: Study protocol. BMC Public Health, 1, 107.

Hansen, N., Müller, S. D., & Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary Computation, 11, 1-18.

Himmelfarb, S. (2008). The multi-item randomized response technique. Sociological Methods & Research, 36, 495-514.

Hussain, Z., Shah, E. A., & Shabbir, J. (2012). An alternative item count technique in sensitive surveys. Revista Colombiana de Estadística, 35, 39-54.

Imai, K. (2011). Multivariate regression analysis for the item count technique. Journal of the American Statistical Association, 106, 407-416.

James, R.A., Nepusz, T., Naughton, D.P., Petroczi, A. (2013). A potential inflating effect in estimation models: cautionary evidence from comparing performance enhancing drug and herbal hormonal supplement use estimates. Psychology of Sport & Exercise, 14, 84-96.

Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537-567.

Landsheer, J.A., van der Heijden, P., & van Gils, G. (1999). Trust and understanding, two psychological aspects of randomized response. Quality & Quantity, 33, 1-12.

Lee, Y-O., & Lee, R.M. (2012). Methodological research on “sensitive” topics: a decade review. Bulletin of Sociological Methodology/ Bulletin de Méthodologie Sociologique, 114, 35-49.

Lelkes, Y., Krosnick, J.A., Marx, D.M., Judd, C.M., & Park, B. (2012). Complete anonymity compromises the accuracy of self-reports. Journal of Experimental Social Psychology, 48, 1291–1299.

Lensvelt-Mulders, G.J.L.M., & Boeije, H.R. (2007). Evaluating compliance with a computer assisted randomized response technique: a qualitative study into the origins of lying and cheating. Computers in Human Behaviour, 23, 591-608.

Lensvelt-Mulders, G.J.L.M., Hox, J.J., & van der Heijden, P.G.M. (2005a). Meta-analysis of randomized response research. Thirty-five years of validation. Sociological Methods and Research, 33, 319-347.

Lensvelt-Mulders, G.J.L.M., Hox, J.J., van der Heijden P.G.M., & Maas, C.J.M. (2005b). How to improve efficiency of randomised response designs. Quality & Quantity, 39, 253-265.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

Tan, M. T., Tian, G. L., & Tang, M. L. (2009). Sample surveys with sensitive questions: a nonrandomized response approach. The American Statistician, 63, 9-16.

Mangat, N. (1994). An im


Recommended