Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar...

Post on 15-Mar-2021

4 views 0 download

transcript

Adaptive Correspondence Experiments

Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1

1UC Berkeley

2Microsoft Research

January 5, 2021

MotivationI There is a growing interest in adopting algorithmic predictions to

advise decision making

I This talk - detection of discriminatory jobs

I Potential tool for regulators such as the Equal EmploymentOpportunity Commission (EEOC) which are charged withpreventing and remedying discrimination by individual employers

I Kline and Walters (forthcoming) show that correspondenceexperiments sending multiple applications to each job can be usedto detect discrimination by individual employers

I Correspondence experiments can be seen as ensembles ofmini-experiments

I Using these ensembles, we can learn the distribution ofdiscrimination across jobs, and use Empirical Bayes (EB) methodsto predict the probability a job is discriminating

I Only few apps are required because discriminatory behavior ishighly variable across jobs

1 / 20

MotivationI There is a growing interest in adopting algorithmic predictions to

advise decision making

I This talk - detection of discriminatory jobs

I Potential tool for regulators such as the Equal EmploymentOpportunity Commission (EEOC) which are charged withpreventing and remedying discrimination by individual employers

I Kline and Walters (forthcoming) show that correspondenceexperiments sending multiple applications to each job can be usedto detect discrimination by individual employers

I Correspondence experiments can be seen as ensembles ofmini-experiments

I Using these ensembles, we can learn the distribution ofdiscrimination across jobs, and use Empirical Bayes (EB) methodsto predict the probability a job is discriminating

I Only few apps are required because discriminatory behavior ishighly variable across jobs

1 / 20

Motivation - contI Obstacle: these experiments are costly

I Typically send a fixed number of apps per job

I More apps increase likelihood of detection

I Some jobs have a very low callback probability

I Potential solution: adaptive correspondence experimentsI Similar to dynamic treatment regime to patients in the medical

sciences Chakraborty and Murphy (2014)I Inspired by research in econometrics that update estimators,

decision rules, and experimental designs in response to realized dataKasy and Sautmann (forthcoming); Tabord-Meehan (2020)

I Adaptive methods can be useful in other domains wherediscrimination is a concern, such as healthcare (Alsan et al., 2019;

Obermeyer et al., 2019) and criminal justice (Arnold et al., 2020; Rose,

forthcoming)

2 / 20

Motivation - contI Obstacle: these experiments are costly

I Typically send a fixed number of apps per job

I More apps increase likelihood of detection

I Some jobs have a very low callback probability

I Potential solution: adaptive correspondence experimentsI Similar to dynamic treatment regime to patients in the medical

sciences Chakraborty and Murphy (2014)I Inspired by research in econometrics that update estimators,

decision rules, and experimental designs in response to realized dataKasy and Sautmann (forthcoming); Tabord-Meehan (2020)

I Adaptive methods can be useful in other domains wherediscrimination is a concern, such as healthcare (Alsan et al., 2019;

Obermeyer et al., 2019) and criminal justice (Arnold et al., 2020; Rose,

forthcoming)

2 / 20

Motivation - contI Obstacle: these experiments are costly

I Typically send a fixed number of apps per job

I More apps increase likelihood of detection

I Some jobs have a very low callback probability

I Potential solution: adaptive correspondence experimentsI Similar to dynamic treatment regime to patients in the medical

sciences Chakraborty and Murphy (2014)I Inspired by research in econometrics that update estimators,

decision rules, and experimental designs in response to realized dataKasy and Sautmann (forthcoming); Tabord-Meehan (2020)

I Adaptive methods can be useful in other domains wherediscrimination is a concern, such as healthcare (Alsan et al., 2019;

Obermeyer et al., 2019) and criminal justice (Arnold et al., 2020; Rose,

forthcoming)

2 / 20

This paper

I Consider a hypothetical regulator seeking to detect discriminatoryjobs (e.g. the EEOC who is charge of enforcing anti-discriminationlaw)

I The auditor draws new vacancies from a known distribution andsends fictitious applications in attempt to infer the job’s type

I Unlike a static audit experiment, at each step the auditor candecide whether to keep sending applications, initiate aninvestigation, or give up

I Key result: # of apps are cut by more than half without reducingaccuracy of detectionI Giving up early on jobs with very low callback rates, or those that

call black applicantsI Choosing application characteristics optimally

3 / 20

This paper

I Consider a hypothetical regulator seeking to detect discriminatoryjobs (e.g. the EEOC who is charge of enforcing anti-discriminationlaw)

I The auditor draws new vacancies from a known distribution andsends fictitious applications in attempt to infer the job’s type

I Unlike a static audit experiment, at each step the auditor candecide whether to keep sending applications, initiate aninvestigation, or give up

I Key result: # of apps are cut by more than half without reducingaccuracy of detectionI Giving up early on jobs with very low callback rates, or those that

call black applicantsI Choosing application characteristics optimally

3 / 20

Model

4 / 20

A model for callbacksFollowing Kline and Walters (forthcoming):

I Callbacks are modeled as iid Bernoulli trials

I Callback probability of job j to applications of race r ∈ {b,w}with characteristics x :

pjr (x) = Λ(αj − βj1{r = b}+ x ′γ),

where Λ(z) ≡ [1 + exp(−z)]−1.

I (αj , βj) are random coefficients: βj = max{0, β̃j}, with(αj

β̃j

)iid∼ N

(α0

β0,

[σ2α ρρ σ2β

])

I Model allows for continuous heterogeneity in callback rates anddiscrimination severity, and a mass point at βj = 0

5 / 20

Fitting the model - Nunley et al. (2015) data

I We estimate the model using data from Nunley et al. (2015)’s(NPRS) audit experiment

I The NPRS experiment submitted fictitious applications withracially distinctive names to 2,305 entry-level jobs for collegegraduates in the US

I 4 applications per job, typically 2 white and 2 black

I View this as a pilot study, e.g. commissioned by the EEOC

6 / 20

Maximum Simulated Likelihood estimates

7 / 20

No correlation between white CB and discrimination severity

7 / 20

Most jobs don’t call anyone

Pr(pjw < 0.01) ≈ 0.537 / 20

Severe discrimination among a minority of jobs

Pr(βj = 0) ≈ 0.79, E [βj |βj > 0] ≈ 3.67 / 20

The auditor’s problem

I Consider an auditor that knows the parameters of the model

I The auditor’s goal is to find discriminators by sending additionalfictitious apps

I Can send up to 8 apps per job

I Simplify to two quality levels q ∈ {h, l}, corresponding to x ′γ oneSD above and below its mean

I At every step, based on the observed callbacks, the auditor candecide to send another application, initiate an investigation, orgive up

8 / 20

The auditor’s problem

I Hn is the auditing history after sending n apps. Includes counts ofapps and callbacks by race and quality

I For example: H4 =

{sent: (Wl ,Bl ,Wh,Bh) = (1, 0, 2, 1)

CB: (Wl ,Bl ,Wh,Bh) = (0, 0, 2, 0)

9 / 20

The auditor’s payoff

I Once an investigation is initiated, the job’s true type is revealed,yielding payoff:

1

2

∑q∈{h,l}

[pjw (q)− pjb(q)]

︸ ︷︷ ︸≡Sj

−κ,

where Sj is the severity of discrimination, κ is the cost of investigation,and q ∈ {h, l} indexes quality

I The auditor cares about the expected number of black callbacks lostrelative to white applicants

10 / 20

The auditor’s value function

V(Hn)=

max

maxr ,q

vrq (Hn)︸ ︷︷ ︸send new app

, vI (Hn)︸ ︷︷ ︸investigate

, 0

if n < 8,

max

vI (Hn)︸ ︷︷ ︸investigate

, 0

if n = 8.

I Value of sending new app: vrq(Hn) = −c + E[V (Hn+1)|Hn]

I Value of investigation: vI (Hn) = E[Sj

∣∣∣∣Hn

]− κ

I Expectations are evaluated via Bayes’ rule starting with the populationdistribution as prior

11 / 20

Simulation Results

12 / 20

Expected value and optimal strategy after sending oneapplication (κ = .13, c = 10−4) more

Sent: (0,0,1,0)CB: (0,0,1,0)

Sent: (0,0,1,0)CB: (0,0,0,0)

Histories

0.12

0.10

0.08

0.06

0.04

0.02

0.00

0.02

0.04Va

lue

Value give-upValue investigateValue send LQ whiteValue send LQ blackValue send HQ whiteValue send HQ black

13 / 20

Expected value and optimal strategy after sending threeapplications (κ = .13, c = 10−4) more

∼ 72% of jobs w/ history (0, 0, 3, 0) and no CBs. If # of jobs = 100, then theauditor saves 0.72× 5× 100 = 360 apps on average

14 / 20

Expected value and optimal strategy after sending threeapplications (κ = .13, c = 10−4) more

∼ 72% of jobs w/ history (0, 0, 3, 0) and no CBs. If # of jobs = 100, then theauditor saves 0.72× 5× 100 = 360 apps on average

14 / 20

Expected value and optimal strategy after sending fourapplications (κ = .13, c = 10−4) more

∼ 12% of jobs w/ the two last histories. If # of jobs = 100, then the auditor saves0.12× 4× 100 = 48 apps on average

15 / 20

Apps sent vs. sensitivity Investigation probability fixed ∈ [.055, 0.06]

16 / 20

Apps sent vs. specificity sensitivity fixed ∈ [.14, .145]

17 / 20

Adaptive auditing catches the worst discriminators

κ = .13, c = 10−4

18 / 20

DiscussionI Adaptive correspondence experiments have the potential to detect

discrimination more efficiently than static experimentsI Substantial reduction in the number of apps sentI Achieve the same levels of sensitivity and specificity

I These methods can contribute to other settings (e.g criminaljustice, healthcare, policing and education) to detectdiscrimination efficiently

I Potential drawbacks:I Requires full knowledge of the distribution of callbacks (pilot study)I Assumes stable callback parametersI Dynamic programming is computationally expensive, especially as

the dimension of the action space grows

I Potential extensions based on reinforcement learning e.g, Kasy and

Sautmann (forthcoming)

19 / 20

DiscussionI Adaptive correspondence experiments have the potential to detect

discrimination more efficiently than static experimentsI Substantial reduction in the number of apps sentI Achieve the same levels of sensitivity and specificity

I These methods can contribute to other settings (e.g criminaljustice, healthcare, policing and education) to detectdiscrimination efficiently

I Potential drawbacks:I Requires full knowledge of the distribution of callbacks (pilot study)I Assumes stable callback parametersI Dynamic programming is computationally expensive, especially as

the dimension of the action space grows

I Potential extensions based on reinforcement learning e.g, Kasy and

Sautmann (forthcoming)

19 / 20

DiscussionI Adaptive correspondence experiments have the potential to detect

discrimination more efficiently than static experimentsI Substantial reduction in the number of apps sentI Achieve the same levels of sensitivity and specificity

I These methods can contribute to other settings (e.g criminaljustice, healthcare, policing and education) to detectdiscrimination efficiently

I Potential drawbacks:I Requires full knowledge of the distribution of callbacks (pilot study)I Assumes stable callback parametersI Dynamic programming is computationally expensive, especially as

the dimension of the action space grows

I Potential extensions based on reinforcement learning e.g, Kasy and

Sautmann (forthcoming)

19 / 20

DiscussionI Adaptive correspondence experiments have the potential to detect

discrimination more efficiently than static experimentsI Substantial reduction in the number of apps sentI Achieve the same levels of sensitivity and specificity

I These methods can contribute to other settings (e.g criminaljustice, healthcare, policing and education) to detectdiscrimination efficiently

I Potential drawbacks:I Requires full knowledge of the distribution of callbacks (pilot study)I Assumes stable callback parametersI Dynamic programming is computationally expensive, especially as

the dimension of the action space grows

I Potential extensions based on reinforcement learning e.g, Kasy and

Sautmann (forthcoming)

19 / 20

Thank You!

20 / 20

Expected value and optimal strategy after sending one app(κ = .13, c = 10−4)

back1 / 10

Expected value and optimal strategy after sending two apps(κ = .13, c = 10−4)

back2 / 10

Expected value and optimal strategy after sending three apps(κ = .13, c = 10−4)

back3 / 10

Expected value and optimal strategy after sending four apps(κ = .13, c = 10−4)

back4 / 10

Expected value and optimal strategy after sending five apps(κ = .13, c = 10−4)

back5 / 10

Expected value and optimal strategy after sending six apps(κ = .13, c = 10−4)

back6 / 10

Expected value and optimal strategy after sending seven apps(κ = .13, c = 10−4)

back7 / 10

Expected value after sending eight apps (κ = .13, c = 10−4)

back

8 / 10

References I

Alsan, M., Garrick, O., and Graziani, G. (2019). Does diversity matter for health?experimental evidence from oakland. American Economic Review,109(12):4071–4111.

Arnold, D., Dobbie, W. S., and Hull, P. (2020). Measuring racial discrimination inbail decisions. Technical report, National Bureau of Economic Research.

Chakraborty, B. and Murphy, S. A. (2014). Dynamic treatment regimes. Annualreview of statistics and its application, 1:447–464.

Kasy, M. and Sautmann, A. (forthcoming). Adaptive treatment assignment inexperiments for policy choice. Econometrica.

Kline, P. M. and Walters, C. R. (forthcoming). Reasonable doubt: Experimentaldetection of job-level employment discrimination. Econometrica.

Nunley, J. M., Pugh, A., Romero, N., and Seals, R. A. (2015). Racial discriminationin the labor market for recent college graduates: Evidence from a fieldexperiment. The BE Journal of Economic Analysis & Policy, 15(3):1093–1125.

Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). Dissectingracial bias in an algorithm used to manage the health of populations. Science,366(6464):447–453.

9 / 10

References II

Rose, E. K. (forthcoming). Who gets a second chance? effectiveness and equity insupervision of criminal offenders. Technical report.

Tabord-Meehan, M. (2020). Stratification trees for adaptive randomization inrandomized controlled trials.

10 / 10