+ All Categories
Home > Documents > Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar...

Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar...

Date post: 15-Mar-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
42
Adaptive Correspondence Experiments Hadar Avivi, 1 Patrick Kline, 1 Evan Rose 2 and Christopher Walters 1 1 UC Berkeley 2 Microsoft Research January 5, 2021
Transcript
Page 1: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Adaptive Correspondence Experiments

Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1

1UC Berkeley

2Microsoft Research

January 5, 2021

Page 2: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

MotivationI There is a growing interest in adopting algorithmic predictions to

advise decision making

I This talk - detection of discriminatory jobs

I Potential tool for regulators such as the Equal EmploymentOpportunity Commission (EEOC) which are charged withpreventing and remedying discrimination by individual employers

I Kline and Walters (forthcoming) show that correspondenceexperiments sending multiple applications to each job can be usedto detect discrimination by individual employers

I Correspondence experiments can be seen as ensembles ofmini-experiments

I Using these ensembles, we can learn the distribution ofdiscrimination across jobs, and use Empirical Bayes (EB) methodsto predict the probability a job is discriminating

I Only few apps are required because discriminatory behavior ishighly variable across jobs

1 / 20

Page 3: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

MotivationI There is a growing interest in adopting algorithmic predictions to

advise decision making

I This talk - detection of discriminatory jobs

I Potential tool for regulators such as the Equal EmploymentOpportunity Commission (EEOC) which are charged withpreventing and remedying discrimination by individual employers

I Kline and Walters (forthcoming) show that correspondenceexperiments sending multiple applications to each job can be usedto detect discrimination by individual employers

I Correspondence experiments can be seen as ensembles ofmini-experiments

I Using these ensembles, we can learn the distribution ofdiscrimination across jobs, and use Empirical Bayes (EB) methodsto predict the probability a job is discriminating

I Only few apps are required because discriminatory behavior ishighly variable across jobs

1 / 20

Page 4: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Motivation - contI Obstacle: these experiments are costly

I Typically send a fixed number of apps per job

I More apps increase likelihood of detection

I Some jobs have a very low callback probability

I Potential solution: adaptive correspondence experimentsI Similar to dynamic treatment regime to patients in the medical

sciences Chakraborty and Murphy (2014)I Inspired by research in econometrics that update estimators,

decision rules, and experimental designs in response to realized dataKasy and Sautmann (forthcoming); Tabord-Meehan (2020)

I Adaptive methods can be useful in other domains wherediscrimination is a concern, such as healthcare (Alsan et al., 2019;

Obermeyer et al., 2019) and criminal justice (Arnold et al., 2020; Rose,

forthcoming)

2 / 20

Page 5: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Motivation - contI Obstacle: these experiments are costly

I Typically send a fixed number of apps per job

I More apps increase likelihood of detection

I Some jobs have a very low callback probability

I Potential solution: adaptive correspondence experimentsI Similar to dynamic treatment regime to patients in the medical

sciences Chakraborty and Murphy (2014)I Inspired by research in econometrics that update estimators,

decision rules, and experimental designs in response to realized dataKasy and Sautmann (forthcoming); Tabord-Meehan (2020)

I Adaptive methods can be useful in other domains wherediscrimination is a concern, such as healthcare (Alsan et al., 2019;

Obermeyer et al., 2019) and criminal justice (Arnold et al., 2020; Rose,

forthcoming)

2 / 20

Page 6: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Motivation - contI Obstacle: these experiments are costly

I Typically send a fixed number of apps per job

I More apps increase likelihood of detection

I Some jobs have a very low callback probability

I Potential solution: adaptive correspondence experimentsI Similar to dynamic treatment regime to patients in the medical

sciences Chakraborty and Murphy (2014)I Inspired by research in econometrics that update estimators,

decision rules, and experimental designs in response to realized dataKasy and Sautmann (forthcoming); Tabord-Meehan (2020)

I Adaptive methods can be useful in other domains wherediscrimination is a concern, such as healthcare (Alsan et al., 2019;

Obermeyer et al., 2019) and criminal justice (Arnold et al., 2020; Rose,

forthcoming)

2 / 20

Page 7: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

This paper

I Consider a hypothetical regulator seeking to detect discriminatoryjobs (e.g. the EEOC who is charge of enforcing anti-discriminationlaw)

I The auditor draws new vacancies from a known distribution andsends fictitious applications in attempt to infer the job’s type

I Unlike a static audit experiment, at each step the auditor candecide whether to keep sending applications, initiate aninvestigation, or give up

I Key result: # of apps are cut by more than half without reducingaccuracy of detectionI Giving up early on jobs with very low callback rates, or those that

call black applicantsI Choosing application characteristics optimally

3 / 20

Page 8: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

This paper

I Consider a hypothetical regulator seeking to detect discriminatoryjobs (e.g. the EEOC who is charge of enforcing anti-discriminationlaw)

I The auditor draws new vacancies from a known distribution andsends fictitious applications in attempt to infer the job’s type

I Unlike a static audit experiment, at each step the auditor candecide whether to keep sending applications, initiate aninvestigation, or give up

I Key result: # of apps are cut by more than half without reducingaccuracy of detectionI Giving up early on jobs with very low callback rates, or those that

call black applicantsI Choosing application characteristics optimally

3 / 20

Page 9: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Model

4 / 20

Page 10: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

A model for callbacksFollowing Kline and Walters (forthcoming):

I Callbacks are modeled as iid Bernoulli trials

I Callback probability of job j to applications of race r ∈ {b,w}with characteristics x :

pjr (x) = Λ(αj − βj1{r = b}+ x ′γ),

where Λ(z) ≡ [1 + exp(−z)]−1.

I (αj , βj) are random coefficients: βj = max{0, β̃j}, with(αj

β̃j

)iid∼ N

(α0

β0,

[σ2α ρρ σ2β

])

I Model allows for continuous heterogeneity in callback rates anddiscrimination severity, and a mass point at βj = 0

5 / 20

Page 11: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Fitting the model - Nunley et al. (2015) data

I We estimate the model using data from Nunley et al. (2015)’s(NPRS) audit experiment

I The NPRS experiment submitted fictitious applications withracially distinctive names to 2,305 entry-level jobs for collegegraduates in the US

I 4 applications per job, typically 2 white and 2 black

I View this as a pilot study, e.g. commissioned by the EEOC

6 / 20

Page 12: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Maximum Simulated Likelihood estimates

7 / 20

Page 13: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

No correlation between white CB and discrimination severity

7 / 20

Page 14: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Most jobs don’t call anyone

Pr(pjw < 0.01) ≈ 0.537 / 20

Page 15: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Severe discrimination among a minority of jobs

Pr(βj = 0) ≈ 0.79, E [βj |βj > 0] ≈ 3.67 / 20

Page 16: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

The auditor’s problem

I Consider an auditor that knows the parameters of the model

I The auditor’s goal is to find discriminators by sending additionalfictitious apps

I Can send up to 8 apps per job

I Simplify to two quality levels q ∈ {h, l}, corresponding to x ′γ oneSD above and below its mean

I At every step, based on the observed callbacks, the auditor candecide to send another application, initiate an investigation, orgive up

8 / 20

Page 17: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

The auditor’s problem

I Hn is the auditing history after sending n apps. Includes counts ofapps and callbacks by race and quality

I For example: H4 =

{sent: (Wl ,Bl ,Wh,Bh) = (1, 0, 2, 1)

CB: (Wl ,Bl ,Wh,Bh) = (0, 0, 2, 0)

9 / 20

Page 18: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

The auditor’s payoff

I Once an investigation is initiated, the job’s true type is revealed,yielding payoff:

1

2

∑q∈{h,l}

[pjw (q)− pjb(q)]

︸ ︷︷ ︸≡Sj

−κ,

where Sj is the severity of discrimination, κ is the cost of investigation,and q ∈ {h, l} indexes quality

I The auditor cares about the expected number of black callbacks lostrelative to white applicants

10 / 20

Page 19: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

The auditor’s value function

V(Hn)=

max

maxr ,q

vrq (Hn)︸ ︷︷ ︸send new app

, vI (Hn)︸ ︷︷ ︸investigate

, 0

if n < 8,

max

vI (Hn)︸ ︷︷ ︸investigate

, 0

if n = 8.

I Value of sending new app: vrq(Hn) = −c + E[V (Hn+1)|Hn]

I Value of investigation: vI (Hn) = E[Sj

∣∣∣∣Hn

]− κ

I Expectations are evaluated via Bayes’ rule starting with the populationdistribution as prior

11 / 20

Page 20: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Simulation Results

12 / 20

Page 21: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending oneapplication (κ = .13, c = 10−4) more

Sent: (0,0,1,0)CB: (0,0,1,0)

Sent: (0,0,1,0)CB: (0,0,0,0)

Histories

0.12

0.10

0.08

0.06

0.04

0.02

0.00

0.02

0.04Va

lue

Value give-upValue investigateValue send LQ whiteValue send LQ blackValue send HQ whiteValue send HQ black

13 / 20

Page 22: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending threeapplications (κ = .13, c = 10−4) more

∼ 72% of jobs w/ history (0, 0, 3, 0) and no CBs. If # of jobs = 100, then theauditor saves 0.72× 5× 100 = 360 apps on average

14 / 20

Page 23: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending threeapplications (κ = .13, c = 10−4) more

∼ 72% of jobs w/ history (0, 0, 3, 0) and no CBs. If # of jobs = 100, then theauditor saves 0.72× 5× 100 = 360 apps on average

14 / 20

Page 24: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending fourapplications (κ = .13, c = 10−4) more

∼ 12% of jobs w/ the two last histories. If # of jobs = 100, then the auditor saves0.12× 4× 100 = 48 apps on average

15 / 20

Page 25: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Apps sent vs. sensitivity Investigation probability fixed ∈ [.055, 0.06]

16 / 20

Page 26: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Apps sent vs. specificity sensitivity fixed ∈ [.14, .145]

17 / 20

Page 27: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Adaptive auditing catches the worst discriminators

κ = .13, c = 10−4

18 / 20

Page 28: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

DiscussionI Adaptive correspondence experiments have the potential to detect

discrimination more efficiently than static experimentsI Substantial reduction in the number of apps sentI Achieve the same levels of sensitivity and specificity

I These methods can contribute to other settings (e.g criminaljustice, healthcare, policing and education) to detectdiscrimination efficiently

I Potential drawbacks:I Requires full knowledge of the distribution of callbacks (pilot study)I Assumes stable callback parametersI Dynamic programming is computationally expensive, especially as

the dimension of the action space grows

I Potential extensions based on reinforcement learning e.g, Kasy and

Sautmann (forthcoming)

19 / 20

Page 29: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

DiscussionI Adaptive correspondence experiments have the potential to detect

discrimination more efficiently than static experimentsI Substantial reduction in the number of apps sentI Achieve the same levels of sensitivity and specificity

I These methods can contribute to other settings (e.g criminaljustice, healthcare, policing and education) to detectdiscrimination efficiently

I Potential drawbacks:I Requires full knowledge of the distribution of callbacks (pilot study)I Assumes stable callback parametersI Dynamic programming is computationally expensive, especially as

the dimension of the action space grows

I Potential extensions based on reinforcement learning e.g, Kasy and

Sautmann (forthcoming)

19 / 20

Page 30: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

DiscussionI Adaptive correspondence experiments have the potential to detect

discrimination more efficiently than static experimentsI Substantial reduction in the number of apps sentI Achieve the same levels of sensitivity and specificity

I These methods can contribute to other settings (e.g criminaljustice, healthcare, policing and education) to detectdiscrimination efficiently

I Potential drawbacks:I Requires full knowledge of the distribution of callbacks (pilot study)I Assumes stable callback parametersI Dynamic programming is computationally expensive, especially as

the dimension of the action space grows

I Potential extensions based on reinforcement learning e.g, Kasy and

Sautmann (forthcoming)

19 / 20

Page 31: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

DiscussionI Adaptive correspondence experiments have the potential to detect

discrimination more efficiently than static experimentsI Substantial reduction in the number of apps sentI Achieve the same levels of sensitivity and specificity

I These methods can contribute to other settings (e.g criminaljustice, healthcare, policing and education) to detectdiscrimination efficiently

I Potential drawbacks:I Requires full knowledge of the distribution of callbacks (pilot study)I Assumes stable callback parametersI Dynamic programming is computationally expensive, especially as

the dimension of the action space grows

I Potential extensions based on reinforcement learning e.g, Kasy and

Sautmann (forthcoming)

19 / 20

Page 32: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Thank You!

20 / 20

Page 33: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending one app(κ = .13, c = 10−4)

back1 / 10

Page 34: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending two apps(κ = .13, c = 10−4)

back2 / 10

Page 35: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending three apps(κ = .13, c = 10−4)

back3 / 10

Page 36: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending four apps(κ = .13, c = 10−4)

back4 / 10

Page 37: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending five apps(κ = .13, c = 10−4)

back5 / 10

Page 38: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending six apps(κ = .13, c = 10−4)

back6 / 10

Page 39: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value and optimal strategy after sending seven apps(κ = .13, c = 10−4)

back7 / 10

Page 40: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

Expected value after sending eight apps (κ = .13, c = 10−4)

back

8 / 10

Page 41: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

References I

Alsan, M., Garrick, O., and Graziani, G. (2019). Does diversity matter for health?experimental evidence from oakland. American Economic Review,109(12):4071–4111.

Arnold, D., Dobbie, W. S., and Hull, P. (2020). Measuring racial discrimination inbail decisions. Technical report, National Bureau of Economic Research.

Chakraborty, B. and Murphy, S. A. (2014). Dynamic treatment regimes. Annualreview of statistics and its application, 1:447–464.

Kasy, M. and Sautmann, A. (forthcoming). Adaptive treatment assignment inexperiments for policy choice. Econometrica.

Kline, P. M. and Walters, C. R. (forthcoming). Reasonable doubt: Experimentaldetection of job-level employment discrimination. Econometrica.

Nunley, J. M., Pugh, A., Romero, N., and Seals, R. A. (2015). Racial discriminationin the labor market for recent college graduates: Evidence from a fieldexperiment. The BE Journal of Economic Analysis & Policy, 15(3):1093–1125.

Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S. (2019). Dissectingracial bias in an algorithm used to manage the health of populations. Science,366(6464):447–453.

9 / 10

Page 42: Adaptive Correspondence Experiments · 2021. 2. 10. · Adaptive Correspondence Experiments Hadar Avivi,1 Patrick Kline,1 Evan Rose2 and Christopher Walters1 1UC Berkeley 2Microsoft

References II

Rose, E. K. (forthcoming). Who gets a second chance? effectiveness and equity insupervision of criminal offenders. Technical report.

Tabord-Meehan, M. (2020). Stratification trees for adaptive randomization inrandomized controlled trials.

10 / 10


Recommended