+ All Categories
Home > Data & Analytics > Auditing search engines for differential satisfaction across demographics

Auditing search engines for differential satisfaction across demographics

Date post: 15-Apr-2017
Category:
Upload: amit-sharma
View: 118 times
Download: 0 times
Share this document with a friend
24
Auditing Search Engines for Differential Satisfaction across Demographics Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine Yilmaz Microsoft Research New York
Transcript
Page 1: Auditing search engines for differential satisfaction across demographics

Auditing Search Engines for Differential Satisfaction across

DemographicsRishabh Mehrotra, Ashton Anderson, Fernando Diaz,

Amit Sharma, Hanna Wallach, Emine Yilmaz

Microsoft Research New York

Page 2: Auditing search engines for differential satisfaction across demographics

From public libraries to search engines

Page 3: Auditing search engines for differential satisfaction across demographics

Motivation for auditing

• Ethical• Equal access to everyone

• Practical• Equal access helps attract a large and

diverse population of users• Service providers are scrutinized for

seemingly unfair behavior [1,2,3]

• We offer methods for auditing a system’s performance for detection of differences in user satisfaction across demographics

[1] N. Diakopoulos. Algorithmic accountability. Digital Journalism, 3(3):398–415, 2015[2] S. Barocas and A. D. Selbst. Big data’s disparate impact. California Law Review, 104, 2016.[3] C. Munoz, M. Smith, and D. Patel. Big data: A report on algorithmic systems, opportunity, and civil rights. Technical report, Executive Office of the President of the United States, May 2016.

Page 4: Auditing search engines for differential satisfaction across demographics

Tricky: straightforward optimization can lead to differential performance

• Search engine uses a standard metric: time spent on clicked result page as an indicator of satisfaction.

• Goal: estimate difference in user satisfaction between these two demographic groups.

• Suppose older users issue more of “retirement planning” queries

Age: >50 years

80% users 10% users

Age: <30 years

Page 5: Auditing search engines for differential satisfaction across demographics

1. Overall metrics can hide differential satisfaction• Average user satisfaction for “retirement planning”

may be high.

But, • Average satisfaction for younger users=0.7• Average satisfaction for older users=0.2

Page 6: Auditing search engines for differential satisfaction across demographics

2. Query-level metrics can hide differential satisfaction

<query><query><query><query><query><query>

retirement planning<query><query>

retirement planningretirement planning

<query>retirement planning

Same user satisfaction for “retirement planning” for both older and younger users = 0.7

What if average satisfaction for <query>=0.9?

Older users still receiving more of lower-quality results than younger users.

Younger users

Older users

Page 7: Auditing search engines for differential satisfaction across demographics

3. More critically, even individual-level metrics can also hide differential satisfaction

Reading time for the same webpage result for the same user satisfaction

Time spent on a webpage

Younger Users

Older Users

Page 8: Auditing search engines for differential satisfaction across demographics

We must control for natural demographic variation to meaningfully audit for differential satisfaction.

Page 9: Auditing search engines for differential satisfaction across demographics

Data: Demographic characteristics of search engine users

• Internal logs from Bing.com for two weeks

• 4 M users | 32 M impressions | 17 M sessions

• Demographics: Age & Gender

• Age:• post-Millenial: <18• Millenial: 18-34• Generation X: 35-54• Baby Boomer: 55 - 74

Page 10: Auditing search engines for differential satisfaction across demographics

Demographic distribution of user activity

Age Groups

Page 11: Auditing search engines for differential satisfaction across demographics

Overall metrics across Demographics

Four metrics:Graded Utility (GU) Reformulation Rate (RR)Successful Click Count (SCC) Page Click Count (PCC)

Page 12: Auditing search engines for differential satisfaction across demographics

Pitfalls with Overall Metrics

• Conflate two separate effects:• natural demographic variation caused by the differing

traits among the different demographic groups e.g. • Different queries issued• Different information need for the same query• Even for the same satisfaction, demographic A tends to click

more than demographic B

• Systemic difference in user satisfaction due to the search engine

Page 13: Auditing search engines for differential satisfaction across demographics

Utilize work from causal inference

Information Need

Demographics

MetricUser satisfactionQuery Search

Results

Page 14: Auditing search engines for differential satisfaction across demographics

I. Context Matching: selecting for activity with near-identical context

Information Need

Demographics

MetricUser satisfactionQuery Search

Results

Context

Page 15: Auditing search engines for differential satisfaction across demographics

Information Need

Demographics

MetricUser satisfactionQuery Search

Results

Context

For any two users from different demographics,1. Same Query2. Same Information Need:

1. Control for user intent: same final SAT click2. Only consider navigational queries

3. Identical top-8 Search Results

1.2 M impressions, 19K unique queries, 617K users

Page 16: Auditing search engines for differential satisfaction across demographics

Age-wise differences in metrics disappear

• General auditing tool: robust

• Very low coverage across queries• Did we control for too much?

Page 17: Auditing search engines for differential satisfaction across demographics

II. Query-level hierarchical model: Differential satisfaction for the same query

Information Need

Demographics

MetricUser satisfactionQuery Search

Results

Page 18: Auditing search engines for differential satisfaction across demographics

• Simply fitting different models for each query will not work for less popular queries.

• We formulate a hierarchical model that borrows strength from more popular queries: • Consider metric for each impression---query and user---as a

deviation from overall metric based on:• Query Topic• User demographics

Page 19: Auditing search engines for differential satisfaction across demographics

Age-wise differences appear again: bigger differences for harder queries

Page 20: Auditing search engines for differential satisfaction across demographics

III. Query-level pairwise model: Estimating satisfaction directly by considering pairs of users

Information Need

Demographics

MetricUser satisfactionQuery Search

Results

Page 21: Auditing search engines for differential satisfaction across demographics

Estimating absolute satisfaction is non-trivial• Instead, Estimate relative satisfaction by considering pairs of users for

the same query• Conservative proxy for pairwise satisfaction by only considering “big”

differences in observed metric for the same query• Logistic regression model for estimating probability of impression i

being more satisfied than impression j:

Page 22: Auditing search engines for differential satisfaction across demographics

Again, see a small age-wise difference in satisfaction

Page 23: Auditing search engines for differential satisfaction across demographics

• Auditing is more nuanced than merely measuring metrics on demographically-binned traffic.

• We find light trend towards older users being more satisfied.

• General framework for auditing systems• Plug-in different metrics• Plug-in different demographics/user groups

• Suggests recalibration of metrics based on demographics

Discussion

Page 24: Auditing search engines for differential satisfaction across demographics

Thank You!Amit Sharma

Postdoctoral Researcherhttp://www.amitsharma.in

@[email protected]

Auditing is more nuanced than merely measuring metrics on demographically-binned traffic.

General framework for auditing systemsPlug-in different metricsPlug-in different demographics/user groups

Paper: http://datworkshop.org/papers/dat16-final41.pdf


Recommended