+ All Categories
Home > Documents > The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the...

The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the...

Date post: 10-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
24
The Bigot in the Machine: Data Bias Prof Alan F. Smeaton Dublin City University [email protected]
Transcript
Page 1: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

The Bigot in the Machine: Data Bias

Prof Alan F. SmeatonDublin City [email protected]

Page 2: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

450+ Academics, Researchers,

Engineers & PhDs

300+Research Awards won over 4 years

83+ Industry Funding Partners

60+ H2020 consortia. 580+

collaborations in 40 Countries

8 Spinouts

57 License agreements

Built on 14 yars of research in

Data Analytics and AI

4 Co-Lead Universities

Insight in 4 years

1,395+Scientific Papers

100+ PhD/Masters

graduated, 250 + by 2019

4 Taught Masters Programs 330

Graduates per year

Page 3: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Funding Partners

3

Multi-nationals(Examples)

Irish & SME(Examples)

Page 4: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

4

* Based on Insight’s Top 40 Companies

Impact Assessment Matrix with Industry

4

Page 5: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Broadcast

Security Autonomy

Archives

4

Page 6: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Broadcast

Security Autonomy

Archives

4

Page 7: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Simple message … data bias can hurt you !

But before that … computer vision … the home from which deep learning grew

• For years, we were happy

• Image tagging was ourobjective

7

Page 8: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

And in 2012, this happened !

• Krizhevsky, Sutskever and Hinton

at U Toronto, “won” the ImageNet

large scale visual recognition

challenge with a “convolutional

neural network” (deep learning)

• 6 months later, they all work

at Google

Page 9: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video
Page 10: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

10

Page 11: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

From John’s presentation …

11

Page 12: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Neural networks are much more complex …

12

Page 13: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Neural Nets can have many layers

13

Page 14: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Neural Nets can vary layer dimensions

14

Page 15: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Configuring Neural Networks …

• … while there are platforms like GPUs and TensorFlow, optimising hyperparameters is a black art !

• Lets look at a contemporary computer vision example ... automatically captioning video

• Many real world applications

✓ Video summarisation

✓ Supporting search and browsing

✓ Accessibility - video description to the blind

15

Page 16: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Insight and Adapt collaboration

16

LSTM

Sequence to Sequence - Video to Text (S2VT)

“Video caption”

CNN

● Required training on 00,000’s of video-caption pairs – never enough !● Video features generated with a CNN, passed to a 2x LSTM stack● LSTM’s encode the features and decode into natural language descriptions

Page 17: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

17

#990

a baseball player holding a bat on a

field

#1599

a white cat sitting on top of a table

#603

a green truck is parked on a street

#1695

a person riding a bike down a street

Some Insight – Adapt automatic captions …

Page 18: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

How does it perform ?

18

• Took part in an evaluation benchmark – dozens of groups worldwide – run by US National Institute of Standards and Technology

• Human assessment scores [0..100] for each caption from each group on each video – micro-averaged per caption then overall averages, standardised for variation across humans’ mean and std deviation

• Human captions are c.85% satisfactory, ours are c.50% satisfactory

• For many videos ours as good as human, for others we’re poor – why ?1. Our training data has bias – not broad enough to cover all test videos

2. Some of the videos are really difficult to caption anyway

Page 19: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Bias in Training Video Dataset• We have many videos of men playing soccer

• … all manually captioned accordingly, used as training data, but ...

... or ...

19

Page 20: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Easy, and difficult, videos to caption

20

#1002

a woman sitting in a chair with a laptop

#1457

a woman wearing a pink shirt and tie

#1249

a man holding a fork and a cat

#1734

a man in a suit and tie standing at a table

Page 21: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Observations

• Video captioning is hard + need large, diverse set of video-caption pairs, with good coverage, no bias

• History of machine learning has many examples of data bias• 2015 Google Photos app tagged two African American users as gorillas• 2015 Google AdWords shown to advertise more lower-paid jobs to

women and minorities• 2016 Google Image search for CEOs found almost all men• 2017 Russian developers of FaceApp to transform (beautify ?) faces in

photos, automatically lightened skin tones

21TRECVID 2017

Page 22: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Takeaway Message …

• Importantly, we use ML to inform decisions based on large datasets in personal finance, healthcare, job applications, legal system

• ML exacerbates disparities baked into data sets, algorithmic bias results from using machine learning even where there is no discrimination intended - there is a bigot in the machine.

• But volume is your friend, when there’s enough data volume, biases disappear … or so we thought !

• Data-driven approaches assume all data points are created equally, they are not• Mistaken belief that correlation between data sets equals causation

• University of Edinburgh study found significant correlation linking higher chocolate consumption per capita to serial killer activity per capita,

22

Page 23: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Insight and Aviva

• Several projects, including propensity modelling, multi-policy pricing models, CLV modelling … all using machine learning, some deep learning ...

• … all done with real customer data, in Aviva

• We have to stop and ask ourselves ...

• What baked-in biases exist in this data we’re aware of ?

• What baked-in biases exist in this data we’re not aware of ?

• Do those biases matter to the application using the data ?

23

Page 24: The Bigot in the Machine: Data Bias - Insurance Ireland · Accessibility - video description to the blind 15. Insight and Adapt collaboration 16 LSTM Sequence to Sequence - Video

Thank You

[email protected]

24


Recommended