450+ Academics, Researchers,
Engineers & PhDs
300+Research Awards won over 4 years
83+ Industry Funding Partners
60+ H2020 consortia. 580+
collaborations in 40 Countries
8 Spinouts
57 License agreements
Built on 14 yars of research in
Data Analytics and AI
4 Co-Lead Universities
Insight in 4 years
1,395+Scientific Papers
100+ PhD/Masters
graduated, 250 + by 2019
4 Taught Masters Programs 330
Graduates per year
Funding Partners
3
Multi-nationals(Examples)
Irish & SME(Examples)
4
* Based on Insight’s Top 40 Companies
Impact Assessment Matrix with Industry
4
Broadcast
Security Autonomy
Archives
4
Broadcast
Security Autonomy
Archives
4
Simple message … data bias can hurt you !
But before that … computer vision … the home from which deep learning grew
• For years, we were happy
• Image tagging was ourobjective
7
And in 2012, this happened !
• Krizhevsky, Sutskever and Hinton
at U Toronto, “won” the ImageNet
large scale visual recognition
challenge with a “convolutional
neural network” (deep learning)
• 6 months later, they all work
at Google
10
From John’s presentation …
11
Neural networks are much more complex …
12
Neural Nets can have many layers
13
Neural Nets can vary layer dimensions
14
Configuring Neural Networks …
• … while there are platforms like GPUs and TensorFlow, optimising hyperparameters is a black art !
• Lets look at a contemporary computer vision example ... automatically captioning video
• Many real world applications
✓ Video summarisation
✓ Supporting search and browsing
✓ Accessibility - video description to the blind
15
Insight and Adapt collaboration
16
LSTM
Sequence to Sequence - Video to Text (S2VT)
“Video caption”
CNN
● Required training on 00,000’s of video-caption pairs – never enough !● Video features generated with a CNN, passed to a 2x LSTM stack● LSTM’s encode the features and decode into natural language descriptions
17
#990
a baseball player holding a bat on a
field
#1599
a white cat sitting on top of a table
#603
a green truck is parked on a street
#1695
a person riding a bike down a street
Some Insight – Adapt automatic captions …
How does it perform ?
18
• Took part in an evaluation benchmark – dozens of groups worldwide – run by US National Institute of Standards and Technology
• Human assessment scores [0..100] for each caption from each group on each video – micro-averaged per caption then overall averages, standardised for variation across humans’ mean and std deviation
• Human captions are c.85% satisfactory, ours are c.50% satisfactory
• For many videos ours as good as human, for others we’re poor – why ?1. Our training data has bias – not broad enough to cover all test videos
2. Some of the videos are really difficult to caption anyway
Bias in Training Video Dataset• We have many videos of men playing soccer
• … all manually captioned accordingly, used as training data, but ...
... or ...
19
Easy, and difficult, videos to caption
20
#1002
a woman sitting in a chair with a laptop
#1457
a woman wearing a pink shirt and tie
#1249
a man holding a fork and a cat
#1734
a man in a suit and tie standing at a table
Observations
• Video captioning is hard + need large, diverse set of video-caption pairs, with good coverage, no bias
• History of machine learning has many examples of data bias• 2015 Google Photos app tagged two African American users as gorillas• 2015 Google AdWords shown to advertise more lower-paid jobs to
women and minorities• 2016 Google Image search for CEOs found almost all men• 2017 Russian developers of FaceApp to transform (beautify ?) faces in
photos, automatically lightened skin tones
21TRECVID 2017
Takeaway Message …
• Importantly, we use ML to inform decisions based on large datasets in personal finance, healthcare, job applications, legal system
• ML exacerbates disparities baked into data sets, algorithmic bias results from using machine learning even where there is no discrimination intended - there is a bigot in the machine.
• But volume is your friend, when there’s enough data volume, biases disappear … or so we thought !
• Data-driven approaches assume all data points are created equally, they are not• Mistaken belief that correlation between data sets equals causation
• University of Edinburgh study found significant correlation linking higher chocolate consumption per capita to serial killer activity per capita,
22
Insight and Aviva
• Several projects, including propensity modelling, multi-policy pricing models, CLV modelling … all using machine learning, some deep learning ...
• … all done with real customer data, in Aviva
• We have to stop and ask ourselves ...
• What baked-in biases exist in this data we’re aware of ?
• What baked-in biases exist in this data we’re not aware of ?
• Do those biases matter to the application using the data ?
23