Abuses and misuses of AI: prevention vs reaction · Wu et al., "Making an Invisibility Cloak: Real...

Cristian Canton Ferrer Research Manager (AI Red Team @ Facebook)

Abuses and misuses of AI: prevention vs reactionRed Teaming in the AI world

Cristian Canton Ferrer Research Manager (AI Red Team @ Facebook)

Abuses and misuses of AI: prevention vs reactionRed Teaming in the AI world ...with Manipulated Media as an example

Outline

IntroductionAbusesMisusesPreventionReaction and Mitigation

Introduction

What is the current situation of AI?

Credits: Nicolas Carlini for the graph (https://nicholas.carlini.com/)

Research on adversarial attacks has growth since the advent of DNNs

https://nicholas.carlini.com/

https://nicholas.carlini.com/

Adversarial attack ⇏ GAN

Input imageCategory: Panda (57.7% confidence) Adversarial noise Attacked image

Category: Gibbon (99.3% confidence)

+ =

Credit: Goodfellow et al. "Explaining and harnessing adversarial examples", ICLR 2015.

Abuse of an AI system to force it to make a calculated mistake

What is a Red Team?

What is a Red Team?

"A Red Team is a group that helps organizations to improve themselves by providing opposition to the

point of view of the organization that they are helping."

Wikipedia T

What is a Red Team?

Pope Sixtus V (1521-1590)

At the origin, everything started with the:

"Advocatus Diaboli"

What is a Red Team?

The advent of Red Teaming in the modern era:The Yom Kippur War and the 10th Man Rule

What is a Red Team?

The advent of Red Teaming in the modern era:The Yom Kippur War and the 10th Man Rule

Bryce G. Hoffman, "Red Teaming", 2017. Micah Zenko, "Red Team", 2015.

What does an AI Red Team do?• Bring the "loyal" adversarial mentality into the AI world, specially for systems

in production


in production• Understand the risk landscape of your company


in production• Understand the risk landscape of your company • Identify, evaluate and prioritize risks and feasible attacks


in production• Understand the risk landscape of your company • Identify, evaluate and prioritize risks and feasible attacks • Conceive worst case scenarios derived from abuses and misuses of AI


in production• Understand the risk landscape of your company • Identify, evaluate and prioritize risks and feasible attacks • Conceive worst case scenarios derived from abuses and misuses of AI• Conform a group of experts across all involved aspects of a real system


in production• Understand the risk landscape of your company • Identify, evaluate and prioritize risks and feasible attacks • Conceive worst case scenarios derived from abuses and misuses of AI• Conform a group of experts across all involved aspects of a real system• Convince stakeholders of the importance and potential impact of a worst

case scenario and ideate solutions: preventions or mitigations



case scenario and ideate solutions: preventions or mitigations• Define iterative and periodic interactions with stakeholders



case scenario and ideate solutions: preventions or mitigations• Define iterative and periodic interactions with stakeholders• Defenses? No: that's for the blue team!

Red Queen Dynamics

"...it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"

Lewis Carroll, Through the Looking-Glass

Red Queen Dynamics

AI Risk = Severity x Likelihood

Risk estimation

Risk estimation


• Core metrics for your company• Financial• Data leakage, privacy• PR• Human• Mitigation cost, response time• ...

Risk estimation


• Discoverability• Implementation cost / Feasibility• Motivation• ...

Risk estimationAI Risk = Severity x Likelihood

A first (real) example

This is"objectionable content" (99%)

A first (real) example

This is safe content (95%)

Abuses

Maximum speed 60 MPH

Eykh

olt e

t al.

"Rob

ust P

hysi

cal-W

orld

Atta

cks

on D

eep

Lear

ning

Vis

ual C

lass

ifica

tion"

, 201

8.

Taba

ssi e

t al.,

"A T

axon

omy

and

Term

inol

ogy

of A

dver

saria

l Mac

hine

Lea

rnin

g", 2

019.

Taba

ssi e

t al.,

"A T

axon

omy

and

Term

inol

ogy

of A

dver

saria

l Mac

hine

Lea

rnin

g", 2

019.

Taba

ssi e

t al.,

"A T

axon

omy

and

Term

inol

ogy

of A

dver

saria

l Mac

hine

Lea

rnin

g", 2

019.

Sitawarin et al., "DARTS: Deceiving Autonomous Cars with Toxic Signs", 2018.

Taba

ssi e

t al.,

"A T

axon

omy

and

Term

inol

ogy

of A

dver

saria

l Mac

hine

Lea

rnin

g", 2

019.

Wu et al., "Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors", 2020.

Taba

ssi e

t al.,

"A T

axon

omy

and

Term

inol

ogy

of A

dver

saria

l Mac

hine

Lea

rnin

g", 2

019.

Origina

Alberti et al., "Are You Tampering With My Data?", 2018.

Origina

Attacking dateset biases

De Vries et al., "Does Object RecognitionWork for Everyone?", 2019.





Geographical distribution of classification accuracy

Taba

ssi e

t al.,

"A T

axon

omy

and

Term

inol

ogy

of A

dver

saria

l Mac

hine

Lea

rnin

g", 2

019.

Origina

Original Poisoned

Alberti et al., "Are You Tampering With My Data?", 2018.

Taba

ssi e

t al.,

"A T

axon

omy

and

Term

inol

ogy

of A

dver

saria

l Mac

hine

Lea

rnin

g", 2

019.

Misuses

Example case: Synthetic people

Karras et al. "A Style-Based Generator Architecture for Generative Adversarial Networks", 2019.Karras et al. "Analyzing and Improving the Image Quality of StyleGAN", 2020.

StyleGAN

Disclaimer: None of these individuals exist!

Example case: Synthetic peoplePlenty of potential good uses:• Creative purposes• Virtual characters• Semantic face editing


Smile

edi

tion

Shen et al. "Interpreting the Latent Space of GANs for Semantic Face Editing", 2020.





Potentially "easy" to spot:• Generator residuals (in the image)




Potentially "easy" to spot:• Generator residuals (in the image)• Patterns in the frequency domain

Wang et al. "CNN-generated images are surprisingly easy to spot... for now", 2020.

Example case: Synthetic people Disclaimer: None of these individuals exist!

Andrew Waltz Katie Jones Matilda Romero


Andrew Waltz Katie Jones Matilda Romero

"Real" profile pictures from fake social media users


Carlini and Farid "Evading Deepfake-Image Detectors with White- and Black-Box Attacks", 2020.

87% Fake


Carlini and Farid "Evading Deepfake-Image Detectors with White- and Black-Box Attacks", 2020.

87% Fake

+ =

1% FakeAdversarial noise(magnified x1000)

Example case: DeepFakes


PairwiseSwap the faces of two individuals - the face of person A is put on the body of person B. Requires many photos of person A and B.

Identity-freeWith a few reference photos of person A, put this face onto any other person. Many methods use GANs.


Prevention

Ask the expertsExample - DFDC competition

Ask the expertsExample - DFDC competition

Ask the expertsExample - DFDC competition - Dataset

Ask the expertsExample - DFDC competition - Dataset

Domain gap + Distribution shift


The test distribution you constructed to

validate your algorithm




The real distribution




Your algorithm's goal

The real distribution


Dolhansky et al. "The DeepFake Detection Challenge Dataset", https://arxiv.org/abs/2006.07397









(and know your metrics!)

In general, classification metrics cannot tell the whole story for detection problems.

Detecting DeepFakes from a large pool of real videos is a problem with extreme class imbalance.

Even with an extremely small false positive rate (which accuracy does not really account for), many more false positives will be detected than real DeepFakes.



(and know your metrics!)

A practical case: Risk-a-thons• What is a risk-a-thon? Why is it necessary?

A practical case: Risk-a-thons• What is a risk-a-thon? Why is it necessary?

• For DeepFakes detection:

• Generalization attacks

• Adversarial noise

• Sub-population attacks (burns, vitiligo, skin conditions,...)

• Make-up, scarfs, hats, etc.

Open vs Closed sourcingPros: Good as how well you can keep it secretCons: Underestimation of the adversarial agent

Open vs Closed sourcingPros: Good as how well you can keep it secretCons: Underestimation of the adversarial agent

Neekhara et al. "Adversarial Deepfakes: Evaluating Vulnerability of Deepfake Detectors to Adversarial Examples", 2020.

Open source DeepFake detectors: XceptionNet and MesoNet

Reaction

Duct tape fix on Apollo 17 mission

Mitigation• Sometimes, been preventive about every potential adversity is unfeasible!


• Define mitigations for the most (unaddressed) risky scenarios



• Build defensive systems that are able to rapidly incorporate new adversarial samples, even if there's few of them

Yang et al. "One-Shot Domain Adaptation For Face Generation", 2020.



• Build defensive systems that are able to rapidly incorporate new adversarial samples, even if there's few of them

• Define coordination strategies (if possible) to mitigate potential AI-centric attacks across multiple surfaces

Conclusions

Conclusions• Assume an adversarial mindset when developing systems built on the top of

AI.

• Understand your risk manifold, quantify it and made informed decisions to prioritize defenses and mitigation strategies

• The scope of may AI Red Team is very broad, focus on the relevant areas for your industry

• Stress tess mercilessly. Develop a strategy to convince stakeholders of the value of being ready against a worst-case-scenario

• The more you sweat in training, the less you bleed in battle.

Cristian Canton (@cristiancanton) Research Manager (AI Red Team), Facebook AI

Thanks! Q&A

Date post:	24-Sep-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Abuses and misuses of AI: prevention vs reaction · Wu et al., "Making an Invisibility Cloak: Real...

Documents