+ All Categories
Home > Documents > Certified Robustness to Adversarial Examples with ... · Google’s Inception network for...

Certified Robustness to Adversarial Examples with ... · Google’s Inception network for...

Date post: 27-Apr-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
48
Certified Robustness to Adversarial Examples with Dierential Privacy Mathias Lécuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana Columbia University Code: https://github.com/columbia/pixeldp Contact: [email protected]
Transcript
Page 1: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Certified Robustness to Adversarial Examples with Differential Privacy

Mathias Lécuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana

Columbia University

Code: https://github.com/columbia/pixeldpContact: [email protected]

Page 2: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

!2

Deep Learning

• Deep Neural Networks (DNNs) deliver remarkable performance on many tasks.

• DNNs are increasingly deployed, including in attack-prone contexts:

Taylor Swift Said to Use Facial Recognition to Identify StalkersBy Sopan Deb, Natasha Singer - Dec. 13, 2018

Page 3: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

!3

ticket 3ticket 2

no ticket

ticket 1

1.0

0.5

Example

ticket 2ticket 3

ticket 1

no ticket

… … …

inputx

layer1

layer2

layer3

softmax

0.10.20.10.6

Page 4: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

!4

… … …

0.10.20.10.6

inputx

layer1

layer2

layer3

softmax

Example

no ticket

argmax

But DNNs are vulnerable to adversarial example attacks.

ticket 3ticket 2

no ticket

ticket 1

1.0

0.5

Page 5: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

no ticketticket 2

!5

… … …

0.10.20.10.6

inputx

layer1

layer2

layer3

softmax+

Example

argmax

0.10.70.10.1

ticket 3ticket 2

no ticket

ticket 1

1.0

0.5

But DNNs are vulnerable to adversarial example attacks.

Page 6: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Accu

racy

(top

1)

0

0.25

0.5

0.75

1

Size of attack α (2-norm)0 0.5 1 1.5 2 2.5 3

�6

Accuracy under attack

Inception-v3 DNN on ImageNet dataset.

2||α|| = 0.52

Teddy bear

Giant panda

2||α|| = 1.06

Teapot

Page 7: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

1. Evaluate accuracy under attack:• Launch an attack on examples in a test set.• Compute accuracy on the attacked examples.

2. Improve accuracy under attack:• Many approaches: e.g. train on adversarial examples.

(e.g Goodfellow+ '15; Papernot+ '16; Buckman+ '18; Guo+ '18)

Problem: both steps are attack specific, leading to an arms race that attackers are winning.

(e.g Carlini-Wagner '17; Athalye+ '18)

Best-effort approaches

7

Page 8: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

• Guaranteed accuracy: what is my minimum accuracy under any attack?

• Prediction robustness: given a prediction can any attack change it?

Key questions

8

Page 9: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

• A few recent approaches with provable guarantees.(e.g. Wong-Kolter '18; Raghunathan+ '18; Wang+ '18)

• Poor scalability in terms of:• Input dimension (e.g. number of pixels).• DNN size.• Size of training data.

9

Key questions• Guaranteed accuracy: what is my minimum accuracy

under any attack?• Prediction robustness: given a prediction can any

attack change it?

Page 10: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

• My defense PixelDP gives answers for norm bounded attacks.

• Key idea: novel use of differential privacy theory at prediction time.

• The most scalable approach: first provable guarantees for large models on ImageNet!

10

Key questions• Guaranteed accuracy: what is my minimum accuracy

under any attack?• Prediction robustness: given a prediction can any

attack change it?

Page 11: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

PixelDP outline

Motivation

Design

Evaluation

11

Page 12: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Key idea

!12

• Problem: small input perturbations create large score changes.

ticket 2

… … …

0.10.60.10.2

inputx

layer1

layer2

layer3

softmax+

0.10.70.10.1

argmax

2=

ticket 3ticket 2

no ticket

ticket 1

1.0

0.5

Page 13: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Key idea

!13

• Problem: small input perturbations create large score changes.• Idea: design a DNN with bounded maximum score changes

(leveraging Differential Privacy theory).

ticket 2

… … …

0.10.60.10.2

inputx

layer1

layer2

layer3

softmax+

argmax

2=

0.10.70.10.1

ticket 3ticket 2

no ticket

ticket 1

1.0

0.5

Page 14: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

• Differential Privacy (DP): technique to randomize a computation over a database, such that changing one data point can only lead to bounded changes in the distribution over possible outputs.

• For (ε, δ)-DP randomized computation Af:

Certified Robustness to Adversarial Examples with Differential Privacy

Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman JanaColumbia University

Abstract—Adversarial examples that fool machine learningmodels, particularly deep neural networks, have been a topicof intense research interest, with attacks and defenses beingdeveloped in a tight back-and-forth. Most past defenses arebest effort and have been shown to be vulnerable to sophis-ticated attacks. Recently a set of certified defenses have beenintroduced, which provide guarantees of robustness to norm-bounded attacks. However these defenses either do not scaleto large datasets or are limited in the types of models theycan support. This paper presents the first certified defensethat both scales to large networks and datasets (such asGoogle’s Inception network for ImageNet) and applies broadlyto arbitrary model types. Our defense, called PixelDP, is basedon a novel connection between robustness against adversarialexamples and differential privacy, a cryptographically-inspiredprivacy formalism, that provides a rigorous, generic, andflexible foundation for defense.

I. IntroductionAf (d) = f(d) +N (0,�2)P (Af (d) = n) e

✏P (Af (d0) = n) + �

P (Af (d) 2 S) e✏P (Af (d0) 2 S) + �

Deep neural networks (DNNs) perform exceptionally wellon many machine learning tasks, including safety- andsecurity-sensitive applications such as self-driving cars [5],malware classification [48], face recognition [47], and criti-cal infrastructure [71]. Robustness against malicious behav-ior is important in many of these applications, yet in recentyears it has become clear that DNNs are vulnerable to abroad range of attacks. Among these attacks – broadly sur-veyed in [46] – are adversarial examples: the adversary findssmall perturbations to correctly classified inputs that cause aDNN to produce an erroneous prediction, possibly of the ad-versary’s choosing [56]. Adversarial examples pose seriousthreats to security-critical applications. A classic example isan adversary attaching a small, human-imperceptible stickeronto a stop sign that causes a self-driving car to recognizeit as a yield sign. Adversarial examples have also beendemonstrated in domains such as reinforcement learning [32]and generative models [31].

Since the initial demonstration of adversarial exam-ples [56], numerous attacks and defenses have been pro-posed, each building on one another. Initially, most de-fenses used best-effort approaches and were broken soonafter introduction. Model distillation, proposed as a robustdefense in [45], was subsequently broken in [7]. Otherwork [36] claimed that adversarial examples are unlikely tofool machine learning (ML) models in the real-world, due

to the rotation and scaling introduced by even the slightestcamera movements. However, [3] demonstrated a new attackstrategy that is robust to rotation and scaling. While thisback-and-forth has advanced the state of the art, recentlythe community has started to recognize that rigorous, theory-backed, defensive approaches are required to put us off thisarms race.

Accordingly, a new set of certified defenses have emergedover the past year, that provide rigorous guarantees ofrobustness against norm-bounded attacks [12], [52], [65].These works alter the learning methods to both optimizefor robustness against attack at training time and permitprovable robustness checks at inference time. At present,these methods tend to be tied to internal network details,such as the type of activation functions and the networkarchitecture. They struggle to generalize across differenttypes of DNNs and have only been evaluated on smallnetworks and datasets.

We propose a new and orthogonal approach to certifiedrobustness against adversarial examples that is broadly ap-plicable, generic, and scalable. We observe for the firsttime a connection between differential privacy (DP), acryptography-inspired formalism, and a definition of robust-ness against norm-bounded adversarial examples in ML.We leverage this connection to develop PixelDP, the firstcertified defense we are aware of that both scales to largenetworks and datasets (such as Google’s Inception net-work trained on ImageNet) and can be adapted broadlyto arbitrary DNN architectures. Our approach can evenbe incorporated with no structural changes in the targetnetwork (e.g., through a separate auto-encoder as describedin Section III-B). We provide a brief overview of ourapproach below along with the section references that detailthe corresponding parts.

§II establishes the DP-robustness connection formally (ourfirst contribution). To give the intuition, DP is a frameworkfor randomizing computations running on databases suchthat a small change in the database (removing or alteringone row or a small set of rows) is guaranteed to result ina bounded change in the distribution over the algorithm’soutputs. Separately, robustness against adversarial examplescan be defined as ensuring that small changes in the input ofan ML predictor (such as changing a few pixels in an imagein the case of an l0-norm attack) will not result in drasticchanges to its predictions (such as changing its label froma stop to a yield sign). Thus, if we think of a DNN’s inputs

Certified Robustness to Adversarial Examples with Differential Privacy

Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman JanaColumbia University

Abstract—Adversarial examples that fool machine learningmodels, particularly deep neural networks, have been a topicof intense research interest, with attacks and defenses beingdeveloped in a tight back-and-forth. Most past defenses arebest effort and have been shown to be vulnerable to sophis-ticated attacks. Recently a set of certified defenses have beenintroduced, which provide guarantees of robustness to norm-bounded attacks. However these defenses either do not scaleto large datasets or are limited in the types of models theycan support. This paper presents the first certified defensethat both scales to large networks and datasets (such asGoogle’s Inception network for ImageNet) and applies broadlyto arbitrary model types. Our defense, called PixelDP, is basedon a novel connection between robustness against adversarialexamples and differential privacy, a cryptographically-inspiredprivacy formalism, that provides a rigorous, generic, andflexible foundation for defense.

I. IntroductionAf (d) = f(d) +N (0,�2)P (Af (d) = n) e

✏P (Af (d0) = n) + �

P (Af (d) 2 S) e✏P (Af (d0) 2 S) + �

Deep neural networks (DNNs) perform exceptionally wellon many machine learning tasks, including safety- andsecurity-sensitive applications such as self-driving cars [5],malware classification [48], face recognition [47], and criti-cal infrastructure [71]. Robustness against malicious behav-ior is important in many of these applications, yet in recentyears it has become clear that DNNs are vulnerable to abroad range of attacks. Among these attacks – broadly sur-veyed in [46] – are adversarial examples: the adversary findssmall perturbations to correctly classified inputs that cause aDNN to produce an erroneous prediction, possibly of the ad-versary’s choosing [56]. Adversarial examples pose seriousthreats to security-critical applications. A classic example isan adversary attaching a small, human-imperceptible stickeronto a stop sign that causes a self-driving car to recognizeit as a yield sign. Adversarial examples have also beendemonstrated in domains such as reinforcement learning [32]and generative models [31].

Since the initial demonstration of adversarial exam-ples [56], numerous attacks and defenses have been pro-posed, each building on one another. Initially, most de-fenses used best-effort approaches and were broken soonafter introduction. Model distillation, proposed as a robustdefense in [45], was subsequently broken in [7]. Otherwork [36] claimed that adversarial examples are unlikely tofool machine learning (ML) models in the real-world, due

to the rotation and scaling introduced by even the slightestcamera movements. However, [3] demonstrated a new attackstrategy that is robust to rotation and scaling. While thisback-and-forth has advanced the state of the art, recentlythe community has started to recognize that rigorous, theory-backed, defensive approaches are required to put us off thisarms race.

Accordingly, a new set of certified defenses have emergedover the past year, that provide rigorous guarantees ofrobustness against norm-bounded attacks [12], [52], [65].These works alter the learning methods to both optimizefor robustness against attack at training time and permitprovable robustness checks at inference time. At present,these methods tend to be tied to internal network details,such as the type of activation functions and the networkarchitecture. They struggle to generalize across differenttypes of DNNs and have only been evaluated on smallnetworks and datasets.

We propose a new and orthogonal approach to certifiedrobustness against adversarial examples that is broadly ap-plicable, generic, and scalable. We observe for the firsttime a connection between differential privacy (DP), acryptography-inspired formalism, and a definition of robust-ness against norm-bounded adversarial examples in ML.We leverage this connection to develop PixelDP, the firstcertified defense we are aware of that both scales to largenetworks and datasets (such as Google’s Inception net-work trained on ImageNet) and can be adapted broadlyto arbitrary DNN architectures. Our approach can evenbe incorporated with no structural changes in the targetnetwork (e.g., through a separate auto-encoder as describedin Section III-B). We provide a brief overview of ourapproach below along with the section references that detailthe corresponding parts.

§II establishes the DP-robustness connection formally (ourfirst contribution). To give the intuition, DP is a frameworkfor randomizing computations running on databases suchthat a small change in the database (removing or alteringone row or a small set of rows) is guaranteed to result ina bounded change in the distribution over the algorithm’soutputs. Separately, robustness against adversarial examplescan be defined as ensuring that small changes in the input ofan ML predictor (such as changing a few pixels in an imagein the case of an l0-norm attack) will not result in drasticchanges to its predictions (such as changing its label froma stop to a yield sign). Thus, if we think of a DNN’s inputs

Certified Robustness to Adversarial Examples with Differential Privacy

Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman JanaColumbia University

Abstract—Adversarial examples that fool machine learningmodels, particularly deep neural networks, have been a topicof intense research interest, with attacks and defenses beingdeveloped in a tight back-and-forth. Most past defenses arebest effort and have been shown to be vulnerable to sophis-ticated attacks. Recently a set of certified defenses have beenintroduced, which provide guarantees of robustness to norm-bounded attacks. However these defenses either do not scaleto large datasets or are limited in the types of models theycan support. This paper presents the first certified defensethat both scales to large networks and datasets (such asGoogle’s Inception network for ImageNet) and applies broadlyto arbitrary model types. Our defense, called PixelDP, is basedon a novel connection between robustness against adversarialexamples and differential privacy, a cryptographically-inspiredprivacy formalism, that provides a rigorous, generic, andflexible foundation for defense.

I. IntroductionAf (d) = f(d) +N (0,�2)P (Af (d) = n) e

✏P (Af (d0) = n) + �

P (Af (d) 2 S) e✏P (Af (d0) 2 S) + �

Deep neural networks (DNNs) perform exceptionally wellon many machine learning tasks, including safety- andsecurity-sensitive applications such as self-driving cars [5],malware classification [48], face recognition [47], and criti-cal infrastructure [71]. Robustness against malicious behav-ior is important in many of these applications, yet in recentyears it has become clear that DNNs are vulnerable to abroad range of attacks. Among these attacks – broadly sur-veyed in [46] – are adversarial examples: the adversary findssmall perturbations to correctly classified inputs that cause aDNN to produce an erroneous prediction, possibly of the ad-versary’s choosing [56]. Adversarial examples pose seriousthreats to security-critical applications. A classic example isan adversary attaching a small, human-imperceptible stickeronto a stop sign that causes a self-driving car to recognizeit as a yield sign. Adversarial examples have also beendemonstrated in domains such as reinforcement learning [32]and generative models [31].

Since the initial demonstration of adversarial exam-ples [56], numerous attacks and defenses have been pro-posed, each building on one another. Initially, most de-fenses used best-effort approaches and were broken soonafter introduction. Model distillation, proposed as a robustdefense in [45], was subsequently broken in [7]. Otherwork [36] claimed that adversarial examples are unlikely tofool machine learning (ML) models in the real-world, due

to the rotation and scaling introduced by even the slightestcamera movements. However, [3] demonstrated a new attackstrategy that is robust to rotation and scaling. While thisback-and-forth has advanced the state of the art, recentlythe community has started to recognize that rigorous, theory-backed, defensive approaches are required to put us off thisarms race.

Accordingly, a new set of certified defenses have emergedover the past year, that provide rigorous guarantees ofrobustness against norm-bounded attacks [12], [52], [65].These works alter the learning methods to both optimizefor robustness against attack at training time and permitprovable robustness checks at inference time. At present,these methods tend to be tied to internal network details,such as the type of activation functions and the networkarchitecture. They struggle to generalize across differenttypes of DNNs and have only been evaluated on smallnetworks and datasets.

We propose a new and orthogonal approach to certifiedrobustness against adversarial examples that is broadly ap-plicable, generic, and scalable. We observe for the firsttime a connection between differential privacy (DP), acryptography-inspired formalism, and a definition of robust-ness against norm-bounded adversarial examples in ML.We leverage this connection to develop PixelDP, the firstcertified defense we are aware of that both scales to largenetworks and datasets (such as Google’s Inception net-work trained on ImageNet) and can be adapted broadlyto arbitrary DNN architectures. Our approach can evenbe incorporated with no structural changes in the targetnetwork (e.g., through a separate auto-encoder as describedin Section III-B). We provide a brief overview of ourapproach below along with the section references that detailthe corresponding parts.

§II establishes the DP-robustness connection formally (ourfirst contribution). To give the intuition, DP is a frameworkfor randomizing computations running on databases suchthat a small change in the database (removing or alteringone row or a small set of rows) is guaranteed to result ina bounded change in the distribution over the algorithm’soutputs. Separately, robustness against adversarial examplescan be defined as ensuring that small changes in the input ofan ML predictor (such as changing a few pixels in an imagein the case of an l0-norm attack) will not result in drasticchanges to its predictions (such as changing its label froma stop to a yield sign). Thus, if we think of a DNN’s inputs

Differential Privacy

!14

• We prove the Expected Output Stability Bound. For any DP mechanism with bounded outputs in [0, 1] we have:

Page 15: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Make prediction DP

!15

Key idea

• Problem: small input perturbations create large score changes.• Idea: design a DNN with bounded maximum score changes

(leveraging Differential Privacy theory).

no ticket

… … …

0.10.20.10.6

inputx

layer1

layer2

layer3

softmax

argmax

ticket 3ticket 2

no ticket

ticket 1

1.0

0.5

Page 16: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

ticket 3ticket 2

no ticket

ticket 1

1.0

0.5

stalker 2

argmax

0.10.20.10.6

Make prediction DP

!16

Key idea

• Problem: small input perturbations create large score changes.• Idea: design a DNN with bounded maximum score changes

(leveraging Differential Privacy theory).

… … …

0.10.20.10.6

inputx

layer1

layer2

layer3

softmax

stability bounds

Page 17: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

ticket 3ticket 2

no ticket

ticket 1

1.0

0.5

stalker 2

argmax

0.10.20.10.6

Make prediction DP

!17

Key idea

• Problem: small input perturbations create large score changes.• Idea: design a DNN with bounded maximum score changes

(leveraging Differential Privacy theory).

… … …

0.10.20.10.6

inputx

layer1

layer2

layer3

softmax

stability bounds

Page 18: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

!18

PixelDP architecture

1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.

Page 19: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

!19

PixelDP architecture

… …

inputx

layer1

noise layer

+

1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.

layer2

layer3

softmax

0.20.10.10.6

Page 20: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

… …input

xlayer

1

noise layer

+

PixelDP architecture

!20

Notes

Mathi

asLe

cuyer

June

5,201

8

5 (

1Dr

awsf

orpr

edict

ion

Given

anim

ageto

classify,w

ewan

tto

detect

thehigh

estprob

ability

labe

l.

1.1Fi

xed

boun

ds

Wefirst

askho

wman

ydraw

sweneed

todistingu

ish

thehigh

estprob

a-

bilitywithprob

ability

atleast1��

.Westartfrom

Hoe↵d

ing’sinequa

l-

ityap

pliedto

aBerno

uilli

variab

lethat...

WecallX̄

theem

piricalmean,

X̄=

1 n

P n i=1X

:

P(|X̄�p

|�✏)2

e�2✏2 n

Weno

wno

te✏t

hedi↵e

rencebe

tweenthehigh

estan

dsecond

high

estlabe

l

prob

ability,an

drewrite

thevariab

leso

that

thebo

unds

dono

toverlap

✏ ✏ 2.

Finally,w

eap

plyaun

ionbo

undover

thekp

ossiblelabe

ls,a

ndend

upwith:

P(|X̄�p

|�✏)2

ke�✏2 n 2

�⌧1

)n�

2 ✏2ln(2k �)

Forinstan

cein

datasets

withk=

10,distingu

ishing

thetoplabe

lwith

prob

ability

atleast0.9

9whenitisbigger

than

thesecond

oneby

0.1requ

ires

n⇡1500

draw

s.

1

1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.

(ε, δ)-DP

layer2

layer3

softmax

0.20.10.10.6

Page 21: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

… …input

xlayer

1

noise layer

+

PixelDP architecture

Resilience to post-processing: any computation on the output of an (ε, δ)-DP mechanism is still (ε, δ)-DP.

Notes

Mathi

asLec

uyer

June5

,2018

5 (

1Dr

awsf

orpr

edict

ion

Given

anim

ageto

classify,

wewan

tto

detect

thehigh

estprob

ability

labe

l.

1.1Fix

edbo

unds

Wefirst

askho

wman

ydraw

sweneed

todistingu

ishthehigh

estprob

a-

bilitywith

prob

ability

atleast1��

.Westartfrom

Hoe↵d

ing’sinequa

l-

ityap

pliedto

aBerno

uilli

varia

blethat...

WecallX̄

theem

piric

almean,

X̄=

1 nP n i=1X

:

P(|X̄�p

|�✏)2

e�2✏2 n

Weno

wno

te✏t

hedi↵e

rencebe

tweenthehigh

estan

dsecond

high

estlabe

l

prob

ability,an

drewrit

ethevaria

bleso

that

thebo

unds

dono

toverlap

✏ ✏ 2.

Fina

lly,w

eap

plyaun

ionbo

undover

thekp

ossib

lelabe

ls,an

dend

upwith

:

P(|X̄�p

|�✏)2

ke�✏2 n 2�⌧1

)n�2 ✏2

ln(2k �)

Forinstan

cein

datasets

with

k=10,d

istingu

ishingthetoplabe

lwith

prob

ability

atleast0

.99whenitisbigger

than

thesecond

oneby

0.1requ

ires

n⇡1500

draw

s.

1

!21

1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.

layer2

layer3

softmax

0.20.10.10.6

Page 22: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

… …input

xlayer

1

0.10.20.10.6

PixelDP architecture

layer2

layer3

softmax

noise layer

+ …

0.20.10.10.6

?

^

Compute empirical mean with standard Monte Carlo estimate.

!22

1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.

Page 23: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

… …input

xlayer

1

0.10.20.10.6

PixelDP architecture

0.20.10.10.6

layer2

layer3

softmax

noise layer

+

!23

1.0

0.5

stability bounds^

η-confidence intervals

^

stalker 3

stalker 2

harmless

stalker 1

1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.

Page 24: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

… …input

xlayer

1

0.10.20.10.6

PixelDP architecture

0.20.10.10.6

layer2

layer3

softmax

noise layer

+

!24

1.0

0.5

stability bounds^

η-confidence intervals

^

stalker 3

stalker 2

harmless

stalker 1

1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.

Page 25: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

• Train DP DNN with noise.

• Control pre-noise sensitivity during training.

• Support various attack norms ( ).

• Scale to large DNNs and datasets.

Further challenges

Notes

Mathias Lecuyer

June 6, 2018

5

(

x+ ↵

8d, d0

S ✓ B

i = argmax py(x)

A(d) = f(d) +N (0,�2)

(✏, �)-DP

� =

r2 ln(

1.25

�)�p,2L

L0, L1, L2, L1

P (w(d) 2 S) e✏P (w(d0) 2 S) + �

py(x) e✏py(x0) + �

P (f(x) = i) e✏P (f(x0) = i) + �

1

!25

Page 26: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Scaling to Inception on ImageNet

!26

• Large dataset: image resolution is 300x300x3.• Large model:

• 48 layers deep.• 23 millions parameters.• Released pre-trained by Google on ImageNet.

Inception-v3

Page 27: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

…… …

inputx

…noise layer

+

inputx

Notes

Mathias Lecuyer

June 5, 2018

5

1 Draws for prediction

Given an image to classify, we want to detect the highest probability label.

1.1 Fixed bounds

We first ask how many draws we need to distinguish the highest proba-

bility with probability at least 1 � �. We start from Hoe↵ding’s inequal-

ity applied to a Bernouilli variable that... We call X̄ the empirical mean,

X̄ =1n

Pni=1X:

P (|X̄ � p| � ✏) 2e�2✏2n

We now note ✏ the di↵erence between the highest and second highest label

probability, and rewrite the variable so that the bounds do not overlap

✏ ✏2 . Finally, we apply a union bound over the k possible labels, and end

up with:

P (|X̄ � p| � ✏) 2ke�✏2n2 � ⌧ 1) n � 2

✏2ln(

2k

�)

For instance in datasets with k = 10, distinguishing the top label with

probability at least 0.99 when it is bigger than the second one by 0.1 requires

n ⇡ 1500 draws.

1

PixelDP auto-encoder

!27

Scaling to Inception on ImageNet

Page 28: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

!28

Inception-v3

Scaling to Inception on ImageNet

…… … …+Notes MathiasLecuyer June5,2018 5 (

1Drawsforpredic

tion

Givenanima

getoclassify,

wewanttode

tectthehighe

stprobability

label.

1.1Fixedbounds

Wefirstask

howmanydr

awsweneedt

odistinguish

thehighestp

roba-

bilitywithpro

babilityatle

ast1��.Westart

fromHoe↵din

g’sinequal-

ityappliedto

aBernouilliv

ariablethat...

WecallX̄theem

piricalmean,

X̄=1 nP n i=1X:

P(|X̄�p|�✏)2e�2✏2 n

Wenownote✏the

di↵erencebetw

eenthehighest

andsecondh

ighestlabel

probability,a

ndrewriteth

evariableso

thattheboun

dsdonotov

erlap

✏ ✏ 2.Finally,we

applyaunionb

oundoverthe

kpossiblelabels,

andend

upwith: P(|X̄�p|�

✏)2ke�✏2 n 2 �⌧1)n�2 ✏2ln(2k �)

Forinstancei

ndatasetswi

thk=10,disting

uishingthetop

labelwith

probabilityat

least0.99wheniti

sbiggerthan

thesecondon

eby0.1requires

n⇡1500draws.

1

Post-processing

PixelDP auto-encoder

Page 29: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

PixelDP Outline

Motivation

Design

Evaluation

29

Page 30: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Evaluation:

1. Guaranteed accuracy on large DNNs/datasets

2. Are robust predictions harder to attack in practice?

3. Comparison with other defenses against state-of-the-art attacks.

�30

Page 31: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Methodology

Dataset Image size Number of Classes

ImageNet 299x299x3 1000CIFAR-100 32x32x3 100CIFAR-10 32x32x3 10

SVHN 32x32x3 10MNIST 28x28x1 10

Dataset Number ofLayers

Number of Parameters

Inception-v3 48 23MWide ResNet 28 36M

CNN 3 3M

Five datasets: Three models:

Attack methodology:Metrics:• Guaranteed accuracy.• Accuracy under attack.

• State of the art attack [Carlini and Wagner S&P'17].

• Strengthened against our defense by averaging gradients over multiple noise draws.

�31

Page 32: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Guaranteed accuracy on ImageNet with Inception-v3

Meaningful guaranteed accuracy for ImageNet!

Model Accuracy (%)

Guaranteed accuracy (%) 0.05 0.1 0.2

Baseline 78 - - -PixelDP: L=0.25 68 63 0 0PixelDP: L=0.75 58 53 49 40

�32

More DP noise

Page 33: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

What if we only act on robust predictions?(e.g. if not robust, check ticket)

Accu

racy

(top

1)

00.10.20.30.40.50.60.70.80.9

1

Attack size (2-norm)0 0.2 0.4 0.6 0.8 1 1.2 1.4

BaselinePrecision: threshold 0.05Recall: threshold 0.05

�33

Dataset: CIFAR-10

Accuracy on robust predictions

Page 34: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Accu

racy

(top

1)

00.10.20.30.40.50.60.70.80.9

1

Attack size (2-norm)0 0.2 0.4 0.6 0.8 1 1.2 1.4

BaselinePrecision:Recall:Madry+ '17

�34

Dataset: CIFAR-10

Comparison:Madry+ '17

Accuracy on robust predictions

If we increase the robustness threshold:better accuracy, less predictions.

threshold 0.1threshold 0.1

Page 35: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Comparison with other provable defenses

PixelDP scales to larger models, yielding better accuracy and robustness.

Accu

racy

(top

1)

00.10.20.30.40.50.60.70.80.9

1

Attack size (2-norm)0 0.2 0.4 0.6 0.8 1 1.2 1.4

ResNet - PixelDP (L = 0.1)CNN - Wong-Kolter '18

�35

Dataset: SVHN

Comparison:Wong-Kolter '18

Page 36: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

PixelDP summary

• PixelDP is the first defense that:• Gives attack-independent guarantees against norm-

bounded adversarial attacks.• And scales to the largest models and datasets.

• Already extensions by others!• Improve the bounds at a given noise level (Li+ '18;

Cohen+ '19).• Use other noise distributions (Pinot+ '19).• Adapt optimization (Rakin+ '18).

�36

Page 37: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

!37

Page 38: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Appendix

!38

Page 39: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Comparison with best-effort techniques

PixelDP is empirically competitive with thestate-of-the-art best-effort defense.

�39

Dataset: CIFAR-10

Comparison:Best effort defense by Madry+ '17 Ac

cura

cy (t

op 1

)

00.10.20.30.40.50.60.70.80.9

1

Attack size (2-norm)0 0.2 0.4 0.6 0.8 1 1.2 1.4

BaselinePixelDP (L = 0.1)Madry+ '17

Page 40: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Related workBest effort Certified

+ Scale:• Run a best effort attack per

gradient step [Goodfellow+ '15, Madry+ '17].

• Preprocess inputs [Buckman+ '18, Guo+ '18].

• Train a second model based on the first one [Papernot+ '16].

+ Flexible:• Support most architectures.

- No robustness guarantees:• Often broken soon after release

[Athalye+ '18].

+ Provable guarantees:• Per prediction [Wong-Kolter+ '18,

Wong+ '18, Raghunathan+ '18, Wang+ '18].

• In expectation [Sinha+ '17].- Hard to scale:

• Requires orders of magnitude more computation [Wong-Kolter+ '18, Wong+ '18, Wang+ '18].

• Support only 1 hidden layer [Raghunathan+ '18].

- Often not flexible:• No ReLU, MaxPool, or accuracy

guarantees [Sinha+ '17].• Only ReLU, no BatchNorm [Wong-Kolter

'18].

�40

PixelDP is the first certified defense that both achieves provable guarantees of robustness, scales and is broadly applicable to arbitrary networks.

Page 41: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Results - CIFAR-10

attack:

Notes

Mathias Lecuyer

June 6, 2018

5

(

x+ ↵

8d, d0

S ✓ B

i = argmax py(x)

A(d) = f(d) +N (0,�2)

(✏, �)-DP

� =

r2 ln(

1.25

�)�p,2L

L0, L1, L2, L1

P (w(d) 2 S) e✏P (w(d0) 2 S) + �

py(x) e✏py(x0) + �

P (f(x) = i) e✏P (f(x0) = i) + �

1

�41

Page 42: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Results - SVHN

attack:

Notes

Mathias Lecuyer

June 6, 2018

5

(

x+ ↵

8d, d0

S ✓ B

i = argmax py(x)

A(d) = f(d) +N (0,�2)

(✏, �)-DP

� =

r2 ln(

1.25

�)�p,2L

L0, L1, L2, L1

P (w(d) 2 S) e✏P (w(d0) 2 S) + �

py(x) e✏py(x0) + �

P (f(x) = i) e✏P (f(x0) = i) + �

1

�42

Page 43: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Certification on ImageNet/Inception-v3C

ertifi

ed A

ccur

acy

00.10.20.30.40.50.60.70.80.9

1

Attack Size0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

BaselinePixelDP (L=0.1)PixelDP (L=0.3)PixelDP (L=1.0)

�43

Page 44: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Certification on CIFAR-10C

ertifi

ed A

ccur

acy

00.10.20.30.40.50.60.70.80.9

1

Attack Size0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

BaselinePixelDP (L=0.1)PixelDP (L=0.3)

�44

Page 45: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Comparison with Best Effort Techniques

�45

2||α|| = 0.52

Teddy bear

Giant panda

Teddy bear

Undefended: 2||α|| = 3.41Undefended:

Page 46: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Full references

• [Goodfellow+ '15] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. ICLR 2015.

• [Papernot+ '16] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. S&P 2016.

• [Buckman+ '18] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow. Thermometer encoding: One hot way to resist adversarial examples. ICLR 2018.

• [Guo+ '18] C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Countering adversarial images using input transformations. ICLR 2018.

• [Madry+ '17] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv 2017.

!46

Page 47: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Full references

• [Carlini-Wagner '17] ] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. S&P 2017.

• [Athalye+ '18] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML 2018.

• [Wong-Kolter '18] E. Wong and Z. Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. ICML 2018.

• [Raghunathan+ '18] A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. arXiv 2018.

• [Wang+ '18] S. Wang, K. Pei, W. Justin, J. Yang, and S. Jana. Efficient formal safety analysis of neural networks. NeurIPS 2018.

• [Li+ '18] B. Li, C. Chen, W. Wang, and L. Carin. Second-Order Adversarial Attack and Certifiable Robustness. arXiv 2018.

!47

Page 48: Certified Robustness to Adversarial Examples with ... · Google’s Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is

Full references

• [Rakin+ '18] A.S. Rakin, Z. He, and D. Fan. Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness against Adversarial Attack. arXiv 2018.

• [Cohen+ '19] J. Cohen, E. Rosenfeld, and Z. Kolter. Certified Adversarial Robustness via Randomized Smoothing. arXiv 2019.

• [Pinot+ '19] R. Pinot, L. Meunier, A. Araujo, H. Kashima, F. Yger, C. Gouy-Pailler, and J. Atif. Theoretical evidence for adversarial robustness through randomization: the case of the Exponential family. arXiv 2019.

!48


Recommended