Certified Robustness to Adversarial Examples with Differential Privacy
Mathias Lécuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana
Columbia University
Code: https://github.com/columbia/pixeldpContact: [email protected]
!2
Deep Learning
• Deep Neural Networks (DNNs) deliver remarkable performance on many tasks.
• DNNs are increasingly deployed, including in attack-prone contexts:
Taylor Swift Said to Use Facial Recognition to Identify StalkersBy Sopan Deb, Natasha Singer - Dec. 13, 2018
!3
ticket 3ticket 2
no ticket
ticket 1
1.0
0.5
Example
ticket 2ticket 3
ticket 1
no ticket
… … …
inputx
layer1
layer2
layer3
softmax
0.10.20.10.6
!4
… … …
0.10.20.10.6
inputx
layer1
layer2
layer3
softmax
Example
no ticket
argmax
But DNNs are vulnerable to adversarial example attacks.
ticket 3ticket 2
no ticket
ticket 1
1.0
0.5
no ticketticket 2
!5
… … …
0.10.20.10.6
inputx
layer1
layer2
layer3
softmax+
Example
argmax
0.10.70.10.1
ticket 3ticket 2
no ticket
ticket 1
1.0
0.5
But DNNs are vulnerable to adversarial example attacks.
Accu
racy
(top
1)
0
0.25
0.5
0.75
1
Size of attack α (2-norm)0 0.5 1 1.5 2 2.5 3
�6
Accuracy under attack
Inception-v3 DNN on ImageNet dataset.
2||α|| = 0.52
Teddy bear
Giant panda
2||α|| = 1.06
Teapot
1. Evaluate accuracy under attack:• Launch an attack on examples in a test set.• Compute accuracy on the attacked examples.
2. Improve accuracy under attack:• Many approaches: e.g. train on adversarial examples.
(e.g Goodfellow+ '15; Papernot+ '16; Buckman+ '18; Guo+ '18)
Problem: both steps are attack specific, leading to an arms race that attackers are winning.
(e.g Carlini-Wagner '17; Athalye+ '18)
Best-effort approaches
7
• Guaranteed accuracy: what is my minimum accuracy under any attack?
• Prediction robustness: given a prediction can any attack change it?
Key questions
8
• A few recent approaches with provable guarantees.(e.g. Wong-Kolter '18; Raghunathan+ '18; Wang+ '18)
• Poor scalability in terms of:• Input dimension (e.g. number of pixels).• DNN size.• Size of training data.
9
Key questions• Guaranteed accuracy: what is my minimum accuracy
under any attack?• Prediction robustness: given a prediction can any
attack change it?
• My defense PixelDP gives answers for norm bounded attacks.
• Key idea: novel use of differential privacy theory at prediction time.
• The most scalable approach: first provable guarantees for large models on ImageNet!
10
Key questions• Guaranteed accuracy: what is my minimum accuracy
under any attack?• Prediction robustness: given a prediction can any
attack change it?
PixelDP outline
Motivation
Design
Evaluation
11
Key idea
!12
• Problem: small input perturbations create large score changes.
ticket 2
… … …
0.10.60.10.2
inputx
layer1
layer2
layer3
softmax+
0.10.70.10.1
argmax
2=
ticket 3ticket 2
no ticket
ticket 1
1.0
0.5
Key idea
!13
• Problem: small input perturbations create large score changes.• Idea: design a DNN with bounded maximum score changes
(leveraging Differential Privacy theory).
ticket 2
… … …
0.10.60.10.2
inputx
layer1
layer2
layer3
softmax+
argmax
2=
0.10.70.10.1
ticket 3ticket 2
no ticket
ticket 1
1.0
0.5
• Differential Privacy (DP): technique to randomize a computation over a database, such that changing one data point can only lead to bounded changes in the distribution over possible outputs.
• For (ε, δ)-DP randomized computation Af:
Certified Robustness to Adversarial Examples with Differential Privacy
Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman JanaColumbia University
Abstract—Adversarial examples that fool machine learningmodels, particularly deep neural networks, have been a topicof intense research interest, with attacks and defenses beingdeveloped in a tight back-and-forth. Most past defenses arebest effort and have been shown to be vulnerable to sophis-ticated attacks. Recently a set of certified defenses have beenintroduced, which provide guarantees of robustness to norm-bounded attacks. However these defenses either do not scaleto large datasets or are limited in the types of models theycan support. This paper presents the first certified defensethat both scales to large networks and datasets (such asGoogle’s Inception network for ImageNet) and applies broadlyto arbitrary model types. Our defense, called PixelDP, is basedon a novel connection between robustness against adversarialexamples and differential privacy, a cryptographically-inspiredprivacy formalism, that provides a rigorous, generic, andflexible foundation for defense.
I. IntroductionAf (d) = f(d) +N (0,�2)P (Af (d) = n) e
✏P (Af (d0) = n) + �
P (Af (d) 2 S) e✏P (Af (d0) 2 S) + �
Deep neural networks (DNNs) perform exceptionally wellon many machine learning tasks, including safety- andsecurity-sensitive applications such as self-driving cars [5],malware classification [48], face recognition [47], and criti-cal infrastructure [71]. Robustness against malicious behav-ior is important in many of these applications, yet in recentyears it has become clear that DNNs are vulnerable to abroad range of attacks. Among these attacks – broadly sur-veyed in [46] – are adversarial examples: the adversary findssmall perturbations to correctly classified inputs that cause aDNN to produce an erroneous prediction, possibly of the ad-versary’s choosing [56]. Adversarial examples pose seriousthreats to security-critical applications. A classic example isan adversary attaching a small, human-imperceptible stickeronto a stop sign that causes a self-driving car to recognizeit as a yield sign. Adversarial examples have also beendemonstrated in domains such as reinforcement learning [32]and generative models [31].
Since the initial demonstration of adversarial exam-ples [56], numerous attacks and defenses have been pro-posed, each building on one another. Initially, most de-fenses used best-effort approaches and were broken soonafter introduction. Model distillation, proposed as a robustdefense in [45], was subsequently broken in [7]. Otherwork [36] claimed that adversarial examples are unlikely tofool machine learning (ML) models in the real-world, due
to the rotation and scaling introduced by even the slightestcamera movements. However, [3] demonstrated a new attackstrategy that is robust to rotation and scaling. While thisback-and-forth has advanced the state of the art, recentlythe community has started to recognize that rigorous, theory-backed, defensive approaches are required to put us off thisarms race.
Accordingly, a new set of certified defenses have emergedover the past year, that provide rigorous guarantees ofrobustness against norm-bounded attacks [12], [52], [65].These works alter the learning methods to both optimizefor robustness against attack at training time and permitprovable robustness checks at inference time. At present,these methods tend to be tied to internal network details,such as the type of activation functions and the networkarchitecture. They struggle to generalize across differenttypes of DNNs and have only been evaluated on smallnetworks and datasets.
We propose a new and orthogonal approach to certifiedrobustness against adversarial examples that is broadly ap-plicable, generic, and scalable. We observe for the firsttime a connection between differential privacy (DP), acryptography-inspired formalism, and a definition of robust-ness against norm-bounded adversarial examples in ML.We leverage this connection to develop PixelDP, the firstcertified defense we are aware of that both scales to largenetworks and datasets (such as Google’s Inception net-work trained on ImageNet) and can be adapted broadlyto arbitrary DNN architectures. Our approach can evenbe incorporated with no structural changes in the targetnetwork (e.g., through a separate auto-encoder as describedin Section III-B). We provide a brief overview of ourapproach below along with the section references that detailthe corresponding parts.
§II establishes the DP-robustness connection formally (ourfirst contribution). To give the intuition, DP is a frameworkfor randomizing computations running on databases suchthat a small change in the database (removing or alteringone row or a small set of rows) is guaranteed to result ina bounded change in the distribution over the algorithm’soutputs. Separately, robustness against adversarial examplescan be defined as ensuring that small changes in the input ofan ML predictor (such as changing a few pixels in an imagein the case of an l0-norm attack) will not result in drasticchanges to its predictions (such as changing its label froma stop to a yield sign). Thus, if we think of a DNN’s inputs
Certified Robustness to Adversarial Examples with Differential Privacy
Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman JanaColumbia University
Abstract—Adversarial examples that fool machine learningmodels, particularly deep neural networks, have been a topicof intense research interest, with attacks and defenses beingdeveloped in a tight back-and-forth. Most past defenses arebest effort and have been shown to be vulnerable to sophis-ticated attacks. Recently a set of certified defenses have beenintroduced, which provide guarantees of robustness to norm-bounded attacks. However these defenses either do not scaleto large datasets or are limited in the types of models theycan support. This paper presents the first certified defensethat both scales to large networks and datasets (such asGoogle’s Inception network for ImageNet) and applies broadlyto arbitrary model types. Our defense, called PixelDP, is basedon a novel connection between robustness against adversarialexamples and differential privacy, a cryptographically-inspiredprivacy formalism, that provides a rigorous, generic, andflexible foundation for defense.
I. IntroductionAf (d) = f(d) +N (0,�2)P (Af (d) = n) e
✏P (Af (d0) = n) + �
P (Af (d) 2 S) e✏P (Af (d0) 2 S) + �
Deep neural networks (DNNs) perform exceptionally wellon many machine learning tasks, including safety- andsecurity-sensitive applications such as self-driving cars [5],malware classification [48], face recognition [47], and criti-cal infrastructure [71]. Robustness against malicious behav-ior is important in many of these applications, yet in recentyears it has become clear that DNNs are vulnerable to abroad range of attacks. Among these attacks – broadly sur-veyed in [46] – are adversarial examples: the adversary findssmall perturbations to correctly classified inputs that cause aDNN to produce an erroneous prediction, possibly of the ad-versary’s choosing [56]. Adversarial examples pose seriousthreats to security-critical applications. A classic example isan adversary attaching a small, human-imperceptible stickeronto a stop sign that causes a self-driving car to recognizeit as a yield sign. Adversarial examples have also beendemonstrated in domains such as reinforcement learning [32]and generative models [31].
Since the initial demonstration of adversarial exam-ples [56], numerous attacks and defenses have been pro-posed, each building on one another. Initially, most de-fenses used best-effort approaches and were broken soonafter introduction. Model distillation, proposed as a robustdefense in [45], was subsequently broken in [7]. Otherwork [36] claimed that adversarial examples are unlikely tofool machine learning (ML) models in the real-world, due
to the rotation and scaling introduced by even the slightestcamera movements. However, [3] demonstrated a new attackstrategy that is robust to rotation and scaling. While thisback-and-forth has advanced the state of the art, recentlythe community has started to recognize that rigorous, theory-backed, defensive approaches are required to put us off thisarms race.
Accordingly, a new set of certified defenses have emergedover the past year, that provide rigorous guarantees ofrobustness against norm-bounded attacks [12], [52], [65].These works alter the learning methods to both optimizefor robustness against attack at training time and permitprovable robustness checks at inference time. At present,these methods tend to be tied to internal network details,such as the type of activation functions and the networkarchitecture. They struggle to generalize across differenttypes of DNNs and have only been evaluated on smallnetworks and datasets.
We propose a new and orthogonal approach to certifiedrobustness against adversarial examples that is broadly ap-plicable, generic, and scalable. We observe for the firsttime a connection between differential privacy (DP), acryptography-inspired formalism, and a definition of robust-ness against norm-bounded adversarial examples in ML.We leverage this connection to develop PixelDP, the firstcertified defense we are aware of that both scales to largenetworks and datasets (such as Google’s Inception net-work trained on ImageNet) and can be adapted broadlyto arbitrary DNN architectures. Our approach can evenbe incorporated with no structural changes in the targetnetwork (e.g., through a separate auto-encoder as describedin Section III-B). We provide a brief overview of ourapproach below along with the section references that detailthe corresponding parts.
§II establishes the DP-robustness connection formally (ourfirst contribution). To give the intuition, DP is a frameworkfor randomizing computations running on databases suchthat a small change in the database (removing or alteringone row or a small set of rows) is guaranteed to result ina bounded change in the distribution over the algorithm’soutputs. Separately, robustness against adversarial examplescan be defined as ensuring that small changes in the input ofan ML predictor (such as changing a few pixels in an imagein the case of an l0-norm attack) will not result in drasticchanges to its predictions (such as changing its label froma stop to a yield sign). Thus, if we think of a DNN’s inputs
Certified Robustness to Adversarial Examples with Differential Privacy
Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman JanaColumbia University
Abstract—Adversarial examples that fool machine learningmodels, particularly deep neural networks, have been a topicof intense research interest, with attacks and defenses beingdeveloped in a tight back-and-forth. Most past defenses arebest effort and have been shown to be vulnerable to sophis-ticated attacks. Recently a set of certified defenses have beenintroduced, which provide guarantees of robustness to norm-bounded attacks. However these defenses either do not scaleto large datasets or are limited in the types of models theycan support. This paper presents the first certified defensethat both scales to large networks and datasets (such asGoogle’s Inception network for ImageNet) and applies broadlyto arbitrary model types. Our defense, called PixelDP, is basedon a novel connection between robustness against adversarialexamples and differential privacy, a cryptographically-inspiredprivacy formalism, that provides a rigorous, generic, andflexible foundation for defense.
I. IntroductionAf (d) = f(d) +N (0,�2)P (Af (d) = n) e
✏P (Af (d0) = n) + �
P (Af (d) 2 S) e✏P (Af (d0) 2 S) + �
Deep neural networks (DNNs) perform exceptionally wellon many machine learning tasks, including safety- andsecurity-sensitive applications such as self-driving cars [5],malware classification [48], face recognition [47], and criti-cal infrastructure [71]. Robustness against malicious behav-ior is important in many of these applications, yet in recentyears it has become clear that DNNs are vulnerable to abroad range of attacks. Among these attacks – broadly sur-veyed in [46] – are adversarial examples: the adversary findssmall perturbations to correctly classified inputs that cause aDNN to produce an erroneous prediction, possibly of the ad-versary’s choosing [56]. Adversarial examples pose seriousthreats to security-critical applications. A classic example isan adversary attaching a small, human-imperceptible stickeronto a stop sign that causes a self-driving car to recognizeit as a yield sign. Adversarial examples have also beendemonstrated in domains such as reinforcement learning [32]and generative models [31].
Since the initial demonstration of adversarial exam-ples [56], numerous attacks and defenses have been pro-posed, each building on one another. Initially, most de-fenses used best-effort approaches and were broken soonafter introduction. Model distillation, proposed as a robustdefense in [45], was subsequently broken in [7]. Otherwork [36] claimed that adversarial examples are unlikely tofool machine learning (ML) models in the real-world, due
to the rotation and scaling introduced by even the slightestcamera movements. However, [3] demonstrated a new attackstrategy that is robust to rotation and scaling. While thisback-and-forth has advanced the state of the art, recentlythe community has started to recognize that rigorous, theory-backed, defensive approaches are required to put us off thisarms race.
Accordingly, a new set of certified defenses have emergedover the past year, that provide rigorous guarantees ofrobustness against norm-bounded attacks [12], [52], [65].These works alter the learning methods to both optimizefor robustness against attack at training time and permitprovable robustness checks at inference time. At present,these methods tend to be tied to internal network details,such as the type of activation functions and the networkarchitecture. They struggle to generalize across differenttypes of DNNs and have only been evaluated on smallnetworks and datasets.
We propose a new and orthogonal approach to certifiedrobustness against adversarial examples that is broadly ap-plicable, generic, and scalable. We observe for the firsttime a connection between differential privacy (DP), acryptography-inspired formalism, and a definition of robust-ness against norm-bounded adversarial examples in ML.We leverage this connection to develop PixelDP, the firstcertified defense we are aware of that both scales to largenetworks and datasets (such as Google’s Inception net-work trained on ImageNet) and can be adapted broadlyto arbitrary DNN architectures. Our approach can evenbe incorporated with no structural changes in the targetnetwork (e.g., through a separate auto-encoder as describedin Section III-B). We provide a brief overview of ourapproach below along with the section references that detailthe corresponding parts.
§II establishes the DP-robustness connection formally (ourfirst contribution). To give the intuition, DP is a frameworkfor randomizing computations running on databases suchthat a small change in the database (removing or alteringone row or a small set of rows) is guaranteed to result ina bounded change in the distribution over the algorithm’soutputs. Separately, robustness against adversarial examplescan be defined as ensuring that small changes in the input ofan ML predictor (such as changing a few pixels in an imagein the case of an l0-norm attack) will not result in drasticchanges to its predictions (such as changing its label froma stop to a yield sign). Thus, if we think of a DNN’s inputs
Differential Privacy
!14
• We prove the Expected Output Stability Bound. For any DP mechanism with bounded outputs in [0, 1] we have:
Make prediction DP
!15
Key idea
• Problem: small input perturbations create large score changes.• Idea: design a DNN with bounded maximum score changes
(leveraging Differential Privacy theory).
no ticket
… … …
0.10.20.10.6
inputx
layer1
layer2
layer3
softmax
argmax
ticket 3ticket 2
no ticket
ticket 1
1.0
0.5
ticket 3ticket 2
no ticket
ticket 1
1.0
0.5
stalker 2
argmax
0.10.20.10.6
Make prediction DP
!16
Key idea
• Problem: small input perturbations create large score changes.• Idea: design a DNN with bounded maximum score changes
(leveraging Differential Privacy theory).
… … …
0.10.20.10.6
inputx
layer1
layer2
layer3
softmax
stability bounds
ticket 3ticket 2
no ticket
ticket 1
1.0
0.5
stalker 2
argmax
0.10.20.10.6
Make prediction DP
!17
Key idea
• Problem: small input perturbations create large score changes.• Idea: design a DNN with bounded maximum score changes
(leveraging Differential Privacy theory).
… … …
0.10.20.10.6
inputx
layer1
layer2
layer3
softmax
stability bounds
!18
PixelDP architecture
1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.
!19
PixelDP architecture
… …
inputx
layer1
noise layer
+
1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.
layer2
layer3
softmax
…
0.20.10.10.6
… …input
xlayer
1
noise layer
+
PixelDP architecture
!20
Notes
Mathi
asLe
cuyer
June
5,201
8
5 (
1Dr
awsf
orpr
edict
ion
Given
anim
ageto
classify,w
ewan
tto
detect
thehigh
estprob
ability
labe
l.
1.1Fi
xed
boun
ds
Wefirst
askho
wman
ydraw
sweneed
todistingu
ish
thehigh
estprob
a-
bilitywithprob
ability
atleast1��
.Westartfrom
Hoe↵d
ing’sinequa
l-
ityap
pliedto
aBerno
uilli
variab
lethat...
WecallX̄
theem
piricalmean,
X̄=
1 n
P n i=1X
:
P(|X̄�p
|�✏)2
e�2✏2 n
Weno
wno
te✏t
hedi↵e
rencebe
tweenthehigh
estan
dsecond
high
estlabe
l
prob
ability,an
drewrite
thevariab
leso
that
thebo
unds
dono
toverlap
✏ ✏ 2.
Finally,w
eap
plyaun
ionbo
undover
thekp
ossiblelabe
ls,a
ndend
upwith:
P(|X̄�p
|�✏)2
ke�✏2 n 2
�⌧1
)n�
2 ✏2ln(2k �)
Forinstan
cein
datasets
withk=
10,distingu
ishing
thetoplabe
lwith
prob
ability
atleast0.9
9whenitisbigger
than
thesecond
oneby
0.1requ
ires
n⇡1500
draw
s.
1
1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.
(ε, δ)-DP
layer2
layer3
softmax
…
0.20.10.10.6
… …input
xlayer
1
noise layer
+
PixelDP architecture
Resilience to post-processing: any computation on the output of an (ε, δ)-DP mechanism is still (ε, δ)-DP.
Notes
Mathi
asLec
uyer
June5
,2018
5 (
1Dr
awsf
orpr
edict
ion
Given
anim
ageto
classify,
wewan
tto
detect
thehigh
estprob
ability
labe
l.
1.1Fix
edbo
unds
Wefirst
askho
wman
ydraw
sweneed
todistingu
ishthehigh
estprob
a-
bilitywith
prob
ability
atleast1��
.Westartfrom
Hoe↵d
ing’sinequa
l-
ityap
pliedto
aBerno
uilli
varia
blethat...
WecallX̄
theem
piric
almean,
X̄=
1 nP n i=1X
:
P(|X̄�p
|�✏)2
e�2✏2 n
Weno
wno
te✏t
hedi↵e
rencebe
tweenthehigh
estan
dsecond
high
estlabe
l
prob
ability,an
drewrit
ethevaria
bleso
that
thebo
unds
dono
toverlap
✏ ✏ 2.
Fina
lly,w
eap
plyaun
ionbo
undover
thekp
ossib
lelabe
ls,an
dend
upwith
:
P(|X̄�p
|�✏)2
ke�✏2 n 2�⌧1
)n�2 ✏2
ln(2k �)
Forinstan
cein
datasets
with
k=10,d
istingu
ishingthetoplabe
lwith
prob
ability
atleast0
.99whenitisbigger
than
thesecond
oneby
0.1requ
ires
n⇡1500
draw
s.
1
!21
1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.
layer2
layer3
softmax
…
0.20.10.10.6
… …input
xlayer
1
0.10.20.10.6
PixelDP architecture
layer2
layer3
softmax
noise layer
+ …
0.20.10.10.6
?
^
Compute empirical mean with standard Monte Carlo estimate.
!22
1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.
… …input
xlayer
1
0.10.20.10.6
PixelDP architecture
…
0.20.10.10.6
layer2
layer3
softmax
noise layer
+
!23
1.0
0.5
stability bounds^
η-confidence intervals
^
stalker 3
stalker 2
harmless
stalker 1
1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.
… …input
xlayer
1
0.10.20.10.6
PixelDP architecture
…
0.20.10.10.6
layer2
layer3
softmax
noise layer
+
!24
1.0
0.5
stability bounds^
η-confidence intervals
^
stalker 3
stalker 2
harmless
stalker 1
1. Add a new noise layer to make DNN DP.2. Estimate the DP DNN's mean scores.3. Add estimation error in the stability bounds.
• Train DP DNN with noise.
• Control pre-noise sensitivity during training.
• Support various attack norms ( ).
• Scale to large DNNs and datasets.
Further challenges
Notes
Mathias Lecuyer
June 6, 2018
5
(
x+ ↵
8d, d0
S ✓ B
i = argmax py(x)
A(d) = f(d) +N (0,�2)
(✏, �)-DP
� =
r2 ln(
1.25
�)�p,2L
✏
L0, L1, L2, L1
P (w(d) 2 S) e✏P (w(d0) 2 S) + �
py(x) e✏py(x0) + �
P (f(x) = i) e✏P (f(x0) = i) + �
1
!25
Scaling to Inception on ImageNet
!26
• Large dataset: image resolution is 300x300x3.• Large model:
• 48 layers deep.• 23 millions parameters.• Released pre-trained by Google on ImageNet.
Inception-v3
…… …
inputx
…noise layer
+
inputx
Notes
Mathias Lecuyer
June 5, 2018
5
1 Draws for prediction
Given an image to classify, we want to detect the highest probability label.
1.1 Fixed bounds
We first ask how many draws we need to distinguish the highest proba-
bility with probability at least 1 � �. We start from Hoe↵ding’s inequal-
ity applied to a Bernouilli variable that... We call X̄ the empirical mean,
X̄ =1n
Pni=1X:
P (|X̄ � p| � ✏) 2e�2✏2n
We now note ✏ the di↵erence between the highest and second highest label
probability, and rewrite the variable so that the bounds do not overlap
✏ ✏2 . Finally, we apply a union bound over the k possible labels, and end
up with:
P (|X̄ � p| � ✏) 2ke�✏2n2 � ⌧ 1) n � 2
✏2ln(
2k
�)
For instance in datasets with k = 10, distinguishing the top label with
probability at least 0.99 when it is bigger than the second one by 0.1 requires
n ⇡ 1500 draws.
1
PixelDP auto-encoder
!27
Scaling to Inception on ImageNet
!28
Inception-v3
Scaling to Inception on ImageNet
…… … …+Notes MathiasLecuyer June5,2018 5 (
1Drawsforpredic
tion
Givenanima
getoclassify,
wewanttode
tectthehighe
stprobability
label.
1.1Fixedbounds
Wefirstask
howmanydr
awsweneedt
odistinguish
thehighestp
roba-
bilitywithpro
babilityatle
ast1��.Westart
fromHoe↵din
g’sinequal-
ityappliedto
aBernouilliv
ariablethat...
WecallX̄theem
piricalmean,
X̄=1 nP n i=1X:
P(|X̄�p|�✏)2e�2✏2 n
Wenownote✏the
di↵erencebetw
eenthehighest
andsecondh
ighestlabel
probability,a
ndrewriteth
evariableso
thattheboun
dsdonotov
erlap
✏ ✏ 2.Finally,we
applyaunionb
oundoverthe
kpossiblelabels,
andend
upwith: P(|X̄�p|�
✏)2ke�✏2 n 2 �⌧1)n�2 ✏2ln(2k �)
Forinstancei
ndatasetswi
thk=10,disting
uishingthetop
labelwith
probabilityat
least0.99wheniti
sbiggerthan
thesecondon
eby0.1requires
n⇡1500draws.
1
Post-processing
PixelDP auto-encoder
PixelDP Outline
Motivation
Design
Evaluation
29
Evaluation:
1. Guaranteed accuracy on large DNNs/datasets
2. Are robust predictions harder to attack in practice?
3. Comparison with other defenses against state-of-the-art attacks.
�30
Methodology
Dataset Image size Number of Classes
ImageNet 299x299x3 1000CIFAR-100 32x32x3 100CIFAR-10 32x32x3 10
SVHN 32x32x3 10MNIST 28x28x1 10
Dataset Number ofLayers
Number of Parameters
Inception-v3 48 23MWide ResNet 28 36M
CNN 3 3M
Five datasets: Three models:
Attack methodology:Metrics:• Guaranteed accuracy.• Accuracy under attack.
• State of the art attack [Carlini and Wagner S&P'17].
• Strengthened against our defense by averaging gradients over multiple noise draws.
�31
Guaranteed accuracy on ImageNet with Inception-v3
Meaningful guaranteed accuracy for ImageNet!
Model Accuracy (%)
Guaranteed accuracy (%) 0.05 0.1 0.2
Baseline 78 - - -PixelDP: L=0.25 68 63 0 0PixelDP: L=0.75 58 53 49 40
�32
More DP noise
What if we only act on robust predictions?(e.g. if not robust, check ticket)
Accu
racy
(top
1)
00.10.20.30.40.50.60.70.80.9
1
Attack size (2-norm)0 0.2 0.4 0.6 0.8 1 1.2 1.4
BaselinePrecision: threshold 0.05Recall: threshold 0.05
�33
Dataset: CIFAR-10
Accuracy on robust predictions
Accu
racy
(top
1)
00.10.20.30.40.50.60.70.80.9
1
Attack size (2-norm)0 0.2 0.4 0.6 0.8 1 1.2 1.4
BaselinePrecision:Recall:Madry+ '17
�34
Dataset: CIFAR-10
Comparison:Madry+ '17
Accuracy on robust predictions
If we increase the robustness threshold:better accuracy, less predictions.
threshold 0.1threshold 0.1
Comparison with other provable defenses
PixelDP scales to larger models, yielding better accuracy and robustness.
Accu
racy
(top
1)
00.10.20.30.40.50.60.70.80.9
1
Attack size (2-norm)0 0.2 0.4 0.6 0.8 1 1.2 1.4
ResNet - PixelDP (L = 0.1)CNN - Wong-Kolter '18
�35
Dataset: SVHN
Comparison:Wong-Kolter '18
PixelDP summary
• PixelDP is the first defense that:• Gives attack-independent guarantees against norm-
bounded adversarial attacks.• And scales to the largest models and datasets.
• Already extensions by others!• Improve the bounds at a given noise level (Li+ '18;
Cohen+ '19).• Use other noise distributions (Pinot+ '19).• Adapt optimization (Rakin+ '18).
�36
!37
Appendix
!38
Comparison with best-effort techniques
PixelDP is empirically competitive with thestate-of-the-art best-effort defense.
�39
Dataset: CIFAR-10
Comparison:Best effort defense by Madry+ '17 Ac
cura
cy (t
op 1
)
00.10.20.30.40.50.60.70.80.9
1
Attack size (2-norm)0 0.2 0.4 0.6 0.8 1 1.2 1.4
BaselinePixelDP (L = 0.1)Madry+ '17
Related workBest effort Certified
+ Scale:• Run a best effort attack per
gradient step [Goodfellow+ '15, Madry+ '17].
• Preprocess inputs [Buckman+ '18, Guo+ '18].
• Train a second model based on the first one [Papernot+ '16].
+ Flexible:• Support most architectures.
- No robustness guarantees:• Often broken soon after release
[Athalye+ '18].
+ Provable guarantees:• Per prediction [Wong-Kolter+ '18,
Wong+ '18, Raghunathan+ '18, Wang+ '18].
• In expectation [Sinha+ '17].- Hard to scale:
• Requires orders of magnitude more computation [Wong-Kolter+ '18, Wong+ '18, Wang+ '18].
• Support only 1 hidden layer [Raghunathan+ '18].
- Often not flexible:• No ReLU, MaxPool, or accuracy
guarantees [Sinha+ '17].• Only ReLU, no BatchNorm [Wong-Kolter
'18].
�40
PixelDP is the first certified defense that both achieves provable guarantees of robustness, scales and is broadly applicable to arbitrary networks.
Results - CIFAR-10
attack:
Notes
Mathias Lecuyer
June 6, 2018
5
(
x+ ↵
8d, d0
S ✓ B
i = argmax py(x)
A(d) = f(d) +N (0,�2)
(✏, �)-DP
� =
r2 ln(
1.25
�)�p,2L
✏
L0, L1, L2, L1
P (w(d) 2 S) e✏P (w(d0) 2 S) + �
py(x) e✏py(x0) + �
P (f(x) = i) e✏P (f(x0) = i) + �
1
�41
Results - SVHN
attack:
Notes
Mathias Lecuyer
June 6, 2018
5
(
x+ ↵
8d, d0
S ✓ B
i = argmax py(x)
A(d) = f(d) +N (0,�2)
(✏, �)-DP
� =
r2 ln(
1.25
�)�p,2L
✏
L0, L1, L2, L1
P (w(d) 2 S) e✏P (w(d0) 2 S) + �
py(x) e✏py(x0) + �
P (f(x) = i) e✏P (f(x0) = i) + �
1
�42
Certification on ImageNet/Inception-v3C
ertifi
ed A
ccur
acy
00.10.20.30.40.50.60.70.80.9
1
Attack Size0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
BaselinePixelDP (L=0.1)PixelDP (L=0.3)PixelDP (L=1.0)
�43
Certification on CIFAR-10C
ertifi
ed A
ccur
acy
00.10.20.30.40.50.60.70.80.9
1
Attack Size0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
BaselinePixelDP (L=0.1)PixelDP (L=0.3)
�44
Comparison with Best Effort Techniques
�45
2||α|| = 0.52
Teddy bear
Giant panda
Teddy bear
Undefended: 2||α|| = 3.41Undefended:
Full references
• [Goodfellow+ '15] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. ICLR 2015.
• [Papernot+ '16] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. S&P 2016.
• [Buckman+ '18] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow. Thermometer encoding: One hot way to resist adversarial examples. ICLR 2018.
• [Guo+ '18] C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Countering adversarial images using input transformations. ICLR 2018.
• [Madry+ '17] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv 2017.
!46
Full references
• [Carlini-Wagner '17] ] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. S&P 2017.
• [Athalye+ '18] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML 2018.
• [Wong-Kolter '18] E. Wong and Z. Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. ICML 2018.
• [Raghunathan+ '18] A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. arXiv 2018.
• [Wang+ '18] S. Wang, K. Pei, W. Justin, J. Yang, and S. Jana. Efficient formal safety analysis of neural networks. NeurIPS 2018.
• [Li+ '18] B. Li, C. Chen, W. Wang, and L. Carin. Second-Order Adversarial Attack and Certifiable Robustness. arXiv 2018.
!47
Full references
• [Rakin+ '18] A.S. Rakin, Z. He, and D. Fan. Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness against Adversarial Attack. arXiv 2018.
• [Cohen+ '19] J. Cohen, E. Rosenfeld, and Z. Kolter. Certified Adversarial Robustness via Randomized Smoothing. arXiv 2019.
• [Pinot+ '19] R. Pinot, L. Meunier, A. Araujo, H. Kashima, F. Yger, C. Gouy-Pailler, and J. Atif. Theoretical evidence for adversarial robustness through randomization: the case of the Exponential family. arXiv 2019.
!48