Adversarial Learning_Rupam Bhattacharya

Adversarial Learning Rupam Bhattacharya

A Seminar on Adversarial and Secure Machine Learning

Summer Semester 2015

What is Adversarial Learning?

• What is spam filtering? • What is intrusion detection?• What is terrorism detection?• What would these have to do with

classification in particular?

Defining Adversarial Learning

• Adversarial machine learning is a research field that lies at the intersection of machine learning and computer security. It aims to enable the safe adoption of machine learning techniques in adversarial settings like spam filtering, malware detection and biometric recognition.

Motivation for the presentation

• Previous works- unrealistic assumption- attacker has perfect knowledge of the classifier.

• Introduction of the adversarial classifier reverse engineering (ACRE) learning problem

• Presentation of efficient algorithms for reverse engineering linear classifiers and the role of active experimentation in adversarial attacks.

Exploring the Problems - I

• As classifiers become more widely deployed, adversaries are actively modifying their behavior to avoid detection.

• For example: senders of junk email

Exploring the Problems – II

• Dalvi et al. – Anticipation of attacks by computation of the adversary’s optimal strategy. Adversary has perfect knowledge of classifier.

• Unfortunately, rarely true in practice. Adversaries must learn about the classifier using prior knowledge, observation and experimentation.

The Underlying Solution

• Exploring the role of active experimentation in adversarial attacks.

• Identification of high-quality instances by adversaries that are not labeled malicious with a reasonable (polynomial) number of queries.

• Treating the problem as adversarial classifier reverse engineering (ACRE) learning problem

Introducing the Learning Problems- I

• Active Learning Problem: Semi-supervised machine learning in which a learning

algorithm is able to interactively query the user to obtain the desired outputs at new data points.

For example: YOU

• Setup: Given existing knowledge, want to choose where to collect more data– Access to cheap unlabelled points– Make a query to obtain expensive label– Want to find labels that are “informative”

• Output: Classifier / predictor trained on less labeled data

Introducing the Learning Problems- II

• PAC Model: Task of successful learning of an unknown target concept should

entail obtaining, with high probability, a hypothesis, that is a good approximation of it.

Algorithm:• Alg is given sample S = {(x,y)} presumed to be drawn from some

distribution D over instance space X, labeled by some target function f. • Alg does optimization over S to produce some hypothesis h.• Goal is for h to be close to f over D.• Allow failure with small prob d (to allow for chance that S is not

representative).

Introducing the Learning Problems- III

• ACRE Problem: Task of learning sufficient information about a classifier to construct adversarial attacks.

• We would discuss the algorithms in the following slides.

Defining the Problem

X1

X2 x

X1

X2 x

+

-

X1

X2

Instance space ClassifierAdversarial cost function

c(x): X {+,}c C, concept class(e.g., linear classifier)

X = {X1, X2, …, Xn}Each Xi is a feature(e.g., a word)Instances, x X(e.g., emails)

a(x): X Ra A(e.g., more legible

spam is better)

Adversarial Classification Reverse Engineering (ACRE)

• Task: Minimize a(x) subject to c(x) = • Given:

X1

X2

? ??

??

?

??

-+

– Full knowledge of a(x)– One positive and one negative instance, x+ and x

– A polynomial number of membership queries

Within a factor of k

Adversarial Classification Reverse Engineering (ACRE)

+

-

Adversary’s Task:Minimize a(x) subject to c(x) =

Problem:The adversary doesn’t know c(x)!

How is ACRE different?

• The ACRE learning problem differs significantly from both the probably approximately correct (PAC) model of learning and active learning in that the goal is not to learn the entire decision surface, there is no assumed distribution governing the instances and success is measured relative to a cost model for the adversary.

What we assume

The adversary: • Can issue membership queries to the classifier

for arbitrary instances • Has access to an adversarial cost function a(x)

that maps instances to non-negative real numbers.

• Provided with one positive instance, x+, and one negative instance, x−.

Linear Classifiers withContinuous features

ACRE learnable within a factor of (1+) under linear cost functions

Proof sketch Only need to change the highest weight/cost feature We can efficiently find this feature using line searches

in each dimension

X1

X2

xa

Linear Classifiers withBoolean features

• Harder problem: can’t do line searches• ACRE learnable within a factor of 2

if adversary has unit cost per change:

xa x-

wi wj wk wl wm

c(x)

Continuous Features – Theorem I

• Let c be a continuous linear classifier with vector of weights w, such that the magnitude of the ratio between two non-zero weights is never less than δ(lower bound). Given positive and negative instances x+ and x−, we can find each weight within a factor of 1+ using a polynomial number of queries.

Continuous Features – Algorithm I

Continuous Features – Theorem II

• Linear classifiers with continuous features are ACRE (1 + )-learnable under linear cost functions.

Continuous Features – Algorithm II

Boolean Features – Theorems

• In a linear classifier with Boolean features, determining if a sign witness exists for a given feature is NP-complete.

• Boolean linear classifiers are ACRE 2-learnable under uniform linear cost functions.

Boolean Features – Algorithm III

Adaptation of ACRE Algorithm - I

Classifier Configuration:Two Linear Classifiers:• A naïve Bayes model • A maximum entropy (maxent) model.Adversary Configuration:• Adversaries were english words from

dictionary which were classified into feature lists as Dict, Freq, and Rand.

Adaptation of ACRE Algorithm - II

Iteratively reduce the cost in two ways:1. Remove any unnecessary change: O(n)

2. Replace any two changes with one: O(n3)

xa ywi wj wk wl

c(x)wm

x-

xa y’wi wj wk wl

c(x)

wp

Experimental Results

Conclusion

ACRE Learning:• Determines whether an adversary can

efficiently learn enough about a classifier to minimize the cost of defeating it.

• Algorithm performed quite well in spam filtering, easily exceeding the worst-case bounds.

Future Work - I• There is possibility to add different types of classifiers, cost

functions, and even learning scenarios and understanding which scenarios can be hard.

• Under what conditions is ACRE learning robust to noisy classifiers?

• What can be learned from passive observation alone, for domains where issuing any test queries would be prohibitively expensive?

• If the adversary does not know which features make up the instance space, when can they be inferred?

Future Work - II• Can a similar framework be applied to relational problems, e.g. to

reverse engineering collective classification?

• Moving beyond classification, under what circumstances can adversaries reverse engineer regression functions, such as car insurance rates?

• How do such techniques fare against a changing classifier, such as a frequently retrained spam filter?

• Will the knowledge to defeat a classifier today be of any use tomorrow?

Date post:	13-Apr-2017
Category:	Documents
Upload:	rupam-bhattacharya
View:	188 times
Download:	0 times

Adversarial Learning_Rupam Bhattacharya

Documents