+ All Categories
Home > Documents > FALL OF GIANTS: HOW POPULAR TEXT-BASED MLAAS FALL …

FALL OF GIANTS: HOW POPULAR TEXT-BASED MLAAS FALL …

Date post: 27-Mar-2022
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
20
FALL OF GIANTS: HOW POPULAR TEXT-BASED MLAAS FALL AGAINST A SIMPLE EVASION ATTACK Authors: Luca Pajola, Mauro Conti
Transcript

FA L L OF G I A NTS : H OW POPUL A R TEXT - B A S ED

ML A A S FA L L A G A I NS T A S I MPL E EV A S I ON A TTA C K

Authors: Luca Pajola, Mauro Conti

OUTL I NE

1. Motivations

2. Zero-Width Attack (ZeW)

3. Results

• Controlled Environment

• Into the "wild"

4. Discussions

Fall of Giants. Pajola and Conti 2 / 19

MOTI V A TI O NS

MOTI V A TI O NS

1. Machine Learning (ML) is here

• Wide set of ML-based applications are

already deployed

2. Several Commercial Usages

3. Gorgeous performance, but what

about the security ?

Fall of Giants. Pajola and Conti 4 / 19

MOTI V A TI O NS

• Where should we focus?

Fall of Giants. Pajola and Conti 5 / 19

data preprocessing ML Model

MOTI V A TI O NS

• Most attacks are designed to leverage ML models weaknesses

• But preprocessing algorithms plays a foundamental role in the pipeline

• They are the "foundaments" of our applications

• If an attacker affects these techniques ...

Fall of Giants. Pajola and Conti 6 / 19

preprocessing

MOTI V A TI O NS

Fall of Giants. Pajola and Conti 9 / 19

What you see What your model actually sees

• Example of image scaling attack [1]

• The attack affects image scaling techniques applied during the preprocessing

• What about NLP?

ZERO- W I D TH A TTA C K

ZEW – TH E I D EA

• Steganography leverages "unnoticeable" characters

• Among these we find non-printable characters

• If inserted inside text, we might affect pre-processing techniques in several ways

Fall of Giants. Pajola and Conti 9 / 19

ZEW – NL P C H A L L ENG ES

• NLP challenges compared to CV

1. Input domain

Different type of perturbation

i.e., in CV we add RGB masks, in NLP?

2. Human perception

Perturbation are easier to spot

3. Semantic

The perturbations should not alter the sentence meaning

e.g., I hate you -> I ate you

Fall of Giants. Pajola and Conti 10 / 19

ZEW – EFFEC T

• Word-based models

• Words with ZeW chars becomes unknown

• And maybe discarded

• E.g., "I lo$ve you"

• With unk: "I UNK you"

• Without unk: "I you"

• Character-based models (more resistant)

• ZeW characters becomes unknown

• With unk: "I loUNKve you"

• Without unk: "I love you"

Fall of Giants. Pajola and Conti 11 / 19

RES UL TS

RES UL TS – A L G ORI TH M

• Case Study: Hate Speech Evasion

• Algorithm

• Identification of negative words in a given sentence

• Add ZeW characters inside the words

• Two injection strategies

• Mask1: insertion on the middle of the word

• Hate -> ha$te

• Mask2: insertion in between each word

• Hate -> $h$a$t$e$

Fall of Giants. Pajola and Conti 13 / 19

RES UL TS – C ONTROL L E D ENV I RON M E NT

RNN model: GRU

Representation type: char and word

With and without UNK tokens

Dataset: Sentiment140 dataset [3]

Goal: evasion of negative sentences

Fall of Giants. Pajola and Conti 14 / 19

RES UL TS – I NTO TH E W I L D

• Tested 12 API

• Developed by Amazon, Google, Microsoft, and IBM

• Different type of services (e.g., translators, sentiment analyzers)

• Goal: manipulate outcomes of hate-speech analyses

Fall of Giants. Pajola and Conti 15 / 19

RES UL TS – I NTO TH E W I L D

Fall of Giants. Pajola and Conti 16 / 19

D I S C US S I ONS

D I S C US S I ONS

• A simple sanitification techniques might prevent ZeW

• First rule in cybersecurity: don't trust the input!

• UNICODE contains a lot of characters

• Preprocessing techniques are perfect attack vectors

• ML applciations do not only contain ML models!

• The attack works in real-life applications

• We should be more carefull on what we deploy

Fall of Giants. Pajola and Conti 18 / 19

TH A NK Y OU

REFERE NC ES

[1] Xiao, Qixue, et al. "Seeing is not believing: Camouflage attacks on image scaling algorithms." USENIX Security (2019).

[2] Hutto, Clayton, and Eric Gilbert. "Vader: A parsimonious rule-based model for sentiment analysis of social media text." Proceedings of the

International AAAI Conference on Web and Social Media. Vol. 8. No. 1. 2014.

[3] A. Go, R. Bhayani, and L. Huang. (2009) Twitter sentiment classification using distant supervision.

Fall of Giants. Pajola and Conti 20 / 19


Recommended