PhD Thesis Diogo Mónica

transcript

Defeating malicious attacks. A contribution to safe network operations.

Candidate - Diogo Mónica Advisor - Carlos Ribeiro

User empowerment.

Outline

‣ History-based password validation

‣ Detection of Evil Twin attacks

‣ An emerging C&C architecture for Botnets

‣ IDS for browser hijacking

‣ Defense against Sybil attacks in wireless ad-hoc networks

History-based password validation

The Problem

Password validation heuristics are not working. Leaks show that:

‣ Users are bad at choosing passwords

‣ There are common patterns used to circumvent these heuristics

password1111111aaaaaqwerty

p@ssw0rdpassword1passw0rdp4ssw0rd

2222222333333344444445555555

asdfgzxcvbqw3rtywertyu...

bbbbbcccccsssssddddd...

Desired Solution

A popularity based classification scheme that:

‣ Resists common password variations (generalization capability)

‣ Allows for offline operation (no centralized authority)

‣ Total size must be small

‣ Testing candidate passwords should be easy and inexpensive

Proposed Solution

‣ Compute a non-invertible, compressed summary of all prior passwords

‣ Distribution of this data-set allows for user-sided, offline validation of passwords

password1111111aaaa123456

Password List

Generalization

Classification Database

Compression

Hashing

User-sideServer-side

download

Classification

Candidate Passwords

password1111111aaaa123456

Compression and GeneralizationImplementation

Hamming distance

Euclidean distance between ASCII codes

#$%"#$%"

$%&!'#("

$)&!'#("

Popularity label of node (x,y)

We apply:

Generalization Capability

Dissimilarity Measure

HashingImplementation

‣ A linear vector projection that we understand

‣ We can remove the phase information, thus avoiding invertibility

Password Spectrum (complex)

Spectrum (magnitude)Autocorrelation

discard phase

Compression rate and statistical performancePerformance

0 1 2 3 4 5 6 7 8 9 10x 104

1SOM vs CM Sketch

Popularity ranking order of chosen threshold

SOM (140,200)CMSkectch(3,28000)CMSkectch(3,56000)CMSkectch(3,84000)CMSkectch(3,112000)

~834Kb

~672KbPasswords Flagged as Dangerous

500 most probable 100%

All mutations 84%

Random passwords 11.3%

John the Ripper mutations testing

Detection of Evil Twin Attacks

The Problem

Internet AP UserEvilTwinAP

During an evil twin attack

Users are operating on insecure, untrusted networks:

‣ Administrators fail to properly configure networks

‣ Users don’t have the tools to self-assess their security

A usable technique to detect Evil Twin attacks, ensuring:

‣ User-sided operation

‣ Operation not detectable by the attacker

‣ Capable of operation in encrypted networks

‣ Non-disruptive operation

‣ Independent control of both the probability of detection and of false alarm

Desired Solution

Proposed Solution

watermark

EvilTwinAP

APInternet User

watermark

‣ Detect a multi-hop setting between the user’s computer and the internet

‣ Assumes the rogue AP relays traffic using the legitimate AP

Implementation

Internet

Echo Server User1 - User sends the watermark

2 - Echo-server replies to the watermark (n times)3 - User changes to the desired channel,

and tries to detect the watermark

Evil TwinAP

Probability of Detection

Probability of False Positives

Performance

Profile DL Rate (Mbps)

UL Rate (Mbps)

Low 2 1

Medium 8 5

High 11.3 12

Traffic Profiles

Campus Network

Legitimate AP

Evil Twin AP

WiredWireless

FON 2201Firmware: DD-WRT v24

Backtrack Linux v4

MonitorOS X Snow Leopard

Profile WiFiHop Attacks Detected

LowOpen 100%

Covert 100%

MediumOpen 100%

Covert 100%

HighOpen 98.44%

Covert 98.05%

WifiHop Results

An emerging C&C architecture for Botnets

The Problem

Botnets continue to evolve:

‣ New strategies are being employed to avoid takedown and detection

‣ Forecasting the evolutionary adaptive changes in botnet operations is of paramount importance to allow timely development of appropriate countermeasures

Desired Solution

Attackers are researching botnet C&C architectures that:

‣ Avoid infiltration and size estimation

‣ Reduce the likelihood of detection of individual bots

‣ Maintain Botmaster anonymity

Potential Direction

‣ No active participation from actual bots in the C&C

‣ All bots passively listen for commands

‣ Commands are signed and pushed to all the listening bots

Botmaster

0x72e4b76 0xc52b7d0 0x80762b8 0xde3254e

Vulnerable Web App

Web Users

Implementation

Bot Master (bm)

0x72e4b76 0xc52b7d0 0x80762b8 0xde3254e

Dissemination Website(W)

Dissemination Layer Host

Kbm, K-1w, {Kw}K-1

bm, {C}K-1bm

Kbm{Kw}K-1bm, {M}K-1

‣ Bots only answer to signed commands from “legitimate” intermediate nodes

0x72e4b76 0xc52b7d0 0x80762b8 0xde3254e

Encrypted fingerset by D1

Encrypted finger set by D2

0x72e4b76 0xc52b7d0 0x80762b8 0xde3254e

Dissemination Layer Hosts

0x72e4b76 0xc52b7d0

0x80762b8 0xde3254e

0x72e4b76 0xc52b7d0

0x80762b8 0xde3254e

0x72e4b76 0xc52b7d0

0x80762b8 0xde3254e

0x72e4b76 0xc52b7d0

0x80762b8 0xde3254e

0x72e4b76 0xc52b7d0

0x80762b8 0xde3254e

0x72e4b76 0xc52b7d0

0x80762b8 0xde3254e

0x72e4b76 0xc52b7d0

0x80762b8 0xde3254e

0x72e4b76 0xc52b7d0

0x80762b8 0xde3254e

0x72e4b76 0xc52b7d0

0x80762b8 0xde3254e

‣ While propagating commands, an encrypted overlay of infected bots is created

‣ The botmaster uses a probabilistic mix-network for information retrieval

Performance

30,000500 5000 10,000 15,000 20,000 25,000

Number of Hosts

f bots

● 10 minutes

▲ 15 minutes

■ 20 minutes

20 minutesw/ cooperation

An IDS for Browser Hijacking

The Problem

Malicious behavior that does not directly target the user’s browser:

‣ Unintended participation in botnet C&C

‣ Browser based DDoS (GitHub attacks)

‣ Javascript scanning (internal network)

‣ Bitcoin mining (malicious ad-networks)

Botmaster

0x72e4b76 0xc52b7d0 0x80762b8 0xde3254e

Vulnerable Web App

Web Users

Desired Solution

A browser-based IDS that:

‣ Does not limit the actions a browser can take

‣ Has low computational/memory requirements so it can be used on any device

‣ Possesses granular detection capability (per-tab)

Proposed Solution

‣ Real-time per-tab behavior monitoring

‣ Uses three indicators to detect whether a particular tab is showing malicious behavior

Implementation

Computational Effort Periodicity IP Address Sequences

‣ The fractional computational load is integrated throughout the full segment

‣ Determine if inter arrival times are random, with a mix of a Kolmogorov-Smirnoff test and ratio of mean/(standard deviation)

‣ Determine if the sequence of destination addresses follows a scanning strategy profile

1 0Bad Good

0Bad Good

Performance

‣ 50 multi-tab browser sessions were logged. ‣ From these sessions, 450 five seconds periods were extracted,

to be used as training set (D); ‣ 150 correspond to regular browser use; ‣ 150 to a simulated DOS attack; ‣ 150 to forced random scanning periods;

‣ 50 other periods were obtained, to be used as a test set.

‣ 450 periods were fed to the perceptron, for supervised training ‣ 100 iterations (epochs) were used in training ‣ Learning factor α = 0.1 ‣ Perceptron weights w were randomly initialized.

Defense against Sybil attacks in wireless ad-hoc networks

The Problem

The Sybil attack is a threat to the secure and dependable operation of wireless ad hoc networks: ‣ Malicious nodes may participate with multiple identities in a

system

‣ There are no completely decentralized algorithms to bootstrap a quorum of trusted nodes in this setting

Desired Solution

A usable technique to mitigate Sybil attacks, ensuring:

‣ No node pre-configuration

‣ Byzantine-node tolerance

‣ Scalable

Proposed Solution

Algorithm that provides each correct node with a quorum with the following properties:

‣ Q-Size. Each delivered quorum has size q.

‣ Probabilistic Sybil-free. With a probability arbitrarily close to 1, in any quorum the number of identities that have been proposed by the f malicious nodes is no larger than f.

‣ Probabilistic Partial Consistency. With a probability arbitrarily close to 1, the intersection of the quorums delivered to all correct nodes has, at least, q-f identities from correct nodes .

Total Nodes: 6 Malicious Nodes: 1 Total identities: 8

Implementation

‣ Three different phases

‣ Phase #1 creates a distributed nonce for initialization of Phase #2

‣ Phase #2 allows some Sybil identities through, but it’s fast

‣ Phase #3 completely removes Sybil identities, but is slow

Malicious Node

Correct Node

Nonce Generation

RRT - Radio Resource Test

CRT - Computational Resource Test

Phase 3

Phase 2

Phase 1

Performance

!" #" $" %" &" '" (" )" *""*"#

!"#$%&'()'*(+%,'-*.

$%&'()'/&0*,#1,,1(*, ++,

! " # $ % & ' ( ) * "!

!"#$%&%'($%)'*'+,&%-).'/0'1"21)23

4./-"-$5$(6

./0122304,51637,8,,9:";

./5<=>32,7?>@A7,8,!,;

5 10 15 20 25 300

Number of nodes (n)

Optimized SST

Summary

‣ We presented novel solutions to different threats that are presently affecting the safety of network users

‣ We implemented proof-of-concept prototypes of these solutions and analyzed their performance

‣ Hopefully we will have contributed to user empowerment and, hence, to safer network operations

Thank you

@diogomonica

Compression RatioSelf-Organizing Maps

‣ Compression ration is

‣ is the number of input passwords

‣ is the number of nodes in the SOM.

‣ Important to node that for any chosen compression ratio.

pmiss is the probability of wrongly classifying a password whose occurrence is higher than the threshold as safe

Why didn’t you do the DFT at the beginning?

‣ Our inability to come up with a similarity measure that has human meaning when working with hashes

‣ Having the hashes at the beginning would make the training computationally more expensive

‣ By doing them at the end, the hashing function can be improved without changing anything in the SOM training

‣ We verified via Monte Carlo simulations that there is practically no loss in terms of topological proximity when doing the hashes at the end, making this a non-issue

Why did you chose this similarity measure?

‣ We are able to easily calculate distances between models and input passwords

‣ We have the ability of doing fractional approximation

‣ Our similarity measure captures a human “closeness” criteria

Why did you chose DFTs

‣ They are very fast to compute

‣ They are informationally non-invertible (if we discard the phase component)

‣ They maintained the topological proximity of our SOM

‣ Something that we can reason about because we understand what it means

‣ The only hashing mechanism that we found that works

PhD Thesis Diogo Mónica

Technology