+ All Categories
Home > Technology > Battling Unknown Malware with Machine Learning

Battling Unknown Malware with Machine Learning

Date post: 12-Jan-2017
Category:
Upload: crowdstrike
View: 186 times
Download: 0 times
Share this document with a friend
43
BATTLING UNKNOWN MALWARE WITH MACHINE LEARNING DR. SVEN KRASSER CHIEF SCIENTIST @SVENKRASSER
Transcript
Page 1: Battling Unknown Malware with Machine Learning

BATTLING UNKNOWN MALWARE WITH MACHINE LEARNING

DR. SVEN KRASSER CHIEF SCIENTIST@SVENKRASSER

Page 2: Battling Unknown Malware with Machine Learning

FALCON ON VIRUSTOTAL

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 3: Battling Unknown Malware with Machine Learning

SUBMITTING TO VIRUSTOTAL

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 4: Battling Unknown Malware with Machine Learning

SCAN RESULTS

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 5: Battling Unknown Malware with Machine Learning

SCAN RESULTS

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 6: Battling Unknown Malware with Machine Learning

SCAN RESULTS

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 7: Battling Unknown Malware with Machine Learning

MACHINE LEARNING PRIMER

More on this: watch http://tinyurl.com/MLcrowdcast

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 8: Battling Unknown Malware with Machine Learning

Some Data to Get Started:1988 ANTHROPOMETRIC

SURVEY OF ARMY PERSONNEL

Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 9: Battling Unknown Malware with Machine Learning

• Over 4000 soldiers surveyed• Over 100 measurements• Reported by gender

Data

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 10: Battling Unknown Malware with Machine Learning

FIRST LOOK

Height [mm]

Den

sity

• Difference in distribution

• Significant overlap

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 11: Battling Unknown Malware with Machine Learning

SECOND DIMENSION

Height [mm]

Wei

ght [

10-1

kg]

• Correlation

• Overlap

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 12: Battling Unknown Malware with Machine Learning

FEATURE SELECTION

“Buttock Circumference” [mm]

Wei

ght [

10-1

kg]

• Correlation

• Reduced overlap

• Selection of features matters

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 13: Battling Unknown Malware with Machine Learning

LET’S CLASSIFY

“Buttock Circumference” [mm]

Wei

ght [

10-1

kg]

• Let’s assume we want to detect males (blue)

• I.e. “blue” is our positive class

• TP: classify blue as blue

• Note some misclassifications

• FP: classify red as blue

• FN: classify blue as red

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 14: Battling Unknown Malware with Machine Learning

“Buttock Circumference” [mm]

Wei

ght [

10-1

kg]

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

LET’S CLASSIFY

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 15: Battling Unknown Malware with Machine Learning

“Buttock Circumference” [mm]

Wei

ght [

10-1

kg]

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

LET’S CLASSIFY

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 16: Battling Unknown Malware with Machine Learning

“Buttock Circumference” [mm]

Wei

ght [

10-1

kg]

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

LET’S CLASSIFY

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 17: Battling Unknown Malware with Machine Learning

“Buttock Circumference” [mm]

Wei

ght [

10-1

kg]

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

LET’S CLASSIFY

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 18: Battling Unknown Malware with Machine Learning

“Buttock Circumference” [mm]

Wei

ght [

10-1

kg]

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

LET’S CLASSIFY

• Get more “blue”right (true positives)

• Get more “red”wrong (false positives)

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 19: Battling Unknown Malware with Machine Learning

RECEIVER OPERATING CHARACTERISTICS CURVE

False Positive Rate

True

Pos

itive

Rat

e

Detectmorebyacceptingmorefalsepositives

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 20: Battling Unknown Malware with Machine Learning

MORE DIMENSIONS

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 21: Battling Unknown Malware with Machine Learning

MISSION ACCOMPLISHED:WE JUST ADD MORE DIMENSIONS…

RIGHT?

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 22: Battling Unknown Malware with Machine Learning

CURSE OF DIMENSIONALITY

REDUCEDpredictive performance

INCREASEDtraining time

SLOWERclassification

LARGERmemory footprint

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 23: Battling Unknown Malware with Machine Learning

Source: https://commons.wikimedia.org/w/index.php?curid=2257082 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 24: Battling Unknown Malware with Machine Learning

Source: https://commons.wikimedia.org/w/index.php?curid=2257082

Page 25: Battling Unknown Malware with Machine Learning
Page 26: Battling Unknown Malware with Machine Learning

Height (mm)

Wei

ght [

10-1

kg]

DIMENSIONALITYAND SPARSENESS

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 27: Battling Unknown Malware with Machine Learning

2016CrowdStrike,Inc.Allrightsreserved.Height (mm)

Wei

ght [

10-1

kg]

DIMENSIONALITYAND SPARSENESS

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 28: Battling Unknown Malware with Machine Learning

LET’S APPLY THIS TO SECURITY

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 29: Battling Unknown Malware with Machine Learning

FILE ANALYSISAKA Static Analysis

• THE GOOD

– Relatively fast

– Scalable

– No need to detonate

– Platform independent, can be done at gateway

• THE BAD

– Limited insight due to narrow view

– Different file types require different techniques

– Different subtypes need special consideration

– Packed files

– .Net

– Installers

– EXEs vs DLLs

– Obfuscations (yet good if detectable)

– Ineffective against exploitation and malware-less attacks

– Asymmetry: a fraction of a second to decide for the defender, months to craft for the attacker

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 30: Battling Unknown Malware with Machine Learning

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

FILE CONTENT

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 31: Battling Unknown Malware with Machine Learning

EXAMPLE FEATURES

32/64 BIT EXECUTABLE

GUI SUBSYSTEM

COMMAND LINE

SUBSYSTEMFILE SIZE TIMESTAMP

DEBUG INFORMATION

PRESENTPACKER TYPE FILE ENTROPY NUMBER OF

SECTIONSNUMBER

WRITABLE

NUMBER READABLE

NUMBER EXECUTABLE

DISTRIBUTION OF SECTION

ENTROPYIMPORTED DLL

NAMESIMPORTED FUNCTION

NAMES

COMPILER ARTIFACTS

LINKER ARTIFACTS

RESOURCE DATA

EMBEDDED PROTOCOL

STRINGSEMBEDDED

IPS/DOMAINS

EMBEDDED PATHS

EMBEDDED PRODUCT

META DATADIGITAL

SIGNATUREICON

CONTENT …

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 32: Battling Unknown Malware with Machine Learning

String-based feature

Exec

utab

le se

ctio

n si

ze-b

ased

feat

ure

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

COMBINING FEATURES

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 33: Battling Unknown Malware with Machine Learning

Subspace Projection A

Subs

pace

Pro

ject

ion

B

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

COMBINING FEATURES

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 34: Battling Unknown Malware with Machine Learning

False Positive Rate

True

Pos

itive

Rat

e

Detectmorebyacceptingmorefalsepositives

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

ARMY DATA ROC CURVE

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 35: Battling Unknown Malware with Machine Learning

False Positive Rate

True

Pos

itive

Rat

e

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

ML MALWARE DETECTION ROC CURVE

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 36: Battling Unknown Malware with Machine Learning

APTS & 99% OF MALWARE DETECTED…

36

Ch

ance

of

at le

ast

one

succ

ess

for

ad

vers

ary

Number of attempts

1%

>99%

500

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 37: Battling Unknown Malware with Machine Learning

MALWARE

40%

THREAT

SOPHISTICATION

MALWARE

STOPPING MALWARE

IS NOTENOUGH

HA

RD

ER

TO

PR

EV

EN

T &

DETE

CTLOW

HIGH

HIGH

LOW

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 38: Battling Unknown Malware with Machine Learning

THREAT

SOPHISTICATION

MALWARE

NON-MALWARE

ATTACKS

MALWARE

40%

NATION-STATES

60%NON-MALWARE ATTACKS

ORGANIZED CRIMINAL GANGS

HACKTIVISTS/VIGILANTES

TERRORISTS CYBER-CRIMINALS

YOU NEED COMPLETE

BREACHPREVENTION

HA

RD

ER

TO

PR

EV

EN

T &

DETE

CTLOW

HIGH

HIGH

LOW

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 39: Battling Unknown Malware with Machine Learning

Next-Generation Endpoint Protection Cloud Delivered. Enriched by Threat Intelligence

MANAGEDHUNTING

ENDPOINT DETECTION AND RESPONSE

NEXT-GEN ANTIVIRUS

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 40: Battling Unknown Malware with Machine Learning

ML SETTINGS WITHIN FALCON HOST

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 41: Battling Unknown Malware with Machine Learning

ML PREVENTION IN ACTION

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 42: Battling Unknown Malware with Machine Learning

KEY POINTS

• Machine Learning is an effective tool against unknown malware

• Try it out on VirusTotal

• Trading off true positives and false positives

• Detecting 99% malware means an APT has a 100% chance of getting malware into your environment

• The majority of intrusions are not malware-based

• Avoid silent failure

• Use a comprehensive array of techniques

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Page 43: Battling Unknown Malware with Machine Learning

www.crowdstrike.com

2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.


Recommended