+ All Categories
Home > Documents > NEURLUX: Dynamic Malware Analysis using Neural Networks

NEURLUX: Dynamic Malware Analysis using Neural Networks

Date post: 15-Oct-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
31
NEURLUX: Dynamic Malware Analysis using Neural Networks CHANI JINDAL
Transcript
Page 1: NEURLUX: Dynamic Malware Analysis using Neural Networks

NEURLUX: Dynamic Malware Analysis using Neural Networks

CHANI JINDAL

Page 2: NEURLUX: Dynamic Malware Analysis using Neural Networks

STATIC vs DYNAMIC ANALYSIS ● Static Analysis● Dynamic Analysis● Why Sandbox? “malware cannot avoid leaving a behavioural footprint” (2018)

Static Analysis Dynamic Analysis

Pros● Comprehensive signatures can be created● Small error rate with byte-sequence matches

● All behavior● thorough understanding

Cons

● Difficult to thoroughly understand behavior especially if obfuscated or packed

● net requests?● Time to Prevention is ultimately long (lifecycle)● Cost (Human / Machine)

● Need environment/ resources● Different execution environment

Page 3: NEURLUX: Dynamic Malware Analysis using Neural Networks

HISTORY and RELATED WORK● Machine Learning (ML)

○ raw bytecode greyscale image of executable -> 2D array (2011)

○ necessary feature extraction and feature engineering○ shallow learning techniques, not scalable

● Neural Networks and Image Classification (2018)

○ Convolutional Network ○ improvement on basic ML models○ adversarial model

● A multi-level Deep Learning system for malware detection (2019)

○ deep learning architecture still focussed on static PEs

● Dynamic based NN ○ focus on machine activity read/write file counts (2018)

Page 4: NEURLUX: Dynamic Malware Analysis using Neural Networks

LIMITATIONS OF PREVIOUS WORK● Image classification scrutinized

○ adversarial attacks

● Focus only on a single dynamic feature, such as API sequence○ lack of feature coverage

● Feature extraction and feature engineering necessary?● Dynamic Deep Learning precedent?

Page 5: NEURLUX: Dynamic Malware Analysis using Neural Networks

DATASETS● We have 2 datasets :

● Generation of Behavioral Reports from these datasets

MALICIOUS BENIGN

PRIVATE DATASET - (real world)

13760 13760

EMBER DATASET 21000 21000

Page 6: NEURLUX: Dynamic Malware Analysis using Neural Networks

DATA COLLECTION - SANDBOXES● Orchestrate Cuckoo -- The Open Source Sandbox

○ 20+ headless vms all running windows with full net access

● potential answer to why this is an unexplored space

Page 7: NEURLUX: Dynamic Malware Analysis using Neural Networks

REPORT FORMAT

Page 8: NEURLUX: Dynamic Malware Analysis using Neural Networks

● Term Frequency - Inverse Document Frequency (TF-IDF)

● Feature Hashing “the hashing trick”

NATURAL LANGUAGE PROCESSING

● Bag of Words (BOW) ○ loss of spatial locality

● N-grams

Page 9: NEURLUX: Dynamic Malware Analysis using Neural Networks

BASELINE = STATE OF THE ART

Page 10: NEURLUX: Dynamic Malware Analysis using Neural Networks

Word EmbeddingsWords that have the same meaning should have similar have a similar representation.

● A way to learn to map a set of words or phrases in a vocabulary to vectors of numerical values.

● Computations with one-hot encoded vectors inefficient because most values in your one-hot vector will be 0 sparse

● Dense representation of words and their relative meanings

Page 11: NEURLUX: Dynamic Malware Analysis using Neural Networks

Word Embeddings

0.7 0.4 0.5

0.2 -0.1 0.1

PeerDist

CacheMgr

Words with similar context tend to have collinear vectors

vocabulary_size * embedding dimension

Page 12: NEURLUX: Dynamic Malware Analysis using Neural Networks

Convolution Neural Networks

S = length of InputD = Dimension of word vectorSentence Matrix = S * D

Page 13: NEURLUX: Dynamic Malware Analysis using Neural Networks

HOW DO THEY FIND SIMILARITIES?

0.7 0.4 0.5

0.2 -0.1 0.1

-0.5 0.4 0.1

0.6 0.3 0.5

0.3 -0.1 0.1

-0.5 0.4 0.1

CurrVersion

PeeDist

CacheMgr

Defender

spynet

windows

0.6 0.4 0.5

0.2 -0.1 0.2

0.9

0.84

Word Embeddings

Convolutional filter

Page 14: NEURLUX: Dynamic Malware Analysis using Neural Networks

LSTM● Networks with loops in them, which allow information to persist.● Cell states are like conveyor belts.

Page 15: NEURLUX: Dynamic Malware Analysis using Neural Networks

LSTMThe LSTM cell has the ability to remove or add information to the cell state, carefully regulated by structures called gates.

Gates are a way to optionally let information through

Page 16: NEURLUX: Dynamic Malware Analysis using Neural Networks

Input from BiLSTM Meaning behind sequences, vector corresponding to each word

Word Vector

Sentence Vector

Page 17: NEURLUX: Dynamic Malware Analysis using Neural Networks
Page 18: NEURLUX: Dynamic Malware Analysis using Neural Networks

PROPOSED METHODS

Page 19: NEURLUX: Dynamic Malware Analysis using Neural Networks

FEATURE COUNTS

Common in both reports -> Format in terms of Parent-Child processes

Page 20: NEURLUX: Dynamic Malware Analysis using Neural Networks

FEATURE BASED TEXT CLASSIFICATION

Page 21: NEURLUX: Dynamic Malware Analysis using Neural Networks

RAW - CNN

WHAT DOES THIS MEAN?

● We don’t encode any information about file format to the model● This can be adapted to new file formats● Time-to-solution reduced. No feature engineering!

PROBLEMS?

● Reports are large● Variable length● Lots of information

Page 22: NEURLUX: Dynamic Malware Analysis using Neural Networks
Page 23: NEURLUX: Dynamic Malware Analysis using Neural Networks

Evaluation

DATASET

BEST:INDIVIDUAL FEATURES + RAW DATA

MALWARE FAMILY

BEST:RAW_DATA

DATASET + REPORT

BEST:STILL RUNNING??

REPORT FORMAT

BEST:INDIVIDUAL FEATURES

UNKNOWN

Page 24: NEURLUX: Dynamic Malware Analysis using Neural Networks

Embedding Visualization

Page 25: NEURLUX: Dynamic Malware Analysis using Neural Networks

Results

Page 26: NEURLUX: Dynamic Malware Analysis using Neural Networks

CONCLUSION

1. RAW IS THE GAWD2. FEATURES CAN BE GIVEN A SHOT3. COMBINATION OF CNN + LSTM +ATTENTION DOES BETTER THAN

JUST CNN4. THIS IS A NOVEL APPROACH!!!!

Page 27: NEURLUX: Dynamic Malware Analysis using Neural Networks

Discussion● Detect a previously unseen family

○ Virus Total experiment○ Correlations between malware families

● Adversarial learning● Can you obfuscate runtime behavior ?

Page 28: NEURLUX: Dynamic Malware Analysis using Neural Networks

Future Work ➔ Try Adversarial attacks on our model➔ Current training relies on accurate and broad data

◆ resiliency to data➔ Compare with Image Classification ➔ More models to try:

◆ Cleaning of reports, document classification on the entire report.

Page 29: NEURLUX: Dynamic Malware Analysis using Neural Networks

ENSEMBLE MODEL

➔ Integrated Stacking Ensemble Model on all features.

Page 30: NEURLUX: Dynamic Malware Analysis using Neural Networks
Page 31: NEURLUX: Dynamic Malware Analysis using Neural Networks

Refs● L. Nataraj, S. Karthikeyan, G. Jacob, B. Manjunath, "Malware images: visualization and automatic classification", Proceedings of the 8th international

symposium on visualization for cyber security., pp. 4, 2011. (image ml)● M. Kalash, M. Rochan, N. Mohammed, N. D. B. Bruce, Y. Wang and F. Iqbal, "Malware Classification with Deep Convolutional Neural Networks," 2018 9th

IFIP International Conference on New Technologies, Mobility and Security (NTMS), Paris, 2018, pp. 1-5. (image nn)

● P. Burnap, R. French, F. Turner, K. Jones, Malware classification using self organising feature maps and machine activity data, Computers & Security 73 (2018) 399–410. quote


Recommended