Deep Program Reidentification: A Graph Neural Network Solution · 2020-01-26 · Deep Program...

Post on 06-Aug-2020

1 views 0 download

transcript

Deep Program Reidentification:A Graph Neural Network Solution

Shen Wang et al.

University of Illinois at Chicago, NEC Labs America

To appear in SIAM International Conference on Data Mining (SDM’19)

Presenter: Weilin Xuhttps://qdata.github.io/deep2Read

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 1 / 30

Outline

1 IntroductionProblemProposed Solution

2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification

3 Experiments

4 Conclusion

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 2 / 30

Outline

1 IntroductionProblemProposed Solution

2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification

3 Experiments

4 Conclusion

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 3 / 30

Program Reidentification

Determine if an unknown program is variant of a known program.

Used to detect disguised malware or ramsomeware.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 4 / 30

Digital Code Signing is Useful

Figure: Program Properties Figure: Digital Signature

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 5 / 30

Digital Code Signing is Useful, but

Not always used, especially by open source software. (False Positives)

Malware can hijack a signed program. (False Negatives)

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 6 / 30

Weakness of previous techniques

Digital code signingNot always used.

Anti-virusMalware-free attack, evasive malware, etc.

Sophisticated program watermarking techniquesProhibitive computational costs.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 7 / 30

Outline

1 IntroductionProblemProposed Solution

2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification

3 Experiments

4 Conclusion

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 8 / 30

Proposed Solution

Program ⇒ Graph

Graph ⇒ Embedding.

Embedding ⇒ Identity Classification.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 9 / 30

Outline

1 IntroductionProblemProposed Solution

2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification

3 Experiments

4 Conclusion

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 10 / 30

Extract Graph from a Program

Possible choices:

Static analysisE.g. Call graph of code blocks.

Complicated, local.

Dynamic analysisE.g. System interaction graph.

Simpler, global (this paper)

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 11 / 30

Extract Graph from a Program

Possible choices:

Static analysisE.g. Call graph of code blocks. Complicated, local.

Dynamic analysisE.g. System interaction graph. Simpler, global (this paper)

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 11 / 30

Extract Graphs from Dynamic Behavior

Figure: Extract three graphs from program execution.Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 12 / 30

Heterogeneous Graph

Three types of nodes:

Fork another program.

Read/Write a file.

Access to a network socket < IPAddr : Port >.

Solution: separate into three homogeneous graphs (meta-path).

Program - Program.

Program - File.

Program - Socket.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 13 / 30

Attentional Multi-Channel Graph Neural Network

Figure: Attentional Multi-Channel Graph Neural Network.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 14 / 30

Outline

1 IntroductionProblemProposed Solution

2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification

3 Experiments

4 Conclusion

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 15 / 30

Feature Extraction

For each node v in graph G , we extract a feature vector from

Connectivity featuresX conv = {ev ,1..., ev ,|V |}

Graph statistical featuresX statv = {X s1

v ,X s2v ,X s3

v ,X s4v }

Degree centralityCloseness centralityBetweenness centralityClustering coefficient

How to combine as Xv? Concatenation?

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 16 / 30

Feature Extraction

For each node v in graph G , we extract a feature vector from

Connectivity featuresX conv = {ev ,1..., ev ,|V |}

Graph statistical featuresX statv = {X s1

v ,X s2v ,X s3

v ,X s4v }

Degree centralityCloseness centralityBetweenness centralityClustering coefficient

How to combine as Xv? Concatenation?

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 16 / 30

Outline

1 IntroductionProblemProposed Solution

2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification

3 Experiments

4 Conclusion

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 17 / 30

Graph Embedding Function

Given homogeneous graph (single channel)G = (V ,E ,A), each V associated with feature X (|V | × (|V |+ 4)?)

Goal: to construct and learn a graph embedding function fG : G → hG

Proposed form: a three-layer Contextual Graph Encoder

h1 = ReLU((PX )W 0)h2 = ReLU((Ph1)W 1)h3 = ReLU((Ph2)W 2)

hG = hvt = h3

Each layer: hl = PROP(hl) = Phl (h0 = X )hl+1 = PERCE (hl) = σ(hlW l) = ReLU(hlW l)

W l : shared trainable weight matrix for all entities at layer l .

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 18 / 30

Graph Embedding Function

Given homogeneous graph (single channel)G = (V ,E ,A), each V associated with feature X (|V | × (|V |+ 4)?)

Goal: to construct and learn a graph embedding function fG : G → hG

Proposed form: a three-layer Contextual Graph Encoder

h1 = ReLU((PX )W 0)h2 = ReLU((Ph1)W 1)h3 = ReLU((Ph2)W 2)

hG = hvt = h3

Each layer: hl = PROP(hl) = Phl (h0 = X )hl+1 = PERCE (hl) = σ(hlW l) = ReLU(hlW l)

W l : shared trainable weight matrix for all entities at layer l .

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 18 / 30

Graph Embedding Function

Given homogeneous graph (single channel)G = (V ,E ,A), each V associated with feature X (|V | × (|V |+ 4)?)

Goal: to construct and learn a graph embedding function fG : G → hG

Proposed form: a three-layer Contextual Graph Encoder

h1 = ReLU((PX )W 0)h2 = ReLU((Ph1)W 1)h3 = ReLU((Ph2)W 2)

hG = hvt = h3

Each layer: hl = PROP(hl) = Phl (h0 = X )hl+1 = PERCE (hl) = σ(hlW l) = ReLU(hlW l)

W l : shared trainable weight matrix for all entities at layer l .

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 18 / 30

Propagation Function based on Random Walk

hl = PROP(hl)

= Phl

= D−1Ahl

= diag(A1)−1Ahl

(1)

A: Adjacency matrix; 1: all one vector.D = diag(A1): degree matrix of A.P = D−1A: propagation matrix shared in each layer.

Implication: weighted sum of the contexts’ current representation.hl =

∑u∈N(vt)

Puvthl , F = {N(vt)}: receptive field

P ∈ RN×N : converged stationary distribution of the Markov process.i th row: likelihood of diffusion from entity.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 19 / 30

Propagation Function based on Random Walk

hl = PROP(hl)

= Phl

= D−1Ahl

= diag(A1)−1Ahl

(1)

A: Adjacency matrix; 1: all one vector.D = diag(A1): degree matrix of A.P = D−1A: propagation matrix shared in each layer.

Implication: weighted sum of the contexts’ current representation.hl =

∑u∈N(vt)

Puvthl , F = {N(vt)}: receptive field

P ∈ RN×N : converged stationary distribution of the Markov process.i th row: likelihood of diffusion from entity.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 19 / 30

Propagation Matrix Example

12

3

A =0 1 01 0 01 0 0

D =2 0 00 1 00 0 1

𝐷() =

12 0 00 1 00 0 1

𝑃 = 𝐷()𝐴 =

12 0 00 1 00 0 1

0 1 01 0 01 0 0

=0

12

12

1 0 01 0 0

Figure: Propagation matrix example.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 20 / 30

Outline

1 IntroductionProblemProposed Solution

2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification

3 Experiments

4 Conclusion

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 21 / 30

Motivation

Treat three channels differently

Programs;

Files;

Sockets.

Example

Ransomware: active in files.

VPN: active in socket.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 22 / 30

Attention Weight

Attention weight ATT (hGi) for channel i :

αi =exp( σ(a[WahGi

||WahGk]) )∑

k ′∈|C | exp(σ(a[WahGi||WahGk′ ]))

Each channel i = 1, 2, ..., |C |hGi

: graph embedding of a target channelhGk

: graph embedding of other channels.a: trainable attention vector.Wa: trainable weight mapping (input features ⇒ hidden space)||: concatenationσ: nonlinear gating function.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 23 / 30

Joint Representation of All Channels

Joint representation of all channels:

hGJoin=

|C |∑i=1

ATT (hGi) hGi

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 24 / 30

Outline

1 IntroductionProblemProposed Solution

2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification

3 Experiments

4 Conclusion

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 25 / 30

Program Reidentification

Train a binary classifier for each known program.Input: A claimed program event data.Prediction: If the program behaves like the claimed one.

Logistic regression classifier.

Binary cross entropy loss.

Adam optimizer.

Early stopping with good accuracy.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 26 / 30

Experimental Setup

Dataset: Real-world system monitoring data of 3 Terabytes.87 machines over 20 weeks.300M events, 2K processes, 600K files, 18K sockets.Behavior graph per program per day.

Baselines.

LR, SVM, XGB, MLP using raw features.MLP: special case that PROP() is identity matrix.

Metrics: ACC, F-1 score, AUC, precision and recall.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 27 / 30

Result

Figure: Comparison of other classification methods.Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 28 / 30

Conclusion

DeepRe-ID, an attentional graph neural network method to verifythe program identity based on behavior graph.

Can encode heterogeneous complex dependency.

Outperform all baseline methods.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 29 / 30

Discussions

Drawbacks:

No open dataset or open source code.

Require feature engineering: graph statistical features.

Require adjacency matrix.

Binary classification with many classes.

No interpretation of trained models.

Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 30 / 30