Post on 06-Aug-2020
transcript
Deep Program Reidentification:A Graph Neural Network Solution
Shen Wang et al.
University of Illinois at Chicago, NEC Labs America
To appear in SIAM International Conference on Data Mining (SDM’19)
Presenter: Weilin Xuhttps://qdata.github.io/deep2Read
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 1 / 30
Outline
1 IntroductionProblemProposed Solution
2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification
3 Experiments
4 Conclusion
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 2 / 30
Outline
1 IntroductionProblemProposed Solution
2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification
3 Experiments
4 Conclusion
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 3 / 30
Program Reidentification
Determine if an unknown program is variant of a known program.
Used to detect disguised malware or ramsomeware.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 4 / 30
Digital Code Signing is Useful
Figure: Program Properties Figure: Digital Signature
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 5 / 30
Digital Code Signing is Useful, but
Not always used, especially by open source software. (False Positives)
Malware can hijack a signed program. (False Negatives)
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 6 / 30
Weakness of previous techniques
Digital code signingNot always used.
Anti-virusMalware-free attack, evasive malware, etc.
Sophisticated program watermarking techniquesProhibitive computational costs.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 7 / 30
Outline
1 IntroductionProblemProposed Solution
2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification
3 Experiments
4 Conclusion
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 8 / 30
Proposed Solution
Program ⇒ Graph
Graph ⇒ Embedding.
Embedding ⇒ Identity Classification.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 9 / 30
Outline
1 IntroductionProblemProposed Solution
2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification
3 Experiments
4 Conclusion
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 10 / 30
Extract Graph from a Program
Possible choices:
Static analysisE.g. Call graph of code blocks.
Complicated, local.
Dynamic analysisE.g. System interaction graph.
Simpler, global (this paper)
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 11 / 30
Extract Graph from a Program
Possible choices:
Static analysisE.g. Call graph of code blocks. Complicated, local.
Dynamic analysisE.g. System interaction graph. Simpler, global (this paper)
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 11 / 30
Extract Graphs from Dynamic Behavior
Figure: Extract three graphs from program execution.Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 12 / 30
Heterogeneous Graph
Three types of nodes:
Fork another program.
Read/Write a file.
Access to a network socket < IPAddr : Port >.
Solution: separate into three homogeneous graphs (meta-path).
Program - Program.
Program - File.
Program - Socket.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 13 / 30
Attentional Multi-Channel Graph Neural Network
Figure: Attentional Multi-Channel Graph Neural Network.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 14 / 30
Outline
1 IntroductionProblemProposed Solution
2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification
3 Experiments
4 Conclusion
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 15 / 30
Feature Extraction
For each node v in graph G , we extract a feature vector from
Connectivity featuresX conv = {ev ,1..., ev ,|V |}
Graph statistical featuresX statv = {X s1
v ,X s2v ,X s3
v ,X s4v }
Degree centralityCloseness centralityBetweenness centralityClustering coefficient
How to combine as Xv? Concatenation?
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 16 / 30
Feature Extraction
For each node v in graph G , we extract a feature vector from
Connectivity featuresX conv = {ev ,1..., ev ,|V |}
Graph statistical featuresX statv = {X s1
v ,X s2v ,X s3
v ,X s4v }
Degree centralityCloseness centralityBetweenness centralityClustering coefficient
How to combine as Xv? Concatenation?
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 16 / 30
Outline
1 IntroductionProblemProposed Solution
2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification
3 Experiments
4 Conclusion
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 17 / 30
Graph Embedding Function
Given homogeneous graph (single channel)G = (V ,E ,A), each V associated with feature X (|V | × (|V |+ 4)?)
Goal: to construct and learn a graph embedding function fG : G → hG
Proposed form: a three-layer Contextual Graph Encoder
h1 = ReLU((PX )W 0)h2 = ReLU((Ph1)W 1)h3 = ReLU((Ph2)W 2)
hG = hvt = h3
Each layer: hl = PROP(hl) = Phl (h0 = X )hl+1 = PERCE (hl) = σ(hlW l) = ReLU(hlW l)
W l : shared trainable weight matrix for all entities at layer l .
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 18 / 30
Graph Embedding Function
Given homogeneous graph (single channel)G = (V ,E ,A), each V associated with feature X (|V | × (|V |+ 4)?)
Goal: to construct and learn a graph embedding function fG : G → hG
Proposed form: a three-layer Contextual Graph Encoder
h1 = ReLU((PX )W 0)h2 = ReLU((Ph1)W 1)h3 = ReLU((Ph2)W 2)
hG = hvt = h3
Each layer: hl = PROP(hl) = Phl (h0 = X )hl+1 = PERCE (hl) = σ(hlW l) = ReLU(hlW l)
W l : shared trainable weight matrix for all entities at layer l .
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 18 / 30
Graph Embedding Function
Given homogeneous graph (single channel)G = (V ,E ,A), each V associated with feature X (|V | × (|V |+ 4)?)
Goal: to construct and learn a graph embedding function fG : G → hG
Proposed form: a three-layer Contextual Graph Encoder
h1 = ReLU((PX )W 0)h2 = ReLU((Ph1)W 1)h3 = ReLU((Ph2)W 2)
hG = hvt = h3
Each layer: hl = PROP(hl) = Phl (h0 = X )hl+1 = PERCE (hl) = σ(hlW l) = ReLU(hlW l)
W l : shared trainable weight matrix for all entities at layer l .
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 18 / 30
Propagation Function based on Random Walk
hl = PROP(hl)
= Phl
= D−1Ahl
= diag(A1)−1Ahl
(1)
A: Adjacency matrix; 1: all one vector.D = diag(A1): degree matrix of A.P = D−1A: propagation matrix shared in each layer.
Implication: weighted sum of the contexts’ current representation.hl =
∑u∈N(vt)
Puvthl , F = {N(vt)}: receptive field
P ∈ RN×N : converged stationary distribution of the Markov process.i th row: likelihood of diffusion from entity.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 19 / 30
Propagation Function based on Random Walk
hl = PROP(hl)
= Phl
= D−1Ahl
= diag(A1)−1Ahl
(1)
A: Adjacency matrix; 1: all one vector.D = diag(A1): degree matrix of A.P = D−1A: propagation matrix shared in each layer.
Implication: weighted sum of the contexts’ current representation.hl =
∑u∈N(vt)
Puvthl , F = {N(vt)}: receptive field
P ∈ RN×N : converged stationary distribution of the Markov process.i th row: likelihood of diffusion from entity.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 19 / 30
Propagation Matrix Example
12
3
A =0 1 01 0 01 0 0
D =2 0 00 1 00 0 1
𝐷() =
12 0 00 1 00 0 1
𝑃 = 𝐷()𝐴 =
12 0 00 1 00 0 1
0 1 01 0 01 0 0
=0
12
12
1 0 01 0 0
Figure: Propagation matrix example.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 20 / 30
Outline
1 IntroductionProblemProposed Solution
2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification
3 Experiments
4 Conclusion
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 21 / 30
Motivation
Treat three channels differently
Programs;
Files;
Sockets.
Example
Ransomware: active in files.
VPN: active in socket.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 22 / 30
Attention Weight
Attention weight ATT (hGi) for channel i :
αi =exp( σ(a[WahGi
||WahGk]) )∑
k ′∈|C | exp(σ(a[WahGi||WahGk′ ]))
Each channel i = 1, 2, ..., |C |hGi
: graph embedding of a target channelhGk
: graph embedding of other channels.a: trainable attention vector.Wa: trainable weight mapping (input features ⇒ hidden space)||: concatenationσ: nonlinear gating function.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 23 / 30
Joint Representation of All Channels
Joint representation of all channels:
hGJoin=
|C |∑i=1
ATT (hGi) hGi
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 24 / 30
Outline
1 IntroductionProblemProposed Solution
2 MethodProgram ⇒ GraphNode Feature ExtractionGraph EmbeddingChannel-Aware AttentionBinary Classification
3 Experiments
4 Conclusion
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 25 / 30
Program Reidentification
Train a binary classifier for each known program.Input: A claimed program event data.Prediction: If the program behaves like the claimed one.
Logistic regression classifier.
Binary cross entropy loss.
Adam optimizer.
Early stopping with good accuracy.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 26 / 30
Experimental Setup
Dataset: Real-world system monitoring data of 3 Terabytes.87 machines over 20 weeks.300M events, 2K processes, 600K files, 18K sockets.Behavior graph per program per day.
Baselines.
LR, SVM, XGB, MLP using raw features.MLP: special case that PROP() is identity matrix.
Metrics: ACC, F-1 score, AUC, precision and recall.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 27 / 30
Result
Figure: Comparison of other classification methods.Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 28 / 30
Conclusion
DeepRe-ID, an attentional graph neural network method to verifythe program identity based on behavior graph.
Can encode heterogeneous complex dependency.
Outperform all baseline methods.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 29 / 30
Discussions
Drawbacks:
No open dataset or open source code.
Require feature engineering: graph statistical features.
Require adjacency matrix.
Binary classification with many classes.
No interpretation of trained models.
Shen Wang et al. (UIC, NEC Labs) Deep Program Reidentification: A Graph Neural Network SolutionPresenter: Weilin Xu https://qdata.github.io/deep2Read 30 / 30