Trojan Families Identification Using Dynamic Features and ... · of features (Naive Bayes, Logistic...

Trojan Families Identification Using Dynamic Features

and Low Complexity Classifiers

Jelena Milosevic, Alberto Ferrante and Miroslaw Malek ALaRI, Faculty of Informatics, USI-Lugano, Switzerland

What Will Keep You Up During the Presentation? Ø  Motivation

Ø  Related Work

Ø  Proposed Approach

Ø  Experimental Setup

Ø  Obtained Results

Ø  Conclusions and Future Work

Motivation/ Why Mobile? Ø  Mobile devices contain more private

(sensitive) data then PCs ever will

Ø  Mobile apps represent more than a half of Internet use today [1]

Ø  Nearly one billion Android devices are sold since 2009 [2]

[1] Lookout blog https://blog.lookout.com/blog/2016/09/29/chamber-of-commerce-mobile-security/ [2] Statistics and facts about Android https://www.statista.com/topics/876/android/

Motivation/ Mobile Malware on the Rise

Motivation/ Mobile Malware on the Rise

Number of detected malicious installation packages (Q3 2015 – Q2 2016) Source: Kaspersky Labs Q2 2016. Statistics

Motivation/ Mobile Trojans on the Rise

MobileMalware Statistics 2009 Schmidt et al, Static Analysis of Executables for Collaborative Malware detection on Android, IEEE International Conference Communications, ICC-2009


Mobile Malware Statistics 2014 Kaspersky Labs http://www.kaspersky.co.in/images/mobile-cyber-threats-600-29841-252612.png


Mobile Malware Statistics 2015/2016 Kaspersky Labs 2016 Q1 Report https://securelist.com/analysis/quarterly-malware-reports/74640/it-threat-evolution-in-q1-2016/

Related Work/ Static Detection

Ø  (Mostly) offline investigation

Ø  Analysis of static features like manifest files, dissasembled code, permissions, intent messages passing, API calls

J No need to run application

L Cannot cope with increased number of malware samples and their variants

L Alone no longer efficient to identify malware [Moser et al]

Promising candidates to overcome these are dynamic detection based methods

Moser et al, Limits of Static Analysis for Malware Detection, Annual Computer Security Applications Conference ACSAC-2007

Related Work/ Dynamic Detection

Ø  Analysis of the apps during their executions

Ø  Analysis of dynamic features like touch screen, keyboard, messaging, CPU, power, network, system calls

J Can cope with increased number of malware samples and their variants

J Can detect malware at run-time

L The complexity is mostly incompatible with limited resources of mobile devices

L Current approaches cannot discriminate different Trojan families at run-time

Our Goals Were to Investigate:

1.  Whether Trojan families can be discriminated from benign applications by observing dynamic system parameters of program executions?

2.  Whether different Trojan families have distinctive impact on the observed system parameters?

Furthermore, the approach can be applicable to mobile devices if:

Ø  It uses a limited number of features (system parameters)

Ø  It uses classifiers of low complexity

The Proposed Approach

It consists of two steps:

Step 1

Ø  Offline development of the Trojans detection

Step 2

Ø  Runtime Trojans detection

Step 1: Offline Development of the Trojans Detection

Record-level detection discussed in more detail in Milosevic et al, A Friend or a Foe? Detecting Malware Using Memory and CPU Features, at 13th International Conference on Security and Cryptography SECRYPT-2016

Step 2: Runtime Trojans Detection

Experimental Setup: Dataset Used

Category Name Number of Samples

Trojan Families Droid Kung Fu Fake Player

Geinimi Ginger Master

Kmin

667 6

92 339 147

Benign Apps - 300

Once installed attempts to gain control over system using expoits that are stored in a

malware package and encrypted with a key.

Pretends to be a movie player, but instead sends SMS

messages Sends personal data (location coordinates, device identifiers,

the list of installed apps) to remote servers

Harvests confidential information from devices

without users knowledge nor consent

Collect user and device data (User ID, Subscriber ID, current

data) and send it to a remote server

Downloaded from Play Store (call and contacts, education,

entertainment, GPS and travel, etc)

Execution Environment

Ø  Each application was run for ten minutes in Android Emulator

•  In average, most of the applications expose their intentions within first three minutes of execution[Milosevic et al]

Ø  Before running each application the operating system (Android 4.0) was reinitialized

Ø  Monkey runner was used to activate different features of apps

Ø  Applications usage of memory, CPU and network resources was recorded

Ø  Monitoring period of two seconds Milosevic et al, MalAware: Effective and Efficient Run-time Mobile Malware Detector, The 14th IEEE International Conference on Dependable, Autonomic and Secure Computing DASC-2016

Features Extraction

Extracted features (in total 73) are related to:

Ø  Memory

•  Virtual, native, Dalvik, Cursor, Android shared, memory-mapped native code, memory-mapped fonts, memory-mapped Dalvik code

Ø  CPU usage

•  Total, user, kernel

Ø  Network statistics

•  Transport and Internet layer (number of packets, packets size, network load, etc.)

In order to make the approach compatible with mobile devices, we aimed at: Ø  Usage of limited number of observed features (system parameters)

Correlation Feature Selection Subset Evaluation Method [Hall et al]

Ø  Usage of classifiers of low complexity

Naive Bayes, Logistic Regression, Support Vector Machines [Hall et al]

Hall et al, The WEKA Data Mining Software: An Update, ACM SIGKDD Explorations Newsletter 2009

Results

Goal 1:Trojan families can be discriminated from benign applications by observing dynamic system parameters of program executions?

Yes, using five features and obtaining following accuracies: Naïve Bayes 84.5%, Logistic Regression 84.8%, Support Vector Machines 84.4%.

Benign vs Trojans Informative Features Feature Name Description

.dex mmap private dirty Private memory of the process, in the dirty state, being used for mapped Dalvik or ART code. The dirty state is due to fix-ups to the native code when it is loaded into its final address

Other mmap shared dirty Memory shared among processes, in the dirty state, being used for non-classified purposes

Network load over last minute Network load in bits/second registered in the last minute

Maximum packet size in bytes Maximum network packets size in bytes observed in the 2s monitoring period

Number of UDP packets Number of UDP network packets received and transmitted in the 2s monitoring period

Results

Goal 2: Different Trojan families have distinctive impact on the observed system parameters?

Yes, using from three to thirteen features with obtained accuracies from 76.5% to 99.8%.

Droid Kung Fu Informative Features (3) Feature Name Description

.ttf mmap PSS Memory usage for true type fonts, including pages shared with other processes


Maximum packet size in bytes Maximum network packets size in bytes observed in the 2s monitoring period

Fake Player Informative Features(3) Feature Name Description

.dex mmap Pss Memory usage for Dalvik or ART code, including pages shared with other processes

Number of ARP packets Number of ARP network packets received and transmitted in the 2s monitoring period


Geinimi Informative Features(13) Feature Name

Description

.jar mmap Pss

Memory usage for Java code, including pages shared among processes

Other mmap Pss

Memory usage for non-classified purposes, including pages shared among processes

Other mmap Shared Dirty

Memory shared among processes, in the dirty state, being used for non-classified purposes

Number of TCP packets

Number of TCP network packets received and transmitted in the 2s monitoring period

Unknown Pss

Memory usage for unknown purposes, including pages shared among processes

Feature Name

Description

Total Heap Size

Total heap size allocated for the process

CPU User User-space CPU usage of the process

CPU kernel Kernel-space CPU usage of the process

Minor faults Virtual memory minor page faults caused by the process

bps Network load in bits/second

Number of ICMP packets

Number of ICMP network packets received and transmitted

Size in byte standard deviation

Standard deviation of the network packet size in bytes

Number of bytes

Number of bytes transmitted and received

Ginger Master Informative Features (6) Feature Name Description

.so mmap Private Dirty Private memory of the process, in the dirty state, being used for mapped native code

.ttf mmap Pss Memory usage for true type fonts, including pages shared with other processes

minor faults Virtual memory minor page faults caused by the process

bps Network load in bits/second registered in the monitoring period


Number of UDP packets Number of UDP network packets received and transmitted in the monitoring period

Kmin Informative Features (11) Feature Name Description

Ashmem Private Dirty

Private memory of the process, in the dirty state, being allocated as Android shared memory

.so mmap Pss Memory usage for mapped native code, including pages shared with other processes

.so mmap Shared Dirty

Memory shared with other processes, in the dirty state, being used for mapped native code

.so mmap Private Dirty

Private memory of the process, in the dirty state, being used for mapped native code

.apk mmap Pss Memory usage for Android application package files, including pages shared with other processes

Feature Name Description

.ttf mmap Pss Memory usage for true type fonts, including pages shared with other processes

TOTAL Shared Dirty

Total memory of the process that is shared and it is marked as dirty

TOTAL Heap Size

Total heap size allocated for the process

minor faults Virtual memory minor page faults caused by the process

bps Network load in bits/second registered in the monitoring period

Number of UDP packets

Number of UDP network packets received and transmitted in the monitoring period

Classification results obtained using selected features:

Droid Kung Fu (3 features)

Fake Player (3 features)

Geinimi (13 features)

Ginger Master (6 features)

Kmin (11 features)

Naive Bayes

58% 99.7% 90% 77.2% 81.5%

Logistic Regression

76.5% 99.8% 96.8% 78.4% 95.3%

Support Vector

Machines

74.2%

99.7%

97%

78.7%

95.4%

Obtained Detection Accuracy

From obtained results, we see that:

Ø  Diverse behaviors of investigated Trojan families are reflected by different usage of memory, CPU, and network (behavioral signatures)

Ø  Once these family behavioral signatures are extracted, they can be used to recognize Trojans at run-time with good accuracy

Ø  High-detection accuracy can be achieved using small number of features and detection algorithms of low (linear) complexity making it suitable for mobile devices

Conclusions

Ø  We propose an approach to the identification of behavioral signatures for different Trojan families and their most appropriate detectors

Ø  By observing only a limited number of features per Trojan family (from 3 to 13 features) and by using a detection algorithms of low complexity in the number of features (Naive Bayes, Logistic Regression or Support Vector Machines), execution records belonging to Trojans can be identified with a precision of up to 99.8%

Ø  The proposed method is suitable for efficient and effective run-time usage on resource-constrained devices

Future Work

Ø  Extension of the method to detect Trojanized applications

Ø  Increase number of observed Trojan families and trusted applications

Ø  Use the method on the real devices

•  So that the overhead can be estimated more precisely

•  More malicious samples can be activated

Ø  Validation in industrial setting (data, data, more data)

Ø  Further optimizations (low power,...)

Date post:	05-Jul-2019
Category:	Documents
Upload:	dangquynh
View:	213 times
Download:	0 times

Trojan Families Identification Using Dynamic Features and ... · of features (Naive Bayes, Logistic...

Documents