+ All Categories
Home > Documents > UNIVERSITY OF CALGARY by Christopher Jarabek...

UNIVERSITY OF CALGARY by Christopher Jarabek...

Date post: 21-Apr-2018
Category:
Upload: trinhkhue
View: 214 times
Download: 1 times
Share this document with a friend
146
UNIVERSITY OF CALGARY Towards cloud-based anti-malware protection for desktop and mobile platforms by Christopher Jarabek A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF COMPUTER SCIENCE CALGARY, ALBERTA April, 2012 c Christopher Jarabek 2012
Transcript

UNIVERSITY OF CALGARY

Towards cloud-based anti-malware protection for desktop and mobile platforms

by

Christopher Jarabek

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

DEGREE OF MASTER OF SCIENCE

DEPARTMENT OF COMPUTER SCIENCE

CALGARY, ALBERTA

April, 2012

c� Christopher Jarabek 2012

UNIVERSITY OF CALGARY

FACULTY OF GRADUATE STUDIES

The undersigned certify that they have read, and recommend to the Faculty of Graduate

Studies for acceptance, a thesis entitled “Towards cloud-based anti-malware protection

for desktop and mobile platforms” submitted by Christopher Jarabek in partial fulfill-

ment of the requirements for the degree of MASTER OF SCIENCE.

Supervisor, Dr. John D. AycockDepartment of Computer Science

Internal Examiner, Dr. Michael E.Locasto

Department of Computer Science

External Examiner, Dr. Behrouz FarDepartment of Electrical and

Computer Engineering

Date

Abstract

Malware is a persistent and growing problem that threatens the privacy and property

of computer users. In recent years, this threat has spread to mobile devices such as

smartphones and tablet computers. At the same time, the main method for combating

malware, anti-virus software, has grown in size and complexity to the point where the

resource demands imposed by these security systems have become increasingly notice-

able. In an effort to create a more transparent security system, it is possible to move

the scanning of malware from the host computer to a scanning service in the cloud.

This relocation could offer the security of conventional host-based scanning, without the

resource demands involved with running a fully host-based anti-virus system.

This thesis shows that under the right circumstances, malware scanning services pro-

vided remotely are capable of replacing host-based anti-malware systems on desktop

computers, although such a cloud-based security system is better suited to protecting

smartphone users from malicious applications. To that end, a system was developed

that provides anti-malware security for desktop computers by making use of pre-existing

web-based file scanning services for malware detection. This system was evaluated and

found to have variable performance ranging from acceptable to very poor. The desktop

scanning system was then augmented and adapted to serve as a mechanism for identify-

ing malicious applications on Android smartphones. The evaluation of this latter system

showed favorable results, and is effective as a mechanism for combating the growing

mobile malware threat.

ii

Acknowledgements

“No man is an island”, as such, this body of research would not have been what it is

without the help of several individuals. First and foremost, I would like to thank Dr.

John Aycock for his guidance and advice during my studies. His open and approachable

nature made him a pleasure to work with, and this research would not have reached its

full potential without his direction.

I would like to express my gratitude to Dr. Michael Locasto and Dr. Behrouz Far, for

serving on my examination committee. I would also like to thank Dr. William Enck and

Dave Barrera for their advice regarding Android development, as well as Erika Chin and

Adrienne Porter Felt for their assistance with tools for data analysis.

Special thanks should also be given to my student colleagues, Daniel De Castro and

Jonathan Gallagher for offering up their company and enjoyable discussions.

Finally, I would like to thank my family: Chelsey Greene and Patricia and Jim

Jarabek. However, words alone are insufficient to show the scale of my gratitude for the

love, encouragement, and support they have shown me.

iii

iv

Table of Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 The Malware Threat . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 The Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Smartphones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.3.1 Mobile Malware . . . . . . . . . . . . . . . . . . . . . . . 51.1.3.2 Android . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1 Mobile Security and Malware . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Cloud Based Anti-Malware . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Device Based Mobile Anti-Malware . . . . . . . . . . . . . . . . . . . . . 182.4 Non-Device Based Mobile Anti-Malware . . . . . . . . . . . . . . . . . . 202.5 Other Lightweight Anti-Virus Techniques . . . . . . . . . . . . . . . . . . 232.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.1 Scanning Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.1 Kaspersky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1.2 VirusChief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1.3 VirusTotal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1.4 Other Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.1.5 Terms of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Desktop Thin AV System . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.1 DazukoFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.2 File System Access Controller . . . . . . . . . . . . . . . . . . . . 313.2.3 Standalone Runner . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.4 Thin AV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.5 Scanning Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.6 System Circumvention . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 Mobile Thin AV System . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3.1 Reuse of Existing Thin AV System . . . . . . . . . . . . . . . . . 393.3.2 Android Specific Scanner . . . . . . . . . . . . . . . . . . . . . . . 413.3.3 Safe Installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.3.4 Killswitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.5 System Circumvention . . . . . . . . . . . . . . . . . . . . . . . . 484 System Evaluation - Desktop Thin AV . . . . . . . . . . . . . . . . . . . 494.1 Scanning Service Performance . . . . . . . . . . . . . . . . . . . . . . . . 50

4.1.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.2 Actual System Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.3 Predicted System Overhead . . . . . . . . . . . . . . . . . . . . . . . . . 654.3.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.4 Large Scale System Simulations . . . . . . . . . . . . . . . . . . . . . . . 714.4.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5 System Evaluation - Mobile Thin AV . . . . . . . . . . . . . . . . . . . . 825.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.2 Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.3 Emulator Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.4 ComDroid Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.4.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.5 Safe Installer Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 925.6 Killswitch Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.6.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.1 Thin AV Performance and Feasibility . . . . . . . . . . . . . . . . . . . . 1046.2 Ideal Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.2.1 Desktop Deployment . . . . . . . . . . . . . . . . . . . . . . . . . 1076.2.2 Mobile Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.3 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.3.1 Desktop Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.3.2 Mobile Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

v

List of Tables

3.1 Thin AV security policy matrix. . . . . . . . . . . . . . . . . . . . . . . . 323.2 Speed comparison of the hashing functions available in Python. . . . . . 34

4.1 Kaspersky file scanning performance statistics. . . . . . . . . . . . . . . . 544.2 VirusChief file scanning performance statistics. . . . . . . . . . . . . . . . 544.3 VirusTotal file scanning performance statistics. . . . . . . . . . . . . . . . 554.4 VirusTotal file upload performance statistics . . . . . . . . . . . . . . . . 554.5 Linear equations for the three scanning services. . . . . . . . . . . . . . . 574.6 Activities in the web and advanced workload scripts. . . . . . . . . . . . 604.7 Scenarios examined for assessing Thin AV overhead. . . . . . . . . . . . . 614.8 General characteristics of testing workload scripts. . . . . . . . . . . . . . 634.9 Time to complete the three workload testing scripts while using Thin AV. 634.10 Refined linear equations for each of the three scanning services. . . . . . 684.11 Simulation results of the Kaspersky service for three different activity logs. 694.12 Simulation results of the VirusChief service for three different activity logs. 704.13 Comparison of running time and simulation results for Kaspersky service. 704.14 Comparison of running time and simulation results for VirusChief service. 70

5.1 General file size characteristics of the Android test data set. . . . . . . . 845.2 Summary of malware found in the Google Market data set. . . . . . . . . 865.3 Android emulator versus hardware performance comparison. . . . . . . . 885.4 Linear equation for the ComDroid scanning service. . . . . . . . . . . . . 895.5 Summary of exposed communication in Google Market data set. . . . . . 905.6 Network speeds used for evaluating the mobile implementation of Thin AV. 925.7 Thin AV safe installer cached performance summary. . . . . . . . . . . . 935.8 Thin AV safe installer uncached performance summary. . . . . . . . . . . 945.9 Linear equations for generating a system fingerprint. . . . . . . . . . . . 975.10 Data consumption of Thin AV killswitch over different time periods. . . . 995.11 Fingerprint generation time for different conditions. . . . . . . . . . . . . 1005.12 Total upload sizes used for calculations of bulk scanning performance. . . 1005.13 Thin AV killswitch app upload times. . . . . . . . . . . . . . . . . . . . . 1015.14 Scan times for different numbers of apps. . . . . . . . . . . . . . . . . . . 101

A.1 Raw data from Figure 4.6. . . . . . . . . . . . . . . . . . . . . . . . . . . 133A.2 Raw data from Figure 4.7. . . . . . . . . . . . . . . . . . . . . . . . . . . 133A.3 Raw data from Figures 4.8 and 4.9. . . . . . . . . . . . . . . . . . . . . . 134A.4 Raw data from Figure 4.10. . . . . . . . . . . . . . . . . . . . . . . . . . 135A.5 Raw data from Figure 4.11. . . . . . . . . . . . . . . . . . . . . . . . . . 136A.6 File size characteristics of Android testing data set. . . . . . . . . . . . . 137

vi

List of Figures

3.1 System architecture for Thin AV . . . . . . . . . . . . . . . . . . . . . . 303.2 UML Class Diagram for Thin AV. . . . . . . . . . . . . . . . . . . . . . . 363.3 System architecture diagram for the mobile implementation of Thin AV. 403.4 User interfaces for the Android killswitch. . . . . . . . . . . . . . . . . . 47

4.1 Scan response time for the Kaspersky scanning service. . . . . . . . . . . 564.2 Scan response time for the VirusChief scanning service. . . . . . . . . . . 574.3 Scan response time for the VirusTotal scanning service. . . . . . . . . . . 574.4 Upload response time for the VirusTotal scanning service. . . . . . . . . . 584.5 Example CDF of simulated files by size. . . . . . . . . . . . . . . . . . . 744.6 Accesses which involved an uncached file versus Thin AV induced overhead. 754.7 Number of file system accesses versus Thin AV induced overhead. . . . . 754.8 File size in bytes versus Thin AV induced overhead. . . . . . . . . . . . . 764.9 File size versus the proportion of accesses scanned by each scanning service. 774.10 Proportion of file modifications versus Thin AV induced overhead. . . . . 774.11 Average time between file accesses versus Thin AV overhead. . . . . . . . 78

5.1 Median file size of the Android test data set by category. . . . . . . . . . 845.2 Reponse time of the ComDroid service as a function of package size. . . . 905.3 Fingerprint generation time versus the number and size of packages. . . . 98

vii

List of Abbreviations

AIDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Android Interface Definition Language

AJAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asynchronous JavaScript and XML

API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application Programming Interface

APK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Android Application Package File

ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advanced RISC Machine

ARP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address Resolution Protocol

AV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anti-Virus

CPU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Central Processing Unit

DLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic-Link Library

DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Domain Name System

FFBF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feed-Forward Bloom Filter

FSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . File System Access Controller

HTML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HyperText Markup Language

HTTP(S) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hypertext Transfer Protocol (Secure)

IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Internet Protocol

IPC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inter-Process Communication

LOC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lines of Code

OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operating System

RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random Access Memory

RISC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduced Instruction Set Computer

VM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Machine

WEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wired Equivalent Privacy

XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extensible Markup Language

viii

1

Chapter 1

Introduction

Computer malware (malicious software) is a persistent and evolving threat to the privacy

and property of individuals and organizations. With software systems growing in com-

plexity every year the potential exploits of these systems are growing in kind. The most

common technique for identifying and removing malware from computers is anti-virus

software. However, anti-virus products that run on end-user computers have become

increasingly bloated in recent years, as developers push to include features that will serve

to differentiate their product in a competitive marketplace. This software bloat has a

negative impact on the performance of computer systems and on users’ willingness to use

anti-virus products to protect their computer systems. Recently, an idea has started de-

veloping which would see security offered as a cloud-based service. Although there are a

variety of factors motivating the development of cloud-based security, from a customer’s

perspective this shift towards cloud-based security ultimately means that the products

that are currently used to ensure access, confidentiality, and integrity of both data and

computer systems can be replaced with a cloud-based service. Such services are already

being employed by security companies seeking to enhance their existing host-based anti-

virus software with cloud-based features [46].

This thesis aims to show that under the right circumstances, malware scanning ser-

vices provided remotely are capable of replacing host-based anti-malware systems on

desktop computers, although such a cloud-based security system is better suited to pro-

tecting smartphone users from malicious applications. The evidence to support this

thesis comes from the development and evaluation of Thin AV: a light-weight, cloud-

based anti-malware system that was implemented for both Linux desktops and Android

2

smartphones.

The remainder of this chapter is laid out as follows: Section 1.1 will broadly cover the

background for the main concepts relevant to this thesis. Next, Section 1.2 will detail the

exact contributions made by this thesis. Finally, Section 1.3 will describe the contents of

the remainder of this document.

1.1 Background

This thesis ties together a variety of different topics, including malware (of both the

mobile and non-mobile varieties), cloud computing, and smartphone security. Whereas

Chapter 2 will discuss a wide range of academic research relating these issues, this section

is intended to serve as a general introduction to the relevant topics, and provide the

context for the rest of the work contained within this thesis. The remainder of this

section is outlined as follows: Section 1.1.1 will discuss malware and the threat it poses;

Section 1.1.2 will talk about the concept of cloud computing; and finally, Section 1.1.3

will discuss smartphones, with a special focus on mobile and smartphone malware as well

as a discussion on the Android smartphone operating system.

1.1.1 The Malware Threat

Malware is, in the broadest sense, a computer program that is designed to compromise,

damage, exploit, or harm a computer system or the data residing on it [31]. While

the term “virus” has become somewhat synonymous with malware, this is incorrect, as

computer viruses constitute only a single type of malware. Malware refers to all varieties

of malicious computer programs, which are typically categorized based on the specific

malicious properties the program exhibits. In recent years the creation of new malware

has seen tremendous growth [50], and while malware is created for a variety of reasons,

the most prevalent incentive is financial gain [53].

3

The most common approach to combating malware is through anti-virus programs.

These are programs that examine the files on a computer and locate files that look or

behave like known malware samples [31]. While there are numerous companies that sell

anti-virus products, and even a few anti-virus products that are given away for free, most

of these products are fairly comparable at their ability to detect malware [82], at least

when it comes to detecting malware that is currently circulating in the wild [106]. This

has led to a scenario where companies have to continually add new features to their anti-

virus products in order to stand out in a crowded market place. And while these features

may have some security benefit, there is almost always an associated performance cost

[105].

1.1.2 The Cloud

New trends are emerging in computing that may offer a new direction in the fight against

malware. Among these trends is the recent emergence of “cloud” computing. Cloud

computing is not so much a new technology, as it is a new business model for computing.

Cloud computing is the delivery of computation services as opposed to computation

products. These services are typically delivered over a network such as the internet [78]1.

A major motivating factor behind the adoption of cloud computing is the potential

for cost savings [63]. For example, rather than a company providing e-mail to their

employees through their own local e-mail server, they could pay a subscription fee to

a company that provides e-mail services over the internet. This service arrangement

saves the company the cost of buying, maintaining, and administering their own e-mail

server. The procurement, maintenance and potentially the administration of these cloud

e-mail servers is not the responsibility of the company, but rather the responsibility of

the service provider. While the geographic location of the cloud servers is controlled by

1The term “cloud” came about because historically a cloud shape is used to represent the internet innetwork topology diagrams [101].

4

the service provider, the location is very much a point of interest to customers, as the

location of a service provider’s cloud servers can significantly impact the performance of

the service, as well as pose significant legal concerns for cloud customers [30].

The concept of cloud computing has its roots in the mainframe computers of a previ-

ous generation, but the technology to actually implement cloud computing really began

to take shape when grid computing and operating system virtualization started seeing

widespread successful applications. The success of these underlying technologies, coupled

with a steady increase in the speed of internet connectivity around the globe, eventually

allowed for computation services to be delivered over the internet [65, 52].

The notion of providing computation as a service can be broken down into a number of

different service categories. Among the most common service offerings are Infrastructure-

as-a-Service, where a company will offer shared hardware resources, Platform-as-a-Service,

for developing and deploying applications, and Application-as-a-Service, which is simi-

lar to the e-mail example above [78]. The notion of offering Security-as-a-Service is a

relatively new concept [29], yet the security company McAfee is already offering a cloud-

based enterprise security service that includes malware protection [26], though the details

pertaining to the architecture of this proprietary system are not publicly available.

1.1.3 Smartphones

Smartphones are fundamentally just mobile phones with some sort of personal computing

functionality. This functionality typically includes the ability to run custom software or

applications, on top of purpose-built operating systems. It is somewhat difficult to specify

the point at which mobile phones started widely being referred to as smartphones, as their

development was simply the result of continual product evolution. However, it is safe

to say that the variety of touch screen devices ushered in by Apple’s iPhone, and later,

Google’s Android devices, can be classified as smartphones. The growth of smartphones

5

sales has been extremely high, with smartphone sales reaching more than 115 million

devices in the third-quarter of 2011 [64].

1.1.3.1 Mobile Malware

Mobile malware is malware that has been written for a mobile device such as a tablet

computer or a mobile phone. The problem of mobile malware has been around for more

than a decade. Even in the pre-smartphone era there was considerable speculation as to

when malware on mobile phones would become commonplace, and what the capabilities

of said malware would be when it arrived [49]. As an emerging platform for malware, there

were many factors that dictated when malware authors would be sufficiently motivated

to begin writing mobile malware in earnest [85]. However, the tremendous increase in

smartphone use [64], coupled with the fact that smartphones increasingly store large

amounts of personal or private information, has been enough to push mobile malware

from a curiosity to a full fledged industry. In recent years the growth of mobile malware

has been dramatic, with F-Secure reporting a nearly 400% increase in mobile malware

between 2005 and 2007 [66], and McAfee Labs recording a doubling of mobile malware

samples between the beginning of 2009 and the middle of 2011. Much like desktop

malware, mobile malware ranges from mildly annoying to extremely insidious, and all

major platforms have been affected [77, 71, 33, 32, 57, 100].

Combating malware is not trivial on high-resource desktop computers, and the re-

source constraints present on mobile devices only increases the challenge of this task. It

is not simply that the processing and storage capacity of a mobile device is less than

a contemporary desktop computer, but it is the fact that the uptime of the device is

limited by the available battery power. Thus, excessive computation caused either by

malware or anti-malware code running on the device will shorten the battery life, and

decrease the usefulness of the device [37].

6

1.1.3.2 Android

Given that a large portion of this research uses the Android operating system, it is worth

discussing Android, as well as the Android security model and some of the issues around

Android security. (For the remainder of this section, unless otherwise stated, please refer

to [2] for details pertaining to the Android operating system.) The selection of Android

as the platform for this study was based on a variety of factors. When comparing the top

smartphone operating systems (Android, iOS, Windows Phone, Symbian and Blackberry

OS), Android is the only mainstream operating system which is open-source, allowing

for modification of the operating system. This, coupled with the rise of Android as a

smartphone platform made it the obvious choice [64].

Android is middle-ware developed by Google and built on top of Linux. It is targeted

at mobile devices such as smartphones, tablets and e-readers. Like many mobile operating

systems, Android has been designed to provide developers with a rich environment in

which to develop applications (or “apps”) that leverage the available physical hardware.

Android apps are written in Java, but are not executed on a traditional Java Virtual

Machine. Rather, Android includes a high performance, mobile-specific VM called the

Dalvik Virtual Machine that executes the compiled Android bytecode.

In order to create a secure operating environment, Android implements a high degree

of process isolation between apps. When an app is launched, a new process is created for

that app, owned by a user ID unique to that app. Within this process, a new Dalvik VM

is launched, within which the desired app is run. This process isolation, in conjunction

with Google’s design philosophy of “all apps are created equal”, is highly beneficial from a

security perspective. It means that flaws or exploits in a given app cannot easily result in

access to restricted data, processes or services. For example, a successful buffer overflow

attack on a particular app would only provide access to the files and process owned by

the compromised app [58], as well as any other public files present on the file system.

7

Another key component of the Android security system is the permissions model

which, broadly speaking, defines what portion of the Android API a given app has access

to, and what actions an app can perform when interacting with another app on the device

[54, 59]. For example, at install time, an application could declare that it requires access

to the internet and the ability to receive SMS messages. Before proceeding with the

installation, the user must approve these permission requests. However, an application

could potentially request a set of permissions that would allow for malicious behavior,

such as creating an application to monitor phone conversations, or track a user’s location

without their knowledge [56].

This permissions model is further complicated by the addition of the inter-process

communication model which provides a mechanism for passing messages and data be-

tween applications, or from the operating system to an app on the device. These messages

are referred to as intents, and these intents can be explicit (app A sends a message to app

B and only app B) or implicit (app A sends a message to any app which supports the

desired operation). Unfortunately, both explicit and implicit intents allow for a scenario

where an app can spoof an intent, in an attempt to gain information from the target

app. Additionally, the latter case creates a scenario where an intent can be intercepted

by a malicious app, bypassing its intended target [45].

In light of the process isolation enforced by Android, it is becoming increasingly likely

that malware in the conventional sense is being eclipsed by the issue of malicious apps

which are unwittingly installed by a user [55]. These can be applications that ask for a

specific collection of permissions that could enable malicious behavior [56] or applications

which abuse Android’s message passing system for malicious purposes [45].

Apps for Android can be distributed in a variety of ways. The most common way

is via an application market. A market is simply an app that runs on a device and

allows a user to find and install other apps. The feature that differentiates the Android

8

Market 2 model from other market models (most notably, the Apple App Store), is the

fact that developer submissions to the Android marketplace are relatively unregulated.

Submissions do not go through any sort of rigorous quality control checks. Specifically,

apps are not manually reviewed for quality and content prior to release, a hallmark of

the Apple App Store [42, 97]. While on one hand, Google’s marketplace model provides

developers with the ability to quickly take an app from development to deployment, it

also means that developers of malicious apps have fewer obstacles to overcome when

trying to quickly publish their apps to a wide audience. In order to combat this, both

the Google and Apple markets contain a remote “killswitch” that allows not only for

the removal of an app from their market, but also the remote removal of the app from

a user’s device [76]. Additionally, Google has potentially staked the reputation of their

brand on their Market, and so has a vested interest in preventing it from becoming filled

with malicious apps. Therefore, it is not surprising, given their more permissive market

model, that Google has had to actually use their killswitch to remove malicious apps

[40]. Furthermore, Google has very recently announced that due to the spate of malware

on their market, they have developed their own internal anti-malware scanning system

called Bouncer, which performs automated scanning of apps submitted to the market

[73].

Android’s market model is further complicated by the fact that a user does not need

to use the Android Market to install apps. Android allows the installation of apps

downloaded from the web, attached in an e-mail, transferred via USB from another

computer, or downloaded from any number of the third-party app stores that are available

for Android. McDaniel and Enck provide a brief discussion of some of the security

challenges presented in such a multi-market environment [76], arguing that markets by

2As of March 6, 2012 Google has grouped the Android market together with a number of their othercommercial services creating a new service called Google Play [91]. Any future references to the GoogleMarket or the Android Market, refer specifically to the market for Android apps that is now part ofGoogle Play.

9

themselves do not fail at security, because markets don’t claim to provide security. Rather

the onus is on the users to make informed decisions about what apps they install. To

that end, it is suggested that what Android needs is a level of automated application

certification in Android’s multi-market ecosystem. Thin AV, the system described in

this thesis, is intended to be a step towards this goal.

While the official Android Market comes with a built-in killswitch for the removal

of malicious apps, the only other high-profile Android market, the Amazon App Store,

does not [43]. Then there are the numerous other, less well known Android application

markets, some of which are targeted at specific geographic regions [10, 15], others that are

targeted at specific hardware platforms [4], while others still are targeted to individuals

with more salacious tastes [14]. There is even a market under development that focuses

specifically providing apps that have been banned by the official Google Market [87].

As the number of third-party app stores increases, it is likely that some of these

markets will be more interested in the quantity of apps available for download, than the

quality of those applications. It is possible that these unofficial application markets will

become significant vectors for malicious applications in the years to come. The mobile

anti-malware system, Thin AV, which is described in Section 3, is a step in combating

this malware vector. By combining an install-time application check with a market-

independent killswitch capable of notifying users of malicious apps regardless of their

source, it is possible that these non-Google Market sources can be made safer for mobile

users.

1.2 Thesis Contributions

The first main contribution of this research is the design and development of Thin AV,

a system for providing anti-virus scanning for Linux-based desktop computers. Thin AV

10

combines a set of pre-existing, third-party scanning services and offloads the scanning of

files from the host computer to these services. The evaluation of Thin AV found that

performance of the system was highly dependent on the file system activity while the

system was active, but that there were specific instances where the system performed well.

The findings from this research can help to address the performance concerns involved

in cloud-based malware scanning. This could result in a system that would be capable

of performing nearly transparent anti-malware protection from the cloud.

The second contribution of this thesis was an extension of the desktop version of

Thin AV, specifically targeted at smartphones and other mobile devices. The system

was designed and developed for the Android operating system, and the evaluation of the

system showed favorable performance, suggesting that cloud-based anti-malware scanning

may be a very good fit for providing a level of security to mobile devices.

Finally, this research includes a comprehensive examination and summary of the

current body of academic research pertaining to cloud-based security for both desktop

computers and mobile devices, as well as research regarding low-impact anti-malware

techniques which might also be suitable for mobile devices.

1.3 Thesis Outline

This thesis is divided into chapters as follows: Chapter 2 will examine the existing

research in the related fields of mobile malware, cloud based-anti-malware, as well as

research into other lightweight anti-malware systems. Chapter 3 will introduce Thin

AV, the system at the centre of this thesis, with a significant focus on the design and

implementation of both the desktop and mobile versions of Thin AV. Chapter 4 will focus

on the evaluation of the desktop version of Thin AV, while Chapter 5 will deal with the

evaluation of the mobile version. Chapter 6 will discuss the results of the evaluation as

11

well as the areas in which Thin AV could be improved, giving specific attention to the

privacy implications of Thin AV. Chapter 7 will conclude this thesis.

12

Chapter 2

Related Work

Most of the research related to Thin AV can be grouped into one of three categories:

security and malware in mobile environments, which will be discussed in Section 2.1,

cloud based anti-virus systems, which will be discussed in Section 2.2, and mobile anti-

virus systems, which are reviewed in Section 2.3. Section 2.4 contains a review of related

research that can be found in the overlap between these research areas. Section 2.5 will

discuss and critique the work that is relevant to Thin AV, but cannot be clearly classified

into any of these previous areas. Finally, Section 2.6 will conclude this chapter.

2.1 Mobile Security and Malware

The problem of mobile malware has been around for more than a decade. In that time,

the nature of the malware threat has shifted significantly. In the pre-smartphone era,

most malware came in the form of viruses or Trojan horses [66], while in recent years

most malware comes in the form of malicious applications [60]. However, Bickford et al.

have shown the possibility of developing rootkits for a modern smartphone, though their

work did not focus on a well known smartphone operating system [38]. Additionally, [95]

showed that smartphones are susceptible to more traditional denial of service attacks due

to their lack of firewalls. The same study also raised the possibility of using smartphones

as offensive platforms, though this is less promising due to their limited power.

Porter Felt et al. conducted a survey of malware found in the wild on Android, iOS

and Symbian devices [60]. Their survey found that all instances of malware for Android

devices used application packages as their vector, meaning that users were un-knowingly

13

installing the malware on their device. Interestingly, the only instances of malware on

the iPhone occurred through an SSH exploit in rooted (or “jailbroken”) devices. The

study went on to examine the incentives behind each piece of malware, most of which

were financially based, and outlined a series of practical changes to each of the mobile

platforms to help curb those incentives.

Given the current glut of mobile malware, and the rate at which smartphones are

being adopted, it is clear that mobile security has become a pressing issue. Oberheide

et al. provide an overview of security issues in mobile environments [83]. They point

out that previous approaches to mobile security are either overly entrenched in desktop

security practices, or argue for entirely new paradigms. Oberheide et al. suggest the

truth lies somewhere in between. They discuss five issues that cause security on mobile

platforms to be subtly different than in non-mobile environments: resource constraints,

different attack strategies, different hardware architectures, platform/network obscurity,

and usability.

Enck et al. performed a review of Android application security by developing a tool

for reverse engineering Java code from the compiled Android byte code, then performing

static analysis [55]. The top 1,100 apps from the Android market were downloaded

and analyzed for a host of security flaws and poor programming practices. Enck et al.

found a pervasive misuse of personally identifying information such as phone identifiers

and location information, as well as evidence of poor programming practices such as the

writing of sensitive data to Android’s public centralized log. Fortunately, no evidence was

found of exploits in the Android framework, or the presence of malware in the collection

of analyzed apps. However, given that the apps selected for study were the top apps in

the Android market place, this likely resulted in a bias towards higher quality code than

might be found in a broader cross-section of apps.

Chin et al. performed a study of Android inter-process communication (IPC) that is

14

complementary to the analysis in Enck et al. [45]. Using ComDroid, a custom static code

analysis tool, one hundred of the top Android applications in the Android Marketplace

were examined for vulnerabilities in how they sent and received IPC messages (intents).

Numerous vulnerabilities were identified, as well as several instances of misuse of the

Android framework. These findings motivated a collection of programming best-practice

guidelines for Android programmers.

The same 1,100 apps from [55] were also studied by Barrera et al. with the goal of

understanding how the Android permission model is used in practice [35]. The study

found that the use of Android permissions showed a distinctly heavy tailed distribution,

with some permissions being employed in most apps (e.g., access to the internet) while

most other permissions were comparatively rare. Ultimately, it was concluded that the

Android permissions model could be improved by sub-dividing certain broad permissions

(e.g., internet access) to provide a more expressive model, while at the same time rarely

used permissions with related functionality could be grouped together (e.g., install /

uninstall applications). The findings of Barrera et al. are also in keeping with those

of Ongtang et al. [86]. Here, various elements of the Android permissions model were

enhanced and modified to accommodate a richer, more expressive set of permissions.

The Android permissions model was also examined by Felt et al. when they examined

the issue of overprivilege in applications [59]. By mapping the Android API, it was

possible to determine which API calls required which permissions. Using this permissions

map they were able to build a tool, Stowaway, to examine several hundred Android

applications, finding that almost a third of Android applications over-request permissions.

Additionally they found several that the Android permissions model is severely under-

documented, and in some cases, incorrectly documented.

15

2.2 Cloud Based Anti-Malware

The notion of cloud-based malware scanning was first posited in [81], addressed at length

in [82], and was a significant source of inspiration for the creation of Thin AV. The system

described by Oberheide et al. is called CloudAV. It involves running a local cloud service

consisting of twelve parallel VMs, ten of which run different anti-virus engines, and two

running behavioral detection engines. End hosts run a lightweight client (300 LOC in

Linux, and 1200 LOC in Windows) which tracks and suspends file access requests, until

the file has been scanned. The use of several heterogeneous scanning engines dramatically

improved threat detection, with Oberheide et al. claiming a 98% detection rate when

testing with the Ann Arbour Malware Library. Such a high detection rate does increase

the risk of false positives. However, it was found that by requiring at least four of the

scanning engines to flag a file as malware, false positives could be eliminated, while the

overall detection rate only dropped by 4%.

Given that CloudAV was deployed with dedicated scanning servers in a LAN envi-

ronment, the performance impediment from network latency and system load is minimal.

This results in an average file scan time of just over one second. This process is sped up

through the use of caching, which was shown to be highly effective, producing a 99.8%

hit rate with a primed cache. The performance of Thin AV could give an indication as

to how such a remote scanning system like CloudAV would perform over a WAN, where

network latency can be significant.

Following their success on the desktop, Oberheide et al. applied their strategy to a

mobile environment [84]. Their results showed a marked reduction in power consumption

and improved malware coverage. However, they failed to provide any information on how

fast their solution operated in the lower-bandwidth / higher-latency mobile realm. Con-

versely, in their examination of the trade-offs between energy consumption and security,

16

Bickford et al. showed that cloud-based anti-malware scanning is more energy intensive

than host-based scanning when performed on a mobile device capable of running a VM

hypervisor [37]. Although, it should be noted that the latter study was an examination

of cloud-based rootkit detection, not virus detection, and both implementations differed

greatly. Therefore, the latter is not necessarily a refutation of the results of Oberheide

et al.

A novel extension to cloud-based malware scanning was provided by Martignoni et

al. [75]. They implemented a system wherein suspect executables are uploaded to a

cloud-based analysis engine. The system executes the malware, intercepting the system

calls generated by the execution, and where necessary, passing those system calls back

to the original host. The rationale behind the approach is that most malware behavior-

based detection engines are based on running malware samples in a highly synthetic

environment. Yet often, the malicious characteristics of a piece of code are only triggered

by a very specific processing environment on the target machine (e.g., visiting a specific

banking web site). Like Thin AV, this approach reduces the user’s risk of infection, but

this approach provides the scanning service with a much more diverse set of computing

environments in which to test potentially malicious code, thus improving coverage when

seeking malicious behavior in a piece of code. Such a system, implemented in a VM,

would make a compelling addition to other cloud-based anti-malware systems such as

Thin AV or CloudAV.

Jakobsson and Juels described a strategy for malware scanning that also relies on

external computing resources [70]. Their technique allows trusted servers to audit the

activity logs of remote clients in an effort to establish the security posture of the clients.

The trusted servers, in most cases, would be owned and operated by institutions suscep-

tible to malware-based fraud, such as banks. A client-based agent would be responsible

for logging activity on the client such as file downloads and installations. This log file

17

could then be sent to the trusted server which would then allow the server to decide

whether or not to proceed with the transaction with that particular client.

Jakobsson and Jules claim their technique is secure against log tampering because any

events that could result in a malware infection occur only after the event in question has

been logged, and the log has been locked. However, they do not address the case where

their agent software would be installed on an already compromised machine. Because only

logs are being processed, this technique is well suited to low powered mobile environments,

where bandwidth is limited. Additionally, because logs and not entire files are being

transmitted, the privacy concerns are somewhat less than those presented by Thin AV,

where whole files are transmitted.

Clone Cloud [47] and MAUI [48] are both systems designed to enhance the processing

capabilities of smartphones by offloading intensive processing to highly resourced cloud-

based servers. The designers of Clone Cloud were the first to envision a system capable

of offloading smartphone malware scanning on to more powerful cloud-based hardware.

However, the ability to perform intensive malware scans was posited as only one of

many possible applications of their approach. Although, it should be noted that the

notion of moving intensive processing from mobile devices on to more powerful servers

predates Clone Cloud by many years [94, 61], yet Clone Cloud is the first system to apply

this practice to modern smartphones, and the first to consider the potential security

applications of such an approach.

Paranoid Android is an implementation of a cloud-based anti-malware system which

follows very closely on the heels of Clone Cloud [89]. The technique involves replicating

an entire mobile device in a virtualized server-based environment. System calls on the

physical device are recorded, and transmitted to the server where the user’s behavior is

replicated. This allows the server to maintain a faithful copy of the user’s device most of

the time (barring network disruptions). This server-based replica can be scanned using

18

traditional CPU intensive techniques that would not be feasible on a mobile device. A

major upside of this approach is that once a replica has been established on a server, the

amount of traffic necessary to maintain a consistent state is quite small. The obvious

downside, like CloudAV and others, is the privacy concerns involved in replicating a

device which very likely contains personal information. However, such a solution would

be ideal in a highly-managed corporate environment where worker privacy on company

provided devices is not a given.

Finally, private security company BitDefender also developed a cloud-based anti-

malware product [46]. In their solution they suggest that only the signature based scan-

ning portion of the malware-scan should be offloaded to the cloud. Their reasoning behind

this is that more than 90% of the size of BitDefender is composed of the static signature

based scanning engine. Therefore, if the less intensive operations such as heuristic scan-

ning remain on the client, and signature-based scanning is done remotely, then network

traffic can be kept to a minimum. For privacy reasons, they also opt to have users only

upload cryptographic hashes of their files for analysis, only uploading the whole file in

the event that a hash cannot be matched. This is very similar to the approach used by

Thin AV which will be discussed in Chapter 3.

2.3 Device Based Mobile Anti-Malware

There are a host of anti-malware systems which are designed to run on resource con-

strained mobile devices. VirusMeter is a proposed approach for general malware detection

in a mobile environment [72]. The approach involves detecting malware by monitoring

battery consumption. The assumption is that if the battery consumption of benign be-

havior can be adequately modeled, then deviations from that model will suggest the

presence of unauthorized code. The key issue with their approach is that even the best

19

case scenario has more than a 4% false-positive rate. This is high for a malware scanner.

More importantly, their system was prototyped on a comparatively old mobile device,

and it is unclear if their approach would work effectively on a modern smartphone which

typically runs a diverse collection of rich media applications capable of quickly draining

a device’s battery.

Heuristic based anti-malware scanning is conducive to mobile platforms simply due

to its reduced overhead. The approach in [102] identifies malware based on the pattern of

DLL usage in a program. Venugopal et al. observed that many malware programs share

similar behaviors, and these behaviors are accessed through DLLs. Furthermore, the

spreading mechanisms and targeted exploits of viruses in the mobile domain are different

than those in the desktop domain, so the heuristic methods from the latter domain

cannot be applied to the former. By developing a heuristic system and training it on a

collection of Symbian viruses, they were able to successfully identify 95% of other (non-

training set) Symbian malware, with no false positives. Much like VirusMeter, the most

obvious problem with this solution is that it was developed in a pre-smartphone world.

Smartphones typically now run a diverse, customized collection of mobile applications. In

a software environment where new applications with novel functionality are being released

on a daily basis, it raises questions about the efficacy of such a heuristic technique, or at

the very least, about the rate of false positives in such an environment.

A similar strategy for malware identification on Android-based mobile devices can

be found in [98]. The strategy involves using Linux-based tools to analyze the low-level

function calls of ELF files. They then use various heuristic techniques to classify a file

as malicious or clean depending on the functions being called. They also suggested a

technique for combating infection by having co-located mobile devices collaborate to

identify malware. Prior to their work on Android, Schmidt et al. developed a technique

for instrumenting Symbian and Windows Mobile devices with the intention of recording

20

user behavior for the purposes of remote anomaly detection [99].

Jakobsson et al. provide a novel technique for mobile malware detection called memory

printing [68, 69]. This is done using a cryptographic function which fills the free RAM

on a phone. The key property of this cryptographic function is that it takes dramatically

longer to compute if the function is configured to use more space than actually exists.

When scanning for malware, legitimate applications in RAM are swapped on the flash

disk. Therefore, when the function is executed, if there is less RAM available than should

exist (due to a piece of malware), the memory-printing function will take much longer

than it would if no malware were present. Jakobsson et al. assert that because malware

can only exist in secondary storage or in RAM, any malware that is not detected in

the RAM scan will be found when secondary storage is scanned. Unfortunately, in this

approach, secondary storage is still scanned via white/black lists, signatures or heuristics,

which, as was pointed out in [83], are not efficient strategies in a mobile environment.

Finally, a more preventative approach to mobile malware can be seen in Kirin [56].

Kirin is a system developed from a security requirements analysis of the Android per-

missions system. It checks a collection of rules which, if violated, may indicate that

an application being installed is capable of malicious activity. The study examined 311

Android applications and found ten positives, five of which were false. The key draw-

back of their system is that user intervention is still required to validate positive results.

Unfortunately, Kirin only prevents the installation of malicious applications which are

installed from third-party sources and not the official Google Market.

2.4 Non-Device Based Mobile Anti-Malware

Due to the processing and battery limitations of smartphones, there is a trend in research

towards anti-malware solutions for mobile devices that rely on a remote server, be it a

21

cloud service, a more conventional centralized server, or even a desktop / laptop.

SmartSiren is a centralized malware mitigation solution targeted at smartphones [44].

The system is the first example of decoupling security from smartphones. It is targeted

at the scenario where some smartphones do not have any AV software installed. Smart-

Siren consists of an agent that monitors phone behavior, and a proxy with whom the

agent communicates. The agent monitors general behaviors such as SMS and Bluetooth

traffic, as well as information about the phone, such as the cell towers to which it most

frequently connects. The proxy receives reports from participating agents and aggregates

these reports in an effort to find evidence of misbehaving smartphones. Their two key

techniques for detecting malware are statistical and anomaly monitoring. The former

looks to see if the phone’s capabilities are being used significantly more than would be

expected based on historical data. The latter attempts to identify phone numbers (which

would charge a fee to the phone’s owner) that are being contacted by the malware. As

malware using these phone numbers spreads, calls and messages to such numbers would

gradually rise in frequency. Alternatively, if the malware attempts to spread through the

smartphone’s contact list, fake contact list entries are used to determine if a device is

infected. Upon detection of an infection, the device owner is notified by SMS about the

infection, and how to deal with it. Additionally, individuals on the user’s contact list are

notified that they might be at risk of infection. Finally, all individuals who are signed

up with the SmartSiren service and frequent the same cell towers as the infected device

will be monitored.

One of the strengths of the work is its focus on the privacy of the individual. Re-

ports can be submitted by users, divulging as little personally identifiable information

as possible. There are some key drawbacks. First it seems much more likely that most

smartphones will not have any AV software, as opposed to the opposite. It is unclear how

well the system would behave when it is operating on very incomplete information. Fur-

22

thermore, the concept of statistical monitoring might falter in a high-traffic, high-noise

environment generated by today’s rich-media apps. Additionally, it seems like the system

is reliant on having a historical record of clean data against which it can compare current

data to ferret out abnormalities. There is no indication as to how this system would

cope when introduced into a potentially tainted environment. Finally, it is questionable

how well the third-party notification system would work in reality. Years of exposure to

adware has left users wary of seemingly inexplicable automated messages that tell them

their electronic devices are at risk of being infected with malware. It is quite possible

that such messages will be assumed to be spam and ignored. The worst possible sce-

nario would be when these third-party notifications are believed to be an indication of

an infection on their own device, leading to unnecessary attempts at malware removal.

Dixon and Mishra describe a system which uses a desktop or laptop as the anti-

malware analysis device [51]. Rather than relying on remote cloud-based services or

other network-intensive techniques, their system validates the contents of a mobile device

when it is connected to a user’s computer via USB. File hashes are used to identify files

that have not been analyzed for malware (be they modified or new files), and only the

files corresponding to novel hashes are sent to the desktop for analysis with standard

anti-malware software. In order to combat a sophisticated attacker which could send

false hashes to the validating system, a keyed hashing mechanism would be used, with

the key being provided by the external validating system.

A broad vision for hindering the spread of malicious apps can be found in Stratus,

a theoretical system proposed by Barrera et al. [34]. Stratus is a system comprising a

collection of information sources and services, such as developer registries, application

databases and remote application killswitches. The purpose of these entities is to help

provide some of the security guarantees that can be found in a single application market,

but in a multi-market environment. The proposed system was discussed with Android

23

in mind, both due to the increasing number of application markets available for An-

droid, and also due to the increased prevalence of malicious apps available for Android.

The backbone of the Stratus system is a universally unique application identifier that is

maintained by Stratus and is unique across all application markets.

Stratus is a well thought-out, but as of yet, unimplemented idea. The Stratus system

could potentially provide a framework in which Thin AV could operate. Specifically,

Thin AV could operate in conjunction with either the application databases, or the kill

switches. As such, the Stratus system is highly complementary to the goals of Thin AV

in the mobile setting.

2.5 Other Lightweight Anti-Virus Techniques

Miretskiy et al. [79] offer a highly integrated anti-virus solution in Avfs, which makes two

key contributions. In addition to modifying the open source anti-virus system ClamAV

to significantly increase the speed of scanning, they tied their modified ClamAV (called

Oyster) into a stacked file system such that AV scanning is part of the file system, a

technique adopted by Thin AV. This has several advantages. It allows the scanning of

files at the earliest possible moment, as opposed to simply scanning when files are opened

or closed, which introduces a window of vulnerability. Additionally, it speeds up scanning

because scanning is taking place at the kernel level, as opposed to intercepting system calls

or message passing. To this end, Miretskey et al. claim that their system demonstrates

less than 15% overhead above a standard non-Avfs based system. Finally, by integrating

anti-virus scanning into the file system, scanning is completely transparent to the user

and any infections can be easily quarantined such that no system process can access the

files. Their file system also supports a mode for built-in post-infection forensics. The key

drawback of the system is its reliance on ClamAV, which compares somewhat unfavorably

24

to commercial AV products [82]. While it is obvious that ClamAV was chosen because it

is open-source, it would have been interesting to see how Avfs would operate when tied

to a proprietary AV system.

A method for improving the speed of conventional malware scanning is provided

by Cha et al. in the form of SplitScreen[41]. Both the number of signatures and the

number of files that need to be scanned are dramatically reduced by using SplitScreen’s

two-pass technique. This is done by performing a first-pass scan of all the files using

a feed-forward Bloom filter (FFBF) with a pre-calculated bit-vector hash of all known

malware samples. The defining feature of any Bloom filter is that it is very fast, it

may produce false positives, but it never produces false negatives. In this way, a system

can quickly be scanned for potential malware. If a possible candidate file is found (i.e.,

possibly infected with malware), the full anti-virus signature needed to positively identify

the malware can be downloaded from an external repository, and these candidate files

are rescanned using a conventional signature-based scanning approach. Because whole

files are not transferred, less user data is exposed and privacy concerns are somewhat

mitigated. Furthermore, Cha et al. claim their technique produces a doubling of scanning

speed and a halving of memory usage. They suggest that this inclines their technique to

being adopted on mobile devices. However, they do not address the worst-case runtime

of their solution, which appears to be worse than a standard AV scan. Furthermore, the

speed of SplitScreen is heavily dependent on cache-based optimization. The study results

show decreasing performance on CPUs with smaller L2 caches, yet the mobile devices

their system is targeted at are not currently endowed with very large caches.

25

2.6 Summary

It is clear that malware is a problem for users of desktop computers. However, this

problem has now spilled over into the potentially lucrative domain of smartphones. Given

that conventional anti-malware strategies have resulted in unwieldy and low-performance

software solutions, significant effort has been dedicated to non-conventional approaches

to malware detection. The notion of using cloud resources to aid in malware detection is

a relatively new concept. However, in reality many of these efforts make use of non-local

resources which do not necessarily conform to the computing-as-a-service model which

defines true cloud computing.

The resource limitations of mobile devices have necessitated new research efforts into

mobile anti-malware strategies, some of which of are device based, and some of which

rely on external computing resources. However, to date, no strategy for mobile malware

detection has been shown to be clearly superior. This is why Thin AV is a compelling

avenue of research. If it is possible to provide a reasonable level of protection for desktop

computers and mobile devices using pre-existing shared resources for malware detection,

it may be possible to significantly reduce the computing burden on these devices.

26

Chapter 3

System Architecture

This chapter presents an overview of the Thin AV anti-malware system, both on the

desktop and mobile platforms. The system was designed to have a modular architec-

ture, with separate modules for the individual scanning services. Section 3.1 will provide

background on the different scanning services that Thin AV uses. Section 3.2 details the

implementation of the desktop-based implementation of Thin AV. The mobile implemen-

tation of Thin AV is discussed in Section 3.3.

3.1 Scanning Services

The goal of Thin AV was to develop an anti-malware solution that offloads the chore of

scanning to third-party malware scanning services. At present, the system can scan files

with one of three different scanning services that are freely available online: Kaspersky,

VirusChief, and VirusTotal. These services all behave similarly insofar as a user can

upload any type of file (executable, data, etc.) through the website of the service and

receive a report as to any malware that might be contained in that file. Unfortunately,

these scanning services are based on proprietary anti-malware engines, and as such, the

exact details of the engines underpinning these services are very closely held trade secrets.

Therefore, the exact capabilities and limitations of these services with respect to threat

detection are not publicly known. Consumer testing of these anti-virus products can

provide some clue as to their capabilities [93], though in order to be in compliance with

the end-user license agreements (in the United States), these consumer tests must be

limited to black-box testing methodologies [24].

27

3.1.1 Kaspersky

Kaspersky Lab [13] offers a free service for scanning individual files [12] that are 1 MB

or smaller in size. The service scans files uploaded to the website using Kaspersky’s

proprietary anti-malware engine and returns a diagnosis to the user’s browser.

3.1.2 VirusChief

VirusChief [21] is a multi-engine based malware scanning service with a 10 MB file size

limit. Similar to Kaspersky, users upload files through their browser. Once received,

the file is scanned using up to 13 different scanning engines. Results from each scanning

engine are returned to the user’s browser via AJAX.

3.1.3 VirusTotal

VirusTotal [22] is a multi-engine based scanning service offered by Hispasec, which can

also scan files up to 20 MB in size.1

VirusTotal scans files with 42 different scanning engines, including many of the same

engines found in VirusChief as well as the Kaspersky engine. VirusTotal is also unique in

that in addition to their website, they offer a semi-public API for accessing their service.

Individuals can apply for a key which can be used to call the API for the purpose of

uploading files and retrieving reports. VirusTotal also attempts to increase performance

by storing cryptographic hashes of all uploaded files. That way, if a file has been scanned

previously, the report generated from the previous scan can be returned as opposed to

completely re-scanning the file.

1The file size limit for VirusTotal was raised to 32 MB some time during early 2012. However, allThin AV evaluation relevant to VirusTotal was completed when the 20 MB limit was still in place.

28

3.1.4 Other Services

Two additional scanning services were examined for inclusion in Thin AV, but were

deemed to be inappropriate for Thin AV modules. VirScan [20] is a multi-engine web-

based file analyzer. However, the scanning process used by VirScan checks the uploaded

file sequentially, with each of 37 different scanning engines. Although VirScan is designed

in a way that could be used to power a Thin AV module, the sequential scanning results

in extremely poor performance, and for that reason was not selected for inclusion in Thin

AV.

FileAdvisor [8] by Bit9 was another service considered as a candidate for Thin AV.

However, FileAdvisor is not actually a real-time scanning service. Rather, it is a large

database of malware scan results. Users can upload a file, or simply a cryptographic

hash of a target file, and FileAdvisor will return information regarding previous malware

scans of that file. However, this is contingent upon a matching hash being found in

the FileAdvisor database. This means that novel files will always fail to return results.

Furthermore, although FileAdvisor boasts a database of more than 7 billion files [8], the

database is heavily biased towards Windows files. Finally, this service offering has a limit

of 15 lookups per day, making FileAdvisor a very poor candidate for inclusion in Thin

AV.

3.1.5 Terms of Service

Of the three scanning services included in Thin AV, only VirusTotal provides a terms of

service agreement for their offering. The pertinent language in the agreement as it applies

to Thin AV is that users must “abstain from any activity that could damage, overload,

harm or impede the normal functioning of Hispasec websites” [23]. To a large extent, this

requirement is automatically enforced given that VirusTotal’s API only allows a limited

number of requests from each user over a given time period.

29

Given that Thin AV was developed as a experimental proof-of-concept, the workloads

produced were comparatively small. Even so, attempts were made, where possible, to

minimize traffic to these scanning services during testing and performance evaluation by

making use of a simulator (discussed in Section 4.3.1) as opposed to actually uploading

files for scanning. Given the configuration of Thin AV on the desktop, any sort of large

scale deployment would be in violation of VirusTotal’s terms of service, and would surely

draw the ire of Kaspersky Lab and VirusChief. However, the configuration of the mobile

version of Thin AV is considerably less taxing on the third-party services, as such, it

might be possible to run a production-scale deployment of the mobile version of Thin

AV without violating VirusTotal’s terms of service, or overly taxing Kaspersky Lab and

VirusChief.

3.2 Desktop Thin AV System

The desktop-based implementation of Thin AV was written in Python 2.7 and deployed

on Ubuntu Linux 11.04 running a modified Linux kernel. The kernel which was originally

packaged with this version of Ubuntu (2.6.39.0) was replaced with version 2.6.36.4 so as

to be compatible with the stackable file system discussed in Section 3.2.1. The hardware

platform for development and testing of Thin AV was a laptop with an Intel Core i5

M460 CPU (2.53GHz) and 4 GB of RAM. The native operating systems were Windows

7 Professional SP1 64-bit, and Ubuntu 11.10. The desktop version of Thin AV was

developed and tested on Ubuntu which was deployed in a virtual machine running in

VMWare Player in Windows 7.

The anti-malware system has three major components (Figure 3.1): the DazukoFS

[7] stacked file system, the file system access controller, and the Thin AV anti-malware

scanner. The former, DazukoFS, was originally developed by Avira Operations GmbH

30

& Co. KG [5], before being made freely available under a BSD license. The latter two

components are original developments. In addition to these components, a testing pro-

gram (standalone runner) was developed which provided access to Thin AV independent

of DazukoFS and the file system access controller. Each of these components will now

be described in detail.

User  Space

Third-­‐party  Scanning  Services

Thin  AV

Thin  AV  Cache

VirusTotal  Module

VirusChiefModule

Kaspersky  Module

VirusTotal  Service

VirusChiefService

Kaspersky  Service

File  System  Access  Controller

DazukoFS  API

Kernel

DazukoFS  Stacked  File  System

Linux  File  System

Standalone  Runner

Figure 3.1: System architecture for Thin AV

31

3.2.1 DazukoFS

DazukoFS is a stackable file system that allows user space programs to perform file access

control at the kernel level. For the purposes of this work, version 3.1.4 of DazukoFS was

used. This version of DazukoFS installs as a kernel module for version 2.6.36 of the

Linux kernel. In addition to the kernel module, DazukoFS also provides an API library

for interacting with the mounted file system.

DazukoFS can be mounted on top of any directory in the Linux file system with the

exception of the root directory. Once mounted, all file access requests that take place

in that directory (or any subdirectories) will be intercepted by DazukoFS. A user space

program (using the DazukoFS API library) is then responsible for permitting or denying

access to the requested file. In the case of Thin AV, if a file is deemed to contain malware,

the file access operation is terminated and the user is informed that the file access was

not permitted.

3.2.2 File System Access Controller

The file system access controller is a user space program that runs in the background

and is responsible for allowing or denying file access requests. The controller for Thin AV

was written in Python, with the CTypes module providing access to the DazukoFS API

library. The controller creates an instance of the thinAv class, which in turn is responsible

for telling the controller whether or not the file being accessed contains malware.

There are currently five different infection statuses. All three services can specify that

a file is either clean or infected, meaning that the service returned a conclusive result.

However, the API powering the VirusTotal module demanded the inclusion of three

additional statuses: waiting, postponed, and questionable. “Waiting” indicates that a

file has been uploaded, but that the service must be checked later for the completed

report. “Postponed” means that an upload was denied because more than 20 files have

32

Scan Result Permissive Restrictive PassiveClean

√ √ √

Infected√

Questionable√

Waiting√ √

Error / Postponed√ √

Table 3.1: Thin AV security policy matrix. A check mark indicates whether file accessis allowed under each security policy for a given scan result.

been uploaded via the API in the last 5 minutes. Finally, because VirusTotal scans with

more than 40 anti-malware engines, there is an increased risk of a file falsely being labeled

as infected. Therefore, a threshold value is used to alleviate some of the false positives.

If the number of scanning engines indicating a file is infected is less than four, but more

than zero, the file will be labeled as “questionable”. For the desktop implementation of

ThinAV the threshold value is hard-coded.

Once the file infection status has been returned to the controller, a determination must

be made as to whether or not to allow access to the file. This determination is based on

the security policy of the access controller, which is set at run-time via command line

argument. There are three policies implemented in Thin AV: permissive, restrictive, and

passive. The permissive policy will allow access to any file that is not explicitly infected,

or questionable (possibly infected). The restrictive policy will only allow access to files

that are explicitly labeled as clean. Finally, the passive policy never prevents access to a

file, rather it will simply alert the user to the presence of any malware via the terminal.

Table 3.1 outlines the various security policies. It should be noted that these policies

apply only to the desktop version of Thin AV, and not to the mobile version. The reasons

for this will be discussed in Section 3.3.3.

33

3.2.3 Standalone Runner

A small user space Python script was developed which is capable of instantiating a thinAv

class and scanning a file, independent of the file system access controller, or DazukoFS.

This was developed for the purpose of debugging and performance evaluation (Section

4.1) of the Thin AV system, as well as on-demand scanning of individual files.

3.2.4 Thin AV

The Thin AV Python program is the core of the anti-malware scanning system (Figure

3.2). The thinAv class possesses a scan function, which takes a filename, and optionally,

a file descriptor and the name of a specific scanning service, and returns a status code

indicating whether or not the named file is free of malware. This status code is the result

of either an online scan of the file, or a successful search through the local Thin AV cache.

The Thin AV local cache is simply a flat file of previous scan results, which is read

after a scanning module is instantiated. Prior to uploading a file to one of the scanning

services, the local cache is first checked. This way if a file has already been scanned, a

quick local lookup is all that is necessary to allow access to the file. The cache contains

an MD5 hash of the file being analyzed, the full path to the file, the number of times

Thin AV has been asked to analyze the file, the last time such an access has occurred, the

infection status of the file, a note for additional scan details, and the module that was

used when the file was analyzed. Because files are identified by their MD5 hash, it avoids

having to track and compare file modification times to determine if the file (in its current

form), has already been scanned. MD5 was chosen, in spite of its flaws [104], as the hash

function for file identification in Thin AV. This is primarily due to the speed of MD5

when compared with other hashing functions available in the Python hashing module

(Table 3.2). However, given that VirusTotal supports a variety of hashing functions,

changing Thin AV to use an alternative hashing function would be trivial.

34

MD5 SHA1 SHA256 SHA512Time (seconds) 2.37× 10−3 9.12× 10−3 2.40× 10−2 5.04× 10−2

Overhead compared to MD5 N/A 384.66 % 1013.45 % 2125.71 %

Table 3.2: Speed comparison of the hashing functions available in the Python hashlib

module. Speeds are based on the average time required to hash a 1 MB file of pseu-do-randomly generated data with each hashing function. The average time is the resultof 10 trials on the hardware described in Section 3.2.

In order to determine whether or not a file contains malware, thinAv must instantiate

a scanning module. At present, the desktop version of Thin AV has four scanning mod-

ules, one for each of the three scanning services described in Section 3.1, and a simulator

module which is used for performance evaluation purposes (Section 4.3.1), and does not

actually scan files for malware. All of the scanning modules inherit core functionality

from the thinAvParent class, which is responsible for providing functions for interacting

with the local cache, and uploading multipart/form-data via HTTP POST requests.

At present, the choice as to which module will be used when scanning is based on

the average amount of time each module takes to scan a file, with the fastest service

(Kaspersky) being selected first, followed by VirusChief and finally VirusTotal. If any

scanning module returns an error from an attempted online scan, then the next module in

the priority sequence is selected. If all three scanning modules fail, a general error code is

returned to the calling program (be it the file system access controller or the standalone

runner). The performance measurements which formed the basis of this decision are

discussed in Section 4.1.

3.2.5 Scanning Modules

Thin AV accesses the Kaspersky scanning service by simply constructing a HTTP POST

request with the appropriate fields, and searching the body of the HTTP response for

text strings which indicate whether Kaspersky has deemed the file to be clean or infected.

35

Using VirusChief for scanning in Thin AV is slightly more complicated than the

method used with Kaspersky. First, VirusChief checks for a cookie prior to scanning,

this means that Thin AV must initiate a HTTP GET request to the service in order to

procure a valid session ID. The file of interest is then uploaded via HTTP POST, along with

the session ID, and a report ID is returned. Because results are returned asynchronously

via AJAX, Thin AV polls the service once every second to check for scan results. Once

at least four scan results have been returned, the results are parsed, and the scan result

is returned.

Because VirusTotal provides an API for interacting with their scanning service, the

corresponding Thin AV module is somewhat different from the other scanning modules.

VirusTotal caches scan results, so prior to uploading a file, the hash of that file can be

checked against the VirusTotal database. If a match is found, the full report for that

file will be returned. If a match is not found, then the file can be uploaded, in full,

via an HTTP POST request, returning a report ID which can be used to look up the

scan report once it has been completed. Unfortunately, scan requests from the API are

given the lowest priority by VirusTotal, making the response time for a file scan highly

unpredictable. Although it would be possible to simply have the Thin AV module wait,

and periodically poll VirusTotal for the report, this would result in impractically long

wait times. Therefore, if a report is not immediately available for a file, the module will

return a status code indicating it is waiting for a result. Finally, the VirusTotal API only

allows 20 uploads every 5 minutes. If Thin AV exceeds this maximum, the module will

return a status code indicating that VirusTotal is temporarily unavailable.

3.2.6 System Circumvention

In the current implementation of Thin AV, there are a number of security holes that

would have to be addressed were a production scale system to be implemented. The two

36

Figure 3.2: UML Class Diagram for Thin AV.

most obvious avenues for attacking Thin AV are via a man-in-the-middle attack, and by

attacking the Thin AV local cache.

Of the three scanning services, only VirusTotal allows users to upload files via HTTPS.

This means that all traffic sent between Thin AV and both Kaspersky and VirusChief, is

sent in the clear (unencrypted). It is possible that an attacker might be able to intercept

this traffic by taking advantage of weakly secured (WEP) or public Wi-Fi, or by using an

attack such as ARP cache poisoning. If successful, an attacker could modify the results

37

returned by the scanning services to indicate that the uploaded file is free of malware,

when this might not be the case. Unfortunately, the only solution to this attack is to

rely on communication via HTTPS, which is a decision that rests in the hands of the

scanning service providers.

Additionally, because the Thin AV local cache is implemented as an unencrypted text

file, it is a ripe target for any potential malware. If a piece of malware could edit the

local cache, then subverting Thin AV would be as trivial as flagging known malicious

files as clean. One possible solution to this problem would be to encrypt the Thin AV

local cache file when it is not being written or read. Any changes to the encrypted file

would then result in an inability to correctly read the cache file, which would indicate

the presence of malware on the system. Unfortunately, key management under such a

scenario would still be problematic.

One possible, though highly improbable attack on Thin AV, would be to take advan-

tage of the lack of collision resistance in the MD5 hashing algorithm used by Thin AV

to identify files. Given that it is possible to construct two distinct inputs which produce

the same MD5 hash [104], it is possible, though, extremely unlikely, that an attacker

could construct a piece of malware that, when hashed, produces the same output as

a previously scanned benign file. Thin AV would then mistake this piece of malware

for a previously scanned file, and allow the execution of the file. As mentioned earlier,

changing the hash function used by Thin AV would be a trivial fix for this flaw.

A more problematic issue, is that of file size. Because the largest file that can be

scanned by Thin AV is 20 MB (using the VirusTotal service), this means that any files

in excess of 20 MB will be ignored by Thin AV. As such an attacker could simply write

a very large piece of malware in order to infect a system.

Finally, given that Thin AV is intended to operate only on files to which a normal

(non-root) user can edit (e.g., files in their home directory), any malware which makes use

38

of a privilege escalation exploit, such as a rootkit, could potentially circumvent Thin AV

in a variety of ways, from modifying the cache file (as mentioned above), to replacing the

Python interpreter with a corrupted executable, thereby circumventing the very behavior

of Thin AV. However, given that this work is focused on the feasibility of a light weight

cloud-based anti-malware system, defending against such attacks is beyond the scope of

this work.

3.3 Mobile Thin AV System

The mobile version of Thin AV was implemented on the Android platform. As mentioned

in Chapter 1, the decision to use Android was based both on the modifiability and the

wide-spread adoption of Android.

Because of the application isolation created by the combination of unique user ID

and processes for each application, the key threat to the Android user comes not from

traditional vectors for malware such as drive-by-downloads and application exploits, but

rather from malicious apps that are unwittingly installed by a user [60]. In order to

combat this threat, Thin AV on Android is application-centric and not file-centric (as

on the desktop). Specifically, the mobile version of Thin AV is focused on non-system

applications, that is, applications which were installed on the device after it was released

by the manufacturer or carrier. This was done for two reasons: first, unless a user has

“root” access on a device, the un-installation of system apps is not possible; second, it

seems unlikely that a manufacturer would intentionally install a malicious application on

their product.2

The implementation of Thin AV for Android is an extension of the desktop scanning

system. The top-level portion of Thin AV as outlined in Figure 3.1 was, with minimal

modification, re-tasked to act as a unified front-end and caching mechanism for a web-

2Recent events have shown this might not necessarily be the case [74].

39

based scanning service used by the Android implementation. This web-based scanning

service is then used in two different ways by the Android device: first, the “safe installer”

provides a way of verifying Android applications (APKs) during installation, and second,

the “killswitch” informs users when already installed applications have been found to be

malicious. Figure 3.3 shows the overall system architecture of Thin AV for Android. Each

of the key components of the mobile Thin AV system will be described in the following

subsections.

All Android development for Thin AV was done on the same hardware platform

described in Section 3.2 and deployed on a virtualized Android device running version

2.3.7 of Android, also referred to as Gingerbread. This version of Android was selected

for development because 2.3.7 was the most up-to-date version of Gingerbread before the

project was forked and Android 3.0 (Honeycomb) was developed specifically for tablets.

The platform changes in Honeycomb and Gingerbread were merged back into a unified

platform in version 4.0 (Ice Cream Sandwich) which, at the time of writing is the most

current version of Android. Ice Cream Sandwich was released in November of 2011, and

as such, deployment of this version is extremely limited, while Gingerbread constitutes

a large portion of the Android install base [92]. The availability of documentation,

examples, and the pervasiveness of Gingerbread, made it the ideal version for Android

development.

3.3.1 Reuse of Existing Thin AV System

The existing Thin AV system from the desktop implementation was modified to serve as

a unified scanning service for the mobile implementation of Thin AV. This was beneficial

because not only did it build upon the work which had already been completed for the

desktop version, but it also provided two key benefits to the system architecture. First, it

minimized the amount of code running on the Android device, and second, it put a layer

40

Thin  AV  Mobile  Extension

Desktop  Thin  AV  Implementation

PackageInstaller

Thin  AV  Killswitch

Android  Device

Third-­‐party  Scanning  Services

Thin  AV

Thin  AV  Web  Interface

Thin  AV  Cache

VirusTotal  Module

VirusChiefModule

Kaspersky  Module

ComDroid  Module

VirusTotal  Service

VirusChiefService

Kaspersky  Service

ComDroid  Service

Web  Site

E-­‐Mail

USB

Third-­‐Party  Market

Official  Google  Market

Application  Sources

G Application  Repository

Thin  AV  Safe  Installer

Figure 3.3: System architecture diagram for the mobile implementation of Thin AV.

41

of abstraction between the Android device and the third-party scanning services. This is

beneficial because any changes to the scanning services can then be handled by modifying

the Thin AV code running on the server, while the code running on the Android device

would not require an update.

A web-front end was created using Flask 0.8 [25], a web application micro-framework

for Python. The web application creates a simple HTML form capable of receiving both

HTTP GET and POST requests. A GET request is sent in one of two circumstances: first,

if the Android Safe Installer attempts to check the cryptographic hash of a package

prior to installation, the application will return a scan result if such a result exists (this

will be discussed in greater detail in Section 3.3.3). Second, if the Android killswitch

sends a system fingerprint, the web-application will return a list of cryptographic hashes

for applications which have been found to contain malware (this will be discussed in

greater detail in Section 3.3.4). A POST request is sent in the circumstance where the

cryptographic hash for an Android package was not found in the Thin AV cache, and the

whole package must be uploaded to Thin AV.

3.3.2 Android Specific Scanner

One of the major benefits of Thin AV is its modular and extensible architecture. In

order to further demonstrate this benefit, and to increase the functionality of Thin AV,

an Android specific scanning service, ComDroid, was added to Thin AV to complement

the existing third-party anti-virus scanners. However, because of the modular design of

Thin AV, theoretically, any type of analysis module could be used to evaluate the safety of

Android packages. An Android specific anti-virus scanning service, permissions analyzers

similar to Kirin [56], Stowaway [59], or a social reputation analyzer such as was described

in [34], are all compelling possibilities. In fact, ComDroid itself is not an anti-virus engine,

but rather it is a static code analysis tool which can identify potential vulnerabilities in

42

Android applications. The tool was developed by Chin et al. and described in detail in

[45].

ComDroid is publicly available as a web based service hosted at the University of Cal-

ifornia at Berkley. Because ComDroid has a web interface, building a Thin AV scanning

module to take advantage of ComDroid was relatively straightforward. Beyond develop-

ing a new scanning module called thinComDroid, which also inherits from thinAvParent

(see Figure 3.2), the only internal change that was necessary was the addition of a new

return status code. Because ComDroid identifies potentially exploitable apps and not

malicious apps, the ComDroid module can identify an Android package as being “at risk”

as opposed to being “infected”.

In the current deployment of Thin AV, a package will be prevented from installing it-

self if ComDroid identifies vulnerable communication channels within the package. How-

ever, depending on the sensitivity of ComDroid, and the prevalence of potentially vulnera-

ble apps, a more permissive strategy might be warranted. If one of the existing scanning

modules identifies a malicious app, this status will supersede any status returned by

ComDroid. It should also be noted that an Android package is scanned with ComDroid

after scanning with the appropriate anti-virus scanner. The performance drawbacks of

this configuration are obvious, and a production scale deployment of Thin AV should

incorporate the ability to perform simultaneous scans in parallel.

3.3.3 Safe Installer

In order to protect a device from malicious applications a mechanism must exist for

preventing the installation of malicious applications, and the Safe Installer is such a

mechanism.

All applications not installed via Google’s Market are installed using Android’s Pack-

age Installer system, and so this was the target for injecting the application check for Thin

43

AV. The technique used for hooking into the Package Installer was adapted from [56].

Unfortunately, Google’s Market does not use the Package Installer for installing apps.

This is because Google’s Market application is signed with the same certificate that is

used to sign the operating system; this gives the Market access to the highest level of

system permissions, a level that no other third-party applications is granted. Therefore,

the Google Market has the ability to directly install and uninstall applications, bypassing

the Package Installer. As mentioned in Section 1.1.3.2, Google has staked the reputation

of their brand to the success of the Android Market, and thus have a vested interest in

keeping it free of malware. Other application markets may not have such a prominent

brand name to maintain.

In order to modify the Package Installer, the Android operating system source code

was modified. The Package Installer is part of Android’s Java middle-ware which in-

cludes a broad selection of programs and libraries for use by application developers. The

PackageInstallerActivity class was modified to make use of ThinAvService, a new

service class which was added to the source code for the purpose of communicating with

the Thin AV web application described in the previous section. The service provides

a single public function checkAPK, accessed via an interface, defined using the Android

Interface Definition Language (AIDL). When a package is to be installed by the Package

Installer, the APK must already reside on the file system, having been downloaded by a

third-party market or transferred via some other method. The checkAPK function takes

the file system path of the APK being installed, reads the file and creates an MD5 hash of

the bytes of the APK. This hash is then sent to the Thin AV web application, which re-

turns a scan report, if such a report exists. If no scan report exists, the APK is uploaded

to Thin AV where it is passed off to one of the third-party scanning services. When a

scan result is returned, that result is passed back to ThinAvService and checkAPK then

returns a Boolean as to whether or not the installation should be allowed to proceed. The

44

PackageInstallerActivity then allows or prevents the installation of the application,

displaying the appropriate information dialogs to the user, where necessary.

In some sense the safe installer acts similar to the file system access controller in the

desktop version of Thin AV. Despite this, there is no concept of multiple security policies

in the safe installer. All package installs are subject to scanning, and package installation

will be terminated if Thin AV detects malware. Although different security policies could

be added to the mobile system, the faster performance of the mobile system compared

to the desktop system allowed for a single strict security policy without compromising

system performance (Section 5.5).

At this point, it should be noted that given that the Package Installer is part of the

Android source, the system just described cannot simply be installed on any Android

capable device. Replacing the operating system on a particular Android device requires

that the device be unlocked or “rooted”, as most devices are locked by the manufacturer

or service provider. Additionally, installing a new version of Android on a device voids

the device warranty and can have compatibility issues [16].

3.3.4 Killswitch

The safe installer described in the previous section can prevent the installation of appli-

cations known to be malicious. However, two other scenarios must also be addressed: one

where a malicious application has been installed on a device prior to the installation of

Thin AV, and one where an application was installed on a device but was not flagged as

malicious at the time of installation. A killswitch was developed that addresses these two

scenarios. It operates independently of any specific application installation mechanism,

making it ideal for the multi-market ecosystem available on Android devices.

Four different approaches were considered as potential alternatives for how to imple-

ment the killswitch. The first was to check for revocations at application launch time, by

45

modifying the app being launched. However this was not realistic, because even though

tools exist for decoding Android packages [1], the lack of a main function in Android

applications would require that a hook be inserted into every single activity which could

be called by an intent. Furthermore, any code modifications would also invalidate the

certificate that is packaged with the application. The second alternative was to hook

into the application launcher. This would allow for the ability to interrupt launches

from the Android home screen, but not launches from the system application list, nor

would it catch launches caused by intents generated from other apps. The third option

was to modify the actual program execution code which resides at a much lower level

in the Android source code. This approach, while technically challenging, was feasible.

However, from a from a software architecture perspective, it would very likely create un-

desirable cross-cutting concerns within the Android source code. The final option, which

was ultimately selected, was to develop a scheduled service which periodically checks for

revocations.

The killswitch was developed as a standard Android application capable of communi-

cating with the Thin AV web application, similar to the safe installer. The killswitch has

three different functions available to the user. It can upload all applications to Thin AV

for analysis (if said applications are not in the Thin AV cache), it can manually check if

any non-system applications on the device have been flagged as malicious, and finally, it

can set up the killswitch to regularly check the device for malicious applications using a

scheduled event (Figure 3.4). In the current implementation the killswitch is scheduled

to run every fifteen minutes.

The feature to manually upload missing packages to Thin AV was left as a manual

activity for the user. This decision was made due to the fact that it is possible that many

or even most of the packages on a device may be missing from Thin AV. The upload

and scanning of these apps, while a one-time activity, would still consume a great deal

46

of time and bandwidth. Therefore by leaving the decision to the user they can opt to

perform the upload when the device is connected to a WiFi network (as opposed to being

charged for using their cellular data plan), or at another time when the upload process

would not be an inconvenience, such as when the device is charging.

When the killswitch is checking for malicious apps, it uses the PackageManager class

to locate all public Android packages installed on the device. These packages are stored

on the device and are read-only, making them ideal for analysis. The meta-data of each

of the packages is read, and if the package has not been previously seen by the killswitch,

the bytes of the package are hashed and a collection of all package hashes is sent to the

Thin AV web application via HTTP GET. If a package has already been hashed by the

killswitch, then the hash is stored in a file which is only accessible to the killswitch. This

hash can then be retrieved much more quickly than recomputing the hash every time

the device is fingerprinted. If any of the hashes sent to the Thin AV web application are

found to be from a malicious app, the hashes corresponding to the malicious applications

are returned to the killswitch. The user is then notified of the issue, and presented a

list of applications suspected to be malicious. The user can then choose to initiate the

removal of those applications.

It might be preferable to have a killswitch which was capable of removing or in

some way quarantining a malicious application, without the input or consent of a user.

However, this would have required significant changes to the Android PackageManager,

as well as possibly other lower level components. This would have also created a potential

security vulnerability insofar as it would have created a mechanism by which an ordinary

application could uninstall other applications. This could result in a scenario where a

malicious application could install the anti-virus or security apps on a device.

47

(a) (b)

(c) (d)

Figure 3.4: User interfaces for the Android killswitch: (a) main screen, (b) prompt toupload missing packages, (c) notification of malware, (d) malicious application removalscreen.

48

3.3.5 System Circumvention

The key drawback of Thin AV as it is designed and deployed for Android is the fact that

install-time checking of applications can only be achieved by rooting a device and in-

stalling a custom operating system. As a prototype this technique is adequate. However,

this design is impractical for wide-scale deployment. A preferable scenario would be one

in which the main source code trunk of Android was modified to create a generic hook in

the PackageInstallerActivity class which would give ordinary apps the ability to allow

or deny application installations. Unfortunately such a hook in the PackageInstaller

could very easily be abused by malicious apps looking to prohibit the installation of le-

gitimate applications. A potential solution to this issue would be for Google to allow

applications to use the hook only if the developer of the application is trusted or in some

way certified by Google.

Finally, the mobile prototype of Thin AV is vulnerable to circumvention in much

the same way as the desktop version, most notably, the lack of encryption on HTTP

communication, and the possibility of forged MD5 hashes.

One major improvement offered by the mobile version of Thin AV is the reduced

privacy concerns due to the fact that only Android applications, and not personal files,

are being uploaded to Thin AV.

49

Chapter 4

System Evaluation - Desktop Thin AV

The evaluation of Thin AV was approached differently for the evaluation of the desktop-

based version of Thin AV (Chapter 4), and the mobile-based implementation of the

system (Chapter 5). The evaluation of the desktop version of Thin AV focuses on the

system speed, and not detection rates. This is because the detection performance of Thin

AV is reliant upon third-party proprietary software systems, and the detection perfor-

mance of these systems is regularly evaluated by consumer product testing groups [93].

Additionally, [82] provides a detailed analysis of the detection capabilities of many con-

sumer anti-virus products. Finally, because Thin AV relies on remote scanning services,

the performance of the system would be a much larger barrier to adoption than the de-

tection performance which is generally high for all consumer-grade anti-virus products

[106].

The evaluation of the desktop version of Thin AV was performed in four phases. The

first phase in the evaluation was to assess the performance of the individual scanning

services. The goal of this test was to determine the relationship, if any, between the

size of files being uploaded, and the time required to receive a response from the various

scanning services. This phase will be discussed in Section 4.1.

The second phase of testing involved determining the actual overhead caused by Thin

AV. A series of workload scripts were used to generate file system activity while various

parts of Thin AV were active. This way the overhead incurred by each element of Thin

AV could be determined. This phase is elaborated upon in Section 4.2.

The third phase of testing involved using the timing results from the first phase

to produce a simulator which would predict the scanning time for a file of arbitrary

50

size for each scanning service. The formulæ powering this simulator were then used to

compute the predicted overhead of using Thin AV on a system while running the same

workload script from the previous phase. By comparing the predicted overheads to the

actual overheads measured in phase two, the formulæ for the scanning services could be

iteratively refined until they predicted the actual overhead of Thin AV with a very high

degree of accuracy. This phase will be detailed in Section 4.3.

Finally, with the response time formulæ refined, the simulator was improved which

made it possible to simulate file system access on an large scale and determine the

overhead of Thin AV under different file system access patterns. Simulation was chosen

as the method for large scale testing because it would allow for testing a variety of file

system access patterns, at very large scales, relatively quickly, and it would not draw the

wrath of the various scanning service providers. This phase will be detailed in Section

4.4.

For each of the phases discussed below, the precise testing protocol will be described,

followed by the results produced from the test, concluding with a brief discussion of the

results as they pertain to the testing phase in question.

4.1 Scanning Service Performance

Each of the three scanning services is hosted by a different organization, with different

hardware resources, and each service receives different loads. Therefore, it was first

critical to determine the response times of the different services based on the size of files

being uploaded.

4.1.1 Testing Protocol

In order to test the performance of the different scanning services, a small testing program

was developed which would scan a series of files using a specific scanning service, and

51

record the time necessary to complete the scanning operation. Consequently, these tests

did not examine any latency that might be introduced by DazukoFS (Section 3.2.1) or

the file system access controller (FSAC, Section 3.2.2). An unavoidable drawback of

this black-box approach, is that it was not possible to determine the what portion of

the response time was due to file uploading what what portion was attributable to file

scanning.

Each execution of the testing program scanned 12 different files of the following sizes:

0 KB, 1 KB, 2 KB, 4 KB, 8 KB, 16 KB, 32 KB, 64 KB, 128 KB, 256 KB, 512 KB and

1023 KB. The Kaspersky scanning service will not scan files 1024 KB or larger, hence

1023 KB was chosen as the upper limit on file size. Additionally, files were skewed to the

small end of the size spectrum because results from [96], [90] and [28] suggest that small

files comprise the bulk of accesses during typical file system operation. Finally, the 0 KB

size file was included in the test because it shows the best case response time for each

service.

The test files were generated by a script which produces files of a specified size filled

with pseudo-random bits. The assumption was that a file generated in such a way would

have an extremely low probability of being flagged as malware by one of the scanning

services. Test files are uploaded in a pseudo-random order every time the test program

is run. This was done to overcome any penalty that might be incurred against the first

file being uploaded due to DNS lookups.

The testing program was run up to 8 times a day for each of the three scanning

services (for a maximum of 24 runs per day). Testing took place on 8 different days:

28/08/11 to 2/9/11, and 8/9/11 to 10/9/11. Testing was done on several days in an

attempt to give a fair representation of average service performance, in the event that

one or more services were performing particularly poorly on any given day. The tests

for each scanner took place between the hours of 9:00AM and 5:30PM MDT, and were

52

spaced as close to one hour apart as possible, with Kaspersky tested first, followed by

VirusChief, and finally VirusTotal. Although it would have been preferable to have a

completely automated testing script which ran continuously for several days, the prospect

of local network outages and remote service failures meant that the tests had to be run

under human supervision. Testing was performed on the hardware platform described

in Section 3.2. The laptop was connected to the internet via the University of Calgary’s

AirUC 802.11 wireless network.

Unfortunately, due to the low priority placed on scan requests to VirusTotal, the

response times from this service were frequently excessive. As such, testing of all twelve

files could often not be completed in the (approximately) 55 minutes between testing

sessions. However, given that the test files were uploaded in a random order, it was

possible to collect some data for all of the different file sizes. Additionally, because

VirusTotal limits users to 20 API calls in any given 5 minute window, the testing program

had to be modified to periodically poll VirusTotal for results. After uploading a file, the

test program would sleep, first for 15, then 30, and finally 60 seconds, each time polling

VirusTotal for a result after sleeping. The test program would then continue to sleep for

60 seconds between polling attempts until a result was returned. It is therefore the case

that response time results for VirusTotal have up to a 60 second margin of error. Finally,

it should be noted that during the period of testing, there were occasional failures of the

VirusTotal service. These outages were always brief, each lasting less than 5 minutes.

In the event of a failure, the test was stopped and restarted once the service became

available.

Finally, as Section 4.1.2 will show, the response time for VirusTotal was extremely

high, and not strongly correlated with the file size of the upload. In order to at least gain

an understanding of the file-size-dependent portion of the VirusTotal scanning process, a

test was run to determine the time required to upload a file of a given size to VirusTotal,

53

and receive a response, without waiting for a scan result. For this experiment, eight files

ranging from each of the file sizes listed earlier were uploaded to VirusTotal, and the

time required to receive a “waiting” response was measured.

4.1.2 Results

For each of the three scanning services, several hundred response time measurements were

recorded. Tables 4.1, 4.2, and 4.3 detail the key statistical measurements for each of the

three scanning services. Table 4.4 contains the measurements for the upload-only portion

of the VirusTotal scanning service. A cursory review of the data showed a handful of

extreme outliers for each service. As no standard technique exists for identifying outliers

[103], standard deviation was chosen as the means by which outliers were identified. As

such, any measurement beyond two standard deviations of the mean was classified as an

outlier. This threshold was chosen because it eliminated the most extreme results, while

retaining the vast majority of the data.

A comparison of the measurements from the three services shows a clear difference of

nearly an order-of-magnitude between the performance of VirusTotal and the other two

scanning services. The average response times from Kaspersky and VirusChief range from

1.54 – 14.49 seconds and 6.82 – 28.70 seconds respectively, while the response times from

VirusTotal range from 1.21 – 229.28 seconds (though the latter range becomes 148.92

– 229.28 seconds, when only non-zero file sizes are considered). The upload portion of

VirusTotal shows response times similar to Kaspersky with response times ranging from

1.94 – 11.74 seconds.

With the outliers removed, the response time data was plotted in an attempt to

determine what, if any, relationship exists between file size and service response time for

the three scanning services. Figures 4.1, 4.2, and 4.3 show the upload file size plotted

versus the response time for each of the three scanning services, and 4.4 graphs the

54

KasperskyWith Outliers Without Outliers

Upload size n µ (sec.) x (sec.) σ (sec.) n µ (sec.) x (sec.) σ (sec.)0 KB 64 1.69 1.48 0.81 61 1.54 1.45 0.311 KB 69 1.96 1.48 2.42 67 1.61 1.48 0.432 KB 69 2.09 1.76 1.90 66 1.75 1.75 0.214 KB 70 2.00 1.73 0.90 68 1.87 1.71 0.478 KB 69 2.89 2.00 3.48 67 2.33 1.99 1.1216 KB 70 2.49 2.01 2.45 69 2.22 2.00 0.7632 KB 69 2.90 2.30 2.50 68 2.63 2.29 1.0264 KB 70 3.07 2.62 1.13 63 2.74 2.58 0.54128 KB 69 3.67 3.37 1.41 66 3.41 3.34 0.31256 KB 70 5.58 4.70 3.50 68 5.03 4.67 1.28512 KB 70 9.92 8.12 6.02 65 8.48 7.96 3.071023 KB 68 16.17 13.78 6.12 62 14.49 13.63 2.59

Table 4.1: Number of measurements (n), mean response time (µ), median response time(x), and standard deviation for each size of file uploaded to the Kaspersky scanningservice.

VirusChiefWith Outliers Without Outliers

Upload size n µ (sec.) x (sec.) σ (sec.) n µ (sec.) x (sec.) σ (sec.)0 KB 63 7.38 6.46 3.72 60 6.82 6.40 2.751 KB 68 17.84 18.01 5.80 62 18.72 18.22 3.972 KB 68 18.04 18.79 10.36 62 16.94 18.69 6.104 KB 68 18.22 18.11 10.09 65 16.51 18.06 6.198 KB 67 18.46 18.57 8.76 65 17.46 18.45 6.1616 KB 68 20.23 19.19 11.70 66 18.66 19.15 6.6232 KB 68 19.48 20.18 6.99 63 20.24 21.13 5.5864 KB 68 20.31 20.31 7.36 61 20.49 20.42 5.10128 KB 68 19.24 19.85 5.97 60 20.17 20.07 3.28256 KB 69 20.70 22.18 6.61 61 21.36 22.25 4.25512 KB 68 23.99 24.72 6.93 61 25.20 24.91 4.251023 KB 67 28.53 28.71 7.96 60 28.70 28.73 5.76

Table 4.2: Number of measurements (n), mean response time (µ), median response time(x), and standard deviation for each size of file uploaded to the VirusChief scanningservice.

55

VirusTotalWith Outliers Without Outliers

Upload size n µ (sec.) x (sec.) σ (sec.) n µ (sec.) x (sec.) σ (sec.)0 KB 36 2.20 0.84 6.00 35 1.21 0.84 0.931 KB 34 201.65 116.27 246.64 32 148.92 116.08 91.642 KB 39 210.30 146.06 173.05 36 168.80 134.25 96.234 KB 37 283.89 170.51 289.37 34 212.95 150.50 152.348 KB 41 282.63 170.58 408.17 40 224.62 170.44 171.4516 KB 36 166.83 143.48 102.82 35 152.68 139.54 58.8732 KB 38 192.47 134.82 151.96 35 162.14 118.65 114.0064 KB 37 213.12 135.46 197.75 35 173.57 135.42 106.86128 KB 35 286.85 171.24 386.21 34 229.28 154.04 184.92256 KB 40 219.71 148.96 228.65 37 160.77 136.61 68.63512 KB 39 230.78 173.11 203.92 36 182.71 156.84 113.891023 KB 33 198.52 140.38 140.44 31 171.32 139.93 91.09

Table 4.3: Number of measurements (n), mean response time (µ), median response time(x), and standard deviation for each size of file uploaded to the VirusTotal scanningservice and polling until a scan result is returned.

VirusTotal (Upload Only)With Outliers Without Outliers

Upload size n µ (sec.) x (sec.) σ (sec.) n µ (sec.) x (sec.) σ (sec.)1 KB 8 2.30 1.68 1.07 8 2.30 1.68 1.072 KB 8 2.77 1.69 3.02 7 1.71 1.59 0.234 KB 8 1.94 1.97 0.15 8 1.94 1.97 0.158 KB 8 2.55 2.20 0.88 7 2.28 2.02 0.5016 KB 8 2.73 2.37 0.94 7 2.43 2.17 0.4632 KB 8 2.42 2.36 0.22 8 2.42 2.36 0.2264 KB 8 3.17 2.53 1.49 7 2.67 2.50 0.48128 KB 8 3.10 2.84 0.53 8 3.10 2.84 0.53256 KB 8 4.54 3.52 2.26 7 3.79 3.47 0.86512 KB 8 5.28 5.10 0.82 7 5.00 5.10 0.271023 KB 8 11.74 7.43 7.97 8 11.74 7.43 7.97

Table 4.4: Number of measurements (n), mean response time (µ), median response time(x), and standard deviation for each size of file uploaded to the VirusTotal scanningservice when no polling for scan results was performed.

56

0  

5  

10  

15  

20  

25  

30  

0   200000   400000   600000   800000   1000000   1200000  

Response  tim

e  (seconds)  

File  size  (bytes)  

Figure 4.1: Scan response time versus file upload size for the Kaspersky virus scannerservice.

upload and response speed of VirusTotal. Kaspersky and VirusChief both show a similar

positive correlation between file size and response time, with the VirusChief data being

positively shifted on the y-axis by roughly fifteen seconds. VirusTotal, on the other

hand, shows little if any relationship between file size and response time. The trend of

the VirusTotal response time data is slightly negative and possessing a much larger y-

intercept than either of the other two scanning services. Conversely, the upload portion

of the VirusTotal scan shows a trend very similar to Kaspersky. With this data it was

possible to produce a set of linear equations that approximate the performance of the

scanning services as a function of the number of bytes in the file (Table 4.5). This was

done using a linear regression in Microsoft’s Excel spreadsheet program.

4.1.3 Discussion

It is not surprising that Kaspersky, which only scans with a single anti-virus engine,

returns the fastest results, and VirusChief, which scans with six anti-virus engines is

roughly fifteen seconds slower than Kaspersky when scanning a similarly sized file. It is

57

0  

5  

10  

15  

20  

25  

30  

35  

40  

45  

0   200000   400000   600000   800000   1000000   1200000  

Response  Tim

e  (seconds)  

File  size  (bytes)  

Figure 4.2: Scan response time versus file upload size for the VirusChief virus scannerservice.

0  

100  

200  

300  

400  

500  

600  

700  

800  

900  

0   200000   400000   600000   800000   1000000   1200000  

Response  tim

e  (seconds)  

File  size  (bytes)  

Figure 4.3: Scan response time versus file upload size for the VirusTotal virus scannerservice.

Kaspersky f(x) = 10−5 × x+ 1.891VirusChief f(x) = 10−5 × x+ 17.133VirusTotal f(x) = −9× 10−6 × x+ 182.98VirusTotal (Upload Only) f(x) = 9−6 × x+ 1.947

Table 4.5: Linear equations for each of the three scanning services derived from Figures4.1, 4.2, 4.3, and 4.4. Equations calculate the response time for each scanning service fora file x bytes in size.

58

0  

5  

10  

15  

20  

25  

30  

0   200000   400000   600000   800000   1000000   1200000  

Response  tim

e  (seconds)  

File  size  (bytes)  

Figure 4.4: Upload response time versus file upload size for the VirusTotal virus scannerservice when uploading a file and not polling for a scan result.

the VirusTotal scanning service which behaves in a markedly different fashion. Again,

this is not surprising, because VirusTotal assigns the lowest priority to requests that are

sent via their formal API, the response of the VirusTotal service is not dependent upon

the size of the uploaded file, but rather on how busy the VirusTotal scanning service is

at any given moment. This distinction is made much clearer when comparing the time

required to scan a file with VirusTotal with the time required to merely upload a file to

VirusTotal.

Finally, it should be noted that because VirusTotal maintains a database of scan

results, hashes that match entries in the database will return results much faster than

files which there is no matching hash. This is evident when looking at the 0 KB length

files, which have a dramatically faster response time than files of any other size, even the

1 KB files. This is because the empty files always produce the same hash value, meaning

that the result will be stored in VirusTotal’s database, and a previous scan report can

be returned immediately and the file will not be uploaded for analysis.

The results in this section describe the performance of the three malware scanning

59

services during a specific window in time. Every effort was made to collect numerous

measurements during that window in order to provide a fair and accurate representation

of the performance of each service. However, it is unreasonable to assume that the results

described herein will accurately reflect the performance of the scanning services in the

long term. Changes in service loads, hardware configuration, and even changes in the

service offerings themselves could alter the performance of the scanning services. Such

possibilities are unavoidable, and by no means a reason to shy away from such avenues

of research. Unless the changes in performance are drastic, it is unlikely to alter the

conclusions made regarding the feasibility of Thin AV as a security apparatus.

4.2 Actual System Overhead

Having characterized the response times of the three different scanning services, the

second step of evaluating the desktop implementation of Thin AV was to assess the

performance impact of running Thin AV.

4.2.1 Testing Protocol

To calculate the actual overhead incurred by Thin AV, a series of workloads were used

to generate file system activity. The technique used to generate workloads was modified

from the approach used by Bickford et al. [37]. Here, the workload was created using a

bash script to launch Firefox and navigate to certain websites. The CPU utilization of

the Firefox process was monitored, and once it dropped to below a specific threshold, the

browser was terminated, as it was assumed that content had been successfully loaded,

and the browser was idle. The same process was applied to launching and terminating the

Thunderbird e-mail client. The idea of using web-browsing and e-mail as a workload is

appealing because these activities are extremely common, and well understood by users.

However, testing of the workload script used by Bickford et al. showed some limitations.

60

Web Advanced1) Launch Firefox 1) Compile GZip from source2) Navigate the following websites: 2) Compile a five page LATEX paper

http://www.google.ca 3) Copy the directory containing the paperhttp://www.slashdot.org 4) Rename the copied directoryhttp://www.reddit.com 5) Delete the copied directoryhttp://www.youtube.org

http://www.cbc.ca

3) Close Firefox4) Open Thunderbird5) Close Thunderbird

Table 4.6: Activities in the web and advanced workload scripts.

Most notably, the Firefox application would often be terminated before a page had been

completely loaded and rendered. Therefore, a different technique was used to generate

browser activity for Thin AV. Specifically, the automated testing tool Selenium was used

to drive Firefox [19].

Three bash scripts were produced each corresponding to a different workload. These

workloads will be referred to as “web”, “advanced”, and “combined”, respectively. The

web script was directly inspired by Bickford et al., and was complemented by the ad-

vanced script, which was designed to capture a snapshot of more developer-oriented

behavior. Table 4.6 outlines the specific activities in the web and advanced scripts. The

combined script includes the activities of the web script followed by the activities in the

advanced script. In order to better characterize the activities in each of the workloads, a

small Python program was written which uses the Linux inotify subsystem [11]. Each of

the workloads was run once and the activities generated by the workload were recorded

in order to help contextualize the results from this experiment.

To assess the overhead of Thin AV, several different scenarios were examined in re-

lation to a base case scenario. In every scenario the three workload scripts were run

ten times, and the time required to complete the workload was recorded. The base case

61

Scenario Description CachesKaspersky (uncached) Thin AV running with only

Kaspersky scanning files ClearedVirusChief (uncached) Thin AV running with only

VirusChief scanning files ClearedAll scanners (cached) Thin AV running in the typical

configuration with all threeservices scanning files of theappropriate size Not Cleared

Dazuko Only DazukoFS mounted withoutThin AV running N/A

FSAC and Dazuko DazuoFS mounted with filesystem access controllerapproving all accesses withoutchecking file with Thin AV N/A

Table 4.7: Scenarios examined for assessing Thin AV overhead. Cache column specifieswhether or not the Thin AV cache and browser cache were cleared between each of theten runs of a given workload.

scenario was one in which no part of Thin AV was active while the workloads were being

executed. Table 4.7 outlines each of the examined scenarios. Kaspserky and VirusChief

were tested individually, with the Thin AV and browser caches being cleared between

each of the ten runs. VirusTotal was not tested by itself because the average response

time of the VirusTotal service was so large, it would still be completely impractical for

scanning with as a standalone service. Additionally, because VirusTotal remotely caches

scan results, it would not have been possible to repeatedly test a workload on VirusTo-

tal without retrieving cached results from the first run of that workload (as opposed to

re-scanning files each time).

The “all scanners” scenario involved running Thin AV with all three scanning services

in concert when the Thin AV cache was primed. This is representative of how Thin AV

would normally operate. If a file cannot be found in the cache it will be scanned by

Kaspersky. If the file cannot be scanned by Kaspersky due to service unavailability or

62

file size, then Thin AV will attempt to scan with VirusChief, and finally VirusTotal. For

this test neither the Thin AV local cache, nor the Firefox browser cache were deleted

between runs. Because of the VirusTotal caching issue mentioned earlier it was not

possible to perform a test of all three scanners in concert with an empty Thin AV cache.

It should also be noted that because some of the content on the websites in the web

and combined workloads is dynamic, a primed Thin AV cache does not mean all files

encountered in the workloads were previously cached by Thin AV.

For each of the scenarios where all or part of Thin AV was running, only the home

directory of the active user was monitored. This decision was based on the fact that,

as previously mentioned, the goal of Thin AV is to provide minimalist anti-malware

protection. As such, areas of the file system which would normally be beyond the reach

of a standard user will be ignored by Thin AV, in favor of monitoring areas of high user

activity (i.e. the /home directory).

4.2.2 Results

Table 4.8 summarizes the characteristics of a representative run of each of the three

workloads. Given that the combined workload contains all of the activities of the web

and advanced workloads, it is sufficient to say that the characteristics of the combined

workload are approximately the sum (events, accesses, modifications, modify / access

events, unique files), average (mean file size), or maximum values from the advanced and

web workloads. The only similarities between the web and advanced workloads are the

number of unique files being accessed, and the median size of those files. The advanced

workload contains more than three times as many file events as the web workload, and

those events are split fairly evenly between access events and modification events. The

web workload, on the other hand, has almost three times as many access events as

modification events. Based on the file size statistics, it is clear that all of the files in

63

Web Advanced CombinedEvents 158 529 698Accesses 116 245 354Mean time between accesses (s) 0.157 0.010 0.064Modifications 42 284 344Modify / access events 7 19 25Unique files 88 92 175Mean file size (KB) 1138.35 17.74 581.59Median file size (KB) 5.13 2.72 4.52Maximum file size (KB) 53132.00 250.71 53132.00

Table 4.8: General characteristics for the three different workloads used for testing ThinAV.

Web Advanced CombinedBase case 21.4 1.9 21.3Kaspersky (uncached) 194.9 (910.75%) 186.2 (9800.00%) 376.3 (1766.67%)VirusChief (uncached) 1786.3 (8347.20%) 1864.2 (98115.79%) 3587.6 (16843.31%)All scanners (cached) 59.0 (275.70%) 36.9 (1942.11%) 100.9 (473.71%)Dazuko Only 21.6 (100.93%) 1.8 (94.74%) 23.2 (108.92%)FSAC and Dazuko 21.8 (101.87%) 2.8 (147.37%) 26.5 (124.41%)

Table 4.9: Average time (in seconds) to complete each of the three workloads while run-ning different configurations of Thin AV. Average is based on ten runs of each workloadfor each Thin AV configuration. Percentage overhead is the result of dividing the averagerunning time of each configuration by the average running time of the base case.

the advanced workload are relatively small (250 KB or less), whereas the web workload

contains a handful of much larger files.

The timing results in Table 4.9 show a high degree of similarity both between work-

loads and between scanning services. The Kaspersky service differs by less than ten

seconds for the web and advanced workloads (194.9 and 186.2 seconds respectively). The

results from VirusChief are roughly an order of magnitude larger than the Kaspersky

timing results, and the corresponding difference between the two workloads is less than

one hundred seconds. This similarity is in spite of the order of magnitude difference be-

64

tween the web and advanced workloads in the base case (21.4 seconds versus 1.8 seconds).

The running times for the workloads when Thin AV was running with a primed cache are

much smaller than the running times when active scanning was taking place. However,

the difference between the web and advanced workloads is somewhat larger (59.0 seconds

for the web workload, and 36.9 for the advanced workload). A marginal increase in run-

ning times of the web and combined workloads can be seen when only the Dazuko file

system is mounted, while the advanced workload actually shows a very small decrease.

The increase in running times is slightly more apparent when both DazukoFS and the

file system access controller are active, with web and advanced workloads showing 0.4

and 0.9 second increase over the baseline running times. Not surprisingly, in all cases,

the timing results for the combined workload are approximately the sum of the results

for the constituent workloads. Additionally, the percentage overheads follow a similar

trend to the absolute timing results.

4.2.3 Discussion

Not surprisingly, the fastest service, Kaspersky, showed the smallest overheads across the

three workloads, whereas VirusChief, which had a much slower response time, showed

the highest overheads. In general, it is clear that the range of possible overheads for Thin

AV is extreme. Interestingly, despite the highly distinct file system access characteristics

of the web and advanced workloads, the ultimate running time of these workloads with

both Kaspersky and VirusChief are somewhat similar. Although, when the overhead

is considered instead of simply the total running time, the advanced workload appears

particularly ill-suited to being scanned by Thin AV.

It is clear that the vast majority of the overhead incurred by Thin AV is caused by the

uploading and scanning of files to the remote services. The overheads caused by Dazuko

and the file system access controller are negligible, though it is possible that this overhead

65

could be further reduced if Thin AV were implemented in a higher performance language

such as C/C++ as opposed to Python. It is likely that the case where the advanced

workload was shown to run faster than the baseline in the Dazuko-only scenario, is the

result of a measurement error due to the granularity of timing measurements recorded

by the workload script.

Based on the response time results from Section 4.1, one could have predicted that

VirusChief would not, by itself, make a practical anti-malware scanning tool. The timing

results from this experiment bear out that prediction. Furthermore, Kaspersky, while

dramatically faster than VirusChief, is still prohibitively slow when all the files being

encountered are novel. However, it was to be expected that Thin AV would perform

poorly when a cache of previously scanned files was not available.

The three workloads were intended to be indicative of general desktop usage for some

users. However, they are very short, and do not contain the wait time that would be

present if a user were manually executing the actions in the workloads. Therefore, it is

very likely that the length of the workloads had a major impact on the overhead of Thin

AV. Specifically, because none of the workloads involved a high degree of repetition, the

effectiveness of the Thin AV cache was somewhat lessened. Given that the workload run

times for the fully cached scenario are approaching a much more reasonable overhead, it

is quite probable that as Thin AV is active for longer periods of time, the cache will grow

in size, increasing the likelihood of a local cache-hit, and thus decreasing the overhead of

Thin AV. This scenario will be examined in greater detail in Section 4.4.

4.3 Predicted System Overhead

Given that it is the highly-cached scenario that provides the most compelling argument

for Thin AV, the performance of Thin AV in the long term (when a high degree of caching

66

is possible) must be assessed. This section details the development and refinement of a

simulator for Thin AV which was developed using the timing results from Sections 4.2

and 4.1. Once the simulator was developed and the results were shown to be consistent

with the results from 4.2.2, larger scale simulations could be run to better understand

the performance impact of Thin AV.

4.3.1 Testing Protocol

In order to predict the overhead of Thin AV, two elements are necessary: a simulator

and a workload. In order to accurately calibrate the simulator, the workloads from the

previous section were used. Based on the file accesses in the workload, it is possible to

calculate how much time would be spent by Thin AV to scan the files being accessed. By

dividing the total duration of scanning with Thin AV plus the time spent for non-Thin

AV activities, by the time spent on non-Thin AV activities, the overhead imposed by

Thin AV can be predicted.

In order to calculate the overhead of Thin AV a simulator was developed. The sim-

ulator reads a list of file system events (e.g. access or modification of one or more files),

and based on the sizes of the different files, calculates the time that Thin AV would take

to run if it were actually scanning the files in the list. Using the workload scripts and the

inotify monitoring application discussed in Section 4.2.1, a log of file system activity was

created for each of the workload scripts which was then used to drive the simulator. In

addition to driving the simulator with a file system activity log, the simulator can also

be driven by a series of statistical parameters; this will be discussed in greater detail in

Section 4.4.1.

The function calculating the wait time for accessing a specific file can be based on the

size of the file being accessed, or not, depending on what kind of Thin AV activity is being

simulated. In the case of VirusChief and Kaspersky, the scanning time is calculated based

67

on a pair of linear equations. The functions in Table 4.5 provided the starting point. In

a series of successive manual iterations, the simulator was run on each of the workloads

and scanning was simulated. By simulating Thin AV running off of strictly Kaspersky or

VirusChief, each of the linear equations could be tuned until the running time predictions

produced by the simulator matched the actual Thin AV running time measured in Section

4.2.2.

Because the performance of VirusTotal is so poor, it is not a suitable candidate

for real-time file scanning. As such, when Thin AV is operating under the permissive

security policy (Table 3.1), the file is only uploaded to VirusTotal. Then, assuming

the file has not been previously scanned by VirusTotal, a “waiting” status is returned

and Thin AV then allows access to the file. In order to simulate this behavior, the

linear equation which describes the time required to upload a file to VirusTotal was

used (without modification) to determine the wait time for VirusTotal (Table 4.5). This

allowed the simulator to accurately predict the overhead of Thin AV when running with

all three scanning services under the permissive security policy.

When the simulator is processing the list of file system events, each time a unique

file is accessed for the first time, the wait time imposed by Thin AV is calculated, based

on the size of the file being accessed, and the specific scanning service being simulated.

Once a file is accessed, it is added to a list of known files. Any future accesses of a known

file only incur a wait time of 0.0002 seconds. This simulates the caching behavior of Thin

AV. The wait time for the cache was arrived at by measuring 10,000 successive cache hits

by Thin AV, and calculating the average access time. Finally, if a known file is modified,

it is removed from the list of known files, and a subsequent access of the modified file

will once again incur a wait time for the simulating scanning of that file.

68

Kaspersky f(x) = 8.85−5 × x+ 1.34VirusChief f(x) = 8.31−5 × x+ 16.37

Table 4.10: Linear equations for the Kaspersky and VirusChief scanning services. Kasper-sky and VirusChief were manually modified from the functions in Table 4.5.

4.3.2 Results

Table 4.10 contains the linear equations that were finally settled on for the simulation of

Kaspersky and VirusChief. These formulæ were arrived at after approximately twenty

iterations of running the simulation and adjusting the slope and y-intercept of each

equation.

Tables 4.11 and 4.12 show the simulation results from simulating Thin AV using only

one of either Kaspersky of VirusChief. The total running times of these simulations are

compared with the actual running times from Section 4.2.2 in Tables 4.13 and 4.14. As

can be seen in the latter two tables, the refinement of the wait-time functions was highly

successful, with the largest discrepancy between the actual and simulated run-times being

only 0.47%.

As was mentioned earlier, the web workload contains a handful of files that are sig-

nificantly larger than the bulk of the files in either of the web or advanced workloads.

This can be quantified more precisely by examining the number of un-scanned accesses

in each workload. Un-scanned accesses are files that are too large for the scanner to

process (1 MB for Kaspersky, and 10 MB for VirusChief). Here we see that in the web

workload, eight files are too large to be scanned by Kaspersky, but only three are too big

to be scanned by VirusChief. In the advanced workload, all of the files being accessed

are small enough for scanning with both Kaspersky and VirusChief. This implies that

the vast majority of files are capable of being scanned by Thin AV, and that most of

these files would be scanned by Kaspersky, the fastest of the three scanning services.

69

KasperskyWeb Advanced Combined

Mean scanned file size (KB) 39.02 17.10 27.23Median scanned file size (KB) 5.13 1.45 3.67Maximum scanned file size (KB) 512.00 250.71 512.00Un-scanned accesses 8 (6.9%) 0 (0.0%) 8 (2.3%)Total size of uploaded files (MB) 3.24 1.85 5.08Cache hit rate 21.3% 54.69% 44.80%Time for non-AV activities (sec.) 18.20 2.33 22.53Time for AV scanning (sec.) 176.47 184.57 354.06Total time (sec.) 194.67 186.9 376.59Overhead from AV 969.61% 7921.40% 1571.52%

Table 4.11: Simulation results of the Kaspersky service for three different activity logs.

The low cache-hit rates that were alluded to in Section 4.2.3 can be seen in the

simulation results. The highest cache hit rate occurs in the advanced workload and is

only 54%, while the lowest cache hit rate is just over 20% in the web workload. In

spite of this, the runtimes for the web and advanced workloads are quite similar for both

Kaspersky and VirusChief.

4.3.3 Discussion

The key finding from this experiment was that it was possible to predict, very accurately,

the running time of Thin AV when using Kaspersky and VirusChief. Although a simu-

lation could be run which predicts the running time of Thin AV using VirusTotal, it has

already been established that VirusTotal is impractical for real-time scanning. Addition-

ally, because no actual overhead measurements were made for VirusTotal, there would be

no way of verifying the accuracy of the VirusTotal overhead predicted by the simulator.

Furthermore, the scanner priority for Thin AV during normal operation is Kaspersky,

followed by VirusChief, then VirusTotal, it is somewhat less important to characterize

the performance of VirusTotal, because only files between ten and twenty megabytes will

70

VirusChiefWeb Advanced Combined

Mean file size scanned (KB) 125.08 17.10 67.05Median file size scanned (KB) 5.13 1.45 3.99Maximum file size scanned (KB) 2205.19 250.71 2205.19Un-scanned accesses 3 (2.6%) 0 (0%) 3 (0.8%)Total size of uploaded files (MB) 10.99 1.85 12.83Cache hit rate 20.35% 54.69% 44.16%Time for non-AV activities (sec.) 18.20 2.33 22.53Time for AV scanning (sec.) 1764.20 1866.14 3548.12Total time (sec.) 1782.40 1868.47 3570.65Overhead from AV 9693.41% 80092.03% 15748.44%

Table 4.12: Simulation results of the VirusChief service for three different activity logs.

KasperskyWorkload Running Time (s) Simulated Time (s) DifferenceWeb 194.90 194.67 -0.12%Advanced 186.20 186.90 0.38%Combined 376.30 376.59 0.08%

Table 4.13: Comparison of running time and simulation results for each of the workloadsfor the Kaspersky service.

VirusChiefWorkload Running Time (s) Simulated Time (s) DifferenceWeb 1786.30 1782.40 -0.22%Advanced 1864.20 1868.50 0.23%Combined 3587.63 3570.70 -0.47%

Table 4.14: Comparison of running time and simulation results for each of the workloadsfor the VirusChief service.

71

be uploaded to VirusTotal. By examining the file system access characteristics (Table

4.8) and the simulation results (Tables 4.11 and 4.12), it is apparent that this is not a

common occurrence. This assertion is also corroborated by the file system traces from

[90], [96] and [28].

Although it may have been possible to further increase the accuracy of the simulator

either by more manual tuning of the wait-time functions, or through an optimization

algorithm, the differences between the simulated and actual results are sufficiently low as

to afford a strong degree of confidence in the results produced by the simulator. It should

also be noted that when tuning the wait-time functions, the goal was to approximate the

total running time, and not the system overhead. This is because the system overhead

can change dramatically as a result in minor fluctuations in the base case run times.

Because the overhead is calculated by dividing the total running time by the base case

running time, small changes in such a comparatively small divisor can result in a dramatic

change in the resulting overhead. The advanced workload is an obvious example of this.

The promising finding from these results is that the workloads studied up to this

point only produce a low to modest degree of caching in Thin AV, nowhere near the 98%

cache hit rates reported by [82]. This means it is quite possible that the overhead of Thin

AV can be reduced to a more acceptable level in the long term, when a greater degree of

caching is possible. This scenario will be examined in the upcoming and final phase of

the evaluation of the desktop implementation of Thin AV.

4.4 Large Scale System Simulations

Given that the overhead values produced by the Thin AV simulator accurately predict the

actual running time of Thin AV, it is possible to examine different patterns of file access

when using Thin AV, and see if there are conditions under which Thin AV particularly

72

excels, or falls short. Although it may have been desirable to collect a set of actual usage

patterns for exploring the behavior of Thin AV, using the simulator to study different

scenarios is preferable for three key reasons. First it allows for the examination of a

wider range of longer running, high-activity file system access traces, in much less time

than it would take to actually perform such traces on the implementation of Thin AV.

Second, it allows for the examination of traces with very specific characteristics (file

sizes, number of unique files, number of modifications, etc.), and devising a series of

actual workloads which would generate this desired level of activity would be extremely

onerous. Finally, despite a concerted search in the areas of software engineering, human-

computer interaction, and hardware systems research, it was not possible to locate a

precedent for how to characterize “typical” user behavior, upon which to base an actual

large-scale workload. This is quite likely due to the fact that the term “typical” itself

lacks a consistent definition when examining a wide cross-section of users.

4.4.1 Testing Protocol

The large-scale testing of Thin AV was done using the simulator described in Section

4.2.1. However, instead of using file activity traces produced by inotify, the activity was

generated by a collection of probability distributions. A collection of key parameters

control the general characteristics of the file system activity generated by the simulator.

The number of file system events controls the total number of file accesses and modifi-

cations that will occur in the simulation. The proportion of modifications specifies what

proportion of events will be modifications versus accesses. The number of unique files

specifies how many different files will be accessed throughout the lifetime of the simula-

tion, which can be thought of as the absolute path of a file (i.e., modifications of a file

does not make it a new unique file).

Other key simulation parameters, specifically file size and time between events, are

73

drawn from exponential distributions. This distribution was chosen for file size generation

because it closely fits the distribution of file sizes measured in [28] and [90]. Figure 4.5

provides an example of file sizes generated by this distribution. The distribution can be

shifted left or right (i.e., smaller or larger files) as needed, depending on the parameter

provided to the exponential number generator. The file sizes generated are bounded by

a constant minimum and maximum; this was necessary to prevent the rare occurrence

where an extremely large file size was generated that would overly skew the mean file

size in the simulation. The exponential distribution was also chosen for generating the

time between file events. This has the effect of producing activity that is “bursty” with

periods of high activity (small times between events) broken up by less frequent periods

of inactivity (long times between events).

Ultimately the activity “log” produced by the simulator is in the same format pro-

duced by the inotify monitoring application used in Section 4.3.1. As such, it can be fed

into the same simulation program that was used earlier, and the behavior of Thin AV for

the simulated log can be characterized. The key simulation parameters were examined in

turn by modifying the experimental parameter while holding all other parameters con-

stant. Each simulation was run ten times and the average results were recorded. It should

be noted that in performing these experiments, some of the simulations were run with

activity logs that would not be likely to occur during any sort of “normal” file system

activity. However these results were included for the purposes of illuminating trends in

the relationships between the characteristics of file system activity and the performance

of Thin AV.

4.4.2 Results

The result of each experiment will be discussed in turn. For the sake of space, figures

will only show relevant relationships between the independent and dependent variables.

74

0.00%  

20.00%  

40.00%  

60.00%  

80.00%  

100.00%  

1   4   16   64   256   1024   4096   16384   65536  

Cumulative  %  of  files  

File  size  (Kilobytes,  log  scale)  

Figure 4.5: Example CDF of simulated files by size.

Simulation input and output values that did not change for each experiment can be found

in Tables A.1 through A.5 in Appendix A.

The first simulations examined the relationship between the number of novel files in

an activity log, and the performance of Thin AV. Figure 4.6 shows a strong positive rela-

tionship between the number of novel files, and the overhead incurred by Thin AV. When

only one in ten accesses involves a new file, Thin AV already shows a 5000% overhead,

and this number increases almost eight-fold as the proportion of accesses involving new

files moves to 1. Unless specifically mentioned, the simulated activity logs did not include

any file modification events. This was done to prevent unnecessarily complicating the

simulation results, as well as to help clarify the trends in the data.

An analogous trend can be seen when the number of access events is manipulated

(Figure 4.7). The Thin AV overhead drops off dramatically as the number of system

accesses increases. The trend in the cache hit rates follow similar trends in both Figures

4.6 and 4.7, as the ratio between the number of unique files and the number of accesses

approaches one-to-one, the frequency of cache hits drops off precipitously.

The next series of simulations were designed to show the impact of file size on Thin

75

0.00%  

10.00%  

20.00%  

30.00%  

40.00%  

50.00%  

60.00%  

70.00%  

80.00%  

90.00%  

100.00%  

0.00%  

5000.00%  

10000.00%  

15000.00%  

20000.00%  

25000.00%  

30000.00%  

35000.00%  

40000.00%  

0.0001   0.001   0.01   0.1   1  

Thin  AV  cache  hit  rate  

Thin  AV  Overhead  

Proportion  of  accesses  of  novel  files  

Unique  file  ratio   Cache  hit  rate  

Figure 4.6: Proportion of accesses which involved a unique (uncached) file (log10) versusThin AV induced overhead (left axis), and Thin AV cache hit rate (i.e., the chance thatan access was serviced by the cache versus online scanning, right axis). See Table A.1for further details.

0.00%  

20.00%  

40.00%  

60.00%  

80.00%  

100.00%  

120.00%  

0.00%  

500.00%  

1000.00%  

1500.00%  

2000.00%  

2500.00%  

1   10   100   1000   10000   100000   1000000  

Thin  AV  cache  hit  rate  

Thin  AV  Overhead  

Number  of  file  system  accesses  

Thin  AV  overhead   Cache  hit  rate  

Figure 4.7: Number of file system accesses (log10) versus Thin AV induced overhead (leftaxis), and Thin AV cache hit rate (i.e., the chance that an access was serviced by thecache versus online scanning, right axis) See Table A.2 for further details.

76

0.00%  10.00%  20.00%  30.00%  

40.00%  50.00%  60.00%  

70.00%  80.00%  90.00%  

100.00%  

0.00%  

1000.00%  

2000.00%  

3000.00%  

4000.00%  

5000.00%  

6000.00%  

7000.00%  

1   10   100   1000   10000   100000   1000000  

Thin  AV  cache  hit  rate  

Thin  AV  Overhead  

File  size  (Kilobytes)  

Mean  file  size  (bytes)   Median  file  size  (bytes)   Cache  hit  rate  

Figure 4.8: File size in bytes (log10) versus Thin AV induced overhead (left axis), andThin AV cache hit rate (i.e., the chance that an access was serviced by the cache versusonline scanning, right axis). See Table A.3 for further details.

AV performance. By changing the mean of the file size distribution, it can be shown that

there is a distinct peak in the overhead of Thin AV. When the bulk of the files are very

small (i.e., one hundred bytes, or less), the overhead is negligible, then as the average

file size increases so too does the overhead (Figure 4.8). This trend peaks around the

10 MB mark at which point the system overhead begins to decline. At the same time,

changing the average file size shows little impact on the cache hit rate. The reasons for

this behavior will be discussed in Section 4.4.3.

Figure 4.9 shows the proportion of files which are scanned by each of the three scan-

ning services as the mean file size changes. Recall that the scanner priority for Thin AV

is Kaspersky, VirusChief, then VirusTotal. This is evident in the figure as the mean file

size starts off small, and as such, all files are scanned by Kaspersky. Gradually, as the

average file size increases, VirusChief and VirusTotal each become more prevalent. At

the same time, the number of files which were too large to be scanned by any of the

services begins to increase, slowly at first, but with a sharp increase at the 10 MB mark.

77

0.00%  

20.00%  

40.00%  

60.00%  

80.00%  

100.00%  

1   10   100   1000   10000   100000  

Proportion  of  accesses    

scan

ned  by  each  service  

Mean  file  size  (Kilobytes)  

Kaspersky   VirusChief   VirusTotal   Unscanned  

Figure 4.9: Mean file size in bytes (log10) versus the proportion of accesses scanned byeach scanning service, and the proportion of accesses un-scanned. See Table A.3 forfurther details.

0  

20  

40  

60  

80  

100  

0.00%  

100.00%  

200.00%  

300.00%  

400.00%  

500.00%  

0   0.2   0.4   0.6   0.8   1  

Time  (secon

ds)  

 Thin  AV

 Overhead  /  Ca

che  hit  rate  

Proportion  of  file  modification  events  

Thin  AV  Overhead  (%)  

Thin  AV  cache  hit  rate  (%)  

Average  time  between  file  accesses  (seconds)  

Figure 4.10: Proportion of file system events which modify files versus the Thin AVinduced overhead, Thin AV cache hit rate (left axis), and average time between fileaccesses (right axis). See Table A.4 for further details.

78

0.001  

0.01  

0.1  

1  

10  

100  

1000  

10000  

0.001   0.01   0.1   1   10   100   1000   10000  

Thin  AV  Overhead  (%

)  

Average  time  between  file  accesses  (seconds)  

Figure 4.11: Average time between file accesses in seconds, versus Thin AV inducedoverhead (both axes are log10 scale). See Table A.5 for further details.

Next, the impact of file modifications was examined. Here, the probability that

controls whether a file event is a modification or an access, was adjusted from zero to

one, and the resulting impact on Thin AV behavior was recorded (Figure 4.10). The

most direct relationship is the inverse relationship between the modification rate and the

cache hit rate. Additionally, there is an overall decrease in Thin AV overhead as the file

modification rate increases.

Finally, there exists a strong (non-linear) positive relationship between the modifica-

tion rate and the time between file accesses. When the time between file accesses was

manipulated directly, a very clear inverse linear relationship emerges between the mean

inter-access time of a file trace, and the overhead incurred by Thin AV (Figure 4.11).

4.4.3 Discussion

Given that the trends in Figures 4.6 and 4.7 are very similar suggests that one of the key

variables in predicting Thin AV overhead is not the number of accesses or the number

of unique files per se, but rather, the ratio between the number of unique files and the

number of accesses. As that ratio approaches one-to-one, the performance of Thin AV

79

drops off dramatically. The reason for this is intuitive, as the ratio approaches one-to-one,

most accesses will involve files that are not in the cache, and therefore must be uploaded

and scanned. As a result the Thin AV cache becomes increasingly ineffective.

When it comes to the size of the files being scanned, the overhead of Thin AV has

much more to do with specific speeds at which the individual scanning services can return

a scan result. Figures 4.8 and 4.9 show that the overhead from Thin AV is relatively

low when the file sizes being scanned are very small. This is because all of the files

are being scanned with Kaspersky, the fastest of the three scanning services. As the

file size increases, there is a large peak as VirusChief (a much slower scanner) becomes

the dominant scanning engine. Ultimately however, as the file size exceeds the 10 MB

limit imposed by VirusChief, the overhead decreases, because VirusTotal simply accepts

the upload and returns a “waiting” response to Thin AV, which is much faster than

continually polling for a response. Finally, as the mean file size moves past the 20 MB

mark, the Thin AV overhead drops to near zero, as none of the scanning services are

capable of servicing the scan request, and the file access is permitted without scanning.

The results in Figure 4.10 are somewhat misleading. One might conclude that as the

file modification rate increases, so would the performance of Thin AV. This trend is in

spite of the fact that the cache hit rate also appears to be decreasing at the same time. In

reality it is the case that it is the time between file accesses that is actually impacting the

performance of Thin AV. As the proportion of file modifications increases, this naturally

creates an increasing gap between file access events. Because Thin AV only scans on file

accesses, this has the effect of reducing the overall amount of scanning that needs to be

performed. This reduction in the need for scanning goes much farther towards reducing

the overhead of Thin AV than any reduction in performance that occurs from reducing

the effectiveness of the cache. This extremely tight relationship between inter-access time

and system overhead can be seen in Figure 4.11.

80

In summary, these simulations have pointed to three key elements that heavily impact

the performance of Thin AV. First, as the ratio between the number of unique files and

the total number of file accesses approaches one-to-one, performance decreases. Next,

when the mean filesize falls into the 10 MB to 20 MB range, the performance significantly

decreases. Finally, as the gap between accesses increases, either due to a lack of activity

or an increase in the frequency of file modifications, the overhead incurred by Thin AV

decreases.

Given the large range of possible overheads and the variety of factors that influence

those overheads, it is possible that a deployment scenario exists that would make Thin

AV a practical anti-malware tool. First and foremost, such a system would require

a substantial pool of dedicated resources on the server side of the transaction. The

dramatic speed difference between scanning with Kaspersky versus the other services

show what is possible even on an unregulated, freely available resource. If a company

were to offer a dedicated scanning service (either as a standalone offering, or as part

of a larger suite of anti-malware products), it would likely be possible to achieve even

faster scanning speeds than were displayed by Kaspersky. This would be because such

a service would have a smaller user base, and if those users were paying for the service,

there would be a greater incentive for the provider to ensure the service had adequate

resources. If this service were paired to a fast server-side cache similar to VirusTotal, the

performance could be improved even more.

Unfortunately, it is not possible to define a relationship between the number of scan-

ning engines running on a service and the performance of that service. Intuitively it

follows more scanning engines would result in longer scanning times. This could be

consistent with the performance differences between Kaspersky, VirusChief and Virus-

Total, which have 1, 13 and 42 different scanning engines, respectively. However, without

knowing what kind of hardware is underpinning these services, it is not possible to ac-

81

curately gauge the relationship between number of scanning engines and performance.

The performance of CloudAV gives some indication of what might be possible in terms of

performance when hardware constraints are eliminated [82]. In general, it is likely safe to

assume that the best possible performance would be achieved on a system that only used

a single scanning engine. As such, an ideal deployment of Thin AV should be limited

to a single high performance scanning engine, until it can be established that additional

engines can be added without a significant performance penalty.

The ideal deployment on the client computer is a somewhat more challenging issue

as it is largely out of the hands of the service provider. However, based on the file

inter-access time results, and the characteristics of the web and advanced workloads it is

possible to conclude that Thin AV is more conducive to systems that are typically used for

casual internet activities as opposed to more developer-oriented activities. Beyond that,

users could further improve performance by running Thin AV with the passive security

policy (Table 3.1). This would offer a large performance boost, but would come at a

significant price. Specifically, files infected with malware would be allowed to execute,

and users would only be notified of the infection after the fact. Due to the extremely lax

security guarantees offered under this policy, and the fact that Thin AV does not offer a

mechanism for malware removal, this trade-off does not seem equitable. For this reason,

the performance overhead of this scenario was not studied.

82

Chapter 5

System Evaluation - Mobile Thin AV

The evaluation of the mobile version of Thin AV was considerably more challenging than

the evaluation of the desktop version. There are several reasons for this. There is no

established library of Android malware for use by researchers. Porter Felt et al. have

collected information on 18 malicious Android apps circulating in the wild[60], but their

study did not involve the collection of actual malware samples. Therefore, evaluation of

Thin AV was done with a collection of apps downloaded from the official Google Android

market. This data set will be described in greater detail in Section 5.1. However, without

an Android malware data set it was not possible to fully gauge the effectiveness of the

third party scanning engines when scanning Android malware. It was determined that

several of the scanning engines used by Thin AV do, in fact, detect Android malware,

and this issue will be discussed in Section 5.2.

All development and testing was done on the Android emulator provided in the SDK.

This was because it allowed for rapid development on different versions of the Android

operating system, and it allowed for changes to be made to the Android source code.

While it may have been possible to meet these requirements on a rooted Android device,

it would have been a significant gamble as to whether or not compatibility issues would

have arisen given that virtually all commercially available Android devices run a version of

Android that has been modified by the device manufacturer, such as Samsung’s TouchWiz

or HTC’s Sense UI. Further discussion on the performance of the Android emulator can

be found in Section 5.3.

Working on the emulator also presents evaluation issues with respect to network

performance. Because Thin AV is heavily reliant on the network, the connection speed

83

can greatly impact the performance of Thin AV. On a mobile device like a cell phone,

the speed of the cellular connection can be impacted by the location of the user, radio

interference, the load on the cellular network, as well as other factors. Because of the

challenges of involved with network measurements, previous research results from Gass

and Diot, comparing the speed of cellular and WiFi networks were used instead [62].

The remainder of this section is laid out as follows. Section 5.4 will discuss the

evaluation of the ComDroid scanning module. Section 5.5 will provide evaluation of the

best and worst case performance of the Thin AV safe installer. Finally, Section 5.6 will

conclude with an analysis of the cost of running the Thin AV killswitch on an ongoing

basis. Where relevant, the following sections are subdivided into the same protocol-

results-discussion format from Chapter 4.

5.1 Data Set

The data set used for testing consists of 1,022 apps downloaded from Google’s Android

market. To download the apps, a program was created which made use of the un-official

market API [9]. The API provided a means of retrieving the asset IDs of the packages to

be downloaded. These asset IDs were then combined with a set of valid Google account

credentials in order to download the actual packages. An attempt was made to download

the top fifty free apps in each application category, as ranked by user votes on January

3, 2012. The majority of package downloads were successful, with 28 downloads causing

repeated failures. This resulted in 1,022 apps spread across 21 application categories,

with each category having between 46 and 50 packages.

Table 5.1 summarizes the key file size statistics of the data set, while Figure 5.1 shows

the median file size for the apps broken down by application category.

84

Number of Apps 1022Mean App Size 2.65 MBMedian App Size 1.78 MBMinimum App Size 0.02 MBMaximum App Size 37.06 MBProportion of Apps <1 MB 34.64 %Proportion of Apps <10 MB 97.16 %Proportion of Apps <20 MB 99.51 %

Table 5.1: General file size characteristics of the Android test data set.

0

0.5

1

1.5

2

2.5

3

3.5

4

Med

ical

Tool

s

Fina

nce

Med

ia &

Vid

eo

Com

ics

Prod

uctiv

ity

Busi

ness

Pers

onal

izat

ion

New

s &

Mag

azin

es

Wea

ther

Phot

ogra

phy

Book

s &

Ref

eren

ce

Shop

ping

Life

styl

e

Mus

ic &

Aud

io

Spor

ts

Trav

el &

Loc

al

Soci

al

Com

mun

icat

ion

Hea

lth &

Fitn

ess

Educ

atio

n Med

ian

File

Siz

e (M

B)

Application Category

Figure 5.1: Median file size of the Android test data set packages for each Google Marketapplication category.

85

5.2 Malware Detection

The entire collection of Android packages was uploaded to the VirusTotal scanning ser-

vice. This was done for several reasons: first, it would show whether or not any of the

42 scanning engines used by Virus Total detected any malware in the data set. Second,

it had the potential to show how many of the engines were capable of even detecting

Android malware. Third, because VirusTotal includes the Kaspersky scanning engine,

and most of the scanning engines in VirusChief, it would have the effect of testing the

data set on the scanning engines used by the other two third-party scanning services as

well. Finally, if VirusTotal was capable of detecting malicious Android apps, it would be

the most preferable of the three scanning services for the mobile implementation of Thin

AV. This is because VirusTotal’s slow response time is considerably less of an issue on

Android because the set of possible inputs (packages downloaded from various markets)

is tiny and relatively predictable in comparison to the near infinite array of files that

might be seen in the desktop implementation. This allows for the possibility of priming

VirusTotal with packages from various markets. The details of this deployment scenario

will be expanded upon in Section 6.2.

In a very surprising result, VirusTotal flagged several possible instances of malware

in the data set downloaded from the Google market. Of the 1,022 apps uploaded, 1,019

were scanned (with three being skipped due to size restrictions). Of the 1,019 scanned

packages, 27 were flagged as malware by at least one scanning engine. One package was

flagged as malware by four different engines, nine packages were flagged by two engines,

and the remaining seventeen packages were flagged as malware by a single engine. Table

5.2 provides details on some of the commonly flagged samples. The most commonly

identified sample was from the Adware.Airpush family. However, the majority of these

samples were identified by a single scanning engine (DrWeb), which raises the possibility

86

Sample Name Malware Type Occurrences Detection Engine(s)Adware.Airpush(2, 3) Adware 15 DrWeb, KasperskyPlankton (A, D, G) Trojan 6 Kaspersky, Comodo, NOD32, TrendMicroSmsSend (151, 261) Dialer 2 DrWebRootcager Trojan 2 Symantec

Table 5.2: Most frequent samples of malware detected in Google Market data set. De-tection engine refers to which VirusTotal scanning engines detected the sample.

of this being a false positive. The next most common sample was Plankton, which was

identified by a variety of scanning engines. The remaining malware samples had far fewer

occurrences in the data set.

While Google has themselves admitted to finding malicious apps in their market [40],

it was very surprising to find numerous possible instances of malware in Google’s official

Android market. It is even more surprising considering that these apps were selected for

the data set because they were among the fifty most popular apps in their respective

categories on the day they were downloaded. This only serves to reinforce the problem

presented by mobile malware, and if the official Google Market can fall victim to this

issue, it is worrying to consider the prevalence of malware in third-party markets.

The detection of malware in the data set shows that Thin AV, in its current form,

can take advantage of existing third-party scanning services to prevent the installation

of malware on Android devices. While the test data set only suggested that 6 of the AV

engines used by VirusTotal are capable of detecting Android malware, follow-up research

showed that as many as 26, or more than half of the scanning engines in VirusTotal are

capable of detecting some form of Android specific malware [27].

87

5.3 Emulator Performance

As mentioned above, all development and evaluation of Thin AV was done on the Android

emulator. In order to provide context for the performance results taken from the emu-

lator, it was necessary to assess the performance of the Android emulator as compared

to a physical Android device. The Java based numerical benchmark SciMark [17] was

ported to run on an Android device. The benchmark consists of five CPU-bound tasks:

Fast Fourier Transforms (FFT), Jacobi Successive Over-relaxation (SOR), Monte Carlo

integration (MC), Sparse matrix-multiply (Sparse), and dense LU matrix factorization

(LU). The specific details of each test can be found in [18]. The benchmark was then

modified to test the speed of sequential reads and writes. This was necessary to include

as the emulator uses the RAM and hard disk of the host device for storage, whereas

Android devices use flash memory.

The benchmark was run on the Android emulator as well as three different physical

Android devices. Table 5.3 shows the results of the benchmark for each device. It is

clear from these results that any performance testing done on the Android emulator will

represent a lower bound on the performance of a production deployment of Thin AV.

In general, the Android emulator running on a modern computer is about an order of

magnitude slower than the same operation computed on a contemporary Android device.

5.4 ComDroid Evaluation

The ComDroid scanning service was added to Thin AV both to further demonstrate the

modularity and extensibility of the Thin AV architecture, as well as to add a scanning

service that was specifically targeted at Android applications. This section discussed the

evaluation of the ComDroid module.

88

Emulator

Sam

sungGalax

yS

HTC

Desire

HTC

Evo

3D

Hardware

See

3.2

Sam

sungExy

nos

3110

Qualcomm

QSD8250

Qualcomm

MSM8660

ARM

CortexA8@

1GHzSnap

dragon@

1GHzDual-C

ore@

1.2GHz

512MB

RAM

576MB

RAM

1GB

RAM

OSVersion

2.3.3

2.3.3

2.2

2.3.4

Average

Score

3.039

19.457

37.909

41.771

FFT

Score

1.750

12.542

32.129

35.406

SOR

Score

4.584

32.825

72.464

81.101

MC

Score

0.656

6.357

7.970

5.433

Sparse

Score

4.290

15.398

31.019

26.114

LU

Score

3.916

30.164

45.965

60.800

Write

(MB/s)

11.062

9.597

12.438

37.879

Read(M

B/s)

19.802

112.36

121.951

192.308

Tab

le5.3:

Com

parison

ofbenchmarkscores

fortheAndroid

SDK

emulatoran

dthreedifferentphy

sicalAndroid

devices.

Benchmarkconsistsof

five

CPU-bou

ndtasks:

FastFou

rier

Transforms(F

FT),

JacobiSuccessive

Over-relaxa

tion

(SOR),

Mon

teCarlo

integration(M

C),

Sparse

matrix-multiply

(Sparse),

anddense

LU

matrixfactorization(LU).

Thebenchmark

also

measuresthemeanspeedof

sequ

ential

read

ingan

dwritingfrom

flashmem

ory.

For

allbenchmarks,higher

values

are

better.

89

ComDroid f(x) = 0.0132× x+ 9.6893

Table 5.4: Linear equation for the ComDroid scanning service.

5.4.1 Testing Protocol

The ComDroid module was tested in a manner somewhat similar to the other three

scanning modules from earlier in this chapter. All 1,022 of the apps described in Section

5.1 were uploaded to the ComDroid scanning service. Roughly half of the uploads were

performed on January 26 with the remainder being uploaded on January 27, 2012. For

each upload the time required for ComDroid to return a result was recorded; additionally,

the scan report for each of the applications was saved.

5.4.2 Results

Of the 1,022 packages uploaded to ComDroid, 993 were scanned, with the remainder

being rejected by the server due to a 10 MB size limitation. Of the 993 packages scanned

by ComDroid, 8 returned a scan error, resulting in 985 valid scan results. The mean

response time was 40.67 seconds (σ = 77.60 seconds), and the median response time was

18.63 seconds. Figure 5.2 shows the response time plotted as a function of the package

size, and the exact function is specified in Table 5.4. It is clear there is some positive

linear relationship between package size and scan time, although numerous outliers are

apparent.

The vast majority of packages, 971 of 985 (or 98.6%) show some potential for exposed

communication. Table 5.5 provides a summary of the exposed communication found

within the testing data set.

90

0  

100  

200  

300  

400  

500  

600  

700  

800  

900  

1000  

0   2000   4000   6000   8000   10000   12000  

Time  (s)  

App  Size  (in  KB)  

Figure 5.2: Reponse time of the ComDroid service as a function of package size.

Type of Warning Packages Occurrences AverageAction Misuse 331 (33.6%) 5640 17.0Possible Activity Hijacking 961 (97.6%) 14671 15.3Possible Malicious Activity Launch 481 (48.8%) 2200 4.6Possible Broadcast Theft 501 (50.9%) 4630 9.2Possible Broadcast Injection 613 (62.2%) 3703 6.0Possible Service Hijacking 261 (26.5%) 980 3.8Possible Malicious Service Launch 167 (17.0%) 315 1.9Protected System Broadcast w/o Action Check 108 (11.0%) 134 1.2

Table 5.5: Break down of exposed communication found by ComDroid in the testingdata set. The packages column refers to the number of applications with at least oneinstance of a given warning type. The occurrences column refers to the total numberof potentially exploitable attack surfaces that exist for a given warning type, within theentire data set. The average column is the average number of occurrences per package.For a complete explanation of the types of warnings see [45].

91

5.4.3 Discussion

The performance of the ComDroid service is somewhat similar to both the Kaspersky and

VirusChief services (Section 4.1.2). However, the linear trend is much less prominent.

The most likely explanation for this lies in the nature of the analysis performed by

ComDroid. ComDroid is a static code analysis tool, and as such, it is safe to assume

that the time required to analyze an Android app has more to do with the amount of

code in the package, than the total size of the package. Given that many apps contain

numerous resource files (images, sounds, video, etc.) which are not scanned by ComDroid,

it is easy to imagine how a package might have a large size, but a relatively small amount

of code, resulting in a much faster scan than for an app with a large amount of code and

few resource files. It is quite likely that the observed linear trend is much more a result

of the upload time and not the code-to-resource-file ratio of the package.

The prevalence of exposed communication within the data set seems very high, with

less than 3% of packages not reporting any errors. However, interestingly, all values in

the packages column of Table 5.5 are within 10% of the values reported by Chin et al. in

[45], suggesting that their initial findings were fairly representative of a larger data set.

The pervasiveness of programming errors detected by ComDroid suggest that in its

current form, simply flagging an application as being “at risk” if there is any instance of

exposed communication would be overkill. It would effectively cripple the ability of users

to install apps on their device. As was pointed out in [45], a manual inspection of a subset

of warnings found that only about 10-15% of warnings were genuine vulnerabilities. This

does suggest that there is a place for ComDroid in the Thin AV architecture. However,

the behavior of this Thin AV module would likely have to be adjusted over time to

prevent excessive false positives. This could be done by creating thresholds which would

flag a package as vulnerable if it had significantly more exposed surfaces than average

for a given type of warning.

92

Network Configuration Upload Speed (KBps) Download Speed (KBps)Typical 3G 16.25 84.13Ideal 3G 1792.00 1792.00Typical WiFi 190.38 155.38Ideal WiFi 76800.00 76800.00

Table 5.6: Network speeds used for evaluating the mobile implementation of Thin AV.

5.5 Safe Installer Performance

The first line of defense provided by Thin AV is the safe installer, which checks for

malicious apps at install time. The performance of the Safe Installer is based on three

factors: the size of the package being scanned, the speed of the network to which the

device is connected, and whether or not the package being installed has already been

scanned by Thin AV.

For the purposes of evaluating the safe installer, three different file sizes (small,

medium, and large) were chosen: 0.76 MB, 1.78 MB, and 3.56 MB, corresponding to

the median size of apps in the category with the smallest median size (medical apps),

the median size for the entire data set, and the median size of apps in the category with

the largest median size (educational apps).

Additionally, four different network configurations will be examined. These will be

referred to as “Ideal 3G”, “Typical 3G”, “Ideal WiFi”, and “Typical WiFi”. The speeds

for each of these configurations (listed in Table 5.6) have been taken from [62] and [67].

The best case scenario for the performance of the safe installer is one in which the

package being installed has already been scanned by Thin AV. There are several reasons

why this could occur, and they will be discussed in Chapter 6. In this case, the cost

for performing an install time check is equal to the time required to hash the installing

application, send the hash to Thin AV, look up the scan result, and return the scan

result.

93

Network Configuration Small File Medium File Large FileIdeal 3G 0.034 s 0.232 s 0.293 sTypical 3G 0.041 s 0.239 s 0.300 sIdeal WiFi 0.034 s 0.231 s 0.293 sTypical WiFi 0.035 s 0.233 s 0.294 s

Table 5.7: Time required to check an package in Thin AV, for three different file sizes,and four different network configurations, assuming the scan result is already cached byThin AV

The time required to hash a small, medium, and large application on the Android

emulator was measured, and the average of five runs was taken for each size. The small

file took 0.033 seconds to hash, the medium file took 0.231 seconds to hash and the large

file took 0.293 seconds to hash. The total amount of data uploaded and downloaded for

transmitting the hash and receiving the result was recorded. This was approximately

200 bytes (100 up, 100 down), although this amount varied slightly with the file being

scanned. Finally, the cost of Thin AV performing a cache lookup was examined in Section

4.3.1, and so here too, the cost of Thin AV performing a lookup from cache will be taken

to be 0.0002 seconds. Table 5.7 summarizes the results for this best case scenario. In

general, even the largest file over the slowest network only takes 0.3 seconds to check

with Thin AV.

The worst case scenario is one in which the application being installed has not been

scanned by Thin AV, and the whole package must be uploaded to Thin AV, which must

then upload the package to one or more of the third-party scanning services. Using the

formulæ in Tables 4.10 and 5.4, and the file sizes and network speeds above, it is possible

to compute the time required to upload and scan these files at install time. Because

the time spent uploading and scanning a file will dwarf the time required to hash an

application and upload that hash, these costs will not be included in the calculation.

It should be noted that when calculating the time required to scan a package both

94

Network Configuration Small File Medium File Large FileIdeal 3G 36.56 s 98.13 s 170.29 sTypical 3G 84.66 s 210.00 s 394.14 sIdeal WiFi 36.13 s 97.14 s 168.31 sTypical WiFi 40.23 s 106.68 s 187.39 s

Table 5.8: Time required to check an package in Thin AV, for three different file sizes,and four different network configurations, assuming the scan result is not cached by ThinAV.

the time to scan with the appropriate anti-virus scanning service and the time required

for scanning with ComDroid must be added. This is because as ComDroid is currently

configured, ComDroid is run in addition to scanning the package with the appropriate

anti-virus scanner for the size of the package. The performance drawbacks of this con-

figuration are obvious. However, it does mean that the results presented in this section

represent a highly conservative estimation of possible Thin AV performance in a produc-

tion deployment.

Table 5.8 summarizes the results for this worst case scenario. In general, the time

required to upload and scan an Android package ranges from a low of 36 seconds to a

high of almost 400 seconds, depending on the size of the file and the speed of the network.

The best case scenario, where Thin AV already has a cached scan result, is extremely

fast. At 0.3 seconds, this check would be unnoticeable to a user. On the other hand,

if the file needed to be uploaded and scanned, this process could take as long as 400

seconds, or almost seven minutes. This could be seen as a serious inconvenience to the

user, but considering that this check would only take place when a user is installing an

unknown app, it is not likely to be a frequent occurrence. Additionally, given that Thin

AV could be primed with packages from a variety of sources, including regular downloads

of applications from various application markets, upload of applications by developers,

and the upload of applications by other users running Thin AV, the chance that a user

95

would have to upload a package for scanning at install time could be made very rare. So,

while the worst case scenario is not ideal, it is not likely to be a frequent occurrence.

Finally, while not a specific performance test, an end-to-end functionality test was

run in which the Thin AV safe installer correctly blocked the installation of an app from

the testing data set which was flagged as malware by VirusTotal.

5.6 Killswitch Cost

During normal operation, the most frequently used functionality of Thin AV would typ-

ically be the killswitch service which is periodically activated and checks for revoked

apps. To evaluate the performance of the killswitch, several factors must be examined:

the cost of hashing apps to generate a system fingerprint, the network cost associated

with uploading the fingerprint, the cost of looking up the hashes in Thin AV, and the

network cost associated with returning those hashes to the client. The last and most

costly aspect of the killswitch is the manual upload feature, because this is the only time

when the killswitch should incur any cost for scanning a package. This is because it is

assumed that any missing packages will be scanned by Thin AV when they are uploaded

by the safe installer at the time of installation.

This section will examine the cost of the killswitch under normal operation, as well as

the cost of manually uploading missing packages. The normal operation will be assessed

in two parts, the cost of generating a system fingerprint, and the cost of sending and

receiving the system response. The cost of Thin AV performing a cache lookup will

again be taken to be 0.0002 seconds per cryptographic hash.

In general, the time required for the killswitch to perform a check for revoked apps

will be:

96

T ime = hashing time+hash upload size

upload speed+cache lookup time+

response download size

download speed

(5.1)

Because the cost of performing a manual upload of missing packages is dominated

by upload and scanning costs (similar to the safe installer above), only these costs will

be included in the calculation. The time required for the killswitch to manually upload

missing packages will be:

T ime =package upload size

upload speed+ scanning time (5.2)

Similar to the safe installer, the time spent scanning is the sum of the time scanning

with the appropriate anti-virus service as well as scanning with ComDroid.

5.6.1 Testing Protocol

To test the performance of the hashing function the top five apps (by user popularity

ranking) from each of the 21 market categories were installed on the Android emulator

(i.e., all of the top apps from each category were installed, then the top two apps, then

the top three, etc.).

A complete system fingerprint was generated ten times, and the average was taken, af-

ter which the local fingerprint cache was deleted. This represents the worst case scenario,

one in which none of the apps on the device have been hashed before, and all hashes must

be computed. Next, another ten fingerprints were generated and the average was taken,

but this time, the cache was left intact. This represents the best case scenario: one in

which all of the apps on the phone have already been hashed and the phone fingerprint

is stored locally. Along with the fingerprint generation time, the size of the fingerprint,

and the size of the server response to sending that fingerprint were recorded. This way

the data consumption of the killswitch can be evaluated.

97

Uncached f(x) = 0.0895× x− 0.27Cached f(x) = 0.0028× x+ 0.164

Table 5.9: Linear equation for generating a system fingerprint for the number of bytesworth of apps on a device, for both the cached and uncached scenarios.

Under normal use it is likely to expect that the typical scenario would in fact be the

best case scenario, or very close to it. After the first fingerprint has been generated, the

only time an app will have to be hashed is when it has not been seen by the killswitch,

meaning it has just been installed. Unless a user installs numerous apps between the

scheduled runs of the killswitch, it is likely the number of apps that need to be hashed

would be near zero.

Combining the hashing performance with the file size data for the data set, the scanner

performance functions in Tables 4.10 and 5.4, and the experimental network performance

measurements from [62], the cost of performing manual uploads, as well as the cost of

fingerprinting can be calculated using Equations 5.1 and 5.2.

5.6.2 Results

Figure 5.3 shows the best (cached) and worst (uncached) case scenarios for the fingerprint

generation time as a function of both the number of packages on the device and the total

size of those packages. Table 5.9 shows the linear equations for both the cached and

uncached functions for the total number of bytes worth of apps on a device.

It is clear that time to generate a system fingerprint grows in a mostly linear way with

both the number and size of packages on the device. In the worst case, with 110 apps

on the device, it only takes 29.95 seconds to generate a system fingerprint. However,

the best case scenario is dramatically better, with a fingerprint being generated in 1.09

seconds for the same 110 apps when the fingerprint has been cached.

98

0  500  1000  1500  2000  2500  3000  3500  4000  

0  5  

10  15  20  25  30  35  

0   20   40   60   80   100   120  

Data  Tran

smitted  (B)  

Time  to  Generate  Fingerprint  (s)  

Number  of  Packages  

Uncached   Cached   Data  Transmitted  

(a)

0  

5  

10  

15  

20  

25  

30  

35  

0   50   100   150   200   250   300   350  

Time  to  Generate  Fingerprint  (s)  

Total  Size  of  All  Packages  (MB)  

Uncached   Cached  

(b)

Figure 5.3: Time required to generate a complete system fingerprint as a function ofthe number of packages installed on the device (a) and the total size of those packages(b). Both figures show the average time when all of the package hashes have been stored(cached) and when none of the package hashes are stored (uncached). Figure (a) alsoincludes the number of bytes sent and received when communicating the fingerprint tothe ThinAV server.

99

Interval Data Consumption (5 Apps) Data Consumption (110 Apps)1 Day 24.47 KB 349.41 KB1 Week 171.28 KB 2.39 MB1 Month 5.19 MB 74.04 MB

Table 5.10: Data consumption of Thin AV killswitch over different time periods, for 5and 110 apps installed on the device, assuming the killswitch is scheduled to run everyfifteen minutes.

Data usage grows linearly with the number of packages on the device. The data

consumption ranges from 3.64 KB for 110 apps, down to 261 bytes for 5 apps. The

majority of this transmission is in the form of the uploaded fingerprint, as the response

from Thin AV only downloads 70 bytes from the server. This is for a fingerprint that

included no hashes corresponding to malicious apps, however.

Under the current configuration the killswitch is scheduled to generate a system fin-

gerprint every 15 minutes. Table 5.10 shows how much data would be consumed by Thin

AV (as it is currently configured) over different lengths of time.

Using the same network measurements from Section 5.5, the measured fingerprint

generation times, and data transmission totals, it is possible to compute a variety of

potential running times for the entire fingerprinting operation of Thin AV killswitch

using Equation 5.1. These values are summarized in Table 5.11.

For calculating the cost of manually uploading missing packages, two additional as-

sumptions must be made, first about the number of packages being uploaded, and the

size of those packages. The three different package sizes will be the same as those used in

Section 5.5. These app sizes will be used to examine the case where 10, 25 and 50 apps

were being uploaded to the Thin AV service for scanning. These scenarios will again be

examined from the four different network configurations seen in the previous section.

Table 5.12 summarizes the total amount of data that would be uploaded for different

numbers of apps of different sizes. The key assumption underpinning this table is that the

100

Scenario Time (seconds)110 apps over an ideal 3G connection with no hashes cached 26.206110 apps over a typical 3G connection with no hashes cached 26.430110 apps over an ideal WiFi connection with no hashes cached 26.204110 apps over a typical WiFi connection with no hashes cached 26.223

110 apps over an ideal 3G connection with all hashes cached 3.424110 apps over a typical 3G connection with all hashes cached 3.478110 apps over an ideal WiFi connection with all hashes cached 3.423110 apps over a typical WiFi connection with all hashes cached 3.428

26 apps over an ideal 3G connection with no hashes cached 1.03426 apps over a typical 3G connection with no hashes cached 1.25826 apps over an ideal WiFi connection with no hashes cached 1.03226 apps over a typical WiFi connection with no hashes cached 1.051

26 apps over an ideal 3G connection with all hashes cached 0.28526 apps over a typical 3G connection with all hashes cached 0.33926 apps over an ideal WiFi connection with all hashes cached 0.28526 apps over a typical WiFi connection with all hashes cached 0.290

Table 5.11: Time required to complete the fingerprinting operation for different numbersof applications, network performance, and caching scenarios. The definitions of “typical”and “ideal” for each connection type is the same as in 5.5.

Total Data Uploaded (MB)Scenario 10 Apps 25 Apps 50 AppsSmall Apps 7.643 19.108 38.216Medium Apps 17.775 44.438 88.875Large Apps 35.570 88.925 177.850

Table 5.12: Total upload sizes used for calculations of bulk scanning performance.

101

Upload Time (Seconds)Scenario 10 Apps 25 Apps 50 Apps

Ideal 3G ConnectionSmall Apps 4.367 10.919 21.837

Medium Apps 10.157 25.393 50.786Large Apps 20.326 50.814 101.629

Typical 3G ConnectionSmall Apps 485.360 1213.400 2426.801

Medium Apps 1128.781 2821.953 5643.907Large Apps 2258.833 5647.082 11294.164

Ideal WiFi ConnectionSmall Apps 0.102 0.255 0.510

Medium Apps 0.237 0.593 1.185Large Apps 0.474 1.186 2.371

Typical WiFi ConnectionSmall Apps 41.111 102.777 205.553

Medium Apps 95.609 239.023 478.046Large Apps 191.326 478.315 956.630

Table 5.13: Upload times for the values in Table 5.12, for four different network config-urations.

Scanning Time (Seconds)Scenario 10 Apps 25 Apps 50 AppsSmall Apps Scanned With Kaspersky 161.021 402.552 805.104Medium Apps Scanned With VirusChief 634.032 1585.080 3170.159Large Apps Scanned With VirusChief 1104.893 2762.232 5524.465Small Apps Scanned With ComDroid 107.224 252.563 494.796Medium Apps Scanned With ComDroid 120.919 266.259 508.491Large Apps Scanned With ComDroid 144.972 290.312 532.544

Table 5.14: Scan times for different numbers of apps with small, medium and large sizes,using conventional scanning engines (Kaspersky and VirusChief) and the Android-specificscanner, ComDroid.

102

sizes for a given number of apps are assumed to be the same (e.g., ten small apps would

be 10 × 0.764 MB). Using these total upload sizes, the upload times can be calculated

based on the network speeds specified in the previous section. The upload times for the

different numbers and sizes of apps are summarized in Table 5.13. Using the size and

quantity of each app, the scanning time could then be computed using the equations in

Tables 4.10 and 5.4. These results are summarized in Table 5.14. Finally, referring to

Equation 5.2, it is possible to compute the time required to upload and scan missing

apps under different scenarios.

The best case scenario is when ten small apps are uploaded and scanned over an

ideal WiFi connection. In this case the total operation would take 289.2 seconds, or just

under five minutes. The worst case scenario is one in which fifty large apps are uploaded

and scanned over a typical 3G connection. This operation would take 17351.2 seconds,

or nearly five hours. However, if the same operation is performed over a typical WiFi

connection, the time required to complete this one-time operation drops by more than

half, to 1.95 hours.

5.6.3 Discussion

From both a time and data consumption perspective, Thin AV has a relatively minor

impact on an Android device. Fingerprinting is the only operation that would likely

take place with any frequency during long-term use. Given the best case scenario for the

killswitch, 1 second of computation followed by less than 4 KB of data transmission for

all 110 apps, it is likely that this operation would be unnoticeable to a user. Furthermore,

given that these tests were performed on the Android emulator, the fingerprinting would

almost certainly take considerably less time on a physical Android device.

In terms of data consumption, the 74 MB a month for uploading the fingerprint of

110 apps is not trivial. However, given that cellular carriers offer data plans ranging

103

from 500 MB a month to unlimited data usage, the impact of Thin AV would represent

a small fraction of a user’s allotted data consumption for a given month. Furthermore,

it would be possible to reduce the amount of data consumed by a large fraction simply

by reducing the frequency with which the killswitch is run, and by removing extraneous

bytes from the messages sent and received by ThinAV.

It should be noted that the above results assumed that Thin AV has already scanned

the packages present on the mobile device. This assumption is reasonable considering

that the killswitch is intended to operate in conjunction with the safe installer. This

means that any app installed on a device that has not been scanned by Thin AV would

be uploaded and scanned at install time, as a consequence, the scan result would already

be present in Thin AV when the killswitch is later run. The one exception to this case

would be if the killswitch is installed on a device after several other apps have been

installed. In this case, the one-time upload and scanning of missing apps must take

place. The worst case performance for this operation is quite poor. Assuming fifty apps

were uploaded, each roughly 3.7 MB in size, sent over a typical 3G connection, it would

take nearly five hours to complete the operation. However, the same operation over a

typical WiFi connection would take less than two hours. Considering that this is a one

time operation, and it is at the user’s discretion when this operation takes place, these

results are not unreasonable. A user could simply initiate the upload over their home or

office WiFi network when their phone is charging.

In general the long term performance impact of using the Thin AV killswitch is quite

favorable.

104

Chapter 6

Discussion

This chapter will discuss some of the broader issues pertaining to Thin AV and the possi-

ble use of Thin AV as a production scale service. For discussions on specific experimental

results see the discussion subsections of Chapter 4. Section 6.1 of this chapter will talk

about the feasibility of Thin AV, specifically where the system succeeds and where it

fails. The different ideal deployment scenarios for both the mobile and desktop versions

of Thin AV will be discussed in Section 6.2. Finally the privacy concerns that would

come along with using Thin AV will be expanded upon in Section 6.3.

6.1 Thin AV Performance and Feasibility

After a thorough evaluation of Thin AV in both a mobile and a desktop environment, it

can be concluded that the desktop prototype of Thin AV is, at best, marginally successful

and in its current form, not highly feasible. Conversely, the mobile prototype, even in its

unpolished state, demonstrates a highly feasible mechanism for protecting smartphones

from malware.

There are two key factors that seriously impact the performance of Thin AV, and

keep it from being a truly feasible concept on the desktop: the size of the input space,

and the frequency of file access. Because Thin AV is not selective about the files it scans,

any file on the stacked file system will be uploaded to Thin AV if it is accessed. This

creates an extraordinarily large input space, only a portion of which is even remotely

predictable. This large input space presents a significant challenge by itself. However,

when combined with the fact that the files uploaded to Thin AV are often accessed several

105

at a time, and in rapid succession, such as when launching a program, it makes for a

very underwhelming experience for a user.

Fortunately, the aspects of Thin AV that made for slow performance in the desktop

implementation do not exist in the mobile environment. Because virtually all Android

malware comes in the form of malicious applications, the scanner input space is massively

reduced. It is not necessary to scan every individual file access, and even if it were

necessary, it would not be possible without fundamentally violating the Android security

model. Instead only applications need to be scanned. This application-centric design

does create a very different security model than the conventional file scanning model

present in the desktop version of Thin AV. However, enhancing the existing Android

security framework is preferable to violating the framework in the hopes of creating a

more direct comparison with the desktop security model. Furthermore, by opting for an

app-centric approach on Android, and a file-centric approach on Linux it compares and

contrasts two current and realistic scenarios for system security, as opposed to examining

a pair of hypothetical (but more similar) scenarios.

By modifying the Android package installation code, it was possible to check for ma-

licious code in an application before it was installed on the phone. Furthermore, because

of the vastly reduced input space, it is even possible to make predictions about what

applications will be installed, namely, the applications that exist in major application

markets, both official and third-party. Being able to predict which applications will be

installed allows for these apps to be proactively downloaded and scanned, allowing for the

quick return of cached results when performing application checks. This safe installation

mechanism, combined with a background killswitch, effectively work together to prevent

the installation of malicious apps, and prompt the removal of apps if they are found to

be malicious after they have been installed, all with minimal ongoing cost in computing

time and network bandwidth.

106

Another area in which the mobile version of Thin AV out-performs the desktop version

is in the area of connectivity. While most desktop computers are increasingly moving

towards having persistent connectivity, this is not always guaranteed. When internet

connectivity is unavailable, Thin AV can only function in a passive mode, simply allowing

access to files and not scanning them. It might be possible to offer some protection in this

scenario by logging file accesses for later scanning when an internet connection is available,

but this scenario was beyond the scope of this research. In a mobile environment, this

issue does not exist, a smartphone is by its very nature, intended to be a persistently

connected device. While it is possible to lose data connectivity due to lack of service,

this would not be problematic for Thin AV because while it would not be possible to

communicate with Thin AV, it would also not be possible to download packages which

would require verification by Thin AV. It might be possible to envision a problematic

scenario where a user downloads an Android application, but does not install it until a

later time when network connectivity is unavailable. While such a scenario does present

a problem for the safe installer, the killswitch would be capable of detecting a malicious

app once connectivity was restored.

6.2 Ideal Deployment

The goal of this research has been to determine if a cloud-based security-service would

be feasible or appropriate for providing protection from malware on either a desktop

computer or a mobile device. The performance of the desktop and mobile Thin AV

prototypes suggest that such a system is definitely feasible for mobile devices, and with

significant changes, possibly feasible for desktop systems. Due to time and hardware

limitations, the prototypes that were built are, at best, rough implementations of a much

grander vision. To be truly useful as a mechanism for malware protection, a variety of

107

changes would have to be made to both the desktop and mobile systems. Section 6.2.1

will discuss the ideal deployment scenario for the desktop version of Thin AV, while

Section 6.2.2 will discuss the mobile version of Thin AV.

6.2.1 Desktop Deployment

It is clear from the performance experiments in Chapter 4 that the desktop version of Thin

AV has some significant performance impediments. There are four areas in which the

performance of Thin AV could be improved, potentially leading to a practical production

deployment.

The most basic change would be the development platform for Thin AV. The proto-

type was built using Python, an interpreted language that is not ideal for performance

intensive tasks. Python was chosen because it allowed for rapid prototyping. Addition-

ally, Python provides a number of feature-rich libraries which greatly increased the speed

of development. However, it is quite likely that some modest performance gains could

be realized by re-developing Thin AV in a compiled language such as C or C++. These

performance gains would be the most noticeable when accessing files in the Thin AV

cache, which is an extremely common occurrence.

The second area for improvement would be to allow the Thin AV client to selectively

filter the files that it sends for scanning. Bayer et al. [36] provides an overview of the

host and network behavior of a large corpus of Windows malware. If a comparable data

set were available for Linux systems, it could be used to inform the development of a

filter for Thin AV. Such a filter would cause Thin AV to be more selective about the files

it scans, rarely scanning files which typically pose a low risk of containing malware, and

more regularly scanning files which do carry such a risk.

Other areas for improving the performance of Thin AV are the network and scanning

performance. CloudAV showed impressive speed when uploading and scanning files [82].

108

This is not surprising considering that CloudAV was deployed in a university computer

lab. With a limited number of users, accessing a dedicated service over a local area

network, both the speed of file transfers and the speed of file scanning could be minimized.

Thin AV was an attempt to see how well this concept could be extended to a wide area

network. In order to realize a greater degree of performance, some of the benefits of

CloudAV would have to be applied to Thin AV. Most notably, Thin AV could vastly

benefit from running on a dedicated hardware platform with ample resources. In this

scenario, Thin AV would no longer rely on the specific third party scanning services used

in the prototype, but would have a hardware and software configuration much more like

CloudAV. It is not unreasonable to imagine a major anti-virus vendor providing such a

dedicated service to its customers either over the internet, or as a network appliance.

The last area in which Thin AV can improve is, not surprisingly, by operating over

faster network connections. For years, internet connection speeds have been increasing

for both home and business customers. While there are likely limitations to what sorts

of speeds are ultimately feasible, it is safe to say that in the short term, the speed of the

average internet connection will likely increase. Such speed increases can only serve to

improve the performance of Thin AV.

Given the success of Thin AV on the mobile platform compared to the desktop envi-

ronment discussed in Section 6.1, it is worth asking if it would be possible to make the

desktop operating system more like the mobile operating system, so that desktops could

reap the benefits of Thin AV. In general this is not an unreasonable proposition. The

desktop version of Thin AV was built on top of Ubuntu, while the mobile version was

built on Android, both of which are Linux-based operating systems. The key advantage

offered by Android is the application sandboxing which prevents application vulnerabil-

ities from compromising entire systems. Unfortunately, this sandboxing comes with a

price. It limits the interactions that are possible between applications. Android partially

109

solves this by providing a framework for application interaction. However, such a highly

sandboxed desktop operating system would surely require a major shift in the mindset

of users. Although, in-roads are already being made in this general direction, with Ap-

ple’s introduction of the Mac App Store [3] and Google’s work on Chrome OS and the

complementary Chrome Web Store [88, 6]. Both of these initiatives appear to be moti-

vated by the desire to make a simpler, more application-centric, and more user-friendly

desktop computing experience. However, this paradigm shift might bring with it some

very tangible security benefits.

6.2.2 Mobile Deployment

Unlike the desktop prototype of Thin AV, the mobile implementation is considerably

closer to an effective production scale system. Given the relatively low volume of files

that need to be scanned, it is quite possible to use the existing third party scanning

services in a production capacity. Obviously this presents a variety of challenges, not

least of which is the fact that Thin AV is completely reliant on the continued existence

of these scanning services in order to provide continued protection. However, similar to

the ideal desktop deployment scenario above, there is no reason why an anti-virus vendor

couldn’t provide a remote scanning service to subscribing customers. However, even if

this was not a feasible option, there are still several improvements which could allow

Thin AV to function as a more complete system.

The greatest performance boost to Thin AV would come from having a pre-populated

Thin AV cache. As stated previously, this is a much more realistic expectation on a mobile

device as opposed to a desktop computer. Because application markets (both official and

third-party) will likely continue to be the first stop for users seeking applications, the

apps users will be installing can be pre-scanned by Thin AV. In much the same way that

a large selection of apps were downloaded from the Google Market, and scanned with

110

VirusTotal for the purposes of evaluating Thin AV, a system could be developed which

would regularly crawl a variety of markets and download new and popular applications

and scan them with the different scanning services. This way, when Thin AV users go

to install these apps, they will already exist within the cache of the service, negating the

need to upload and scan the file, and thus vastly increasing the performance of the safe

installer and the killswitch. Another avenue for pre-populating the cache could also be

application developers. Thin AV could incorporate a tool allowing developers to upload

their application packages as part of the publication process.

Another area in which Thin AV has potential is in the extensibility of the system.

Currently, the addition of a new scanning module does require some very limited code

modification to the main Thin AV system, but it might be possible to modify Thin

AV such that these modifications could be removed, or at least moved to an external

configuration file. This would make it much easier in the future to develop and add

scanning modules for different services. This in-turn leads to a compelling scenario, one

in which users, developers, and companies can create their own Thin AV-compatible

scanning modules to interface with their various service offerings, be they application

blacklists, static code analysis tools, application permission analyzers, social reputation

tools, or mobile-specific anti-virus scanners. This could lead to a scenario where Thin

AV is a highly configurable service, and users of the service could configure their Thin

AV clients to specify which scanning modules they want Thin AV to use when uploading

packages via the safe installer or killswitch.

The final change that would be necessary in order to fully realize Thin AV on Android

would be the addition of a mechanism that allowed Thin AV to interrupt and prevent

package installations, without modifying the operating system source code. This is a

very challenging requirement, as such a mechanism, if poorly implemented, could do far

more harm than good, by offering a means for malware to prevent the installation of

111

legitimate applications. A potential solution to this issue might be for Google to allow

applications to use such a mechanism only if the developer of the application is trusted

or in some way certified by Google.

6.3 Privacy

Making use of a third party scanning service carries with it some privacy concerns. While

the mobile version of Thin AV carries some very limited privacy concerns which will be

outlined in Section 6.3.2, it is the desktop version that poses the most serious privacy

concerns. These concerns will be discussed in Section 6.3.1.

6.3.1 Desktop Privacy

Because the files scanned by Thin AV are passed along to third-party scanning services,

users must accept the fact that the information contained within those files can be seen

by the organizations operating the scanning services. The websites for the three scanning

services do not make any mention to what is done with files after they have been scanned.

It is not safe to assume they are destroyed. Rather, operating under the assumption that

these files are saved by the scanning services is likely the best course of action. For many

individuals, the prospect of putting a potentially large amount of private or personal data

in the hands of such an organization is quite discomforting, and may not be permissible

for some individuals and organizations. This is further complicated by the fact that the

desktop implementation of Thin AV communicates directly with the scanning services.

This means that it would be possible for such a scanning service to collate all of the

uploads from a single IP address, and unfortunately, such in-depth information could be

used for extremely nefarious purposes.

Given that in its current form Thin AV cannot offer any sort of guarantees regarding

the privacy of user’s files, it seems that the most appropriate deployment environment

112

would be one in which there is a reduced expectation of privacy, such as public desktop

computers in libraries and other common areas. If Thin AV were to be deployed with

a dedicated scanning service as was described in Section 6.2.1, then the service provider

could offer some privacy guarantees, making such a security arrangement more palatable.

In such a scenario, individuals or groups might still be reticent to provide so much per-

sonal information to a single organization. However, in recent years users have become

quite tolerant to the idea of putting substantial amounts of personal, even highly private

information into the hands of companies. For example Google has access to the inter-

net searches, e-mail and personal documents of their users who take advantage of their

search, GMail, and Google Docs products. For a time, Google even considered providing

storage for health records [39]. And while Google may be the largest possessor of personal

information, it is hardly alone in this regard. Companies like Facebook have amassed

a wealth of data on their users ranging from private chat logs to personal photographs.

Despite the risk, users have become quite accustomed to willingly giving personal infor-

mation to companies in exchange for a desirable service. Therefore, the privacy concerns

of Thin AV may be serious, but still within the realm of reason for many users.

6.3.2 Mobile Privacy

The privacy concerns that are present in the desktop version of Thin AV are all but

non-existent in the mobile version. Because the mobile version only uploads Android

packages, most of which come from public markets, there is no risk of leaking personal

or private information from a device to the service provider. Furthermore, because the

mobile version of Thin AV has the Thin AV web service which acts as the aggregator

for the various scanning services, it would not even be possible for the scanning services

to collate uploads by IP, because all uploads, regardless of their original source, would

appear to come from the IP of the Thin AV web application.

113

The last aspect of privacy is the issue of file retention. Again, because only packages

are being uploaded, file retention is not a major concern. However, unlike the three anti-

virus scanning services, ComDroid explicitly states that they do not retain files after

scanning. In general, there are no tangible privacy concerns when it comes to the mobile

implementation of Thin AV.

114

Chapter 7

Conclusion

This thesis examined the concept of providing anti-malware protection to desktop com-

puters and mobile devices through remote third-party services. Host based anti-virus is

the conventional answer to the malware problem, on both desktops and even on smart-

phones. However, given the vast amounts of new malware that is created on a daily basis,

these anti-virus systems require perpetual signature library updates. Furthermore, anti-

virus vendors must continually add features to their products in the hopes of standing

out in such a crowded market segment. This has lead to an array of functionally similar

anti-virus products that are becoming increasingly bloated and resource intensive. This

problem is even more serious on smartphones, where computational resources are finitely

limited by the battery power available.

Within the last decade, and particularly within the last five years, there has been

a push to move computation away from end host computers, and towards more high-

capacity remote computational resources. This has led to the notion of cloud computing.

Fundamentally the cloud is a novel business model layered over the existing concepts of

high capacity grid computing and distributed computing. Cloud computing allows for

software products and services to be offered remotely, and with sufficient capacity as to

effectively eliminate the appearance of resource constraints from the perspective of the

end user. The notion of cloud-based Security-as-a-Service has recently been examined as

a possible way to address the burgeoning malware problem.

The first major contribution of this research was an the design and development of

Thin AV, a system for providing anti-virus scanning for Linux based desktop computers

by offloading the scanning of files from the host computer, to a set of pre-existing third-

115

party scanning services. The design of such a system was beneficial because it reduced

the software footprint on the host computer to a fraction of what it would be if a full-

fledged anti-virus product were installed. Additionally, it allowed for files to be scanned

with several different anti-virus engines, as opposed to a single engine as would be the

case with a host-based system. The key factor that differentiates Thin AV from earlier

cloud-based anti-malware solutions is its reliance on existing scanning services, which are

accessed over the internet, as opposed to making use of dedicated computing resources

located on the same local area network. While the latter case does provide tremendous

performance benefits, it does not accurately represent the performance that one would

see if the service in question were being offered remotely by a third-party.

Thin AV was evaluated by directly measuring the performance of the scanning ser-

vices as well as measuring the performance of the system when executing a series of

scripted user behaviors. These performance measurements were then used to inform the

development of a simulator which was used to test the limits of Thin AV under a variety

of file system behaviors. It was found that in certain cases, the performance of such a

system was acceptable. However, in its current form, the worst case performance was

highly noticeable to the point of being excessively disruptive to the user. However, in the

future it may be possible to address the performance concerns in Thin AV, resulting in a

system that would be capable of performing nearly transparent anti-malware protection

from the cloud.

The second major contribution of this thesis was an extension of the desktop version

of Thin AV, specifically targeted at smartphones and tablets. In recent years, malware on

mobile phones and smartphones has become a major issue, with virtually every smart-

phone platform being affected. The need for addressing the issue of mobile malware

is pressing because of the extent to which smartphones are becoming integrated into

the modern lifestyle. A substantial amount of personal and private information is often

116

stored on a person’s smartphone, and this presents an extremely tempting target to mal-

ware authors. Given the resource constraints that come with mobile devices, a remote

anti-malware service appeared to be a good fit for addressing mobile security.

The desktop Thin AV system was extended and wrapped in a web application in

order to serve as a unified interface for servicing anti-malware scan requests from Android

mobile devices. Additionally, a system was developed which prevents the installation of

malicious applications, by intercepting application installation requests and sending them

to the Thin AV web application for scanning. This was complemented by a background

killswitch which can prompt the removal of pre-existing applications if they are found to

be malicious. Both of these mechanisms rely on the third party scanning services used

in the desktop version of Thin AV. Because it was determined that the scanning services

are capable of detecting Android malware, this made for a fully functioning system,

capable of preventing the installation of malicious applications, or removing malicious

applications after installation. In order to further demonstrate the extensibility and

modularity of Thin AV, a fourth scanning service, capable of performing static code

vulnerability analysis, was added to the mobile version of Thin AV.

The evaluation of the mobile extension of Thin AV was done by assessing both the

typical and best case run time for both the safe installation mechanism and the killswitch.

This was done by independently measuring the time requirements of various aspects of

each system, and then calculating a range of possible running times based on a set of

empirical measurements of 3G and WiFi network performance. However, because all ex-

periments were run on the Android development emulator, they represent a lower bound

to the actual performance. The evaluation showed the system to be highly practical, with

the only major drawback being the need to manually upload pre-installed packages the

first time Thin AV is run. The reason for the much improved performance of Thin AV on

the smartphone was due to the nature of the malware threat on Android smartphones.

117

Unlike desktop systems where malware can come in many forms, virtually all Android

malware in the wild is spread through malicious applications. This meant that Thin AV

had to only scan application packages, and not the entire smartphone file system. This

significant reduction in input space, coupled with the fact that it is possible to pre-cache

scan results for packages, meant that it was possible to get anti-malware protection at a

very low cost in terms of running time.

The successful evaluation of Thin AV shows that the concept of providing security

services from the cloud is a real possibility. The fact that Thin AV was built on top of

shared public scanning services shows what can be achieved on the proverbial “shoestring”

budget. Given sufficient dedicated resources, it is quite likely that cloud-based Security-

as-a-Service could become a real alternative to desktop users who are tired with the

ever-increasing size of anti-virus software, or more ideally, smartphone users who want

the confidence to know that the applications they are downloading are free of malware,

without bearing the burden of a full host-based security system.

There are two main avenues for future work pertaining to Thin AV on the desktop.

The first are practical changes and improvements to increase the speed and robustness

of Thin AV. However, in terms of future research, the largest question that still remains

is how transparent anti-malware scanning can be made with the addition of dedicated

scanning resources. It would be interesting to create a Thin AV like system which is

based not on freely available scanning services, but on private and dedicated systems

for scanning. Previous research has established the feasibility of this arrangement when

the scanning hardware is co-located with the host computers. However, to truly make

malware scanning a service it needs to be seen how much performance is lost when such

a dedicated service is offered over a wide area network. If such a service can be made

effectively transparent to end users, then it might be viable for security companies to

offer subscription based cloud security on the desktop.

118

Another potential topic for future research which came out of the desktop evaluation

is the notion of characterizing typical desktop software usage patterns. The scripts used

to generate file system activity on the desktop were inspired by a single study that had

been performed previously. Despite an extensive search, there does not appear to be any

major body of research which describes the general patterns of software use on desktop

computers. This is not surprising considering how vague and ill defined the question

is. Nevertheless, such a study would have greatly aided the evaluation of Thin AV

on the desktop. One possible approach would be to generate a large range of possible

operational profiles each incorporating numerous different user activities [80]. This might

help in more clearly defining the circumstances in which Thin AV excels.

The future direction for Thin AV on the smartphone is somewhat less clear. The

most necessary research pertains to the Android operating system itself. Thin AV was

able to interrupt and terminate the installation of malicious apps by modifying the op-

erating system source code. This is not a practical long term solution. Research must be

undertaken to find a way that the operating system can be made to allow applications to

arrest the package installation procedure, in a way that is both safe and secure, because

it would be very easy to use such a privileged operation for malicious purposes. While

there are other improvements that can be made to Thin AV to improve its extensibility

and performance, these would only be necessary when considering a production scale

deployment of Thin AV.

Another area of future work for the mobile version of Thin AV would be to assess

the power consumption of Thin AV in comparison to other anti-virus systems available

on the Android market. Due to the relatively small amount of processing and network

traffic generated by Thin AV, it stands to reason that such a security mechanism would

have a minimal impact on the battery life of a device in comparison to other anti-virus

products available for Android. While such a comparison would have been a desirable

119

addition to this research, it was impractical at this time for two reasons: first, such

an experiment would have required a physical Android device capable of running the

custom operating system developed with Thin AV and, as stated earlier, this is not

trivial as there are numerous compatibility issues involved in replacing the operating

system on a physical device; second, because virtually all of the anti-virus applications

on the Google Market are proprietary, it would not be a reasonable to compare their

battery consumption with Thin AV without knowing what sort of processes are actually

taking place within these proprietary systems. However, given that previous research

on cloud-based anti-malware systems have shown mixed results when it comes to power

consumption, it would be highly desirable to assess the power consumption of Thin AV

before a final determination can be made about the suitability of Thin AV for mobile

devices.

This thesis has examined the feasibility of using cloud-based security services to pro-

tected computer systems from malware. While the findings of this research show that

a cloud-based approach offers has many benefits in the fight against malware, it is safe

to predict that this will not be the ultimate solution to the malware problem. The

continually evolving nature of the malware threat virtually grantees that new systems

and techniques will continually need to be developed. However, for the moment cloud

computing may offer a temporary respite from the storm of malware to which users are

continually exposed.

Bibliography

[1] Android-APKTool. http://code.google.com/p/android-apktool/, Last Ac-

cessed: Jan. 2012.

[2] Android Developer Guide. http://developer.android.com/guide/index.html,

Last Accessed: Feb. 2012.

[3] Apple Mac app store. http://www.apple.com/mac/app-store/, Last Accessed:

Feb. 2012.

[4] AppsLib. http://appslib.com/, Last Accessed: Jan. 2012.

[5] Avira Operations GmbH & Co. KG. http://www.avira.com/, Last Accessed:

Sept. 2011.

[6] Chrome web store. https://chrome.google.com/webstore/category/home,

Last Accessed: Feb. 2012.

[7] DazukoFS. http://www.dazuko.org, Last Accessed: Sept. 2011.

[8] FileAdvisor by Bit9. http://fileadvisor.bit9.com, Last Accessed: Sept. 2011.

[9] Google market API. http://code.google.com/p/android-market-api/, Last

Accessed: Jan. 2012.

[10] Indiroid. https://indiroid.com/, Last Accessed: Jan. 2012.

[11] inotify. http://linux.die.net/man/7/inotify, Last Accessed: Sept. 2011.

[12] Kaspersky free virus scan. http://www.kaspersky.com/virusscanner, Last Ac-

cessed: Sept. 2011.

120

121

[13] Kaspersky Lab. http://www.kaspersky.com/, Last Accessed: Sept. 2011.

[14] MiKandi. http://www.mikandi.com/, Last Accessed: Jan. 2012.

[15] Nduoa. http://www.nduoa.com/, Last Accessed: Jan. 2012.

[16] Samsung Galaxy S Forums - pro’s and con’s of installing cus-

tom ROMs. http://samsunggalaxysforums.com/showthread.php/

7418-Pro-s-and-Con-s-of-Installing-custom-Roms, Last Accessed: Jan.

2012.

[17] SciMark. http://math.nist.gov/scimark2/, Last Accessed: Jan. 2012.

[18] SciMark Test Descriptions. http://math.nist.gov/scimark2/about.html, Last

Accessed: Jan. 2012.

[19] Selenium WebKit. http://code.google.com/p/selenium/, Last Accessed: Sept.

2011.

[20] VirScan. http://virscan.org, Last Accessed: Sept. 2011.

[21] VirusChief. http://www.viruschief.com/, Last Accessed: Sept. 2011.

[22] VirusTotal. http://www.virustotal.com/, Last Accessed: Sept. 2011.

[23] VirusTotal terms of service. http://www.virustotal.com/terms.html, Last Ac-

cessed: Sept. 2011.

[24] Bowers v. Baystate Technologies, Inc., 320 F. 3d 1317 - Court of Appeals, Federal

Circuit, 2003.

[25] Flask. http://flask.pocoo.org/, Last Accessed: Jan. 2012.

122

[26] McAfee SaaS endpoint protection suite. http://www.mcafee.com/us/products/

saas-endpoint-protection-suite.aspx, Last Accessed: Feb. 2012.

[27] VirusTotal scan result. https://www.virustotal.com/file/

7f0aaf040b475085713b09221c914a971792e1810b0666003bf38ac9a9b013e6/

analysis/, Last Accessed: Jan. 2012.

[28] Nitin Agrawal, William J. Bolosky, John R. Douceur, and Jacob R. Lorch. A

five-year study of file-system metadata. ACM Trans. Storage, 3:9:1–9:32, October

2007.

[29] Jerry Archer, Alan Boehme, Dave Cullinane, Nils Puhlmann, Paul Kurtz, and Jim

Reavis. Defined categories of service 2011. Technical report, Security as a service

working group. Cloud security alliance, 2011.

[30] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz,

Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei

Zaharia. A view of cloud computing. Commun. ACM, 53:50–58, April 2010.

[31] John Aycock. Computer Viruses and Malware, volume 22 of Advances in Informa-

tion Security. Springer, 2006.

[32] Mark Balanza. Android malware acts as an SMS relay. http://blog.trendmicro.

com/android-malware-acts-as-an-sms-relay/, June 2011.

[33] Mark Balanza. Android malware eavesdrops on users,

uses Google+ as disguise. http://blog.trendmicro.com/

android-malware-eavesdrops-on-users-uses-google-as-disguise/, Au-

gust 2011.

123

[34] David Barrera, William Enck, and Paul C. van Oorschot. Seeding a security-

enhancing infrastructure for multi-market application ecosystems. Technical Re-

port TR-11-06, Carleton University - School of Computer Science, 2011.

[35] David Barrera, H. Gunes Kayacik, Paul C. van Oorschot, and Anil Somayaji. A

methodology for empirical analysis of permission-based security models and its

application to Android. In Proceedings of the 17th ACM conference on Computer

and communications security, CCS ’10, pages 73–84, New York, NY, USA, 2010.

ACM.

[36] Ulrich Bayer, Imam Habibi, Davide Balzarotti, Engin Kirda, and Christopher

Kruegel. A view on current malware behaviors. In Proceedings of the 2nd USENIX

conference on Large-scale exploits and emergent threats: botnets, spyware, worms,

and more, LEET’09, pages 8–8, Berkeley, CA, USA, 2009. USENIX Association.

[37] Jeffrey Bickford, H. Andres Lagar-Cavilla, Alexander Varshavsky, Vinod Gana-

pathy, and Liviu Iftode. Security versus energy tradeoffs in host-based mobile

malware detection. In Proceedings of the 9th international conference on Mobile

systems, applications, and services, MobiSys ’11, pages 225–238, New York, NY,

USA, 2011. ACM.

[38] Jeffrey Bickford, Ryan O’Hare, Arati Baliga, Vinod Ganapathy, and Liviu Iftode.

Rootkits on smart phones: attacks, implications and opportunities. In Proceedings

of the Eleventh Workshop on Mobile Computing Systems & Applications, HotMo-

bile ’10, pages 49–54, New York, NY, USA, 2010. ACM.

[39] Aaron Brown and Bill Weihl. An update on Google Health and

Google PowerMeter. http://googleblog.blogspot.com/2011/06/

update-on-google-health-and-google.html, Last Accessed: Feb. 2012.

124

[40] Rich Canning. An update on Android market security. http://googlemobile.

blogspot.com/2011/03/update-on-android-market-security.html, Last Ac-

cessed: Nov. 2011.

[41] Sang Kil Cha, Iulian Moraru, Jiyong Jang, John Truelove, David Brumley, and

David G. Andersen. Splitscreen: enabling efficient, distributed malware detection.

In Proceedings of the 7th USENIX conference on Networked systems design and

implementation, NSDI’10, pages 25–25, Berkeley, CA, USA, 2010. USENIX Asso-

ciation.

[42] Brian Chen. Want porn? Buy an Android phone, Steve Jobs says. http://www.

wired.com/gadgetlab/2010/04/steve-jobs-porn/, April 2010.

[43] Brian Chen. Amazon app store requires security compromise. http://www.wired.

com/gadgetlab/2011/03/amazon-app-store-security/, March 2011.

[44] Jerry Cheng, Starsky H.Y. Wong, Hao Yang, and Songwu Lu. Smartsiren: virus

detection and alert for smartphones. In Proceedings of the 5th international con-

ference on Mobile systems, applications and services, MobiSys ’07, pages 258–271,

New York, NY, USA, 2007. ACM.

[45] Erika Chin, Adrienne Porter Felt, Kate Greenwood, and David Wagner. Analyzing

inter-application communication in Android. In Proceedings of the 9th Annual

International Conference on Mobile Systems, Applications, and Services (MobiSys),

2011.

[46] Mihai Chiriac. Tales from cloud nine. In Virus Bulletin Conference, pages 1–6,

2009.

[47] Byung-Gon Chun and Petros Maniatis. Augmented smartphone applications

through clone cloud execution. In Proceedings of the 12th conference on Hot topics

125

in operating systems, HotOS’09, pages 8–8, Berkeley, CA, USA, 2009. USENIX

Association.

[48] Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan

Saroiu, Ranveer Chandra, and Paramvir Bahl. Maui: making smartphones last

longer with code offload. In Proceedings of the 8th international conference on Mo-

bile systems, applications, and services, MobiSys ’10, pages 49–62, New York, NY,

USA, 2010. ACM.

[49] David Dagon, Tom Martin, and Thad Starner. Mobile phones as computing de-

vices: The viruses are coming! IEEE Pervasive Computing, 3:11–15, 2004.

[50] Toralv Dirro, Paula Greve, Rahul Kashyap, David Marcus, Franois Paget, Craig

Schmugar, Jimmy Shah, and Adam Wosotowsky. McAfee threats report: second

quarter 2011. Technical report, McAfee Labs, August 2011.

[51] B. Dixon and S. Mishra. On rootkit and malware detection in smartphones. In

2010 International Conference on Dependable Systems and Networks Workshops

(DSN-W), pages 162 –163, July 2010.

[52] The Economist. Clash of the clouds. http://www.economist.com/node/

14637206?story_id=14637206, October 2009.

[53] Marc Fossi (Editor). Symantec report on the underground economy. Technical

report, Symantec Corporation, 2008.

[54] W. Enck, M. Ongtang, and P. McDaniel. Understanding Android security. Security

Privacy, IEEE, 7(1):50 –57, jan.-feb. 2009.

[55] William Enck, Damien Octeau, Patrick McDaniel, and Swarat Chaudhuri. A study

of Android application security. In Proceedings of the 20th USENIX conference on

126

Security, Berkeley, CA, USA, 2011. USENIX Association.

[56] William Enck, Machigar Ongtang, and Patrick McDaniel. On lightweight mobile

phone application certification. In Proceedings of the 16th ACM conference on

Computer and communications security, CCS ’09, pages 235–245, New York, NY,

USA, 2009. ACM.

[57] Georgina Enzer. Android attacks on the up says Trend Micro. http://www.itp.

net/585773-android-attacks-on-the-up-says-trend-micro, August 2011.

[58] Independent Security Evaluators. Exploiting android. http://

securityevaluators.com/content/case-studies/android/index.jsp,

November.

[59] Adrienne Porter Felt, Erika Chin, Steve Hanna, Dawn Song, and David Wagner.

Android permissions demystified. In Proceedings of the 18th ACM conference on

Computer and communications security, CCS ’11, pages 627–638, New York, NY,

USA, 2011. ACM.

[60] Adrienne Porter Felt, Matthew Finifter, Erika Chin, Steve Hanna, and David Wag-

ner. A survey of mobile malware in the wild. In Proceedings of the 1st ACM work-

shop on Security and privacy in smartphones and mobile devices, SPSM ’11, pages

3–14, New York, NY, USA, 2011. ACM.

[61] J. Flinn, D. Narayanan, and M. Satyanarayanan. Self-tuned remote execution for

pervasive computing. In Proceedings of the Eighth Workshop on Hot Topics in

Operating Systems, pages 61 – 66, May 2001.

[62] Richard Gass and Christophe Diot. An experimental performance comparison of

3G and Wi-Fi. In Arvind Krishnamurthy and Bernhard Plattner, editors, Passive

127

and Active Measurement, volume 6032 of Lecture Notes in Computer Science, pages

71–80. Springer Berlin / Heidelberg, 2010.

[63] R.L. Grossman. The case for cloud computing. IT Professional, 11(2):23 –27,

March-April 2009.

[64] Gartner Group. Gartner says sales of mobile devices grew 5.6 percent in third

quarter of 2011; smartphone sales increased 42 percent. http://www.gartner.

com/it/page.jsp?id=1848514, Last Accessed: Nov. 2011.

[65] Gartner Group. Gartner says cloud computing will be as influential as e-business.

http://www.gartner.com/it/page.jsp?id=707508, June 2008.

[66] Mikko Hypponen. The state of cell phone malware in 2007. http://www.usenix.

org/events/sec07/tech/hypponen.pdf, August 2007.

[67] IEEE:802.11n-2009. Wireless LAN medium access control (MAC) and physical

layer specifications enhancements for higher throughput, IEEE, June 2009.

[68] Markus Jakobsson and Karl-Anders Johansson. Assured detection of malware with

applications to mobile platforms. Technical report, DIMACS (February 2010),

2010.

[69] Markus Jakobsson and Karl-Anders Johansson. Retroactive detection of malware

with applications to mobile platforms. In Proceedings of the 5th USENIX confer-

ence on Hot topics in security, HotSec’10, pages 1–13, Berkeley, CA, USA, 2010.

USENIX Association.

[70] Markus Jakobsson and Ari Juels. Server-side detection of malware infection. In

Proceedings of the 2009 workshop on New security paradigms workshop, NSPW ’09,

pages 11–22, New York, NY, USA, 2009. ACM.

128

[71] Gregg Keizer. Spike in mobile malware doubles Android users’ chances of infec-

tion. http://www.computerworld.com/s/article/9218831/Spike_in_mobile_

malware_doubles_Android_users_chances_of_infection, August 2011.

[72] Lei Liu, Guanhua Yan, Xinwen Zhang, and Songqing Chen. VirusMeter: Prevent-

ing your cellphone from spies. In Engin Kirda, Somesh Jha, and Davide Balzarotti,

editors, Recent Advances in Intrusion Detection, volume 5758 of Lecture Notes in

Computer Science, pages 244–264. Springer Berlin / Heidelberg, 2009. 10.1007/978-

3-642-04342-0 13.

[73] Hiroshi Lockheimer. Android and security. http://googlemobile.blogspot.

com/2012/02/android-and-security.html, February 2012.

[74] Zachary Lutz. Carrier IQ: What it is, what it isn’t, and

what you need to know. http://www.engadget.com/2011/12/01/

carrier-iq-what-it-is-what-it-isnt-and-what-you-need-to/, Decem-

ber 2011.

[75] Lorenzo Martignoni, Roberto Paleari, and Danilo Bruschi. A framework for

behavior-based malware analysis in the cloud. In Atul Prakash and Indranil

Sen Gupta, editors, Information Systems Security, volume 5905 of Lecture Notes in

Computer Science, pages 178–192. Springer Berlin / Heidelberg, 2009. 10.1007/978-

3-642-10772-6 14.

[76] P. McDaniel and W. Enck. Not so great expectations: Why application markets

haven’t failed security. Security Privacy, IEEE, 8(5):76 –78, sept.-oct. 2010.

[77] Jane McEntegart. Malicious iPhone virus takes control of your phone.

http://www.tomshardware.com/news/iphone-virus-botnet-bank-details,

9136.html, November 2009.

129

[78] Peter Mell and Timothy Grance. The NIST definition of cloud computing, Septem-

ber 2011.

[79] Yevgeniy Miretskiy, Abhijith Das, Charles P. Wright, and Erez Zadok. Avfs: an

on-access anti-virus file system. In Proceedings of the 13th conference on USENIX

Security Symposium - Volume 13, SSYM’04, pages 6–6, Berkeley, CA, USA, 2004.

USENIX Association.

[80] John D. Musa. Software Reliability Engineering: More Reliable Software Faster

and Cheaper. Authorhouse, 2nd edition, 2004, Chapter 2.

[81] Jon Oberheide, Evan Cooke, and Farnam Jahanian. Rethinking antivirus: exe-

cutable analysis in the network cloud. In Proceedings of the 2nd USENIX work-

shop on Hot topics in security, pages 5:1–5:5, Berkeley, CA, USA, 2007. USENIX

Association.

[82] Jon Oberheide, Evan Cooke, and Farnam Jahanian. CloudAV: N-version antivirus

in the network cloud. In Proceedings of the 17th Conference on Security, pages

91–106, Berkeley, CA, USA, 2008. USENIX Association.

[83] Jon Oberheide and Farnam Jahanian. When mobile is harder than fixed (and vice

versa): demystifying security challenges in mobile environments. In Proceedings of

the Eleventh Workshop on Mobile Computing Systems & Applications, HotMobile

’10, pages 43–48, New York, NY, USA, 2010. ACM.

[84] Jon Oberheide, Kaushik Veeraraghavan, Evan Cooke, Jason Flinn, and Farnam

Jahanian. Virtualized in-cloud security services for mobile devices. In Proceedings

of the First Workshop on Virtualization in Mobile Computing, MobiVirt ’08, pages

31–35, New York, NY, USA, 2008. ACM.

130

[85] A.J. O’Donnell. When malware attacks (anything but windows). IEEE Security

& Privacy, 6(3):68 –70, May-June 2008.

[86] M. Ongtang, S. McLaughlin, W. Enck, and P. McDaniel. Semantically rich

application-centric security in Android. In Computer Security Applications Con-

ference, 2009. ACSAC ’09. Annual, pages 340 –349, December 2009.

[87] Sarah Perez. Developer is building an app store for

banned Android apps. http://techcrunch.com/2012/01/20/

developer-is-building-an-app-store-for-banned-android-apps/, Jan-

uary 2012.

[88] Sundar Pichai. Introducing the Google Chrome OS. http://googleblog.

blogspot.com/2009/07/introducing-google-chrome-os.html, Last Accessed:

Feb. 2012.

[89] Georgios Portokalidis, Philip Homburg, Kostas Anagnostakis, and Herbert Bos.

Paranoid Android: versatile protection for smartphones. In Proceedings of the 26th

Annual Computer Security Applications Conference, ACSAC ’10, pages 347–356,

New York, NY, USA, 2010. ACM.

[90] Drew Roselli, Jacob R. Lorch, and Thomas E. Anderson. A comparison of file

system workloads. In Proceedings of the annual conference on USENIX Annual

Technical Conference, ATEC ’00, pages 4–4, Berkeley, CA, USA, 2000. USENIX

Association.

[91] Jamie Rosenberg. Introducing Google Play: All your entertain-

ment, anywhere you go. http://googleblog.blogspot.com/2012/03/

introducing-google-play-all-your.html, March 2012.

131

[92] Dan Rowinski. More than 50 percent of Android devices still

running Froyo. http://www.readwriteweb.com/mobile/2011/09/

more-than-50-of-android-device.php, Last Accessed: Jan. 2011.

[93] Neil Rubenking. Lab testing antivirus software. http://www.pcmag.com/

article2/0,2817,2358764,00.asp, September 2010.

[94] Alexey Rudenko, Peter Reiher, Gerald J. Popek, and Geoffrey H. Kuenning. Saving

portable computer battery power through remote process execution. SIGMOBILE

Mob. Comput. Commun. Rev., 2:19–26, January 1998.

[95] Steven Salerno, Ameya Sanzgiri, and Shambhu Upadhyaya. Exploration of attacks

on current generation smartphones. Procedia Computer Science, 5:546 – 553, 2011.

The 2nd International Conference on Ambient Systems, Networks and Technolo-

gies (ANT-2011) / The 8th International Conference on Mobile Web Information

Systems (MobiWIS 2011).

[96] Sharun Santhosh. Factoring file access patterns and user behavior into caching

design for distributed file systems. Master’s thesis, Wayne State University, Detroit,

Michigan, 2004.

[97] James Schlichting. Federal Communications Commission. Google Voice and related

iPhone applications. http://hraunfoss.fcc.gov/edocs_public/attachmatch/

DA-09-1736A1.pdf, September 2009.

[98] A.-D. Schmidt, R. Bye, H.-G. Schmidt, J. Clausen, O. Kiraz, K.A. Yuksel, S.A.

Camtepe, and S. Albayrak. Static analysis of executables for collaborative malware

detection on Android. In IEEE International Conference on Communications,

2009, pages 1 –5, june 2009.

132

[99] Aubrey-Derrick Schmidt, Frank Peters, Florian Lamour, Christian Scheel, Seyit

Camtepe, and Sahin Albayrak. Monitoring smartphones for anomaly detection.

Mobile Networks and Applications, 14:92–106, 2009. 10.1007/s11036-008-0113-x.

[100] Blake Stimac. Virus alert: Windows Mobile 6.5

virus found. http://www.intomobile.com/2010/04/15/

virus-alert-windows-mobile-6-5-virus-found/, August 2010.

[101] Cisco Systems. Demystifying cloud computing: a three-minute tutorial. http:

//www.cisco.com/web/offer/fedbiz07/july2009/index.html, July 2009.

[102] Deepak Venugopal, Guoning Hu, and Nicoleta Roman. Intelligent virus detection

on mobile devices. In Proceedings of the 2006 International Conference on Pri-

vacy, Security and Trust: Bridge the Gap Between PST Technologies and Business

Services, PST ’06, pages 65:1–65:4, New York, NY, USA, 2006. ACM.

[103] Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers, and Keying Ye. Prob-

ability & Statistics for Engineers & Scientists. Pearson Prentice Hall, 8th edition,

2007, p. 236.

[104] Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In

Ronald Cramer, editor, Advances in Cryptology - EUROCRYPT 2005, volume 3494

of Lecture Notes in Computer Science, pages 561–561. Springer Berlin / Heidelberg,

2005. 10.1007/11426639 2.

[105] Oli Warner. What really slows Windows down. http://thepcspy.com/read/

what_really_slows_windows_down/, September 2006.

[106] Joe Wells. A radical new approach to virus scanning. Technical report, CyberSoft,

Inc., 1999.

133

Appendix A

Appendix

Number of events processed 99999.7 99998 99994.2 99991.2 99921.8 99330.1 93426.7Number of files processed 77691 50000 10000 5000 1000 100 9.5Mean file size generated 977.002 976.4713 973.6049 977.5164 971.258 974.5984 927.2371Median file size generated 677.3671 676.6826 672.2553 676.6572 659.8002 713.8306 813.637Max file size generated 11539.964 10785.0526 9824.4234 9336.1715 7400.5678 5324.6606 2597.6249Proportion of file modifications 0 0 0 0 0 0 0Mean file size scanned 977.002 976.4713 973.6049 977.5164 971.258 974.5984 927.2371Median file size scanned 677.3671 676.6826 672.2553 676.6572 659.8002 713.8306 813.637Max file size scanned 11539.964 10785.0526 9824.4234 9336.1715 7400.5678 5324.6606 2597.6249Un-scanned accesses 0 0 0 0 0 0 0Cache Hit Rate 22.31% 50.00% 90.00% 95.00% 99.00% 99.90% 99.99%Time for AV scanning (sec.) 2325464.1 1495301.15 298412.16 149908.66 29661.74 3015.74 300.18Time for non-AV activities (sec.) 6673.1 6663.59 6663.27 6663.91 6657.84 6608.88 6230.54Total time (sec.) 2332137.21 1501964.74 305075.42 156572.57 36319.57 9624.63 6530.72AV Overhead 34848.69% 22440.06% 4479.49% 2249.56% 445.51% 45.63% 4.83%Average inter-access time 0.067 0.067 0.067 0.067 0.067 0.067 0.067

Table A.1: Raw data from Figure 4.6.

Number of events processed 97.8 196.3 490.8 985.6 9896.4 99183.6 983779.3Number of files processed 49.8 50 50 50 50 50 50Mean file size generated 93.5559 99.2755 90.5267 89.4505 95.0418 92.7249 94.8326Median file size generated 70.2959 75.4199 63.887 67.4278 73.7681 65.9585 69.502Max file size generated 368.1267 386.5154 395.7792 453.9882 457.671 416.767 408.3396Proportion of file modifications 0 0 0 0 0 0 0Mean file size scanned 93.5559 99.2755 90.5267 89.4505 95.0418 92.7249 94.8326Median file size scanned 70.2959 75.4199 63.887 67.4278 73.7681 65.9585 69.502Max file size scanned 368.1267 386.5154 395.7792 453.9882 457.671 416.767 408.3396Un-scanned accesses 0 0 0 0 0 0 0Cache Hit Rate 49.08% 74.53% 89.81% 94.93% 99.49% 99.95% 99.99%Time for AV scanning (sec.) 154.6 160.66 152.46 151.55 158.6 174.27 353.18Time for non-AV activities (sec.) 6.59 13.2 33.48 66.06 659.77 6605.13 65570.5Total time (sec.) 161.19 173.86 185.94 217.6 818.37 6779.41 65923.68Average inter-access time 0.067 0.067 0.068 0.067 0.067 0.067 0.067AV Overhead 2352.20% 1221.41% 455.75% 229.57% 24.05% 2.64% 0.54%

Table A.2: Raw data from Figure 4.7.

134

Number

ofevents

processed

99982.2

99984.4

99980.7

99992

99984.2

99985.9

99982.9

99987.3

Number

offilesprocessed

5000

5000

5000

5000

5000

5000

5000

5000

Meanfile

size

generated

97197.7499

63435.8064

19433.8632

9866.4089

1962.7876

972.1442

97.612

9.7095

Medianfile

size

generated

102400

67508.1407

13647.1049

6810.7295

1360.4033

674.1827

67.7155

6.6986

Max

file

size

generated

102400

102400

102400

90559.0626

18711.4329

8820.3111

801.2105

83.4222

Proportion

offile

mod

ification

s0

00

00

00

0Meanfile

size

scan

ned

10033.4459

9843.104

8525.1434

6933.636

1961.1926

972.1442

97.612

9.7095

Medianfile

size

scan

ned

9862.1326

9649.7395

7702.8734

5687.9268

1360.1801

674.1827

67.7155

6.6986

Max

file

size

scan

ned

20309.68

20464.6566

20468.9162

20463.6603

17379.1231

8820.3111

801.2105

83.4222

Un-scanned

accesses

93065.7

83882.3

25765.5

4804.2

0.7

00

0CacheHitRate

98.49%

94.14%

95.63%

95.40%

95.00%

95.00%

95.00%

95.00%

Tim

eforAV

scan

ning(sec.)

7838.98

73529.56

275772.28

405529.12

289792.97

148720.84

15924.7

7634.69

Tim

efornon

-AV

activities

(sec.)

6670.36

6667.93

6672.74

6662.6

6664.22

6664.69

6670.02

6657.17

Total

time(sec.)

14509.34

80197.49

282445.02

412191.72

296457.19

155385.53

22594.71

14291.86

AV

Overhead

117.51%

1102.73%

4132.78%

6086.82%

4348.45%

2231.50%

238.76%

114.68%

Average

inter-access

time

0.067

0.067

0.067

0.67

0.067

0.067

0.067

0.067

Kaspersky%

6.43%

5.56%

7.82%

10.98%

40.60%

65.25%

100.00%

100.00%

VirusC

hief%

45.11%

47.43%

54.72%

63.05%

58.89%

34.74%

0.00%

0.00%

VirusTotal

%48.46%

47.01%

37.45%

25.96%

0.51%

0.00%

0.00%

0.00%

Unscan

ned

%93.08%

83.90%

25.77%

4.80%

0.00%

0.00%

0.00%

0.00%

Tab

leA.3:Raw

datafrom

Figures4.8an

d4.9.

135

Number

ofevents

processed

99948

99920.4

99898.7

99900.7

99889.5

99873.8

99951.8

99791.3

99744.9

99264.6

Number

offilesprocessed

1000

1000

1000

997.3

883.7

748.5

587.9

451.1

297.8

148.2

Meanfile

size

generated

980.5748

975.3766

984.9695

977.6461

978.2049

1002.9018

975.5268

996.6749

983.568

1012.0571

Medianfile

size

generated

683.9077

681.8096

686.4618

682.5642

673.7291

701.0202

673.7253

690.6021

671.5941

704.2243

Max

file

size

generated

6981.1646

7542.5187

7342.4595

7603.4454

6689.2825

6811.0934

6585.222

7263.6021

5928.4191

5517.0901

Proportion

offile

mod

ification

s0

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

Mod

ification

sab

solute

09994

19976.1

29872

39945.1

49929.1

60029.1

69893.9

79825.8

89300.4

Mod

then

Use

08924.9

15801

20759.5

23737.9

24635.9

23678

20689.2

15704.1

8840.6

Meanfile

size

scan

ned

980.5748

975.3766

984.9695

977.6461

978.2049

1002.9018

975.5268

996.6749

983.568

1012.0571

Medianfile

size

scan

ned

683.9077

681.8096

686.4618

682.5642

673.7291

701.0202

673.7253

690.6021

671.5941

704.2243

Max

file

size

scan

ned

6981.1646

7542.5187

7342.4595

7603.4454

6689.2825

6811.0934

6585.222

7263.6021

5928.4191

5517.0901

Un-scanned

accesses

00

00

00

00

00

CacheHitRate

99.00%

88.96%

78.98%

68.93%

58.93%

49.17%

39.22%

29.29%

19.67%

9.79%

Tim

eforAV

scan

ning(sec.)

30140.21

281190.78

563833.05

905131.4

707501.46

941231.74

1039714.32

647313.82

423906.85

317449.67

Tim

efornon

-AV

activities

(sec.)

6659.07

105553.71

204727.92

302953.53

402315.21

503420.99

604099.28

701433.59

798448.29

893453.96

Total

time(sec.)

36799.29

386744.49

768560.97

1208084.93

1109816.67

1444652.74

1643813.59

1348747.4

1222355.14

1210903.63

AV

Overhead

452.61%

265.72%

274.64%

298.17%

176.42%

186.90%

172.08%

92.39%

53.12%

35.54%

Inter-access

Tim

e0.067

1.174

2.562

4.326

6.711

10.08

15.132

23.461

40.085

89.666

Tab

leA.4:Raw

datafrom

Figure

4.10.

136

Number

ofevents

processed

99987.1

99987.3

99988.4

99982.9

99983.4

99981.7

99986.7

Number

offilesprocessed

5000

5000

5000

5000

5000

5000

5000

Meanfile

size

generated

97.326

97.927

97.349

97.391

97.758

97.338

97.501

Medianfile

size

generated

67.351

67.832

67.711

67.901

68.074

67.328

67.862

Max

file

size

generated

852.013

865.051

882.332

840.176

862.779

834.322

810.934

Proportion

offile

mod

ification

s0

00

00

00

Meanfile

size

scan

ned

97.326

97.927

97.349

97.391

97.758

97.338

97.501

Medianfile

size

scan

ned

67.351

67.832

67.711

67.901

68.074

67.328

67.862

Max

file

size

scan

ned

852.013

865.051

882.332

840.176

862.779

834.322

810.934

Un-scanned

accesses

00

00

00

0CacheHitRate

0.95

0.95

0.95

0.95

0.95

0.95

0.95

Tim

eforAV

scan

ning(sec.)

15897.76

15954.44

15899.91

15903.83

15938.42

15898.87

15914.27

Tim

efornon

-AV

activities

(sec.)

199933617

20046177.2

1996798.71

199981.72

19985.02

1997.55

199.91

Total

time(sec.)

199949514.7

20062131.64

20062131.64

215885.55

35923.44

17896.42

16114.18

AV

Overhead

1999.594

200.487

19.97

20.2

0.02

0.002

Average

inter-access

time

0.01

0.08

0.8

7.95

79.75

795.93

7960.64

Tab

leA.5:Raw

datafrom

Figure

4.11.

137

Categ

ory

Number

ofApps

Mea

nSize(M

B)

Med

ianSize(M

B)

Minim

um

Size(M

B)

Max

imum

Size(M

B)

%<1MB

%<10

MB

%<20

MB

Med

ical

491.96

0.76

0.03

14.35

55.10%

95.92%

100.00

%Tools

501.40

1.00

0.07

8.37

50.00%

100.00

%10

0.00

%Finan

ce47

1.66

1.03

0.02

10.15

48.94%

97.87%

100.00

%Med

iaan

dVideo

501.81

1.03

0.02

5.74

48.00%

100.00

%10

0.00

%Com

ics

492.34

1.03

0.04

37.06

48.98%

97.96%

97.96%

Productivity

501.77

1.37

0.05

7.92

40.00%

100.00

%10

0.00

%Business

492.40

1.39

0.11

13.35

44.90%

95.92%

100.00

%Personalization

482.18

1.55

0.03

9.10

41.67%

100.00

%10

0.00

%New

san

dMag

azines

481.84

1.58

0.06

6.51

31.25%

100.00

%10

0.00

%W

eath

er50

2.20

1.62

0.02

9.08

40.00%

100.00

%10

0.00

%Photog

raphy

503.05

1.79

0.15

12.17

36.00%

94.00%

100.00

%Book

san

dReferen

ce46

3.17

2.05

0.03

19.11

32.61%

95.65%

100.00

%Shop

ping

482.68

2.14

0.16

11.95

27.08%

97.92%

100.00

%Lifestyle

493.26

2.21

0.11

15.73

28.57%

95.92%

100.00

%Musican

dAudio

492.37

2.25

0.11

8.80

24.49%

100.00

%10

0.00

%Sports

473.54

2.34

0.19

17.00

17.02%

97.87%

100.00

%Trave

lan

dLoca

l49

3.02

2.42

0.12

12.40

22.45%

97.96%

100.00

%Social

492.83

2.43

0.07

8.33

18.37%

100.00

%10

0.00

%Com

munication

503.27

2.45

0.09

14.71

28.00%

94.00%

100.00

%Hea

lthan

dFitness

484.16

2.93

0.06

21.57

18.75%

91.67%

97.92%

Educa

tion

474.80

3.56

0.09

28.44

23.40%

89.36%

97.87%

Tab

leA.6:Filesize

characteristicsof

Android

testingdataset.


Recommended