UNIVERSITY OF CALGARY
Towards cloud-based anti-malware protection for desktop and mobile platforms
by
Christopher Jarabek
A THESIS
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF COMPUTER SCIENCE
CALGARY, ALBERTA
April, 2012
c� Christopher Jarabek 2012
UNIVERSITY OF CALGARY
FACULTY OF GRADUATE STUDIES
The undersigned certify that they have read, and recommend to the Faculty of Graduate
Studies for acceptance, a thesis entitled “Towards cloud-based anti-malware protection
for desktop and mobile platforms” submitted by Christopher Jarabek in partial fulfill-
ment of the requirements for the degree of MASTER OF SCIENCE.
Supervisor, Dr. John D. AycockDepartment of Computer Science
Internal Examiner, Dr. Michael E.Locasto
Department of Computer Science
External Examiner, Dr. Behrouz FarDepartment of Electrical and
Computer Engineering
Date
Abstract
Malware is a persistent and growing problem that threatens the privacy and property
of computer users. In recent years, this threat has spread to mobile devices such as
smartphones and tablet computers. At the same time, the main method for combating
malware, anti-virus software, has grown in size and complexity to the point where the
resource demands imposed by these security systems have become increasingly notice-
able. In an effort to create a more transparent security system, it is possible to move
the scanning of malware from the host computer to a scanning service in the cloud.
This relocation could offer the security of conventional host-based scanning, without the
resource demands involved with running a fully host-based anti-virus system.
This thesis shows that under the right circumstances, malware scanning services pro-
vided remotely are capable of replacing host-based anti-malware systems on desktop
computers, although such a cloud-based security system is better suited to protecting
smartphone users from malicious applications. To that end, a system was developed
that provides anti-malware security for desktop computers by making use of pre-existing
web-based file scanning services for malware detection. This system was evaluated and
found to have variable performance ranging from acceptable to very poor. The desktop
scanning system was then augmented and adapted to serve as a mechanism for identify-
ing malicious applications on Android smartphones. The evaluation of this latter system
showed favorable results, and is effective as a mechanism for combating the growing
mobile malware threat.
ii
Acknowledgements
“No man is an island”, as such, this body of research would not have been what it is
without the help of several individuals. First and foremost, I would like to thank Dr.
John Aycock for his guidance and advice during my studies. His open and approachable
nature made him a pleasure to work with, and this research would not have reached its
full potential without his direction.
I would like to express my gratitude to Dr. Michael Locasto and Dr. Behrouz Far, for
serving on my examination committee. I would also like to thank Dr. William Enck and
Dave Barrera for their advice regarding Android development, as well as Erika Chin and
Adrienne Porter Felt for their assistance with tools for data analysis.
Special thanks should also be given to my student colleagues, Daniel De Castro and
Jonathan Gallagher for offering up their company and enjoyable discussions.
Finally, I would like to thank my family: Chelsey Greene and Patricia and Jim
Jarabek. However, words alone are insufficient to show the scale of my gratitude for the
love, encouragement, and support they have shown me.
iii
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 The Malware Threat . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 The Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Smartphones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3.1 Mobile Malware . . . . . . . . . . . . . . . . . . . . . . . 51.1.3.2 Android . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1 Mobile Security and Malware . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Cloud Based Anti-Malware . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Device Based Mobile Anti-Malware . . . . . . . . . . . . . . . . . . . . . 182.4 Non-Device Based Mobile Anti-Malware . . . . . . . . . . . . . . . . . . 202.5 Other Lightweight Anti-Virus Techniques . . . . . . . . . . . . . . . . . . 232.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.1 Scanning Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.1 Kaspersky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1.2 VirusChief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1.3 VirusTotal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1.4 Other Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.1.5 Terms of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Desktop Thin AV System . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.1 DazukoFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.2 File System Access Controller . . . . . . . . . . . . . . . . . . . . 313.2.3 Standalone Runner . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.4 Thin AV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.5 Scanning Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.6 System Circumvention . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Mobile Thin AV System . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3.1 Reuse of Existing Thin AV System . . . . . . . . . . . . . . . . . 393.3.2 Android Specific Scanner . . . . . . . . . . . . . . . . . . . . . . . 413.3.3 Safe Installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.3.4 Killswitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.5 System Circumvention . . . . . . . . . . . . . . . . . . . . . . . . 484 System Evaluation - Desktop Thin AV . . . . . . . . . . . . . . . . . . . 494.1 Scanning Service Performance . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Actual System Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3 Predicted System Overhead . . . . . . . . . . . . . . . . . . . . . . . . . 654.3.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Large Scale System Simulations . . . . . . . . . . . . . . . . . . . . . . . 714.4.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5 System Evaluation - Mobile Thin AV . . . . . . . . . . . . . . . . . . . . 825.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.2 Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.3 Emulator Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.4 ComDroid Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.4.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5 Safe Installer Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 925.6 Killswitch Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.6.1 Testing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.1 Thin AV Performance and Feasibility . . . . . . . . . . . . . . . . . . . . 1046.2 Ideal Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2.1 Desktop Deployment . . . . . . . . . . . . . . . . . . . . . . . . . 1076.2.2 Mobile Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.3.1 Desktop Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.3.2 Mobile Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
v
List of Tables
3.1 Thin AV security policy matrix. . . . . . . . . . . . . . . . . . . . . . . . 323.2 Speed comparison of the hashing functions available in Python. . . . . . 34
4.1 Kaspersky file scanning performance statistics. . . . . . . . . . . . . . . . 544.2 VirusChief file scanning performance statistics. . . . . . . . . . . . . . . . 544.3 VirusTotal file scanning performance statistics. . . . . . . . . . . . . . . . 554.4 VirusTotal file upload performance statistics . . . . . . . . . . . . . . . . 554.5 Linear equations for the three scanning services. . . . . . . . . . . . . . . 574.6 Activities in the web and advanced workload scripts. . . . . . . . . . . . 604.7 Scenarios examined for assessing Thin AV overhead. . . . . . . . . . . . . 614.8 General characteristics of testing workload scripts. . . . . . . . . . . . . . 634.9 Time to complete the three workload testing scripts while using Thin AV. 634.10 Refined linear equations for each of the three scanning services. . . . . . 684.11 Simulation results of the Kaspersky service for three different activity logs. 694.12 Simulation results of the VirusChief service for three different activity logs. 704.13 Comparison of running time and simulation results for Kaspersky service. 704.14 Comparison of running time and simulation results for VirusChief service. 70
5.1 General file size characteristics of the Android test data set. . . . . . . . 845.2 Summary of malware found in the Google Market data set. . . . . . . . . 865.3 Android emulator versus hardware performance comparison. . . . . . . . 885.4 Linear equation for the ComDroid scanning service. . . . . . . . . . . . . 895.5 Summary of exposed communication in Google Market data set. . . . . . 905.6 Network speeds used for evaluating the mobile implementation of Thin AV. 925.7 Thin AV safe installer cached performance summary. . . . . . . . . . . . 935.8 Thin AV safe installer uncached performance summary. . . . . . . . . . . 945.9 Linear equations for generating a system fingerprint. . . . . . . . . . . . 975.10 Data consumption of Thin AV killswitch over different time periods. . . . 995.11 Fingerprint generation time for different conditions. . . . . . . . . . . . . 1005.12 Total upload sizes used for calculations of bulk scanning performance. . . 1005.13 Thin AV killswitch app upload times. . . . . . . . . . . . . . . . . . . . . 1015.14 Scan times for different numbers of apps. . . . . . . . . . . . . . . . . . . 101
A.1 Raw data from Figure 4.6. . . . . . . . . . . . . . . . . . . . . . . . . . . 133A.2 Raw data from Figure 4.7. . . . . . . . . . . . . . . . . . . . . . . . . . . 133A.3 Raw data from Figures 4.8 and 4.9. . . . . . . . . . . . . . . . . . . . . . 134A.4 Raw data from Figure 4.10. . . . . . . . . . . . . . . . . . . . . . . . . . 135A.5 Raw data from Figure 4.11. . . . . . . . . . . . . . . . . . . . . . . . . . 136A.6 File size characteristics of Android testing data set. . . . . . . . . . . . . 137
vi
List of Figures
3.1 System architecture for Thin AV . . . . . . . . . . . . . . . . . . . . . . 303.2 UML Class Diagram for Thin AV. . . . . . . . . . . . . . . . . . . . . . . 363.3 System architecture diagram for the mobile implementation of Thin AV. 403.4 User interfaces for the Android killswitch. . . . . . . . . . . . . . . . . . 47
4.1 Scan response time for the Kaspersky scanning service. . . . . . . . . . . 564.2 Scan response time for the VirusChief scanning service. . . . . . . . . . . 574.3 Scan response time for the VirusTotal scanning service. . . . . . . . . . . 574.4 Upload response time for the VirusTotal scanning service. . . . . . . . . . 584.5 Example CDF of simulated files by size. . . . . . . . . . . . . . . . . . . 744.6 Accesses which involved an uncached file versus Thin AV induced overhead. 754.7 Number of file system accesses versus Thin AV induced overhead. . . . . 754.8 File size in bytes versus Thin AV induced overhead. . . . . . . . . . . . . 764.9 File size versus the proportion of accesses scanned by each scanning service. 774.10 Proportion of file modifications versus Thin AV induced overhead. . . . . 774.11 Average time between file accesses versus Thin AV overhead. . . . . . . . 78
5.1 Median file size of the Android test data set by category. . . . . . . . . . 845.2 Reponse time of the ComDroid service as a function of package size. . . . 905.3 Fingerprint generation time versus the number and size of packages. . . . 98
vii
List of Abbreviations
AIDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Android Interface Definition Language
AJAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asynchronous JavaScript and XML
API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application Programming Interface
APK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Android Application Package File
ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advanced RISC Machine
ARP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address Resolution Protocol
AV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anti-Virus
CPU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Central Processing Unit
DLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic-Link Library
DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Domain Name System
FFBF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feed-Forward Bloom Filter
FSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . File System Access Controller
HTML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HyperText Markup Language
HTTP(S) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hypertext Transfer Protocol (Secure)
IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Internet Protocol
IPC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inter-Process Communication
LOC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lines of Code
OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operating System
RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random Access Memory
RISC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduced Instruction Set Computer
VM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Machine
WEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wired Equivalent Privacy
XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extensible Markup Language
viii
1
Chapter 1
Introduction
Computer malware (malicious software) is a persistent and evolving threat to the privacy
and property of individuals and organizations. With software systems growing in com-
plexity every year the potential exploits of these systems are growing in kind. The most
common technique for identifying and removing malware from computers is anti-virus
software. However, anti-virus products that run on end-user computers have become
increasingly bloated in recent years, as developers push to include features that will serve
to differentiate their product in a competitive marketplace. This software bloat has a
negative impact on the performance of computer systems and on users’ willingness to use
anti-virus products to protect their computer systems. Recently, an idea has started de-
veloping which would see security offered as a cloud-based service. Although there are a
variety of factors motivating the development of cloud-based security, from a customer’s
perspective this shift towards cloud-based security ultimately means that the products
that are currently used to ensure access, confidentiality, and integrity of both data and
computer systems can be replaced with a cloud-based service. Such services are already
being employed by security companies seeking to enhance their existing host-based anti-
virus software with cloud-based features [46].
This thesis aims to show that under the right circumstances, malware scanning ser-
vices provided remotely are capable of replacing host-based anti-malware systems on
desktop computers, although such a cloud-based security system is better suited to pro-
tecting smartphone users from malicious applications. The evidence to support this
thesis comes from the development and evaluation of Thin AV: a light-weight, cloud-
based anti-malware system that was implemented for both Linux desktops and Android
2
smartphones.
The remainder of this chapter is laid out as follows: Section 1.1 will broadly cover the
background for the main concepts relevant to this thesis. Next, Section 1.2 will detail the
exact contributions made by this thesis. Finally, Section 1.3 will describe the contents of
the remainder of this document.
1.1 Background
This thesis ties together a variety of different topics, including malware (of both the
mobile and non-mobile varieties), cloud computing, and smartphone security. Whereas
Chapter 2 will discuss a wide range of academic research relating these issues, this section
is intended to serve as a general introduction to the relevant topics, and provide the
context for the rest of the work contained within this thesis. The remainder of this
section is outlined as follows: Section 1.1.1 will discuss malware and the threat it poses;
Section 1.1.2 will talk about the concept of cloud computing; and finally, Section 1.1.3
will discuss smartphones, with a special focus on mobile and smartphone malware as well
as a discussion on the Android smartphone operating system.
1.1.1 The Malware Threat
Malware is, in the broadest sense, a computer program that is designed to compromise,
damage, exploit, or harm a computer system or the data residing on it [31]. While
the term “virus” has become somewhat synonymous with malware, this is incorrect, as
computer viruses constitute only a single type of malware. Malware refers to all varieties
of malicious computer programs, which are typically categorized based on the specific
malicious properties the program exhibits. In recent years the creation of new malware
has seen tremendous growth [50], and while malware is created for a variety of reasons,
the most prevalent incentive is financial gain [53].
3
The most common approach to combating malware is through anti-virus programs.
These are programs that examine the files on a computer and locate files that look or
behave like known malware samples [31]. While there are numerous companies that sell
anti-virus products, and even a few anti-virus products that are given away for free, most
of these products are fairly comparable at their ability to detect malware [82], at least
when it comes to detecting malware that is currently circulating in the wild [106]. This
has led to a scenario where companies have to continually add new features to their anti-
virus products in order to stand out in a crowded market place. And while these features
may have some security benefit, there is almost always an associated performance cost
[105].
1.1.2 The Cloud
New trends are emerging in computing that may offer a new direction in the fight against
malware. Among these trends is the recent emergence of “cloud” computing. Cloud
computing is not so much a new technology, as it is a new business model for computing.
Cloud computing is the delivery of computation services as opposed to computation
products. These services are typically delivered over a network such as the internet [78]1.
A major motivating factor behind the adoption of cloud computing is the potential
for cost savings [63]. For example, rather than a company providing e-mail to their
employees through their own local e-mail server, they could pay a subscription fee to
a company that provides e-mail services over the internet. This service arrangement
saves the company the cost of buying, maintaining, and administering their own e-mail
server. The procurement, maintenance and potentially the administration of these cloud
e-mail servers is not the responsibility of the company, but rather the responsibility of
the service provider. While the geographic location of the cloud servers is controlled by
1The term “cloud” came about because historically a cloud shape is used to represent the internet innetwork topology diagrams [101].
4
the service provider, the location is very much a point of interest to customers, as the
location of a service provider’s cloud servers can significantly impact the performance of
the service, as well as pose significant legal concerns for cloud customers [30].
The concept of cloud computing has its roots in the mainframe computers of a previ-
ous generation, but the technology to actually implement cloud computing really began
to take shape when grid computing and operating system virtualization started seeing
widespread successful applications. The success of these underlying technologies, coupled
with a steady increase in the speed of internet connectivity around the globe, eventually
allowed for computation services to be delivered over the internet [65, 52].
The notion of providing computation as a service can be broken down into a number of
different service categories. Among the most common service offerings are Infrastructure-
as-a-Service, where a company will offer shared hardware resources, Platform-as-a-Service,
for developing and deploying applications, and Application-as-a-Service, which is simi-
lar to the e-mail example above [78]. The notion of offering Security-as-a-Service is a
relatively new concept [29], yet the security company McAfee is already offering a cloud-
based enterprise security service that includes malware protection [26], though the details
pertaining to the architecture of this proprietary system are not publicly available.
1.1.3 Smartphones
Smartphones are fundamentally just mobile phones with some sort of personal computing
functionality. This functionality typically includes the ability to run custom software or
applications, on top of purpose-built operating systems. It is somewhat difficult to specify
the point at which mobile phones started widely being referred to as smartphones, as their
development was simply the result of continual product evolution. However, it is safe
to say that the variety of touch screen devices ushered in by Apple’s iPhone, and later,
Google’s Android devices, can be classified as smartphones. The growth of smartphones
5
sales has been extremely high, with smartphone sales reaching more than 115 million
devices in the third-quarter of 2011 [64].
1.1.3.1 Mobile Malware
Mobile malware is malware that has been written for a mobile device such as a tablet
computer or a mobile phone. The problem of mobile malware has been around for more
than a decade. Even in the pre-smartphone era there was considerable speculation as to
when malware on mobile phones would become commonplace, and what the capabilities
of said malware would be when it arrived [49]. As an emerging platform for malware, there
were many factors that dictated when malware authors would be sufficiently motivated
to begin writing mobile malware in earnest [85]. However, the tremendous increase in
smartphone use [64], coupled with the fact that smartphones increasingly store large
amounts of personal or private information, has been enough to push mobile malware
from a curiosity to a full fledged industry. In recent years the growth of mobile malware
has been dramatic, with F-Secure reporting a nearly 400% increase in mobile malware
between 2005 and 2007 [66], and McAfee Labs recording a doubling of mobile malware
samples between the beginning of 2009 and the middle of 2011. Much like desktop
malware, mobile malware ranges from mildly annoying to extremely insidious, and all
major platforms have been affected [77, 71, 33, 32, 57, 100].
Combating malware is not trivial on high-resource desktop computers, and the re-
source constraints present on mobile devices only increases the challenge of this task. It
is not simply that the processing and storage capacity of a mobile device is less than
a contemporary desktop computer, but it is the fact that the uptime of the device is
limited by the available battery power. Thus, excessive computation caused either by
malware or anti-malware code running on the device will shorten the battery life, and
decrease the usefulness of the device [37].
6
1.1.3.2 Android
Given that a large portion of this research uses the Android operating system, it is worth
discussing Android, as well as the Android security model and some of the issues around
Android security. (For the remainder of this section, unless otherwise stated, please refer
to [2] for details pertaining to the Android operating system.) The selection of Android
as the platform for this study was based on a variety of factors. When comparing the top
smartphone operating systems (Android, iOS, Windows Phone, Symbian and Blackberry
OS), Android is the only mainstream operating system which is open-source, allowing
for modification of the operating system. This, coupled with the rise of Android as a
smartphone platform made it the obvious choice [64].
Android is middle-ware developed by Google and built on top of Linux. It is targeted
at mobile devices such as smartphones, tablets and e-readers. Like many mobile operating
systems, Android has been designed to provide developers with a rich environment in
which to develop applications (or “apps”) that leverage the available physical hardware.
Android apps are written in Java, but are not executed on a traditional Java Virtual
Machine. Rather, Android includes a high performance, mobile-specific VM called the
Dalvik Virtual Machine that executes the compiled Android bytecode.
In order to create a secure operating environment, Android implements a high degree
of process isolation between apps. When an app is launched, a new process is created for
that app, owned by a user ID unique to that app. Within this process, a new Dalvik VM
is launched, within which the desired app is run. This process isolation, in conjunction
with Google’s design philosophy of “all apps are created equal”, is highly beneficial from a
security perspective. It means that flaws or exploits in a given app cannot easily result in
access to restricted data, processes or services. For example, a successful buffer overflow
attack on a particular app would only provide access to the files and process owned by
the compromised app [58], as well as any other public files present on the file system.
7
Another key component of the Android security system is the permissions model
which, broadly speaking, defines what portion of the Android API a given app has access
to, and what actions an app can perform when interacting with another app on the device
[54, 59]. For example, at install time, an application could declare that it requires access
to the internet and the ability to receive SMS messages. Before proceeding with the
installation, the user must approve these permission requests. However, an application
could potentially request a set of permissions that would allow for malicious behavior,
such as creating an application to monitor phone conversations, or track a user’s location
without their knowledge [56].
This permissions model is further complicated by the addition of the inter-process
communication model which provides a mechanism for passing messages and data be-
tween applications, or from the operating system to an app on the device. These messages
are referred to as intents, and these intents can be explicit (app A sends a message to app
B and only app B) or implicit (app A sends a message to any app which supports the
desired operation). Unfortunately, both explicit and implicit intents allow for a scenario
where an app can spoof an intent, in an attempt to gain information from the target
app. Additionally, the latter case creates a scenario where an intent can be intercepted
by a malicious app, bypassing its intended target [45].
In light of the process isolation enforced by Android, it is becoming increasingly likely
that malware in the conventional sense is being eclipsed by the issue of malicious apps
which are unwittingly installed by a user [55]. These can be applications that ask for a
specific collection of permissions that could enable malicious behavior [56] or applications
which abuse Android’s message passing system for malicious purposes [45].
Apps for Android can be distributed in a variety of ways. The most common way
is via an application market. A market is simply an app that runs on a device and
allows a user to find and install other apps. The feature that differentiates the Android
8
Market 2 model from other market models (most notably, the Apple App Store), is the
fact that developer submissions to the Android marketplace are relatively unregulated.
Submissions do not go through any sort of rigorous quality control checks. Specifically,
apps are not manually reviewed for quality and content prior to release, a hallmark of
the Apple App Store [42, 97]. While on one hand, Google’s marketplace model provides
developers with the ability to quickly take an app from development to deployment, it
also means that developers of malicious apps have fewer obstacles to overcome when
trying to quickly publish their apps to a wide audience. In order to combat this, both
the Google and Apple markets contain a remote “killswitch” that allows not only for
the removal of an app from their market, but also the remote removal of the app from
a user’s device [76]. Additionally, Google has potentially staked the reputation of their
brand on their Market, and so has a vested interest in preventing it from becoming filled
with malicious apps. Therefore, it is not surprising, given their more permissive market
model, that Google has had to actually use their killswitch to remove malicious apps
[40]. Furthermore, Google has very recently announced that due to the spate of malware
on their market, they have developed their own internal anti-malware scanning system
called Bouncer, which performs automated scanning of apps submitted to the market
[73].
Android’s market model is further complicated by the fact that a user does not need
to use the Android Market to install apps. Android allows the installation of apps
downloaded from the web, attached in an e-mail, transferred via USB from another
computer, or downloaded from any number of the third-party app stores that are available
for Android. McDaniel and Enck provide a brief discussion of some of the security
challenges presented in such a multi-market environment [76], arguing that markets by
2As of March 6, 2012 Google has grouped the Android market together with a number of their othercommercial services creating a new service called Google Play [91]. Any future references to the GoogleMarket or the Android Market, refer specifically to the market for Android apps that is now part ofGoogle Play.
9
themselves do not fail at security, because markets don’t claim to provide security. Rather
the onus is on the users to make informed decisions about what apps they install. To
that end, it is suggested that what Android needs is a level of automated application
certification in Android’s multi-market ecosystem. Thin AV, the system described in
this thesis, is intended to be a step towards this goal.
While the official Android Market comes with a built-in killswitch for the removal
of malicious apps, the only other high-profile Android market, the Amazon App Store,
does not [43]. Then there are the numerous other, less well known Android application
markets, some of which are targeted at specific geographic regions [10, 15], others that are
targeted at specific hardware platforms [4], while others still are targeted to individuals
with more salacious tastes [14]. There is even a market under development that focuses
specifically providing apps that have been banned by the official Google Market [87].
As the number of third-party app stores increases, it is likely that some of these
markets will be more interested in the quantity of apps available for download, than the
quality of those applications. It is possible that these unofficial application markets will
become significant vectors for malicious applications in the years to come. The mobile
anti-malware system, Thin AV, which is described in Section 3, is a step in combating
this malware vector. By combining an install-time application check with a market-
independent killswitch capable of notifying users of malicious apps regardless of their
source, it is possible that these non-Google Market sources can be made safer for mobile
users.
1.2 Thesis Contributions
The first main contribution of this research is the design and development of Thin AV,
a system for providing anti-virus scanning for Linux-based desktop computers. Thin AV
10
combines a set of pre-existing, third-party scanning services and offloads the scanning of
files from the host computer to these services. The evaluation of Thin AV found that
performance of the system was highly dependent on the file system activity while the
system was active, but that there were specific instances where the system performed well.
The findings from this research can help to address the performance concerns involved
in cloud-based malware scanning. This could result in a system that would be capable
of performing nearly transparent anti-malware protection from the cloud.
The second contribution of this thesis was an extension of the desktop version of
Thin AV, specifically targeted at smartphones and other mobile devices. The system
was designed and developed for the Android operating system, and the evaluation of the
system showed favorable performance, suggesting that cloud-based anti-malware scanning
may be a very good fit for providing a level of security to mobile devices.
Finally, this research includes a comprehensive examination and summary of the
current body of academic research pertaining to cloud-based security for both desktop
computers and mobile devices, as well as research regarding low-impact anti-malware
techniques which might also be suitable for mobile devices.
1.3 Thesis Outline
This thesis is divided into chapters as follows: Chapter 2 will examine the existing
research in the related fields of mobile malware, cloud based-anti-malware, as well as
research into other lightweight anti-malware systems. Chapter 3 will introduce Thin
AV, the system at the centre of this thesis, with a significant focus on the design and
implementation of both the desktop and mobile versions of Thin AV. Chapter 4 will focus
on the evaluation of the desktop version of Thin AV, while Chapter 5 will deal with the
evaluation of the mobile version. Chapter 6 will discuss the results of the evaluation as
11
well as the areas in which Thin AV could be improved, giving specific attention to the
privacy implications of Thin AV. Chapter 7 will conclude this thesis.
12
Chapter 2
Related Work
Most of the research related to Thin AV can be grouped into one of three categories:
security and malware in mobile environments, which will be discussed in Section 2.1,
cloud based anti-virus systems, which will be discussed in Section 2.2, and mobile anti-
virus systems, which are reviewed in Section 2.3. Section 2.4 contains a review of related
research that can be found in the overlap between these research areas. Section 2.5 will
discuss and critique the work that is relevant to Thin AV, but cannot be clearly classified
into any of these previous areas. Finally, Section 2.6 will conclude this chapter.
2.1 Mobile Security and Malware
The problem of mobile malware has been around for more than a decade. In that time,
the nature of the malware threat has shifted significantly. In the pre-smartphone era,
most malware came in the form of viruses or Trojan horses [66], while in recent years
most malware comes in the form of malicious applications [60]. However, Bickford et al.
have shown the possibility of developing rootkits for a modern smartphone, though their
work did not focus on a well known smartphone operating system [38]. Additionally, [95]
showed that smartphones are susceptible to more traditional denial of service attacks due
to their lack of firewalls. The same study also raised the possibility of using smartphones
as offensive platforms, though this is less promising due to their limited power.
Porter Felt et al. conducted a survey of malware found in the wild on Android, iOS
and Symbian devices [60]. Their survey found that all instances of malware for Android
devices used application packages as their vector, meaning that users were un-knowingly
13
installing the malware on their device. Interestingly, the only instances of malware on
the iPhone occurred through an SSH exploit in rooted (or “jailbroken”) devices. The
study went on to examine the incentives behind each piece of malware, most of which
were financially based, and outlined a series of practical changes to each of the mobile
platforms to help curb those incentives.
Given the current glut of mobile malware, and the rate at which smartphones are
being adopted, it is clear that mobile security has become a pressing issue. Oberheide
et al. provide an overview of security issues in mobile environments [83]. They point
out that previous approaches to mobile security are either overly entrenched in desktop
security practices, or argue for entirely new paradigms. Oberheide et al. suggest the
truth lies somewhere in between. They discuss five issues that cause security on mobile
platforms to be subtly different than in non-mobile environments: resource constraints,
different attack strategies, different hardware architectures, platform/network obscurity,
and usability.
Enck et al. performed a review of Android application security by developing a tool
for reverse engineering Java code from the compiled Android byte code, then performing
static analysis [55]. The top 1,100 apps from the Android market were downloaded
and analyzed for a host of security flaws and poor programming practices. Enck et al.
found a pervasive misuse of personally identifying information such as phone identifiers
and location information, as well as evidence of poor programming practices such as the
writing of sensitive data to Android’s public centralized log. Fortunately, no evidence was
found of exploits in the Android framework, or the presence of malware in the collection
of analyzed apps. However, given that the apps selected for study were the top apps in
the Android market place, this likely resulted in a bias towards higher quality code than
might be found in a broader cross-section of apps.
Chin et al. performed a study of Android inter-process communication (IPC) that is
14
complementary to the analysis in Enck et al. [45]. Using ComDroid, a custom static code
analysis tool, one hundred of the top Android applications in the Android Marketplace
were examined for vulnerabilities in how they sent and received IPC messages (intents).
Numerous vulnerabilities were identified, as well as several instances of misuse of the
Android framework. These findings motivated a collection of programming best-practice
guidelines for Android programmers.
The same 1,100 apps from [55] were also studied by Barrera et al. with the goal of
understanding how the Android permission model is used in practice [35]. The study
found that the use of Android permissions showed a distinctly heavy tailed distribution,
with some permissions being employed in most apps (e.g., access to the internet) while
most other permissions were comparatively rare. Ultimately, it was concluded that the
Android permissions model could be improved by sub-dividing certain broad permissions
(e.g., internet access) to provide a more expressive model, while at the same time rarely
used permissions with related functionality could be grouped together (e.g., install /
uninstall applications). The findings of Barrera et al. are also in keeping with those
of Ongtang et al. [86]. Here, various elements of the Android permissions model were
enhanced and modified to accommodate a richer, more expressive set of permissions.
The Android permissions model was also examined by Felt et al. when they examined
the issue of overprivilege in applications [59]. By mapping the Android API, it was
possible to determine which API calls required which permissions. Using this permissions
map they were able to build a tool, Stowaway, to examine several hundred Android
applications, finding that almost a third of Android applications over-request permissions.
Additionally they found several that the Android permissions model is severely under-
documented, and in some cases, incorrectly documented.
15
2.2 Cloud Based Anti-Malware
The notion of cloud-based malware scanning was first posited in [81], addressed at length
in [82], and was a significant source of inspiration for the creation of Thin AV. The system
described by Oberheide et al. is called CloudAV. It involves running a local cloud service
consisting of twelve parallel VMs, ten of which run different anti-virus engines, and two
running behavioral detection engines. End hosts run a lightweight client (300 LOC in
Linux, and 1200 LOC in Windows) which tracks and suspends file access requests, until
the file has been scanned. The use of several heterogeneous scanning engines dramatically
improved threat detection, with Oberheide et al. claiming a 98% detection rate when
testing with the Ann Arbour Malware Library. Such a high detection rate does increase
the risk of false positives. However, it was found that by requiring at least four of the
scanning engines to flag a file as malware, false positives could be eliminated, while the
overall detection rate only dropped by 4%.
Given that CloudAV was deployed with dedicated scanning servers in a LAN envi-
ronment, the performance impediment from network latency and system load is minimal.
This results in an average file scan time of just over one second. This process is sped up
through the use of caching, which was shown to be highly effective, producing a 99.8%
hit rate with a primed cache. The performance of Thin AV could give an indication as
to how such a remote scanning system like CloudAV would perform over a WAN, where
network latency can be significant.
Following their success on the desktop, Oberheide et al. applied their strategy to a
mobile environment [84]. Their results showed a marked reduction in power consumption
and improved malware coverage. However, they failed to provide any information on how
fast their solution operated in the lower-bandwidth / higher-latency mobile realm. Con-
versely, in their examination of the trade-offs between energy consumption and security,
16
Bickford et al. showed that cloud-based anti-malware scanning is more energy intensive
than host-based scanning when performed on a mobile device capable of running a VM
hypervisor [37]. Although, it should be noted that the latter study was an examination
of cloud-based rootkit detection, not virus detection, and both implementations differed
greatly. Therefore, the latter is not necessarily a refutation of the results of Oberheide
et al.
A novel extension to cloud-based malware scanning was provided by Martignoni et
al. [75]. They implemented a system wherein suspect executables are uploaded to a
cloud-based analysis engine. The system executes the malware, intercepting the system
calls generated by the execution, and where necessary, passing those system calls back
to the original host. The rationale behind the approach is that most malware behavior-
based detection engines are based on running malware samples in a highly synthetic
environment. Yet often, the malicious characteristics of a piece of code are only triggered
by a very specific processing environment on the target machine (e.g., visiting a specific
banking web site). Like Thin AV, this approach reduces the user’s risk of infection, but
this approach provides the scanning service with a much more diverse set of computing
environments in which to test potentially malicious code, thus improving coverage when
seeking malicious behavior in a piece of code. Such a system, implemented in a VM,
would make a compelling addition to other cloud-based anti-malware systems such as
Thin AV or CloudAV.
Jakobsson and Juels described a strategy for malware scanning that also relies on
external computing resources [70]. Their technique allows trusted servers to audit the
activity logs of remote clients in an effort to establish the security posture of the clients.
The trusted servers, in most cases, would be owned and operated by institutions suscep-
tible to malware-based fraud, such as banks. A client-based agent would be responsible
for logging activity on the client such as file downloads and installations. This log file
17
could then be sent to the trusted server which would then allow the server to decide
whether or not to proceed with the transaction with that particular client.
Jakobsson and Jules claim their technique is secure against log tampering because any
events that could result in a malware infection occur only after the event in question has
been logged, and the log has been locked. However, they do not address the case where
their agent software would be installed on an already compromised machine. Because only
logs are being processed, this technique is well suited to low powered mobile environments,
where bandwidth is limited. Additionally, because logs and not entire files are being
transmitted, the privacy concerns are somewhat less than those presented by Thin AV,
where whole files are transmitted.
Clone Cloud [47] and MAUI [48] are both systems designed to enhance the processing
capabilities of smartphones by offloading intensive processing to highly resourced cloud-
based servers. The designers of Clone Cloud were the first to envision a system capable
of offloading smartphone malware scanning on to more powerful cloud-based hardware.
However, the ability to perform intensive malware scans was posited as only one of
many possible applications of their approach. Although, it should be noted that the
notion of moving intensive processing from mobile devices on to more powerful servers
predates Clone Cloud by many years [94, 61], yet Clone Cloud is the first system to apply
this practice to modern smartphones, and the first to consider the potential security
applications of such an approach.
Paranoid Android is an implementation of a cloud-based anti-malware system which
follows very closely on the heels of Clone Cloud [89]. The technique involves replicating
an entire mobile device in a virtualized server-based environment. System calls on the
physical device are recorded, and transmitted to the server where the user’s behavior is
replicated. This allows the server to maintain a faithful copy of the user’s device most of
the time (barring network disruptions). This server-based replica can be scanned using
18
traditional CPU intensive techniques that would not be feasible on a mobile device. A
major upside of this approach is that once a replica has been established on a server, the
amount of traffic necessary to maintain a consistent state is quite small. The obvious
downside, like CloudAV and others, is the privacy concerns involved in replicating a
device which very likely contains personal information. However, such a solution would
be ideal in a highly-managed corporate environment where worker privacy on company
provided devices is not a given.
Finally, private security company BitDefender also developed a cloud-based anti-
malware product [46]. In their solution they suggest that only the signature based scan-
ning portion of the malware-scan should be offloaded to the cloud. Their reasoning behind
this is that more than 90% of the size of BitDefender is composed of the static signature
based scanning engine. Therefore, if the less intensive operations such as heuristic scan-
ning remain on the client, and signature-based scanning is done remotely, then network
traffic can be kept to a minimum. For privacy reasons, they also opt to have users only
upload cryptographic hashes of their files for analysis, only uploading the whole file in
the event that a hash cannot be matched. This is very similar to the approach used by
Thin AV which will be discussed in Chapter 3.
2.3 Device Based Mobile Anti-Malware
There are a host of anti-malware systems which are designed to run on resource con-
strained mobile devices. VirusMeter is a proposed approach for general malware detection
in a mobile environment [72]. The approach involves detecting malware by monitoring
battery consumption. The assumption is that if the battery consumption of benign be-
havior can be adequately modeled, then deviations from that model will suggest the
presence of unauthorized code. The key issue with their approach is that even the best
19
case scenario has more than a 4% false-positive rate. This is high for a malware scanner.
More importantly, their system was prototyped on a comparatively old mobile device,
and it is unclear if their approach would work effectively on a modern smartphone which
typically runs a diverse collection of rich media applications capable of quickly draining
a device’s battery.
Heuristic based anti-malware scanning is conducive to mobile platforms simply due
to its reduced overhead. The approach in [102] identifies malware based on the pattern of
DLL usage in a program. Venugopal et al. observed that many malware programs share
similar behaviors, and these behaviors are accessed through DLLs. Furthermore, the
spreading mechanisms and targeted exploits of viruses in the mobile domain are different
than those in the desktop domain, so the heuristic methods from the latter domain
cannot be applied to the former. By developing a heuristic system and training it on a
collection of Symbian viruses, they were able to successfully identify 95% of other (non-
training set) Symbian malware, with no false positives. Much like VirusMeter, the most
obvious problem with this solution is that it was developed in a pre-smartphone world.
Smartphones typically now run a diverse, customized collection of mobile applications. In
a software environment where new applications with novel functionality are being released
on a daily basis, it raises questions about the efficacy of such a heuristic technique, or at
the very least, about the rate of false positives in such an environment.
A similar strategy for malware identification on Android-based mobile devices can
be found in [98]. The strategy involves using Linux-based tools to analyze the low-level
function calls of ELF files. They then use various heuristic techniques to classify a file
as malicious or clean depending on the functions being called. They also suggested a
technique for combating infection by having co-located mobile devices collaborate to
identify malware. Prior to their work on Android, Schmidt et al. developed a technique
for instrumenting Symbian and Windows Mobile devices with the intention of recording
20
user behavior for the purposes of remote anomaly detection [99].
Jakobsson et al. provide a novel technique for mobile malware detection called memory
printing [68, 69]. This is done using a cryptographic function which fills the free RAM
on a phone. The key property of this cryptographic function is that it takes dramatically
longer to compute if the function is configured to use more space than actually exists.
When scanning for malware, legitimate applications in RAM are swapped on the flash
disk. Therefore, when the function is executed, if there is less RAM available than should
exist (due to a piece of malware), the memory-printing function will take much longer
than it would if no malware were present. Jakobsson et al. assert that because malware
can only exist in secondary storage or in RAM, any malware that is not detected in
the RAM scan will be found when secondary storage is scanned. Unfortunately, in this
approach, secondary storage is still scanned via white/black lists, signatures or heuristics,
which, as was pointed out in [83], are not efficient strategies in a mobile environment.
Finally, a more preventative approach to mobile malware can be seen in Kirin [56].
Kirin is a system developed from a security requirements analysis of the Android per-
missions system. It checks a collection of rules which, if violated, may indicate that
an application being installed is capable of malicious activity. The study examined 311
Android applications and found ten positives, five of which were false. The key draw-
back of their system is that user intervention is still required to validate positive results.
Unfortunately, Kirin only prevents the installation of malicious applications which are
installed from third-party sources and not the official Google Market.
2.4 Non-Device Based Mobile Anti-Malware
Due to the processing and battery limitations of smartphones, there is a trend in research
towards anti-malware solutions for mobile devices that rely on a remote server, be it a
21
cloud service, a more conventional centralized server, or even a desktop / laptop.
SmartSiren is a centralized malware mitigation solution targeted at smartphones [44].
The system is the first example of decoupling security from smartphones. It is targeted
at the scenario where some smartphones do not have any AV software installed. Smart-
Siren consists of an agent that monitors phone behavior, and a proxy with whom the
agent communicates. The agent monitors general behaviors such as SMS and Bluetooth
traffic, as well as information about the phone, such as the cell towers to which it most
frequently connects. The proxy receives reports from participating agents and aggregates
these reports in an effort to find evidence of misbehaving smartphones. Their two key
techniques for detecting malware are statistical and anomaly monitoring. The former
looks to see if the phone’s capabilities are being used significantly more than would be
expected based on historical data. The latter attempts to identify phone numbers (which
would charge a fee to the phone’s owner) that are being contacted by the malware. As
malware using these phone numbers spreads, calls and messages to such numbers would
gradually rise in frequency. Alternatively, if the malware attempts to spread through the
smartphone’s contact list, fake contact list entries are used to determine if a device is
infected. Upon detection of an infection, the device owner is notified by SMS about the
infection, and how to deal with it. Additionally, individuals on the user’s contact list are
notified that they might be at risk of infection. Finally, all individuals who are signed
up with the SmartSiren service and frequent the same cell towers as the infected device
will be monitored.
One of the strengths of the work is its focus on the privacy of the individual. Re-
ports can be submitted by users, divulging as little personally identifiable information
as possible. There are some key drawbacks. First it seems much more likely that most
smartphones will not have any AV software, as opposed to the opposite. It is unclear how
well the system would behave when it is operating on very incomplete information. Fur-
22
thermore, the concept of statistical monitoring might falter in a high-traffic, high-noise
environment generated by today’s rich-media apps. Additionally, it seems like the system
is reliant on having a historical record of clean data against which it can compare current
data to ferret out abnormalities. There is no indication as to how this system would
cope when introduced into a potentially tainted environment. Finally, it is questionable
how well the third-party notification system would work in reality. Years of exposure to
adware has left users wary of seemingly inexplicable automated messages that tell them
their electronic devices are at risk of being infected with malware. It is quite possible
that such messages will be assumed to be spam and ignored. The worst possible sce-
nario would be when these third-party notifications are believed to be an indication of
an infection on their own device, leading to unnecessary attempts at malware removal.
Dixon and Mishra describe a system which uses a desktop or laptop as the anti-
malware analysis device [51]. Rather than relying on remote cloud-based services or
other network-intensive techniques, their system validates the contents of a mobile device
when it is connected to a user’s computer via USB. File hashes are used to identify files
that have not been analyzed for malware (be they modified or new files), and only the
files corresponding to novel hashes are sent to the desktop for analysis with standard
anti-malware software. In order to combat a sophisticated attacker which could send
false hashes to the validating system, a keyed hashing mechanism would be used, with
the key being provided by the external validating system.
A broad vision for hindering the spread of malicious apps can be found in Stratus,
a theoretical system proposed by Barrera et al. [34]. Stratus is a system comprising a
collection of information sources and services, such as developer registries, application
databases and remote application killswitches. The purpose of these entities is to help
provide some of the security guarantees that can be found in a single application market,
but in a multi-market environment. The proposed system was discussed with Android
23
in mind, both due to the increasing number of application markets available for An-
droid, and also due to the increased prevalence of malicious apps available for Android.
The backbone of the Stratus system is a universally unique application identifier that is
maintained by Stratus and is unique across all application markets.
Stratus is a well thought-out, but as of yet, unimplemented idea. The Stratus system
could potentially provide a framework in which Thin AV could operate. Specifically,
Thin AV could operate in conjunction with either the application databases, or the kill
switches. As such, the Stratus system is highly complementary to the goals of Thin AV
in the mobile setting.
2.5 Other Lightweight Anti-Virus Techniques
Miretskiy et al. [79] offer a highly integrated anti-virus solution in Avfs, which makes two
key contributions. In addition to modifying the open source anti-virus system ClamAV
to significantly increase the speed of scanning, they tied their modified ClamAV (called
Oyster) into a stacked file system such that AV scanning is part of the file system, a
technique adopted by Thin AV. This has several advantages. It allows the scanning of
files at the earliest possible moment, as opposed to simply scanning when files are opened
or closed, which introduces a window of vulnerability. Additionally, it speeds up scanning
because scanning is taking place at the kernel level, as opposed to intercepting system calls
or message passing. To this end, Miretskey et al. claim that their system demonstrates
less than 15% overhead above a standard non-Avfs based system. Finally, by integrating
anti-virus scanning into the file system, scanning is completely transparent to the user
and any infections can be easily quarantined such that no system process can access the
files. Their file system also supports a mode for built-in post-infection forensics. The key
drawback of the system is its reliance on ClamAV, which compares somewhat unfavorably
24
to commercial AV products [82]. While it is obvious that ClamAV was chosen because it
is open-source, it would have been interesting to see how Avfs would operate when tied
to a proprietary AV system.
A method for improving the speed of conventional malware scanning is provided
by Cha et al. in the form of SplitScreen[41]. Both the number of signatures and the
number of files that need to be scanned are dramatically reduced by using SplitScreen’s
two-pass technique. This is done by performing a first-pass scan of all the files using
a feed-forward Bloom filter (FFBF) with a pre-calculated bit-vector hash of all known
malware samples. The defining feature of any Bloom filter is that it is very fast, it
may produce false positives, but it never produces false negatives. In this way, a system
can quickly be scanned for potential malware. If a possible candidate file is found (i.e.,
possibly infected with malware), the full anti-virus signature needed to positively identify
the malware can be downloaded from an external repository, and these candidate files
are rescanned using a conventional signature-based scanning approach. Because whole
files are not transferred, less user data is exposed and privacy concerns are somewhat
mitigated. Furthermore, Cha et al. claim their technique produces a doubling of scanning
speed and a halving of memory usage. They suggest that this inclines their technique to
being adopted on mobile devices. However, they do not address the worst-case runtime
of their solution, which appears to be worse than a standard AV scan. Furthermore, the
speed of SplitScreen is heavily dependent on cache-based optimization. The study results
show decreasing performance on CPUs with smaller L2 caches, yet the mobile devices
their system is targeted at are not currently endowed with very large caches.
25
2.6 Summary
It is clear that malware is a problem for users of desktop computers. However, this
problem has now spilled over into the potentially lucrative domain of smartphones. Given
that conventional anti-malware strategies have resulted in unwieldy and low-performance
software solutions, significant effort has been dedicated to non-conventional approaches
to malware detection. The notion of using cloud resources to aid in malware detection is
a relatively new concept. However, in reality many of these efforts make use of non-local
resources which do not necessarily conform to the computing-as-a-service model which
defines true cloud computing.
The resource limitations of mobile devices have necessitated new research efforts into
mobile anti-malware strategies, some of which of are device based, and some of which
rely on external computing resources. However, to date, no strategy for mobile malware
detection has been shown to be clearly superior. This is why Thin AV is a compelling
avenue of research. If it is possible to provide a reasonable level of protection for desktop
computers and mobile devices using pre-existing shared resources for malware detection,
it may be possible to significantly reduce the computing burden on these devices.
26
Chapter 3
System Architecture
This chapter presents an overview of the Thin AV anti-malware system, both on the
desktop and mobile platforms. The system was designed to have a modular architec-
ture, with separate modules for the individual scanning services. Section 3.1 will provide
background on the different scanning services that Thin AV uses. Section 3.2 details the
implementation of the desktop-based implementation of Thin AV. The mobile implemen-
tation of Thin AV is discussed in Section 3.3.
3.1 Scanning Services
The goal of Thin AV was to develop an anti-malware solution that offloads the chore of
scanning to third-party malware scanning services. At present, the system can scan files
with one of three different scanning services that are freely available online: Kaspersky,
VirusChief, and VirusTotal. These services all behave similarly insofar as a user can
upload any type of file (executable, data, etc.) through the website of the service and
receive a report as to any malware that might be contained in that file. Unfortunately,
these scanning services are based on proprietary anti-malware engines, and as such, the
exact details of the engines underpinning these services are very closely held trade secrets.
Therefore, the exact capabilities and limitations of these services with respect to threat
detection are not publicly known. Consumer testing of these anti-virus products can
provide some clue as to their capabilities [93], though in order to be in compliance with
the end-user license agreements (in the United States), these consumer tests must be
limited to black-box testing methodologies [24].
27
3.1.1 Kaspersky
Kaspersky Lab [13] offers a free service for scanning individual files [12] that are 1 MB
or smaller in size. The service scans files uploaded to the website using Kaspersky’s
proprietary anti-malware engine and returns a diagnosis to the user’s browser.
3.1.2 VirusChief
VirusChief [21] is a multi-engine based malware scanning service with a 10 MB file size
limit. Similar to Kaspersky, users upload files through their browser. Once received,
the file is scanned using up to 13 different scanning engines. Results from each scanning
engine are returned to the user’s browser via AJAX.
3.1.3 VirusTotal
VirusTotal [22] is a multi-engine based scanning service offered by Hispasec, which can
also scan files up to 20 MB in size.1
VirusTotal scans files with 42 different scanning engines, including many of the same
engines found in VirusChief as well as the Kaspersky engine. VirusTotal is also unique in
that in addition to their website, they offer a semi-public API for accessing their service.
Individuals can apply for a key which can be used to call the API for the purpose of
uploading files and retrieving reports. VirusTotal also attempts to increase performance
by storing cryptographic hashes of all uploaded files. That way, if a file has been scanned
previously, the report generated from the previous scan can be returned as opposed to
completely re-scanning the file.
1The file size limit for VirusTotal was raised to 32 MB some time during early 2012. However, allThin AV evaluation relevant to VirusTotal was completed when the 20 MB limit was still in place.
28
3.1.4 Other Services
Two additional scanning services were examined for inclusion in Thin AV, but were
deemed to be inappropriate for Thin AV modules. VirScan [20] is a multi-engine web-
based file analyzer. However, the scanning process used by VirScan checks the uploaded
file sequentially, with each of 37 different scanning engines. Although VirScan is designed
in a way that could be used to power a Thin AV module, the sequential scanning results
in extremely poor performance, and for that reason was not selected for inclusion in Thin
AV.
FileAdvisor [8] by Bit9 was another service considered as a candidate for Thin AV.
However, FileAdvisor is not actually a real-time scanning service. Rather, it is a large
database of malware scan results. Users can upload a file, or simply a cryptographic
hash of a target file, and FileAdvisor will return information regarding previous malware
scans of that file. However, this is contingent upon a matching hash being found in
the FileAdvisor database. This means that novel files will always fail to return results.
Furthermore, although FileAdvisor boasts a database of more than 7 billion files [8], the
database is heavily biased towards Windows files. Finally, this service offering has a limit
of 15 lookups per day, making FileAdvisor a very poor candidate for inclusion in Thin
AV.
3.1.5 Terms of Service
Of the three scanning services included in Thin AV, only VirusTotal provides a terms of
service agreement for their offering. The pertinent language in the agreement as it applies
to Thin AV is that users must “abstain from any activity that could damage, overload,
harm or impede the normal functioning of Hispasec websites” [23]. To a large extent, this
requirement is automatically enforced given that VirusTotal’s API only allows a limited
number of requests from each user over a given time period.
29
Given that Thin AV was developed as a experimental proof-of-concept, the workloads
produced were comparatively small. Even so, attempts were made, where possible, to
minimize traffic to these scanning services during testing and performance evaluation by
making use of a simulator (discussed in Section 4.3.1) as opposed to actually uploading
files for scanning. Given the configuration of Thin AV on the desktop, any sort of large
scale deployment would be in violation of VirusTotal’s terms of service, and would surely
draw the ire of Kaspersky Lab and VirusChief. However, the configuration of the mobile
version of Thin AV is considerably less taxing on the third-party services, as such, it
might be possible to run a production-scale deployment of the mobile version of Thin
AV without violating VirusTotal’s terms of service, or overly taxing Kaspersky Lab and
VirusChief.
3.2 Desktop Thin AV System
The desktop-based implementation of Thin AV was written in Python 2.7 and deployed
on Ubuntu Linux 11.04 running a modified Linux kernel. The kernel which was originally
packaged with this version of Ubuntu (2.6.39.0) was replaced with version 2.6.36.4 so as
to be compatible with the stackable file system discussed in Section 3.2.1. The hardware
platform for development and testing of Thin AV was a laptop with an Intel Core i5
M460 CPU (2.53GHz) and 4 GB of RAM. The native operating systems were Windows
7 Professional SP1 64-bit, and Ubuntu 11.10. The desktop version of Thin AV was
developed and tested on Ubuntu which was deployed in a virtual machine running in
VMWare Player in Windows 7.
The anti-malware system has three major components (Figure 3.1): the DazukoFS
[7] stacked file system, the file system access controller, and the Thin AV anti-malware
scanner. The former, DazukoFS, was originally developed by Avira Operations GmbH
30
& Co. KG [5], before being made freely available under a BSD license. The latter two
components are original developments. In addition to these components, a testing pro-
gram (standalone runner) was developed which provided access to Thin AV independent
of DazukoFS and the file system access controller. Each of these components will now
be described in detail.
User Space
Third-‐party Scanning Services
Thin AV
Thin AV Cache
VirusTotal Module
VirusChiefModule
Kaspersky Module
VirusTotal Service
VirusChiefService
Kaspersky Service
File System Access Controller
DazukoFS API
Kernel
DazukoFS Stacked File System
Linux File System
Standalone Runner
Figure 3.1: System architecture for Thin AV
31
3.2.1 DazukoFS
DazukoFS is a stackable file system that allows user space programs to perform file access
control at the kernel level. For the purposes of this work, version 3.1.4 of DazukoFS was
used. This version of DazukoFS installs as a kernel module for version 2.6.36 of the
Linux kernel. In addition to the kernel module, DazukoFS also provides an API library
for interacting with the mounted file system.
DazukoFS can be mounted on top of any directory in the Linux file system with the
exception of the root directory. Once mounted, all file access requests that take place
in that directory (or any subdirectories) will be intercepted by DazukoFS. A user space
program (using the DazukoFS API library) is then responsible for permitting or denying
access to the requested file. In the case of Thin AV, if a file is deemed to contain malware,
the file access operation is terminated and the user is informed that the file access was
not permitted.
3.2.2 File System Access Controller
The file system access controller is a user space program that runs in the background
and is responsible for allowing or denying file access requests. The controller for Thin AV
was written in Python, with the CTypes module providing access to the DazukoFS API
library. The controller creates an instance of the thinAv class, which in turn is responsible
for telling the controller whether or not the file being accessed contains malware.
There are currently five different infection statuses. All three services can specify that
a file is either clean or infected, meaning that the service returned a conclusive result.
However, the API powering the VirusTotal module demanded the inclusion of three
additional statuses: waiting, postponed, and questionable. “Waiting” indicates that a
file has been uploaded, but that the service must be checked later for the completed
report. “Postponed” means that an upload was denied because more than 20 files have
32
Scan Result Permissive Restrictive PassiveClean
√ √ √
Infected√
Questionable√
Waiting√ √
Error / Postponed√ √
Table 3.1: Thin AV security policy matrix. A check mark indicates whether file accessis allowed under each security policy for a given scan result.
been uploaded via the API in the last 5 minutes. Finally, because VirusTotal scans with
more than 40 anti-malware engines, there is an increased risk of a file falsely being labeled
as infected. Therefore, a threshold value is used to alleviate some of the false positives.
If the number of scanning engines indicating a file is infected is less than four, but more
than zero, the file will be labeled as “questionable”. For the desktop implementation of
ThinAV the threshold value is hard-coded.
Once the file infection status has been returned to the controller, a determination must
be made as to whether or not to allow access to the file. This determination is based on
the security policy of the access controller, which is set at run-time via command line
argument. There are three policies implemented in Thin AV: permissive, restrictive, and
passive. The permissive policy will allow access to any file that is not explicitly infected,
or questionable (possibly infected). The restrictive policy will only allow access to files
that are explicitly labeled as clean. Finally, the passive policy never prevents access to a
file, rather it will simply alert the user to the presence of any malware via the terminal.
Table 3.1 outlines the various security policies. It should be noted that these policies
apply only to the desktop version of Thin AV, and not to the mobile version. The reasons
for this will be discussed in Section 3.3.3.
33
3.2.3 Standalone Runner
A small user space Python script was developed which is capable of instantiating a thinAv
class and scanning a file, independent of the file system access controller, or DazukoFS.
This was developed for the purpose of debugging and performance evaluation (Section
4.1) of the Thin AV system, as well as on-demand scanning of individual files.
3.2.4 Thin AV
The Thin AV Python program is the core of the anti-malware scanning system (Figure
3.2). The thinAv class possesses a scan function, which takes a filename, and optionally,
a file descriptor and the name of a specific scanning service, and returns a status code
indicating whether or not the named file is free of malware. This status code is the result
of either an online scan of the file, or a successful search through the local Thin AV cache.
The Thin AV local cache is simply a flat file of previous scan results, which is read
after a scanning module is instantiated. Prior to uploading a file to one of the scanning
services, the local cache is first checked. This way if a file has already been scanned, a
quick local lookup is all that is necessary to allow access to the file. The cache contains
an MD5 hash of the file being analyzed, the full path to the file, the number of times
Thin AV has been asked to analyze the file, the last time such an access has occurred, the
infection status of the file, a note for additional scan details, and the module that was
used when the file was analyzed. Because files are identified by their MD5 hash, it avoids
having to track and compare file modification times to determine if the file (in its current
form), has already been scanned. MD5 was chosen, in spite of its flaws [104], as the hash
function for file identification in Thin AV. This is primarily due to the speed of MD5
when compared with other hashing functions available in the Python hashing module
(Table 3.2). However, given that VirusTotal supports a variety of hashing functions,
changing Thin AV to use an alternative hashing function would be trivial.
34
MD5 SHA1 SHA256 SHA512Time (seconds) 2.37× 10−3 9.12× 10−3 2.40× 10−2 5.04× 10−2
Overhead compared to MD5 N/A 384.66 % 1013.45 % 2125.71 %
Table 3.2: Speed comparison of the hashing functions available in the Python hashlib
module. Speeds are based on the average time required to hash a 1 MB file of pseu-do-randomly generated data with each hashing function. The average time is the resultof 10 trials on the hardware described in Section 3.2.
In order to determine whether or not a file contains malware, thinAv must instantiate
a scanning module. At present, the desktop version of Thin AV has four scanning mod-
ules, one for each of the three scanning services described in Section 3.1, and a simulator
module which is used for performance evaluation purposes (Section 4.3.1), and does not
actually scan files for malware. All of the scanning modules inherit core functionality
from the thinAvParent class, which is responsible for providing functions for interacting
with the local cache, and uploading multipart/form-data via HTTP POST requests.
At present, the choice as to which module will be used when scanning is based on
the average amount of time each module takes to scan a file, with the fastest service
(Kaspersky) being selected first, followed by VirusChief and finally VirusTotal. If any
scanning module returns an error from an attempted online scan, then the next module in
the priority sequence is selected. If all three scanning modules fail, a general error code is
returned to the calling program (be it the file system access controller or the standalone
runner). The performance measurements which formed the basis of this decision are
discussed in Section 4.1.
3.2.5 Scanning Modules
Thin AV accesses the Kaspersky scanning service by simply constructing a HTTP POST
request with the appropriate fields, and searching the body of the HTTP response for
text strings which indicate whether Kaspersky has deemed the file to be clean or infected.
35
Using VirusChief for scanning in Thin AV is slightly more complicated than the
method used with Kaspersky. First, VirusChief checks for a cookie prior to scanning,
this means that Thin AV must initiate a HTTP GET request to the service in order to
procure a valid session ID. The file of interest is then uploaded via HTTP POST, along with
the session ID, and a report ID is returned. Because results are returned asynchronously
via AJAX, Thin AV polls the service once every second to check for scan results. Once
at least four scan results have been returned, the results are parsed, and the scan result
is returned.
Because VirusTotal provides an API for interacting with their scanning service, the
corresponding Thin AV module is somewhat different from the other scanning modules.
VirusTotal caches scan results, so prior to uploading a file, the hash of that file can be
checked against the VirusTotal database. If a match is found, the full report for that
file will be returned. If a match is not found, then the file can be uploaded, in full,
via an HTTP POST request, returning a report ID which can be used to look up the
scan report once it has been completed. Unfortunately, scan requests from the API are
given the lowest priority by VirusTotal, making the response time for a file scan highly
unpredictable. Although it would be possible to simply have the Thin AV module wait,
and periodically poll VirusTotal for the report, this would result in impractically long
wait times. Therefore, if a report is not immediately available for a file, the module will
return a status code indicating it is waiting for a result. Finally, the VirusTotal API only
allows 20 uploads every 5 minutes. If Thin AV exceeds this maximum, the module will
return a status code indicating that VirusTotal is temporarily unavailable.
3.2.6 System Circumvention
In the current implementation of Thin AV, there are a number of security holes that
would have to be addressed were a production scale system to be implemented. The two
36
Figure 3.2: UML Class Diagram for Thin AV.
most obvious avenues for attacking Thin AV are via a man-in-the-middle attack, and by
attacking the Thin AV local cache.
Of the three scanning services, only VirusTotal allows users to upload files via HTTPS.
This means that all traffic sent between Thin AV and both Kaspersky and VirusChief, is
sent in the clear (unencrypted). It is possible that an attacker might be able to intercept
this traffic by taking advantage of weakly secured (WEP) or public Wi-Fi, or by using an
attack such as ARP cache poisoning. If successful, an attacker could modify the results
37
returned by the scanning services to indicate that the uploaded file is free of malware,
when this might not be the case. Unfortunately, the only solution to this attack is to
rely on communication via HTTPS, which is a decision that rests in the hands of the
scanning service providers.
Additionally, because the Thin AV local cache is implemented as an unencrypted text
file, it is a ripe target for any potential malware. If a piece of malware could edit the
local cache, then subverting Thin AV would be as trivial as flagging known malicious
files as clean. One possible solution to this problem would be to encrypt the Thin AV
local cache file when it is not being written or read. Any changes to the encrypted file
would then result in an inability to correctly read the cache file, which would indicate
the presence of malware on the system. Unfortunately, key management under such a
scenario would still be problematic.
One possible, though highly improbable attack on Thin AV, would be to take advan-
tage of the lack of collision resistance in the MD5 hashing algorithm used by Thin AV
to identify files. Given that it is possible to construct two distinct inputs which produce
the same MD5 hash [104], it is possible, though, extremely unlikely, that an attacker
could construct a piece of malware that, when hashed, produces the same output as
a previously scanned benign file. Thin AV would then mistake this piece of malware
for a previously scanned file, and allow the execution of the file. As mentioned earlier,
changing the hash function used by Thin AV would be a trivial fix for this flaw.
A more problematic issue, is that of file size. Because the largest file that can be
scanned by Thin AV is 20 MB (using the VirusTotal service), this means that any files
in excess of 20 MB will be ignored by Thin AV. As such an attacker could simply write
a very large piece of malware in order to infect a system.
Finally, given that Thin AV is intended to operate only on files to which a normal
(non-root) user can edit (e.g., files in their home directory), any malware which makes use
38
of a privilege escalation exploit, such as a rootkit, could potentially circumvent Thin AV
in a variety of ways, from modifying the cache file (as mentioned above), to replacing the
Python interpreter with a corrupted executable, thereby circumventing the very behavior
of Thin AV. However, given that this work is focused on the feasibility of a light weight
cloud-based anti-malware system, defending against such attacks is beyond the scope of
this work.
3.3 Mobile Thin AV System
The mobile version of Thin AV was implemented on the Android platform. As mentioned
in Chapter 1, the decision to use Android was based both on the modifiability and the
wide-spread adoption of Android.
Because of the application isolation created by the combination of unique user ID
and processes for each application, the key threat to the Android user comes not from
traditional vectors for malware such as drive-by-downloads and application exploits, but
rather from malicious apps that are unwittingly installed by a user [60]. In order to
combat this threat, Thin AV on Android is application-centric and not file-centric (as
on the desktop). Specifically, the mobile version of Thin AV is focused on non-system
applications, that is, applications which were installed on the device after it was released
by the manufacturer or carrier. This was done for two reasons: first, unless a user has
“root” access on a device, the un-installation of system apps is not possible; second, it
seems unlikely that a manufacturer would intentionally install a malicious application on
their product.2
The implementation of Thin AV for Android is an extension of the desktop scanning
system. The top-level portion of Thin AV as outlined in Figure 3.1 was, with minimal
modification, re-tasked to act as a unified front-end and caching mechanism for a web-
2Recent events have shown this might not necessarily be the case [74].
39
based scanning service used by the Android implementation. This web-based scanning
service is then used in two different ways by the Android device: first, the “safe installer”
provides a way of verifying Android applications (APKs) during installation, and second,
the “killswitch” informs users when already installed applications have been found to be
malicious. Figure 3.3 shows the overall system architecture of Thin AV for Android. Each
of the key components of the mobile Thin AV system will be described in the following
subsections.
All Android development for Thin AV was done on the same hardware platform
described in Section 3.2 and deployed on a virtualized Android device running version
2.3.7 of Android, also referred to as Gingerbread. This version of Android was selected
for development because 2.3.7 was the most up-to-date version of Gingerbread before the
project was forked and Android 3.0 (Honeycomb) was developed specifically for tablets.
The platform changes in Honeycomb and Gingerbread were merged back into a unified
platform in version 4.0 (Ice Cream Sandwich) which, at the time of writing is the most
current version of Android. Ice Cream Sandwich was released in November of 2011, and
as such, deployment of this version is extremely limited, while Gingerbread constitutes
a large portion of the Android install base [92]. The availability of documentation,
examples, and the pervasiveness of Gingerbread, made it the ideal version for Android
development.
3.3.1 Reuse of Existing Thin AV System
The existing Thin AV system from the desktop implementation was modified to serve as
a unified scanning service for the mobile implementation of Thin AV. This was beneficial
because not only did it build upon the work which had already been completed for the
desktop version, but it also provided two key benefits to the system architecture. First, it
minimized the amount of code running on the Android device, and second, it put a layer
40
Thin AV Mobile Extension
Desktop Thin AV Implementation
PackageInstaller
Thin AV Killswitch
Android Device
Third-‐party Scanning Services
Thin AV
Thin AV Web Interface
Thin AV Cache
VirusTotal Module
VirusChiefModule
Kaspersky Module
ComDroid Module
VirusTotal Service
VirusChiefService
Kaspersky Service
ComDroid Service
Web Site
E-‐Mail
USB
Third-‐Party Market
Official Google Market
Application Sources
G Application Repository
Thin AV Safe Installer
Figure 3.3: System architecture diagram for the mobile implementation of Thin AV.
41
of abstraction between the Android device and the third-party scanning services. This is
beneficial because any changes to the scanning services can then be handled by modifying
the Thin AV code running on the server, while the code running on the Android device
would not require an update.
A web-front end was created using Flask 0.8 [25], a web application micro-framework
for Python. The web application creates a simple HTML form capable of receiving both
HTTP GET and POST requests. A GET request is sent in one of two circumstances: first,
if the Android Safe Installer attempts to check the cryptographic hash of a package
prior to installation, the application will return a scan result if such a result exists (this
will be discussed in greater detail in Section 3.3.3). Second, if the Android killswitch
sends a system fingerprint, the web-application will return a list of cryptographic hashes
for applications which have been found to contain malware (this will be discussed in
greater detail in Section 3.3.4). A POST request is sent in the circumstance where the
cryptographic hash for an Android package was not found in the Thin AV cache, and the
whole package must be uploaded to Thin AV.
3.3.2 Android Specific Scanner
One of the major benefits of Thin AV is its modular and extensible architecture. In
order to further demonstrate this benefit, and to increase the functionality of Thin AV,
an Android specific scanning service, ComDroid, was added to Thin AV to complement
the existing third-party anti-virus scanners. However, because of the modular design of
Thin AV, theoretically, any type of analysis module could be used to evaluate the safety of
Android packages. An Android specific anti-virus scanning service, permissions analyzers
similar to Kirin [56], Stowaway [59], or a social reputation analyzer such as was described
in [34], are all compelling possibilities. In fact, ComDroid itself is not an anti-virus engine,
but rather it is a static code analysis tool which can identify potential vulnerabilities in
42
Android applications. The tool was developed by Chin et al. and described in detail in
[45].
ComDroid is publicly available as a web based service hosted at the University of Cal-
ifornia at Berkley. Because ComDroid has a web interface, building a Thin AV scanning
module to take advantage of ComDroid was relatively straightforward. Beyond develop-
ing a new scanning module called thinComDroid, which also inherits from thinAvParent
(see Figure 3.2), the only internal change that was necessary was the addition of a new
return status code. Because ComDroid identifies potentially exploitable apps and not
malicious apps, the ComDroid module can identify an Android package as being “at risk”
as opposed to being “infected”.
In the current deployment of Thin AV, a package will be prevented from installing it-
self if ComDroid identifies vulnerable communication channels within the package. How-
ever, depending on the sensitivity of ComDroid, and the prevalence of potentially vulnera-
ble apps, a more permissive strategy might be warranted. If one of the existing scanning
modules identifies a malicious app, this status will supersede any status returned by
ComDroid. It should also be noted that an Android package is scanned with ComDroid
after scanning with the appropriate anti-virus scanner. The performance drawbacks of
this configuration are obvious, and a production scale deployment of Thin AV should
incorporate the ability to perform simultaneous scans in parallel.
3.3.3 Safe Installer
In order to protect a device from malicious applications a mechanism must exist for
preventing the installation of malicious applications, and the Safe Installer is such a
mechanism.
All applications not installed via Google’s Market are installed using Android’s Pack-
age Installer system, and so this was the target for injecting the application check for Thin
43
AV. The technique used for hooking into the Package Installer was adapted from [56].
Unfortunately, Google’s Market does not use the Package Installer for installing apps.
This is because Google’s Market application is signed with the same certificate that is
used to sign the operating system; this gives the Market access to the highest level of
system permissions, a level that no other third-party applications is granted. Therefore,
the Google Market has the ability to directly install and uninstall applications, bypassing
the Package Installer. As mentioned in Section 1.1.3.2, Google has staked the reputation
of their brand to the success of the Android Market, and thus have a vested interest in
keeping it free of malware. Other application markets may not have such a prominent
brand name to maintain.
In order to modify the Package Installer, the Android operating system source code
was modified. The Package Installer is part of Android’s Java middle-ware which in-
cludes a broad selection of programs and libraries for use by application developers. The
PackageInstallerActivity class was modified to make use of ThinAvService, a new
service class which was added to the source code for the purpose of communicating with
the Thin AV web application described in the previous section. The service provides
a single public function checkAPK, accessed via an interface, defined using the Android
Interface Definition Language (AIDL). When a package is to be installed by the Package
Installer, the APK must already reside on the file system, having been downloaded by a
third-party market or transferred via some other method. The checkAPK function takes
the file system path of the APK being installed, reads the file and creates an MD5 hash of
the bytes of the APK. This hash is then sent to the Thin AV web application, which re-
turns a scan report, if such a report exists. If no scan report exists, the APK is uploaded
to Thin AV where it is passed off to one of the third-party scanning services. When a
scan result is returned, that result is passed back to ThinAvService and checkAPK then
returns a Boolean as to whether or not the installation should be allowed to proceed. The
44
PackageInstallerActivity then allows or prevents the installation of the application,
displaying the appropriate information dialogs to the user, where necessary.
In some sense the safe installer acts similar to the file system access controller in the
desktop version of Thin AV. Despite this, there is no concept of multiple security policies
in the safe installer. All package installs are subject to scanning, and package installation
will be terminated if Thin AV detects malware. Although different security policies could
be added to the mobile system, the faster performance of the mobile system compared
to the desktop system allowed for a single strict security policy without compromising
system performance (Section 5.5).
At this point, it should be noted that given that the Package Installer is part of the
Android source, the system just described cannot simply be installed on any Android
capable device. Replacing the operating system on a particular Android device requires
that the device be unlocked or “rooted”, as most devices are locked by the manufacturer
or service provider. Additionally, installing a new version of Android on a device voids
the device warranty and can have compatibility issues [16].
3.3.4 Killswitch
The safe installer described in the previous section can prevent the installation of appli-
cations known to be malicious. However, two other scenarios must also be addressed: one
where a malicious application has been installed on a device prior to the installation of
Thin AV, and one where an application was installed on a device but was not flagged as
malicious at the time of installation. A killswitch was developed that addresses these two
scenarios. It operates independently of any specific application installation mechanism,
making it ideal for the multi-market ecosystem available on Android devices.
Four different approaches were considered as potential alternatives for how to imple-
ment the killswitch. The first was to check for revocations at application launch time, by
45
modifying the app being launched. However this was not realistic, because even though
tools exist for decoding Android packages [1], the lack of a main function in Android
applications would require that a hook be inserted into every single activity which could
be called by an intent. Furthermore, any code modifications would also invalidate the
certificate that is packaged with the application. The second alternative was to hook
into the application launcher. This would allow for the ability to interrupt launches
from the Android home screen, but not launches from the system application list, nor
would it catch launches caused by intents generated from other apps. The third option
was to modify the actual program execution code which resides at a much lower level
in the Android source code. This approach, while technically challenging, was feasible.
However, from a from a software architecture perspective, it would very likely create un-
desirable cross-cutting concerns within the Android source code. The final option, which
was ultimately selected, was to develop a scheduled service which periodically checks for
revocations.
The killswitch was developed as a standard Android application capable of communi-
cating with the Thin AV web application, similar to the safe installer. The killswitch has
three different functions available to the user. It can upload all applications to Thin AV
for analysis (if said applications are not in the Thin AV cache), it can manually check if
any non-system applications on the device have been flagged as malicious, and finally, it
can set up the killswitch to regularly check the device for malicious applications using a
scheduled event (Figure 3.4). In the current implementation the killswitch is scheduled
to run every fifteen minutes.
The feature to manually upload missing packages to Thin AV was left as a manual
activity for the user. This decision was made due to the fact that it is possible that many
or even most of the packages on a device may be missing from Thin AV. The upload
and scanning of these apps, while a one-time activity, would still consume a great deal
46
of time and bandwidth. Therefore by leaving the decision to the user they can opt to
perform the upload when the device is connected to a WiFi network (as opposed to being
charged for using their cellular data plan), or at another time when the upload process
would not be an inconvenience, such as when the device is charging.
When the killswitch is checking for malicious apps, it uses the PackageManager class
to locate all public Android packages installed on the device. These packages are stored
on the device and are read-only, making them ideal for analysis. The meta-data of each
of the packages is read, and if the package has not been previously seen by the killswitch,
the bytes of the package are hashed and a collection of all package hashes is sent to the
Thin AV web application via HTTP GET. If a package has already been hashed by the
killswitch, then the hash is stored in a file which is only accessible to the killswitch. This
hash can then be retrieved much more quickly than recomputing the hash every time
the device is fingerprinted. If any of the hashes sent to the Thin AV web application are
found to be from a malicious app, the hashes corresponding to the malicious applications
are returned to the killswitch. The user is then notified of the issue, and presented a
list of applications suspected to be malicious. The user can then choose to initiate the
removal of those applications.
It might be preferable to have a killswitch which was capable of removing or in
some way quarantining a malicious application, without the input or consent of a user.
However, this would have required significant changes to the Android PackageManager,
as well as possibly other lower level components. This would have also created a potential
security vulnerability insofar as it would have created a mechanism by which an ordinary
application could uninstall other applications. This could result in a scenario where a
malicious application could install the anti-virus or security apps on a device.
47
(a) (b)
(c) (d)
Figure 3.4: User interfaces for the Android killswitch: (a) main screen, (b) prompt toupload missing packages, (c) notification of malware, (d) malicious application removalscreen.
48
3.3.5 System Circumvention
The key drawback of Thin AV as it is designed and deployed for Android is the fact that
install-time checking of applications can only be achieved by rooting a device and in-
stalling a custom operating system. As a prototype this technique is adequate. However,
this design is impractical for wide-scale deployment. A preferable scenario would be one
in which the main source code trunk of Android was modified to create a generic hook in
the PackageInstallerActivity class which would give ordinary apps the ability to allow
or deny application installations. Unfortunately such a hook in the PackageInstaller
could very easily be abused by malicious apps looking to prohibit the installation of le-
gitimate applications. A potential solution to this issue would be for Google to allow
applications to use the hook only if the developer of the application is trusted or in some
way certified by Google.
Finally, the mobile prototype of Thin AV is vulnerable to circumvention in much
the same way as the desktop version, most notably, the lack of encryption on HTTP
communication, and the possibility of forged MD5 hashes.
One major improvement offered by the mobile version of Thin AV is the reduced
privacy concerns due to the fact that only Android applications, and not personal files,
are being uploaded to Thin AV.
49
Chapter 4
System Evaluation - Desktop Thin AV
The evaluation of Thin AV was approached differently for the evaluation of the desktop-
based version of Thin AV (Chapter 4), and the mobile-based implementation of the
system (Chapter 5). The evaluation of the desktop version of Thin AV focuses on the
system speed, and not detection rates. This is because the detection performance of Thin
AV is reliant upon third-party proprietary software systems, and the detection perfor-
mance of these systems is regularly evaluated by consumer product testing groups [93].
Additionally, [82] provides a detailed analysis of the detection capabilities of many con-
sumer anti-virus products. Finally, because Thin AV relies on remote scanning services,
the performance of the system would be a much larger barrier to adoption than the de-
tection performance which is generally high for all consumer-grade anti-virus products
[106].
The evaluation of the desktop version of Thin AV was performed in four phases. The
first phase in the evaluation was to assess the performance of the individual scanning
services. The goal of this test was to determine the relationship, if any, between the
size of files being uploaded, and the time required to receive a response from the various
scanning services. This phase will be discussed in Section 4.1.
The second phase of testing involved determining the actual overhead caused by Thin
AV. A series of workload scripts were used to generate file system activity while various
parts of Thin AV were active. This way the overhead incurred by each element of Thin
AV could be determined. This phase is elaborated upon in Section 4.2.
The third phase of testing involved using the timing results from the first phase
to produce a simulator which would predict the scanning time for a file of arbitrary
50
size for each scanning service. The formulæ powering this simulator were then used to
compute the predicted overhead of using Thin AV on a system while running the same
workload script from the previous phase. By comparing the predicted overheads to the
actual overheads measured in phase two, the formulæ for the scanning services could be
iteratively refined until they predicted the actual overhead of Thin AV with a very high
degree of accuracy. This phase will be detailed in Section 4.3.
Finally, with the response time formulæ refined, the simulator was improved which
made it possible to simulate file system access on an large scale and determine the
overhead of Thin AV under different file system access patterns. Simulation was chosen
as the method for large scale testing because it would allow for testing a variety of file
system access patterns, at very large scales, relatively quickly, and it would not draw the
wrath of the various scanning service providers. This phase will be detailed in Section
4.4.
For each of the phases discussed below, the precise testing protocol will be described,
followed by the results produced from the test, concluding with a brief discussion of the
results as they pertain to the testing phase in question.
4.1 Scanning Service Performance
Each of the three scanning services is hosted by a different organization, with different
hardware resources, and each service receives different loads. Therefore, it was first
critical to determine the response times of the different services based on the size of files
being uploaded.
4.1.1 Testing Protocol
In order to test the performance of the different scanning services, a small testing program
was developed which would scan a series of files using a specific scanning service, and
51
record the time necessary to complete the scanning operation. Consequently, these tests
did not examine any latency that might be introduced by DazukoFS (Section 3.2.1) or
the file system access controller (FSAC, Section 3.2.2). An unavoidable drawback of
this black-box approach, is that it was not possible to determine the what portion of
the response time was due to file uploading what what portion was attributable to file
scanning.
Each execution of the testing program scanned 12 different files of the following sizes:
0 KB, 1 KB, 2 KB, 4 KB, 8 KB, 16 KB, 32 KB, 64 KB, 128 KB, 256 KB, 512 KB and
1023 KB. The Kaspersky scanning service will not scan files 1024 KB or larger, hence
1023 KB was chosen as the upper limit on file size. Additionally, files were skewed to the
small end of the size spectrum because results from [96], [90] and [28] suggest that small
files comprise the bulk of accesses during typical file system operation. Finally, the 0 KB
size file was included in the test because it shows the best case response time for each
service.
The test files were generated by a script which produces files of a specified size filled
with pseudo-random bits. The assumption was that a file generated in such a way would
have an extremely low probability of being flagged as malware by one of the scanning
services. Test files are uploaded in a pseudo-random order every time the test program
is run. This was done to overcome any penalty that might be incurred against the first
file being uploaded due to DNS lookups.
The testing program was run up to 8 times a day for each of the three scanning
services (for a maximum of 24 runs per day). Testing took place on 8 different days:
28/08/11 to 2/9/11, and 8/9/11 to 10/9/11. Testing was done on several days in an
attempt to give a fair representation of average service performance, in the event that
one or more services were performing particularly poorly on any given day. The tests
for each scanner took place between the hours of 9:00AM and 5:30PM MDT, and were
52
spaced as close to one hour apart as possible, with Kaspersky tested first, followed by
VirusChief, and finally VirusTotal. Although it would have been preferable to have a
completely automated testing script which ran continuously for several days, the prospect
of local network outages and remote service failures meant that the tests had to be run
under human supervision. Testing was performed on the hardware platform described
in Section 3.2. The laptop was connected to the internet via the University of Calgary’s
AirUC 802.11 wireless network.
Unfortunately, due to the low priority placed on scan requests to VirusTotal, the
response times from this service were frequently excessive. As such, testing of all twelve
files could often not be completed in the (approximately) 55 minutes between testing
sessions. However, given that the test files were uploaded in a random order, it was
possible to collect some data for all of the different file sizes. Additionally, because
VirusTotal limits users to 20 API calls in any given 5 minute window, the testing program
had to be modified to periodically poll VirusTotal for results. After uploading a file, the
test program would sleep, first for 15, then 30, and finally 60 seconds, each time polling
VirusTotal for a result after sleeping. The test program would then continue to sleep for
60 seconds between polling attempts until a result was returned. It is therefore the case
that response time results for VirusTotal have up to a 60 second margin of error. Finally,
it should be noted that during the period of testing, there were occasional failures of the
VirusTotal service. These outages were always brief, each lasting less than 5 minutes.
In the event of a failure, the test was stopped and restarted once the service became
available.
Finally, as Section 4.1.2 will show, the response time for VirusTotal was extremely
high, and not strongly correlated with the file size of the upload. In order to at least gain
an understanding of the file-size-dependent portion of the VirusTotal scanning process, a
test was run to determine the time required to upload a file of a given size to VirusTotal,
53
and receive a response, without waiting for a scan result. For this experiment, eight files
ranging from each of the file sizes listed earlier were uploaded to VirusTotal, and the
time required to receive a “waiting” response was measured.
4.1.2 Results
For each of the three scanning services, several hundred response time measurements were
recorded. Tables 4.1, 4.2, and 4.3 detail the key statistical measurements for each of the
three scanning services. Table 4.4 contains the measurements for the upload-only portion
of the VirusTotal scanning service. A cursory review of the data showed a handful of
extreme outliers for each service. As no standard technique exists for identifying outliers
[103], standard deviation was chosen as the means by which outliers were identified. As
such, any measurement beyond two standard deviations of the mean was classified as an
outlier. This threshold was chosen because it eliminated the most extreme results, while
retaining the vast majority of the data.
A comparison of the measurements from the three services shows a clear difference of
nearly an order-of-magnitude between the performance of VirusTotal and the other two
scanning services. The average response times from Kaspersky and VirusChief range from
1.54 – 14.49 seconds and 6.82 – 28.70 seconds respectively, while the response times from
VirusTotal range from 1.21 – 229.28 seconds (though the latter range becomes 148.92
– 229.28 seconds, when only non-zero file sizes are considered). The upload portion of
VirusTotal shows response times similar to Kaspersky with response times ranging from
1.94 – 11.74 seconds.
With the outliers removed, the response time data was plotted in an attempt to
determine what, if any, relationship exists between file size and service response time for
the three scanning services. Figures 4.1, 4.2, and 4.3 show the upload file size plotted
versus the response time for each of the three scanning services, and 4.4 graphs the
54
KasperskyWith Outliers Without Outliers
Upload size n µ (sec.) x (sec.) σ (sec.) n µ (sec.) x (sec.) σ (sec.)0 KB 64 1.69 1.48 0.81 61 1.54 1.45 0.311 KB 69 1.96 1.48 2.42 67 1.61 1.48 0.432 KB 69 2.09 1.76 1.90 66 1.75 1.75 0.214 KB 70 2.00 1.73 0.90 68 1.87 1.71 0.478 KB 69 2.89 2.00 3.48 67 2.33 1.99 1.1216 KB 70 2.49 2.01 2.45 69 2.22 2.00 0.7632 KB 69 2.90 2.30 2.50 68 2.63 2.29 1.0264 KB 70 3.07 2.62 1.13 63 2.74 2.58 0.54128 KB 69 3.67 3.37 1.41 66 3.41 3.34 0.31256 KB 70 5.58 4.70 3.50 68 5.03 4.67 1.28512 KB 70 9.92 8.12 6.02 65 8.48 7.96 3.071023 KB 68 16.17 13.78 6.12 62 14.49 13.63 2.59
Table 4.1: Number of measurements (n), mean response time (µ), median response time(x), and standard deviation for each size of file uploaded to the Kaspersky scanningservice.
VirusChiefWith Outliers Without Outliers
Upload size n µ (sec.) x (sec.) σ (sec.) n µ (sec.) x (sec.) σ (sec.)0 KB 63 7.38 6.46 3.72 60 6.82 6.40 2.751 KB 68 17.84 18.01 5.80 62 18.72 18.22 3.972 KB 68 18.04 18.79 10.36 62 16.94 18.69 6.104 KB 68 18.22 18.11 10.09 65 16.51 18.06 6.198 KB 67 18.46 18.57 8.76 65 17.46 18.45 6.1616 KB 68 20.23 19.19 11.70 66 18.66 19.15 6.6232 KB 68 19.48 20.18 6.99 63 20.24 21.13 5.5864 KB 68 20.31 20.31 7.36 61 20.49 20.42 5.10128 KB 68 19.24 19.85 5.97 60 20.17 20.07 3.28256 KB 69 20.70 22.18 6.61 61 21.36 22.25 4.25512 KB 68 23.99 24.72 6.93 61 25.20 24.91 4.251023 KB 67 28.53 28.71 7.96 60 28.70 28.73 5.76
Table 4.2: Number of measurements (n), mean response time (µ), median response time(x), and standard deviation for each size of file uploaded to the VirusChief scanningservice.
55
VirusTotalWith Outliers Without Outliers
Upload size n µ (sec.) x (sec.) σ (sec.) n µ (sec.) x (sec.) σ (sec.)0 KB 36 2.20 0.84 6.00 35 1.21 0.84 0.931 KB 34 201.65 116.27 246.64 32 148.92 116.08 91.642 KB 39 210.30 146.06 173.05 36 168.80 134.25 96.234 KB 37 283.89 170.51 289.37 34 212.95 150.50 152.348 KB 41 282.63 170.58 408.17 40 224.62 170.44 171.4516 KB 36 166.83 143.48 102.82 35 152.68 139.54 58.8732 KB 38 192.47 134.82 151.96 35 162.14 118.65 114.0064 KB 37 213.12 135.46 197.75 35 173.57 135.42 106.86128 KB 35 286.85 171.24 386.21 34 229.28 154.04 184.92256 KB 40 219.71 148.96 228.65 37 160.77 136.61 68.63512 KB 39 230.78 173.11 203.92 36 182.71 156.84 113.891023 KB 33 198.52 140.38 140.44 31 171.32 139.93 91.09
Table 4.3: Number of measurements (n), mean response time (µ), median response time(x), and standard deviation for each size of file uploaded to the VirusTotal scanningservice and polling until a scan result is returned.
VirusTotal (Upload Only)With Outliers Without Outliers
Upload size n µ (sec.) x (sec.) σ (sec.) n µ (sec.) x (sec.) σ (sec.)1 KB 8 2.30 1.68 1.07 8 2.30 1.68 1.072 KB 8 2.77 1.69 3.02 7 1.71 1.59 0.234 KB 8 1.94 1.97 0.15 8 1.94 1.97 0.158 KB 8 2.55 2.20 0.88 7 2.28 2.02 0.5016 KB 8 2.73 2.37 0.94 7 2.43 2.17 0.4632 KB 8 2.42 2.36 0.22 8 2.42 2.36 0.2264 KB 8 3.17 2.53 1.49 7 2.67 2.50 0.48128 KB 8 3.10 2.84 0.53 8 3.10 2.84 0.53256 KB 8 4.54 3.52 2.26 7 3.79 3.47 0.86512 KB 8 5.28 5.10 0.82 7 5.00 5.10 0.271023 KB 8 11.74 7.43 7.97 8 11.74 7.43 7.97
Table 4.4: Number of measurements (n), mean response time (µ), median response time(x), and standard deviation for each size of file uploaded to the VirusTotal scanningservice when no polling for scan results was performed.
56
0
5
10
15
20
25
30
0 200000 400000 600000 800000 1000000 1200000
Response tim
e (seconds)
File size (bytes)
Figure 4.1: Scan response time versus file upload size for the Kaspersky virus scannerservice.
upload and response speed of VirusTotal. Kaspersky and VirusChief both show a similar
positive correlation between file size and response time, with the VirusChief data being
positively shifted on the y-axis by roughly fifteen seconds. VirusTotal, on the other
hand, shows little if any relationship between file size and response time. The trend of
the VirusTotal response time data is slightly negative and possessing a much larger y-
intercept than either of the other two scanning services. Conversely, the upload portion
of the VirusTotal scan shows a trend very similar to Kaspersky. With this data it was
possible to produce a set of linear equations that approximate the performance of the
scanning services as a function of the number of bytes in the file (Table 4.5). This was
done using a linear regression in Microsoft’s Excel spreadsheet program.
4.1.3 Discussion
It is not surprising that Kaspersky, which only scans with a single anti-virus engine,
returns the fastest results, and VirusChief, which scans with six anti-virus engines is
roughly fifteen seconds slower than Kaspersky when scanning a similarly sized file. It is
57
0
5
10
15
20
25
30
35
40
45
0 200000 400000 600000 800000 1000000 1200000
Response Tim
e (seconds)
File size (bytes)
Figure 4.2: Scan response time versus file upload size for the VirusChief virus scannerservice.
0
100
200
300
400
500
600
700
800
900
0 200000 400000 600000 800000 1000000 1200000
Response tim
e (seconds)
File size (bytes)
Figure 4.3: Scan response time versus file upload size for the VirusTotal virus scannerservice.
Kaspersky f(x) = 10−5 × x+ 1.891VirusChief f(x) = 10−5 × x+ 17.133VirusTotal f(x) = −9× 10−6 × x+ 182.98VirusTotal (Upload Only) f(x) = 9−6 × x+ 1.947
Table 4.5: Linear equations for each of the three scanning services derived from Figures4.1, 4.2, 4.3, and 4.4. Equations calculate the response time for each scanning service fora file x bytes in size.
58
0
5
10
15
20
25
30
0 200000 400000 600000 800000 1000000 1200000
Response tim
e (seconds)
File size (bytes)
Figure 4.4: Upload response time versus file upload size for the VirusTotal virus scannerservice when uploading a file and not polling for a scan result.
the VirusTotal scanning service which behaves in a markedly different fashion. Again,
this is not surprising, because VirusTotal assigns the lowest priority to requests that are
sent via their formal API, the response of the VirusTotal service is not dependent upon
the size of the uploaded file, but rather on how busy the VirusTotal scanning service is
at any given moment. This distinction is made much clearer when comparing the time
required to scan a file with VirusTotal with the time required to merely upload a file to
VirusTotal.
Finally, it should be noted that because VirusTotal maintains a database of scan
results, hashes that match entries in the database will return results much faster than
files which there is no matching hash. This is evident when looking at the 0 KB length
files, which have a dramatically faster response time than files of any other size, even the
1 KB files. This is because the empty files always produce the same hash value, meaning
that the result will be stored in VirusTotal’s database, and a previous scan report can
be returned immediately and the file will not be uploaded for analysis.
The results in this section describe the performance of the three malware scanning
59
services during a specific window in time. Every effort was made to collect numerous
measurements during that window in order to provide a fair and accurate representation
of the performance of each service. However, it is unreasonable to assume that the results
described herein will accurately reflect the performance of the scanning services in the
long term. Changes in service loads, hardware configuration, and even changes in the
service offerings themselves could alter the performance of the scanning services. Such
possibilities are unavoidable, and by no means a reason to shy away from such avenues
of research. Unless the changes in performance are drastic, it is unlikely to alter the
conclusions made regarding the feasibility of Thin AV as a security apparatus.
4.2 Actual System Overhead
Having characterized the response times of the three different scanning services, the
second step of evaluating the desktop implementation of Thin AV was to assess the
performance impact of running Thin AV.
4.2.1 Testing Protocol
To calculate the actual overhead incurred by Thin AV, a series of workloads were used
to generate file system activity. The technique used to generate workloads was modified
from the approach used by Bickford et al. [37]. Here, the workload was created using a
bash script to launch Firefox and navigate to certain websites. The CPU utilization of
the Firefox process was monitored, and once it dropped to below a specific threshold, the
browser was terminated, as it was assumed that content had been successfully loaded,
and the browser was idle. The same process was applied to launching and terminating the
Thunderbird e-mail client. The idea of using web-browsing and e-mail as a workload is
appealing because these activities are extremely common, and well understood by users.
However, testing of the workload script used by Bickford et al. showed some limitations.
60
Web Advanced1) Launch Firefox 1) Compile GZip from source2) Navigate the following websites: 2) Compile a five page LATEX paper
http://www.google.ca 3) Copy the directory containing the paperhttp://www.slashdot.org 4) Rename the copied directoryhttp://www.reddit.com 5) Delete the copied directoryhttp://www.youtube.org
http://www.cbc.ca
3) Close Firefox4) Open Thunderbird5) Close Thunderbird
Table 4.6: Activities in the web and advanced workload scripts.
Most notably, the Firefox application would often be terminated before a page had been
completely loaded and rendered. Therefore, a different technique was used to generate
browser activity for Thin AV. Specifically, the automated testing tool Selenium was used
to drive Firefox [19].
Three bash scripts were produced each corresponding to a different workload. These
workloads will be referred to as “web”, “advanced”, and “combined”, respectively. The
web script was directly inspired by Bickford et al., and was complemented by the ad-
vanced script, which was designed to capture a snapshot of more developer-oriented
behavior. Table 4.6 outlines the specific activities in the web and advanced scripts. The
combined script includes the activities of the web script followed by the activities in the
advanced script. In order to better characterize the activities in each of the workloads, a
small Python program was written which uses the Linux inotify subsystem [11]. Each of
the workloads was run once and the activities generated by the workload were recorded
in order to help contextualize the results from this experiment.
To assess the overhead of Thin AV, several different scenarios were examined in re-
lation to a base case scenario. In every scenario the three workload scripts were run
ten times, and the time required to complete the workload was recorded. The base case
61
Scenario Description CachesKaspersky (uncached) Thin AV running with only
Kaspersky scanning files ClearedVirusChief (uncached) Thin AV running with only
VirusChief scanning files ClearedAll scanners (cached) Thin AV running in the typical
configuration with all threeservices scanning files of theappropriate size Not Cleared
Dazuko Only DazukoFS mounted withoutThin AV running N/A
FSAC and Dazuko DazuoFS mounted with filesystem access controllerapproving all accesses withoutchecking file with Thin AV N/A
Table 4.7: Scenarios examined for assessing Thin AV overhead. Cache column specifieswhether or not the Thin AV cache and browser cache were cleared between each of theten runs of a given workload.
scenario was one in which no part of Thin AV was active while the workloads were being
executed. Table 4.7 outlines each of the examined scenarios. Kaspserky and VirusChief
were tested individually, with the Thin AV and browser caches being cleared between
each of the ten runs. VirusTotal was not tested by itself because the average response
time of the VirusTotal service was so large, it would still be completely impractical for
scanning with as a standalone service. Additionally, because VirusTotal remotely caches
scan results, it would not have been possible to repeatedly test a workload on VirusTo-
tal without retrieving cached results from the first run of that workload (as opposed to
re-scanning files each time).
The “all scanners” scenario involved running Thin AV with all three scanning services
in concert when the Thin AV cache was primed. This is representative of how Thin AV
would normally operate. If a file cannot be found in the cache it will be scanned by
Kaspersky. If the file cannot be scanned by Kaspersky due to service unavailability or
62
file size, then Thin AV will attempt to scan with VirusChief, and finally VirusTotal. For
this test neither the Thin AV local cache, nor the Firefox browser cache were deleted
between runs. Because of the VirusTotal caching issue mentioned earlier it was not
possible to perform a test of all three scanners in concert with an empty Thin AV cache.
It should also be noted that because some of the content on the websites in the web
and combined workloads is dynamic, a primed Thin AV cache does not mean all files
encountered in the workloads were previously cached by Thin AV.
For each of the scenarios where all or part of Thin AV was running, only the home
directory of the active user was monitored. This decision was based on the fact that,
as previously mentioned, the goal of Thin AV is to provide minimalist anti-malware
protection. As such, areas of the file system which would normally be beyond the reach
of a standard user will be ignored by Thin AV, in favor of monitoring areas of high user
activity (i.e. the /home directory).
4.2.2 Results
Table 4.8 summarizes the characteristics of a representative run of each of the three
workloads. Given that the combined workload contains all of the activities of the web
and advanced workloads, it is sufficient to say that the characteristics of the combined
workload are approximately the sum (events, accesses, modifications, modify / access
events, unique files), average (mean file size), or maximum values from the advanced and
web workloads. The only similarities between the web and advanced workloads are the
number of unique files being accessed, and the median size of those files. The advanced
workload contains more than three times as many file events as the web workload, and
those events are split fairly evenly between access events and modification events. The
web workload, on the other hand, has almost three times as many access events as
modification events. Based on the file size statistics, it is clear that all of the files in
63
Web Advanced CombinedEvents 158 529 698Accesses 116 245 354Mean time between accesses (s) 0.157 0.010 0.064Modifications 42 284 344Modify / access events 7 19 25Unique files 88 92 175Mean file size (KB) 1138.35 17.74 581.59Median file size (KB) 5.13 2.72 4.52Maximum file size (KB) 53132.00 250.71 53132.00
Table 4.8: General characteristics for the three different workloads used for testing ThinAV.
Web Advanced CombinedBase case 21.4 1.9 21.3Kaspersky (uncached) 194.9 (910.75%) 186.2 (9800.00%) 376.3 (1766.67%)VirusChief (uncached) 1786.3 (8347.20%) 1864.2 (98115.79%) 3587.6 (16843.31%)All scanners (cached) 59.0 (275.70%) 36.9 (1942.11%) 100.9 (473.71%)Dazuko Only 21.6 (100.93%) 1.8 (94.74%) 23.2 (108.92%)FSAC and Dazuko 21.8 (101.87%) 2.8 (147.37%) 26.5 (124.41%)
Table 4.9: Average time (in seconds) to complete each of the three workloads while run-ning different configurations of Thin AV. Average is based on ten runs of each workloadfor each Thin AV configuration. Percentage overhead is the result of dividing the averagerunning time of each configuration by the average running time of the base case.
the advanced workload are relatively small (250 KB or less), whereas the web workload
contains a handful of much larger files.
The timing results in Table 4.9 show a high degree of similarity both between work-
loads and between scanning services. The Kaspersky service differs by less than ten
seconds for the web and advanced workloads (194.9 and 186.2 seconds respectively). The
results from VirusChief are roughly an order of magnitude larger than the Kaspersky
timing results, and the corresponding difference between the two workloads is less than
one hundred seconds. This similarity is in spite of the order of magnitude difference be-
64
tween the web and advanced workloads in the base case (21.4 seconds versus 1.8 seconds).
The running times for the workloads when Thin AV was running with a primed cache are
much smaller than the running times when active scanning was taking place. However,
the difference between the web and advanced workloads is somewhat larger (59.0 seconds
for the web workload, and 36.9 for the advanced workload). A marginal increase in run-
ning times of the web and combined workloads can be seen when only the Dazuko file
system is mounted, while the advanced workload actually shows a very small decrease.
The increase in running times is slightly more apparent when both DazukoFS and the
file system access controller are active, with web and advanced workloads showing 0.4
and 0.9 second increase over the baseline running times. Not surprisingly, in all cases,
the timing results for the combined workload are approximately the sum of the results
for the constituent workloads. Additionally, the percentage overheads follow a similar
trend to the absolute timing results.
4.2.3 Discussion
Not surprisingly, the fastest service, Kaspersky, showed the smallest overheads across the
three workloads, whereas VirusChief, which had a much slower response time, showed
the highest overheads. In general, it is clear that the range of possible overheads for Thin
AV is extreme. Interestingly, despite the highly distinct file system access characteristics
of the web and advanced workloads, the ultimate running time of these workloads with
both Kaspersky and VirusChief are somewhat similar. Although, when the overhead
is considered instead of simply the total running time, the advanced workload appears
particularly ill-suited to being scanned by Thin AV.
It is clear that the vast majority of the overhead incurred by Thin AV is caused by the
uploading and scanning of files to the remote services. The overheads caused by Dazuko
and the file system access controller are negligible, though it is possible that this overhead
65
could be further reduced if Thin AV were implemented in a higher performance language
such as C/C++ as opposed to Python. It is likely that the case where the advanced
workload was shown to run faster than the baseline in the Dazuko-only scenario, is the
result of a measurement error due to the granularity of timing measurements recorded
by the workload script.
Based on the response time results from Section 4.1, one could have predicted that
VirusChief would not, by itself, make a practical anti-malware scanning tool. The timing
results from this experiment bear out that prediction. Furthermore, Kaspersky, while
dramatically faster than VirusChief, is still prohibitively slow when all the files being
encountered are novel. However, it was to be expected that Thin AV would perform
poorly when a cache of previously scanned files was not available.
The three workloads were intended to be indicative of general desktop usage for some
users. However, they are very short, and do not contain the wait time that would be
present if a user were manually executing the actions in the workloads. Therefore, it is
very likely that the length of the workloads had a major impact on the overhead of Thin
AV. Specifically, because none of the workloads involved a high degree of repetition, the
effectiveness of the Thin AV cache was somewhat lessened. Given that the workload run
times for the fully cached scenario are approaching a much more reasonable overhead, it
is quite probable that as Thin AV is active for longer periods of time, the cache will grow
in size, increasing the likelihood of a local cache-hit, and thus decreasing the overhead of
Thin AV. This scenario will be examined in greater detail in Section 4.4.
4.3 Predicted System Overhead
Given that it is the highly-cached scenario that provides the most compelling argument
for Thin AV, the performance of Thin AV in the long term (when a high degree of caching
66
is possible) must be assessed. This section details the development and refinement of a
simulator for Thin AV which was developed using the timing results from Sections 4.2
and 4.1. Once the simulator was developed and the results were shown to be consistent
with the results from 4.2.2, larger scale simulations could be run to better understand
the performance impact of Thin AV.
4.3.1 Testing Protocol
In order to predict the overhead of Thin AV, two elements are necessary: a simulator
and a workload. In order to accurately calibrate the simulator, the workloads from the
previous section were used. Based on the file accesses in the workload, it is possible to
calculate how much time would be spent by Thin AV to scan the files being accessed. By
dividing the total duration of scanning with Thin AV plus the time spent for non-Thin
AV activities, by the time spent on non-Thin AV activities, the overhead imposed by
Thin AV can be predicted.
In order to calculate the overhead of Thin AV a simulator was developed. The sim-
ulator reads a list of file system events (e.g. access or modification of one or more files),
and based on the sizes of the different files, calculates the time that Thin AV would take
to run if it were actually scanning the files in the list. Using the workload scripts and the
inotify monitoring application discussed in Section 4.2.1, a log of file system activity was
created for each of the workload scripts which was then used to drive the simulator. In
addition to driving the simulator with a file system activity log, the simulator can also
be driven by a series of statistical parameters; this will be discussed in greater detail in
Section 4.4.1.
The function calculating the wait time for accessing a specific file can be based on the
size of the file being accessed, or not, depending on what kind of Thin AV activity is being
simulated. In the case of VirusChief and Kaspersky, the scanning time is calculated based
67
on a pair of linear equations. The functions in Table 4.5 provided the starting point. In
a series of successive manual iterations, the simulator was run on each of the workloads
and scanning was simulated. By simulating Thin AV running off of strictly Kaspersky or
VirusChief, each of the linear equations could be tuned until the running time predictions
produced by the simulator matched the actual Thin AV running time measured in Section
4.2.2.
Because the performance of VirusTotal is so poor, it is not a suitable candidate
for real-time file scanning. As such, when Thin AV is operating under the permissive
security policy (Table 3.1), the file is only uploaded to VirusTotal. Then, assuming
the file has not been previously scanned by VirusTotal, a “waiting” status is returned
and Thin AV then allows access to the file. In order to simulate this behavior, the
linear equation which describes the time required to upload a file to VirusTotal was
used (without modification) to determine the wait time for VirusTotal (Table 4.5). This
allowed the simulator to accurately predict the overhead of Thin AV when running with
all three scanning services under the permissive security policy.
When the simulator is processing the list of file system events, each time a unique
file is accessed for the first time, the wait time imposed by Thin AV is calculated, based
on the size of the file being accessed, and the specific scanning service being simulated.
Once a file is accessed, it is added to a list of known files. Any future accesses of a known
file only incur a wait time of 0.0002 seconds. This simulates the caching behavior of Thin
AV. The wait time for the cache was arrived at by measuring 10,000 successive cache hits
by Thin AV, and calculating the average access time. Finally, if a known file is modified,
it is removed from the list of known files, and a subsequent access of the modified file
will once again incur a wait time for the simulating scanning of that file.
68
Kaspersky f(x) = 8.85−5 × x+ 1.34VirusChief f(x) = 8.31−5 × x+ 16.37
Table 4.10: Linear equations for the Kaspersky and VirusChief scanning services. Kasper-sky and VirusChief were manually modified from the functions in Table 4.5.
4.3.2 Results
Table 4.10 contains the linear equations that were finally settled on for the simulation of
Kaspersky and VirusChief. These formulæ were arrived at after approximately twenty
iterations of running the simulation and adjusting the slope and y-intercept of each
equation.
Tables 4.11 and 4.12 show the simulation results from simulating Thin AV using only
one of either Kaspersky of VirusChief. The total running times of these simulations are
compared with the actual running times from Section 4.2.2 in Tables 4.13 and 4.14. As
can be seen in the latter two tables, the refinement of the wait-time functions was highly
successful, with the largest discrepancy between the actual and simulated run-times being
only 0.47%.
As was mentioned earlier, the web workload contains a handful of files that are sig-
nificantly larger than the bulk of the files in either of the web or advanced workloads.
This can be quantified more precisely by examining the number of un-scanned accesses
in each workload. Un-scanned accesses are files that are too large for the scanner to
process (1 MB for Kaspersky, and 10 MB for VirusChief). Here we see that in the web
workload, eight files are too large to be scanned by Kaspersky, but only three are too big
to be scanned by VirusChief. In the advanced workload, all of the files being accessed
are small enough for scanning with both Kaspersky and VirusChief. This implies that
the vast majority of files are capable of being scanned by Thin AV, and that most of
these files would be scanned by Kaspersky, the fastest of the three scanning services.
69
KasperskyWeb Advanced Combined
Mean scanned file size (KB) 39.02 17.10 27.23Median scanned file size (KB) 5.13 1.45 3.67Maximum scanned file size (KB) 512.00 250.71 512.00Un-scanned accesses 8 (6.9%) 0 (0.0%) 8 (2.3%)Total size of uploaded files (MB) 3.24 1.85 5.08Cache hit rate 21.3% 54.69% 44.80%Time for non-AV activities (sec.) 18.20 2.33 22.53Time for AV scanning (sec.) 176.47 184.57 354.06Total time (sec.) 194.67 186.9 376.59Overhead from AV 969.61% 7921.40% 1571.52%
Table 4.11: Simulation results of the Kaspersky service for three different activity logs.
The low cache-hit rates that were alluded to in Section 4.2.3 can be seen in the
simulation results. The highest cache hit rate occurs in the advanced workload and is
only 54%, while the lowest cache hit rate is just over 20% in the web workload. In
spite of this, the runtimes for the web and advanced workloads are quite similar for both
Kaspersky and VirusChief.
4.3.3 Discussion
The key finding from this experiment was that it was possible to predict, very accurately,
the running time of Thin AV when using Kaspersky and VirusChief. Although a simu-
lation could be run which predicts the running time of Thin AV using VirusTotal, it has
already been established that VirusTotal is impractical for real-time scanning. Addition-
ally, because no actual overhead measurements were made for VirusTotal, there would be
no way of verifying the accuracy of the VirusTotal overhead predicted by the simulator.
Furthermore, the scanner priority for Thin AV during normal operation is Kaspersky,
followed by VirusChief, then VirusTotal, it is somewhat less important to characterize
the performance of VirusTotal, because only files between ten and twenty megabytes will
70
VirusChiefWeb Advanced Combined
Mean file size scanned (KB) 125.08 17.10 67.05Median file size scanned (KB) 5.13 1.45 3.99Maximum file size scanned (KB) 2205.19 250.71 2205.19Un-scanned accesses 3 (2.6%) 0 (0%) 3 (0.8%)Total size of uploaded files (MB) 10.99 1.85 12.83Cache hit rate 20.35% 54.69% 44.16%Time for non-AV activities (sec.) 18.20 2.33 22.53Time for AV scanning (sec.) 1764.20 1866.14 3548.12Total time (sec.) 1782.40 1868.47 3570.65Overhead from AV 9693.41% 80092.03% 15748.44%
Table 4.12: Simulation results of the VirusChief service for three different activity logs.
KasperskyWorkload Running Time (s) Simulated Time (s) DifferenceWeb 194.90 194.67 -0.12%Advanced 186.20 186.90 0.38%Combined 376.30 376.59 0.08%
Table 4.13: Comparison of running time and simulation results for each of the workloadsfor the Kaspersky service.
VirusChiefWorkload Running Time (s) Simulated Time (s) DifferenceWeb 1786.30 1782.40 -0.22%Advanced 1864.20 1868.50 0.23%Combined 3587.63 3570.70 -0.47%
Table 4.14: Comparison of running time and simulation results for each of the workloadsfor the VirusChief service.
71
be uploaded to VirusTotal. By examining the file system access characteristics (Table
4.8) and the simulation results (Tables 4.11 and 4.12), it is apparent that this is not a
common occurrence. This assertion is also corroborated by the file system traces from
[90], [96] and [28].
Although it may have been possible to further increase the accuracy of the simulator
either by more manual tuning of the wait-time functions, or through an optimization
algorithm, the differences between the simulated and actual results are sufficiently low as
to afford a strong degree of confidence in the results produced by the simulator. It should
also be noted that when tuning the wait-time functions, the goal was to approximate the
total running time, and not the system overhead. This is because the system overhead
can change dramatically as a result in minor fluctuations in the base case run times.
Because the overhead is calculated by dividing the total running time by the base case
running time, small changes in such a comparatively small divisor can result in a dramatic
change in the resulting overhead. The advanced workload is an obvious example of this.
The promising finding from these results is that the workloads studied up to this
point only produce a low to modest degree of caching in Thin AV, nowhere near the 98%
cache hit rates reported by [82]. This means it is quite possible that the overhead of Thin
AV can be reduced to a more acceptable level in the long term, when a greater degree of
caching is possible. This scenario will be examined in the upcoming and final phase of
the evaluation of the desktop implementation of Thin AV.
4.4 Large Scale System Simulations
Given that the overhead values produced by the Thin AV simulator accurately predict the
actual running time of Thin AV, it is possible to examine different patterns of file access
when using Thin AV, and see if there are conditions under which Thin AV particularly
72
excels, or falls short. Although it may have been desirable to collect a set of actual usage
patterns for exploring the behavior of Thin AV, using the simulator to study different
scenarios is preferable for three key reasons. First it allows for the examination of a
wider range of longer running, high-activity file system access traces, in much less time
than it would take to actually perform such traces on the implementation of Thin AV.
Second, it allows for the examination of traces with very specific characteristics (file
sizes, number of unique files, number of modifications, etc.), and devising a series of
actual workloads which would generate this desired level of activity would be extremely
onerous. Finally, despite a concerted search in the areas of software engineering, human-
computer interaction, and hardware systems research, it was not possible to locate a
precedent for how to characterize “typical” user behavior, upon which to base an actual
large-scale workload. This is quite likely due to the fact that the term “typical” itself
lacks a consistent definition when examining a wide cross-section of users.
4.4.1 Testing Protocol
The large-scale testing of Thin AV was done using the simulator described in Section
4.2.1. However, instead of using file activity traces produced by inotify, the activity was
generated by a collection of probability distributions. A collection of key parameters
control the general characteristics of the file system activity generated by the simulator.
The number of file system events controls the total number of file accesses and modifi-
cations that will occur in the simulation. The proportion of modifications specifies what
proportion of events will be modifications versus accesses. The number of unique files
specifies how many different files will be accessed throughout the lifetime of the simula-
tion, which can be thought of as the absolute path of a file (i.e., modifications of a file
does not make it a new unique file).
Other key simulation parameters, specifically file size and time between events, are
73
drawn from exponential distributions. This distribution was chosen for file size generation
because it closely fits the distribution of file sizes measured in [28] and [90]. Figure 4.5
provides an example of file sizes generated by this distribution. The distribution can be
shifted left or right (i.e., smaller or larger files) as needed, depending on the parameter
provided to the exponential number generator. The file sizes generated are bounded by
a constant minimum and maximum; this was necessary to prevent the rare occurrence
where an extremely large file size was generated that would overly skew the mean file
size in the simulation. The exponential distribution was also chosen for generating the
time between file events. This has the effect of producing activity that is “bursty” with
periods of high activity (small times between events) broken up by less frequent periods
of inactivity (long times between events).
Ultimately the activity “log” produced by the simulator is in the same format pro-
duced by the inotify monitoring application used in Section 4.3.1. As such, it can be fed
into the same simulation program that was used earlier, and the behavior of Thin AV for
the simulated log can be characterized. The key simulation parameters were examined in
turn by modifying the experimental parameter while holding all other parameters con-
stant. Each simulation was run ten times and the average results were recorded. It should
be noted that in performing these experiments, some of the simulations were run with
activity logs that would not be likely to occur during any sort of “normal” file system
activity. However these results were included for the purposes of illuminating trends in
the relationships between the characteristics of file system activity and the performance
of Thin AV.
4.4.2 Results
The result of each experiment will be discussed in turn. For the sake of space, figures
will only show relevant relationships between the independent and dependent variables.
74
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
1 4 16 64 256 1024 4096 16384 65536
Cumulative % of files
File size (Kilobytes, log scale)
Figure 4.5: Example CDF of simulated files by size.
Simulation input and output values that did not change for each experiment can be found
in Tables A.1 through A.5 in Appendix A.
The first simulations examined the relationship between the number of novel files in
an activity log, and the performance of Thin AV. Figure 4.6 shows a strong positive rela-
tionship between the number of novel files, and the overhead incurred by Thin AV. When
only one in ten accesses involves a new file, Thin AV already shows a 5000% overhead,
and this number increases almost eight-fold as the proportion of accesses involving new
files moves to 1. Unless specifically mentioned, the simulated activity logs did not include
any file modification events. This was done to prevent unnecessarily complicating the
simulation results, as well as to help clarify the trends in the data.
An analogous trend can be seen when the number of access events is manipulated
(Figure 4.7). The Thin AV overhead drops off dramatically as the number of system
accesses increases. The trend in the cache hit rates follow similar trends in both Figures
4.6 and 4.7, as the ratio between the number of unique files and the number of accesses
approaches one-to-one, the frequency of cache hits drops off precipitously.
The next series of simulations were designed to show the impact of file size on Thin
75
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
0.00%
5000.00%
10000.00%
15000.00%
20000.00%
25000.00%
30000.00%
35000.00%
40000.00%
0.0001 0.001 0.01 0.1 1
Thin AV cache hit rate
Thin AV Overhead
Proportion of accesses of novel files
Unique file ratio Cache hit rate
Figure 4.6: Proportion of accesses which involved a unique (uncached) file (log10) versusThin AV induced overhead (left axis), and Thin AV cache hit rate (i.e., the chance thatan access was serviced by the cache versus online scanning, right axis). See Table A.1for further details.
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
0.00%
500.00%
1000.00%
1500.00%
2000.00%
2500.00%
1 10 100 1000 10000 100000 1000000
Thin AV cache hit rate
Thin AV Overhead
Number of file system accesses
Thin AV overhead Cache hit rate
Figure 4.7: Number of file system accesses (log10) versus Thin AV induced overhead (leftaxis), and Thin AV cache hit rate (i.e., the chance that an access was serviced by thecache versus online scanning, right axis) See Table A.2 for further details.
76
0.00% 10.00% 20.00% 30.00%
40.00% 50.00% 60.00%
70.00% 80.00% 90.00%
100.00%
0.00%
1000.00%
2000.00%
3000.00%
4000.00%
5000.00%
6000.00%
7000.00%
1 10 100 1000 10000 100000 1000000
Thin AV cache hit rate
Thin AV Overhead
File size (Kilobytes)
Mean file size (bytes) Median file size (bytes) Cache hit rate
Figure 4.8: File size in bytes (log10) versus Thin AV induced overhead (left axis), andThin AV cache hit rate (i.e., the chance that an access was serviced by the cache versusonline scanning, right axis). See Table A.3 for further details.
AV performance. By changing the mean of the file size distribution, it can be shown that
there is a distinct peak in the overhead of Thin AV. When the bulk of the files are very
small (i.e., one hundred bytes, or less), the overhead is negligible, then as the average
file size increases so too does the overhead (Figure 4.8). This trend peaks around the
10 MB mark at which point the system overhead begins to decline. At the same time,
changing the average file size shows little impact on the cache hit rate. The reasons for
this behavior will be discussed in Section 4.4.3.
Figure 4.9 shows the proportion of files which are scanned by each of the three scan-
ning services as the mean file size changes. Recall that the scanner priority for Thin AV
is Kaspersky, VirusChief, then VirusTotal. This is evident in the figure as the mean file
size starts off small, and as such, all files are scanned by Kaspersky. Gradually, as the
average file size increases, VirusChief and VirusTotal each become more prevalent. At
the same time, the number of files which were too large to be scanned by any of the
services begins to increase, slowly at first, but with a sharp increase at the 10 MB mark.
77
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
1 10 100 1000 10000 100000
Proportion of accesses
scan
ned by each service
Mean file size (Kilobytes)
Kaspersky VirusChief VirusTotal Unscanned
Figure 4.9: Mean file size in bytes (log10) versus the proportion of accesses scanned byeach scanning service, and the proportion of accesses un-scanned. See Table A.3 forfurther details.
0
20
40
60
80
100
0.00%
100.00%
200.00%
300.00%
400.00%
500.00%
0 0.2 0.4 0.6 0.8 1
Time (secon
ds)
Thin AV
Overhead / Ca
che hit rate
Proportion of file modification events
Thin AV Overhead (%)
Thin AV cache hit rate (%)
Average time between file accesses (seconds)
Figure 4.10: Proportion of file system events which modify files versus the Thin AVinduced overhead, Thin AV cache hit rate (left axis), and average time between fileaccesses (right axis). See Table A.4 for further details.
78
0.001
0.01
0.1
1
10
100
1000
10000
0.001 0.01 0.1 1 10 100 1000 10000
Thin AV Overhead (%
)
Average time between file accesses (seconds)
Figure 4.11: Average time between file accesses in seconds, versus Thin AV inducedoverhead (both axes are log10 scale). See Table A.5 for further details.
Next, the impact of file modifications was examined. Here, the probability that
controls whether a file event is a modification or an access, was adjusted from zero to
one, and the resulting impact on Thin AV behavior was recorded (Figure 4.10). The
most direct relationship is the inverse relationship between the modification rate and the
cache hit rate. Additionally, there is an overall decrease in Thin AV overhead as the file
modification rate increases.
Finally, there exists a strong (non-linear) positive relationship between the modifica-
tion rate and the time between file accesses. When the time between file accesses was
manipulated directly, a very clear inverse linear relationship emerges between the mean
inter-access time of a file trace, and the overhead incurred by Thin AV (Figure 4.11).
4.4.3 Discussion
Given that the trends in Figures 4.6 and 4.7 are very similar suggests that one of the key
variables in predicting Thin AV overhead is not the number of accesses or the number
of unique files per se, but rather, the ratio between the number of unique files and the
number of accesses. As that ratio approaches one-to-one, the performance of Thin AV
79
drops off dramatically. The reason for this is intuitive, as the ratio approaches one-to-one,
most accesses will involve files that are not in the cache, and therefore must be uploaded
and scanned. As a result the Thin AV cache becomes increasingly ineffective.
When it comes to the size of the files being scanned, the overhead of Thin AV has
much more to do with specific speeds at which the individual scanning services can return
a scan result. Figures 4.8 and 4.9 show that the overhead from Thin AV is relatively
low when the file sizes being scanned are very small. This is because all of the files
are being scanned with Kaspersky, the fastest of the three scanning services. As the
file size increases, there is a large peak as VirusChief (a much slower scanner) becomes
the dominant scanning engine. Ultimately however, as the file size exceeds the 10 MB
limit imposed by VirusChief, the overhead decreases, because VirusTotal simply accepts
the upload and returns a “waiting” response to Thin AV, which is much faster than
continually polling for a response. Finally, as the mean file size moves past the 20 MB
mark, the Thin AV overhead drops to near zero, as none of the scanning services are
capable of servicing the scan request, and the file access is permitted without scanning.
The results in Figure 4.10 are somewhat misleading. One might conclude that as the
file modification rate increases, so would the performance of Thin AV. This trend is in
spite of the fact that the cache hit rate also appears to be decreasing at the same time. In
reality it is the case that it is the time between file accesses that is actually impacting the
performance of Thin AV. As the proportion of file modifications increases, this naturally
creates an increasing gap between file access events. Because Thin AV only scans on file
accesses, this has the effect of reducing the overall amount of scanning that needs to be
performed. This reduction in the need for scanning goes much farther towards reducing
the overhead of Thin AV than any reduction in performance that occurs from reducing
the effectiveness of the cache. This extremely tight relationship between inter-access time
and system overhead can be seen in Figure 4.11.
80
In summary, these simulations have pointed to three key elements that heavily impact
the performance of Thin AV. First, as the ratio between the number of unique files and
the total number of file accesses approaches one-to-one, performance decreases. Next,
when the mean filesize falls into the 10 MB to 20 MB range, the performance significantly
decreases. Finally, as the gap between accesses increases, either due to a lack of activity
or an increase in the frequency of file modifications, the overhead incurred by Thin AV
decreases.
Given the large range of possible overheads and the variety of factors that influence
those overheads, it is possible that a deployment scenario exists that would make Thin
AV a practical anti-malware tool. First and foremost, such a system would require
a substantial pool of dedicated resources on the server side of the transaction. The
dramatic speed difference between scanning with Kaspersky versus the other services
show what is possible even on an unregulated, freely available resource. If a company
were to offer a dedicated scanning service (either as a standalone offering, or as part
of a larger suite of anti-malware products), it would likely be possible to achieve even
faster scanning speeds than were displayed by Kaspersky. This would be because such
a service would have a smaller user base, and if those users were paying for the service,
there would be a greater incentive for the provider to ensure the service had adequate
resources. If this service were paired to a fast server-side cache similar to VirusTotal, the
performance could be improved even more.
Unfortunately, it is not possible to define a relationship between the number of scan-
ning engines running on a service and the performance of that service. Intuitively it
follows more scanning engines would result in longer scanning times. This could be
consistent with the performance differences between Kaspersky, VirusChief and Virus-
Total, which have 1, 13 and 42 different scanning engines, respectively. However, without
knowing what kind of hardware is underpinning these services, it is not possible to ac-
81
curately gauge the relationship between number of scanning engines and performance.
The performance of CloudAV gives some indication of what might be possible in terms of
performance when hardware constraints are eliminated [82]. In general, it is likely safe to
assume that the best possible performance would be achieved on a system that only used
a single scanning engine. As such, an ideal deployment of Thin AV should be limited
to a single high performance scanning engine, until it can be established that additional
engines can be added without a significant performance penalty.
The ideal deployment on the client computer is a somewhat more challenging issue
as it is largely out of the hands of the service provider. However, based on the file
inter-access time results, and the characteristics of the web and advanced workloads it is
possible to conclude that Thin AV is more conducive to systems that are typically used for
casual internet activities as opposed to more developer-oriented activities. Beyond that,
users could further improve performance by running Thin AV with the passive security
policy (Table 3.1). This would offer a large performance boost, but would come at a
significant price. Specifically, files infected with malware would be allowed to execute,
and users would only be notified of the infection after the fact. Due to the extremely lax
security guarantees offered under this policy, and the fact that Thin AV does not offer a
mechanism for malware removal, this trade-off does not seem equitable. For this reason,
the performance overhead of this scenario was not studied.
82
Chapter 5
System Evaluation - Mobile Thin AV
The evaluation of the mobile version of Thin AV was considerably more challenging than
the evaluation of the desktop version. There are several reasons for this. There is no
established library of Android malware for use by researchers. Porter Felt et al. have
collected information on 18 malicious Android apps circulating in the wild[60], but their
study did not involve the collection of actual malware samples. Therefore, evaluation of
Thin AV was done with a collection of apps downloaded from the official Google Android
market. This data set will be described in greater detail in Section 5.1. However, without
an Android malware data set it was not possible to fully gauge the effectiveness of the
third party scanning engines when scanning Android malware. It was determined that
several of the scanning engines used by Thin AV do, in fact, detect Android malware,
and this issue will be discussed in Section 5.2.
All development and testing was done on the Android emulator provided in the SDK.
This was because it allowed for rapid development on different versions of the Android
operating system, and it allowed for changes to be made to the Android source code.
While it may have been possible to meet these requirements on a rooted Android device,
it would have been a significant gamble as to whether or not compatibility issues would
have arisen given that virtually all commercially available Android devices run a version of
Android that has been modified by the device manufacturer, such as Samsung’s TouchWiz
or HTC’s Sense UI. Further discussion on the performance of the Android emulator can
be found in Section 5.3.
Working on the emulator also presents evaluation issues with respect to network
performance. Because Thin AV is heavily reliant on the network, the connection speed
83
can greatly impact the performance of Thin AV. On a mobile device like a cell phone,
the speed of the cellular connection can be impacted by the location of the user, radio
interference, the load on the cellular network, as well as other factors. Because of the
challenges of involved with network measurements, previous research results from Gass
and Diot, comparing the speed of cellular and WiFi networks were used instead [62].
The remainder of this section is laid out as follows. Section 5.4 will discuss the
evaluation of the ComDroid scanning module. Section 5.5 will provide evaluation of the
best and worst case performance of the Thin AV safe installer. Finally, Section 5.6 will
conclude with an analysis of the cost of running the Thin AV killswitch on an ongoing
basis. Where relevant, the following sections are subdivided into the same protocol-
results-discussion format from Chapter 4.
5.1 Data Set
The data set used for testing consists of 1,022 apps downloaded from Google’s Android
market. To download the apps, a program was created which made use of the un-official
market API [9]. The API provided a means of retrieving the asset IDs of the packages to
be downloaded. These asset IDs were then combined with a set of valid Google account
credentials in order to download the actual packages. An attempt was made to download
the top fifty free apps in each application category, as ranked by user votes on January
3, 2012. The majority of package downloads were successful, with 28 downloads causing
repeated failures. This resulted in 1,022 apps spread across 21 application categories,
with each category having between 46 and 50 packages.
Table 5.1 summarizes the key file size statistics of the data set, while Figure 5.1 shows
the median file size for the apps broken down by application category.
84
Number of Apps 1022Mean App Size 2.65 MBMedian App Size 1.78 MBMinimum App Size 0.02 MBMaximum App Size 37.06 MBProportion of Apps <1 MB 34.64 %Proportion of Apps <10 MB 97.16 %Proportion of Apps <20 MB 99.51 %
Table 5.1: General file size characteristics of the Android test data set.
0
0.5
1
1.5
2
2.5
3
3.5
4
Med
ical
Tool
s
Fina
nce
Med
ia &
Vid
eo
Com
ics
Prod
uctiv
ity
Busi
ness
Pers
onal
izat
ion
New
s &
Mag
azin
es
Wea
ther
Phot
ogra
phy
Book
s &
Ref
eren
ce
Shop
ping
Life
styl
e
Mus
ic &
Aud
io
Spor
ts
Trav
el &
Loc
al
Soci
al
Com
mun
icat
ion
Hea
lth &
Fitn
ess
Educ
atio
n Med
ian
File
Siz
e (M
B)
Application Category
Figure 5.1: Median file size of the Android test data set packages for each Google Marketapplication category.
85
5.2 Malware Detection
The entire collection of Android packages was uploaded to the VirusTotal scanning ser-
vice. This was done for several reasons: first, it would show whether or not any of the
42 scanning engines used by Virus Total detected any malware in the data set. Second,
it had the potential to show how many of the engines were capable of even detecting
Android malware. Third, because VirusTotal includes the Kaspersky scanning engine,
and most of the scanning engines in VirusChief, it would have the effect of testing the
data set on the scanning engines used by the other two third-party scanning services as
well. Finally, if VirusTotal was capable of detecting malicious Android apps, it would be
the most preferable of the three scanning services for the mobile implementation of Thin
AV. This is because VirusTotal’s slow response time is considerably less of an issue on
Android because the set of possible inputs (packages downloaded from various markets)
is tiny and relatively predictable in comparison to the near infinite array of files that
might be seen in the desktop implementation. This allows for the possibility of priming
VirusTotal with packages from various markets. The details of this deployment scenario
will be expanded upon in Section 6.2.
In a very surprising result, VirusTotal flagged several possible instances of malware
in the data set downloaded from the Google market. Of the 1,022 apps uploaded, 1,019
were scanned (with three being skipped due to size restrictions). Of the 1,019 scanned
packages, 27 were flagged as malware by at least one scanning engine. One package was
flagged as malware by four different engines, nine packages were flagged by two engines,
and the remaining seventeen packages were flagged as malware by a single engine. Table
5.2 provides details on some of the commonly flagged samples. The most commonly
identified sample was from the Adware.Airpush family. However, the majority of these
samples were identified by a single scanning engine (DrWeb), which raises the possibility
86
Sample Name Malware Type Occurrences Detection Engine(s)Adware.Airpush(2, 3) Adware 15 DrWeb, KasperskyPlankton (A, D, G) Trojan 6 Kaspersky, Comodo, NOD32, TrendMicroSmsSend (151, 261) Dialer 2 DrWebRootcager Trojan 2 Symantec
Table 5.2: Most frequent samples of malware detected in Google Market data set. De-tection engine refers to which VirusTotal scanning engines detected the sample.
of this being a false positive. The next most common sample was Plankton, which was
identified by a variety of scanning engines. The remaining malware samples had far fewer
occurrences in the data set.
While Google has themselves admitted to finding malicious apps in their market [40],
it was very surprising to find numerous possible instances of malware in Google’s official
Android market. It is even more surprising considering that these apps were selected for
the data set because they were among the fifty most popular apps in their respective
categories on the day they were downloaded. This only serves to reinforce the problem
presented by mobile malware, and if the official Google Market can fall victim to this
issue, it is worrying to consider the prevalence of malware in third-party markets.
The detection of malware in the data set shows that Thin AV, in its current form,
can take advantage of existing third-party scanning services to prevent the installation
of malware on Android devices. While the test data set only suggested that 6 of the AV
engines used by VirusTotal are capable of detecting Android malware, follow-up research
showed that as many as 26, or more than half of the scanning engines in VirusTotal are
capable of detecting some form of Android specific malware [27].
87
5.3 Emulator Performance
As mentioned above, all development and evaluation of Thin AV was done on the Android
emulator. In order to provide context for the performance results taken from the emu-
lator, it was necessary to assess the performance of the Android emulator as compared
to a physical Android device. The Java based numerical benchmark SciMark [17] was
ported to run on an Android device. The benchmark consists of five CPU-bound tasks:
Fast Fourier Transforms (FFT), Jacobi Successive Over-relaxation (SOR), Monte Carlo
integration (MC), Sparse matrix-multiply (Sparse), and dense LU matrix factorization
(LU). The specific details of each test can be found in [18]. The benchmark was then
modified to test the speed of sequential reads and writes. This was necessary to include
as the emulator uses the RAM and hard disk of the host device for storage, whereas
Android devices use flash memory.
The benchmark was run on the Android emulator as well as three different physical
Android devices. Table 5.3 shows the results of the benchmark for each device. It is
clear from these results that any performance testing done on the Android emulator will
represent a lower bound on the performance of a production deployment of Thin AV.
In general, the Android emulator running on a modern computer is about an order of
magnitude slower than the same operation computed on a contemporary Android device.
5.4 ComDroid Evaluation
The ComDroid scanning service was added to Thin AV both to further demonstrate the
modularity and extensibility of the Thin AV architecture, as well as to add a scanning
service that was specifically targeted at Android applications. This section discussed the
evaluation of the ComDroid module.
88
Emulator
Sam
sungGalax
yS
HTC
Desire
HTC
Evo
3D
Hardware
See
3.2
Sam
sungExy
nos
3110
Qualcomm
QSD8250
Qualcomm
MSM8660
ARM
CortexA8@
1GHzSnap
dragon@
1GHzDual-C
ore@
1.2GHz
512MB
RAM
576MB
RAM
1GB
RAM
OSVersion
2.3.3
2.3.3
2.2
2.3.4
Average
Score
3.039
19.457
37.909
41.771
FFT
Score
1.750
12.542
32.129
35.406
SOR
Score
4.584
32.825
72.464
81.101
MC
Score
0.656
6.357
7.970
5.433
Sparse
Score
4.290
15.398
31.019
26.114
LU
Score
3.916
30.164
45.965
60.800
Write
(MB/s)
11.062
9.597
12.438
37.879
Read(M
B/s)
19.802
112.36
121.951
192.308
Tab
le5.3:
Com
parison
ofbenchmarkscores
fortheAndroid
SDK
emulatoran
dthreedifferentphy
sicalAndroid
devices.
Benchmarkconsistsof
five
CPU-bou
ndtasks:
FastFou
rier
Transforms(F
FT),
JacobiSuccessive
Over-relaxa
tion
(SOR),
Mon
teCarlo
integration(M
C),
Sparse
matrix-multiply
(Sparse),
anddense
LU
matrixfactorization(LU).
Thebenchmark
also
measuresthemeanspeedof
sequ
ential
read
ingan
dwritingfrom
flashmem
ory.
For
allbenchmarks,higher
values
are
better.
89
ComDroid f(x) = 0.0132× x+ 9.6893
Table 5.4: Linear equation for the ComDroid scanning service.
5.4.1 Testing Protocol
The ComDroid module was tested in a manner somewhat similar to the other three
scanning modules from earlier in this chapter. All 1,022 of the apps described in Section
5.1 were uploaded to the ComDroid scanning service. Roughly half of the uploads were
performed on January 26 with the remainder being uploaded on January 27, 2012. For
each upload the time required for ComDroid to return a result was recorded; additionally,
the scan report for each of the applications was saved.
5.4.2 Results
Of the 1,022 packages uploaded to ComDroid, 993 were scanned, with the remainder
being rejected by the server due to a 10 MB size limitation. Of the 993 packages scanned
by ComDroid, 8 returned a scan error, resulting in 985 valid scan results. The mean
response time was 40.67 seconds (σ = 77.60 seconds), and the median response time was
18.63 seconds. Figure 5.2 shows the response time plotted as a function of the package
size, and the exact function is specified in Table 5.4. It is clear there is some positive
linear relationship between package size and scan time, although numerous outliers are
apparent.
The vast majority of packages, 971 of 985 (or 98.6%) show some potential for exposed
communication. Table 5.5 provides a summary of the exposed communication found
within the testing data set.
90
0
100
200
300
400
500
600
700
800
900
1000
0 2000 4000 6000 8000 10000 12000
Time (s)
App Size (in KB)
Figure 5.2: Reponse time of the ComDroid service as a function of package size.
Type of Warning Packages Occurrences AverageAction Misuse 331 (33.6%) 5640 17.0Possible Activity Hijacking 961 (97.6%) 14671 15.3Possible Malicious Activity Launch 481 (48.8%) 2200 4.6Possible Broadcast Theft 501 (50.9%) 4630 9.2Possible Broadcast Injection 613 (62.2%) 3703 6.0Possible Service Hijacking 261 (26.5%) 980 3.8Possible Malicious Service Launch 167 (17.0%) 315 1.9Protected System Broadcast w/o Action Check 108 (11.0%) 134 1.2
Table 5.5: Break down of exposed communication found by ComDroid in the testingdata set. The packages column refers to the number of applications with at least oneinstance of a given warning type. The occurrences column refers to the total numberof potentially exploitable attack surfaces that exist for a given warning type, within theentire data set. The average column is the average number of occurrences per package.For a complete explanation of the types of warnings see [45].
91
5.4.3 Discussion
The performance of the ComDroid service is somewhat similar to both the Kaspersky and
VirusChief services (Section 4.1.2). However, the linear trend is much less prominent.
The most likely explanation for this lies in the nature of the analysis performed by
ComDroid. ComDroid is a static code analysis tool, and as such, it is safe to assume
that the time required to analyze an Android app has more to do with the amount of
code in the package, than the total size of the package. Given that many apps contain
numerous resource files (images, sounds, video, etc.) which are not scanned by ComDroid,
it is easy to imagine how a package might have a large size, but a relatively small amount
of code, resulting in a much faster scan than for an app with a large amount of code and
few resource files. It is quite likely that the observed linear trend is much more a result
of the upload time and not the code-to-resource-file ratio of the package.
The prevalence of exposed communication within the data set seems very high, with
less than 3% of packages not reporting any errors. However, interestingly, all values in
the packages column of Table 5.5 are within 10% of the values reported by Chin et al. in
[45], suggesting that their initial findings were fairly representative of a larger data set.
The pervasiveness of programming errors detected by ComDroid suggest that in its
current form, simply flagging an application as being “at risk” if there is any instance of
exposed communication would be overkill. It would effectively cripple the ability of users
to install apps on their device. As was pointed out in [45], a manual inspection of a subset
of warnings found that only about 10-15% of warnings were genuine vulnerabilities. This
does suggest that there is a place for ComDroid in the Thin AV architecture. However,
the behavior of this Thin AV module would likely have to be adjusted over time to
prevent excessive false positives. This could be done by creating thresholds which would
flag a package as vulnerable if it had significantly more exposed surfaces than average
for a given type of warning.
92
Network Configuration Upload Speed (KBps) Download Speed (KBps)Typical 3G 16.25 84.13Ideal 3G 1792.00 1792.00Typical WiFi 190.38 155.38Ideal WiFi 76800.00 76800.00
Table 5.6: Network speeds used for evaluating the mobile implementation of Thin AV.
5.5 Safe Installer Performance
The first line of defense provided by Thin AV is the safe installer, which checks for
malicious apps at install time. The performance of the Safe Installer is based on three
factors: the size of the package being scanned, the speed of the network to which the
device is connected, and whether or not the package being installed has already been
scanned by Thin AV.
For the purposes of evaluating the safe installer, three different file sizes (small,
medium, and large) were chosen: 0.76 MB, 1.78 MB, and 3.56 MB, corresponding to
the median size of apps in the category with the smallest median size (medical apps),
the median size for the entire data set, and the median size of apps in the category with
the largest median size (educational apps).
Additionally, four different network configurations will be examined. These will be
referred to as “Ideal 3G”, “Typical 3G”, “Ideal WiFi”, and “Typical WiFi”. The speeds
for each of these configurations (listed in Table 5.6) have been taken from [62] and [67].
The best case scenario for the performance of the safe installer is one in which the
package being installed has already been scanned by Thin AV. There are several reasons
why this could occur, and they will be discussed in Chapter 6. In this case, the cost
for performing an install time check is equal to the time required to hash the installing
application, send the hash to Thin AV, look up the scan result, and return the scan
result.
93
Network Configuration Small File Medium File Large FileIdeal 3G 0.034 s 0.232 s 0.293 sTypical 3G 0.041 s 0.239 s 0.300 sIdeal WiFi 0.034 s 0.231 s 0.293 sTypical WiFi 0.035 s 0.233 s 0.294 s
Table 5.7: Time required to check an package in Thin AV, for three different file sizes,and four different network configurations, assuming the scan result is already cached byThin AV
The time required to hash a small, medium, and large application on the Android
emulator was measured, and the average of five runs was taken for each size. The small
file took 0.033 seconds to hash, the medium file took 0.231 seconds to hash and the large
file took 0.293 seconds to hash. The total amount of data uploaded and downloaded for
transmitting the hash and receiving the result was recorded. This was approximately
200 bytes (100 up, 100 down), although this amount varied slightly with the file being
scanned. Finally, the cost of Thin AV performing a cache lookup was examined in Section
4.3.1, and so here too, the cost of Thin AV performing a lookup from cache will be taken
to be 0.0002 seconds. Table 5.7 summarizes the results for this best case scenario. In
general, even the largest file over the slowest network only takes 0.3 seconds to check
with Thin AV.
The worst case scenario is one in which the application being installed has not been
scanned by Thin AV, and the whole package must be uploaded to Thin AV, which must
then upload the package to one or more of the third-party scanning services. Using the
formulæ in Tables 4.10 and 5.4, and the file sizes and network speeds above, it is possible
to compute the time required to upload and scan these files at install time. Because
the time spent uploading and scanning a file will dwarf the time required to hash an
application and upload that hash, these costs will not be included in the calculation.
It should be noted that when calculating the time required to scan a package both
94
Network Configuration Small File Medium File Large FileIdeal 3G 36.56 s 98.13 s 170.29 sTypical 3G 84.66 s 210.00 s 394.14 sIdeal WiFi 36.13 s 97.14 s 168.31 sTypical WiFi 40.23 s 106.68 s 187.39 s
Table 5.8: Time required to check an package in Thin AV, for three different file sizes,and four different network configurations, assuming the scan result is not cached by ThinAV.
the time to scan with the appropriate anti-virus scanning service and the time required
for scanning with ComDroid must be added. This is because as ComDroid is currently
configured, ComDroid is run in addition to scanning the package with the appropriate
anti-virus scanner for the size of the package. The performance drawbacks of this con-
figuration are obvious. However, it does mean that the results presented in this section
represent a highly conservative estimation of possible Thin AV performance in a produc-
tion deployment.
Table 5.8 summarizes the results for this worst case scenario. In general, the time
required to upload and scan an Android package ranges from a low of 36 seconds to a
high of almost 400 seconds, depending on the size of the file and the speed of the network.
The best case scenario, where Thin AV already has a cached scan result, is extremely
fast. At 0.3 seconds, this check would be unnoticeable to a user. On the other hand,
if the file needed to be uploaded and scanned, this process could take as long as 400
seconds, or almost seven minutes. This could be seen as a serious inconvenience to the
user, but considering that this check would only take place when a user is installing an
unknown app, it is not likely to be a frequent occurrence. Additionally, given that Thin
AV could be primed with packages from a variety of sources, including regular downloads
of applications from various application markets, upload of applications by developers,
and the upload of applications by other users running Thin AV, the chance that a user
95
would have to upload a package for scanning at install time could be made very rare. So,
while the worst case scenario is not ideal, it is not likely to be a frequent occurrence.
Finally, while not a specific performance test, an end-to-end functionality test was
run in which the Thin AV safe installer correctly blocked the installation of an app from
the testing data set which was flagged as malware by VirusTotal.
5.6 Killswitch Cost
During normal operation, the most frequently used functionality of Thin AV would typ-
ically be the killswitch service which is periodically activated and checks for revoked
apps. To evaluate the performance of the killswitch, several factors must be examined:
the cost of hashing apps to generate a system fingerprint, the network cost associated
with uploading the fingerprint, the cost of looking up the hashes in Thin AV, and the
network cost associated with returning those hashes to the client. The last and most
costly aspect of the killswitch is the manual upload feature, because this is the only time
when the killswitch should incur any cost for scanning a package. This is because it is
assumed that any missing packages will be scanned by Thin AV when they are uploaded
by the safe installer at the time of installation.
This section will examine the cost of the killswitch under normal operation, as well as
the cost of manually uploading missing packages. The normal operation will be assessed
in two parts, the cost of generating a system fingerprint, and the cost of sending and
receiving the system response. The cost of Thin AV performing a cache lookup will
again be taken to be 0.0002 seconds per cryptographic hash.
In general, the time required for the killswitch to perform a check for revoked apps
will be:
96
T ime = hashing time+hash upload size
upload speed+cache lookup time+
response download size
download speed
(5.1)
Because the cost of performing a manual upload of missing packages is dominated
by upload and scanning costs (similar to the safe installer above), only these costs will
be included in the calculation. The time required for the killswitch to manually upload
missing packages will be:
T ime =package upload size
upload speed+ scanning time (5.2)
Similar to the safe installer, the time spent scanning is the sum of the time scanning
with the appropriate anti-virus service as well as scanning with ComDroid.
5.6.1 Testing Protocol
To test the performance of the hashing function the top five apps (by user popularity
ranking) from each of the 21 market categories were installed on the Android emulator
(i.e., all of the top apps from each category were installed, then the top two apps, then
the top three, etc.).
A complete system fingerprint was generated ten times, and the average was taken, af-
ter which the local fingerprint cache was deleted. This represents the worst case scenario,
one in which none of the apps on the device have been hashed before, and all hashes must
be computed. Next, another ten fingerprints were generated and the average was taken,
but this time, the cache was left intact. This represents the best case scenario: one in
which all of the apps on the phone have already been hashed and the phone fingerprint
is stored locally. Along with the fingerprint generation time, the size of the fingerprint,
and the size of the server response to sending that fingerprint were recorded. This way
the data consumption of the killswitch can be evaluated.
97
Uncached f(x) = 0.0895× x− 0.27Cached f(x) = 0.0028× x+ 0.164
Table 5.9: Linear equation for generating a system fingerprint for the number of bytesworth of apps on a device, for both the cached and uncached scenarios.
Under normal use it is likely to expect that the typical scenario would in fact be the
best case scenario, or very close to it. After the first fingerprint has been generated, the
only time an app will have to be hashed is when it has not been seen by the killswitch,
meaning it has just been installed. Unless a user installs numerous apps between the
scheduled runs of the killswitch, it is likely the number of apps that need to be hashed
would be near zero.
Combining the hashing performance with the file size data for the data set, the scanner
performance functions in Tables 4.10 and 5.4, and the experimental network performance
measurements from [62], the cost of performing manual uploads, as well as the cost of
fingerprinting can be calculated using Equations 5.1 and 5.2.
5.6.2 Results
Figure 5.3 shows the best (cached) and worst (uncached) case scenarios for the fingerprint
generation time as a function of both the number of packages on the device and the total
size of those packages. Table 5.9 shows the linear equations for both the cached and
uncached functions for the total number of bytes worth of apps on a device.
It is clear that time to generate a system fingerprint grows in a mostly linear way with
both the number and size of packages on the device. In the worst case, with 110 apps
on the device, it only takes 29.95 seconds to generate a system fingerprint. However,
the best case scenario is dramatically better, with a fingerprint being generated in 1.09
seconds for the same 110 apps when the fingerprint has been cached.
98
0 500 1000 1500 2000 2500 3000 3500 4000
0 5
10 15 20 25 30 35
0 20 40 60 80 100 120
Data Tran
smitted (B)
Time to Generate Fingerprint (s)
Number of Packages
Uncached Cached Data Transmitted
(a)
0
5
10
15
20
25
30
35
0 50 100 150 200 250 300 350
Time to Generate Fingerprint (s)
Total Size of All Packages (MB)
Uncached Cached
(b)
Figure 5.3: Time required to generate a complete system fingerprint as a function ofthe number of packages installed on the device (a) and the total size of those packages(b). Both figures show the average time when all of the package hashes have been stored(cached) and when none of the package hashes are stored (uncached). Figure (a) alsoincludes the number of bytes sent and received when communicating the fingerprint tothe ThinAV server.
99
Interval Data Consumption (5 Apps) Data Consumption (110 Apps)1 Day 24.47 KB 349.41 KB1 Week 171.28 KB 2.39 MB1 Month 5.19 MB 74.04 MB
Table 5.10: Data consumption of Thin AV killswitch over different time periods, for 5and 110 apps installed on the device, assuming the killswitch is scheduled to run everyfifteen minutes.
Data usage grows linearly with the number of packages on the device. The data
consumption ranges from 3.64 KB for 110 apps, down to 261 bytes for 5 apps. The
majority of this transmission is in the form of the uploaded fingerprint, as the response
from Thin AV only downloads 70 bytes from the server. This is for a fingerprint that
included no hashes corresponding to malicious apps, however.
Under the current configuration the killswitch is scheduled to generate a system fin-
gerprint every 15 minutes. Table 5.10 shows how much data would be consumed by Thin
AV (as it is currently configured) over different lengths of time.
Using the same network measurements from Section 5.5, the measured fingerprint
generation times, and data transmission totals, it is possible to compute a variety of
potential running times for the entire fingerprinting operation of Thin AV killswitch
using Equation 5.1. These values are summarized in Table 5.11.
For calculating the cost of manually uploading missing packages, two additional as-
sumptions must be made, first about the number of packages being uploaded, and the
size of those packages. The three different package sizes will be the same as those used in
Section 5.5. These app sizes will be used to examine the case where 10, 25 and 50 apps
were being uploaded to the Thin AV service for scanning. These scenarios will again be
examined from the four different network configurations seen in the previous section.
Table 5.12 summarizes the total amount of data that would be uploaded for different
numbers of apps of different sizes. The key assumption underpinning this table is that the
100
Scenario Time (seconds)110 apps over an ideal 3G connection with no hashes cached 26.206110 apps over a typical 3G connection with no hashes cached 26.430110 apps over an ideal WiFi connection with no hashes cached 26.204110 apps over a typical WiFi connection with no hashes cached 26.223
110 apps over an ideal 3G connection with all hashes cached 3.424110 apps over a typical 3G connection with all hashes cached 3.478110 apps over an ideal WiFi connection with all hashes cached 3.423110 apps over a typical WiFi connection with all hashes cached 3.428
26 apps over an ideal 3G connection with no hashes cached 1.03426 apps over a typical 3G connection with no hashes cached 1.25826 apps over an ideal WiFi connection with no hashes cached 1.03226 apps over a typical WiFi connection with no hashes cached 1.051
26 apps over an ideal 3G connection with all hashes cached 0.28526 apps over a typical 3G connection with all hashes cached 0.33926 apps over an ideal WiFi connection with all hashes cached 0.28526 apps over a typical WiFi connection with all hashes cached 0.290
Table 5.11: Time required to complete the fingerprinting operation for different numbersof applications, network performance, and caching scenarios. The definitions of “typical”and “ideal” for each connection type is the same as in 5.5.
Total Data Uploaded (MB)Scenario 10 Apps 25 Apps 50 AppsSmall Apps 7.643 19.108 38.216Medium Apps 17.775 44.438 88.875Large Apps 35.570 88.925 177.850
Table 5.12: Total upload sizes used for calculations of bulk scanning performance.
101
Upload Time (Seconds)Scenario 10 Apps 25 Apps 50 Apps
Ideal 3G ConnectionSmall Apps 4.367 10.919 21.837
Medium Apps 10.157 25.393 50.786Large Apps 20.326 50.814 101.629
Typical 3G ConnectionSmall Apps 485.360 1213.400 2426.801
Medium Apps 1128.781 2821.953 5643.907Large Apps 2258.833 5647.082 11294.164
Ideal WiFi ConnectionSmall Apps 0.102 0.255 0.510
Medium Apps 0.237 0.593 1.185Large Apps 0.474 1.186 2.371
Typical WiFi ConnectionSmall Apps 41.111 102.777 205.553
Medium Apps 95.609 239.023 478.046Large Apps 191.326 478.315 956.630
Table 5.13: Upload times for the values in Table 5.12, for four different network config-urations.
Scanning Time (Seconds)Scenario 10 Apps 25 Apps 50 AppsSmall Apps Scanned With Kaspersky 161.021 402.552 805.104Medium Apps Scanned With VirusChief 634.032 1585.080 3170.159Large Apps Scanned With VirusChief 1104.893 2762.232 5524.465Small Apps Scanned With ComDroid 107.224 252.563 494.796Medium Apps Scanned With ComDroid 120.919 266.259 508.491Large Apps Scanned With ComDroid 144.972 290.312 532.544
Table 5.14: Scan times for different numbers of apps with small, medium and large sizes,using conventional scanning engines (Kaspersky and VirusChief) and the Android-specificscanner, ComDroid.
102
sizes for a given number of apps are assumed to be the same (e.g., ten small apps would
be 10 × 0.764 MB). Using these total upload sizes, the upload times can be calculated
based on the network speeds specified in the previous section. The upload times for the
different numbers and sizes of apps are summarized in Table 5.13. Using the size and
quantity of each app, the scanning time could then be computed using the equations in
Tables 4.10 and 5.4. These results are summarized in Table 5.14. Finally, referring to
Equation 5.2, it is possible to compute the time required to upload and scan missing
apps under different scenarios.
The best case scenario is when ten small apps are uploaded and scanned over an
ideal WiFi connection. In this case the total operation would take 289.2 seconds, or just
under five minutes. The worst case scenario is one in which fifty large apps are uploaded
and scanned over a typical 3G connection. This operation would take 17351.2 seconds,
or nearly five hours. However, if the same operation is performed over a typical WiFi
connection, the time required to complete this one-time operation drops by more than
half, to 1.95 hours.
5.6.3 Discussion
From both a time and data consumption perspective, Thin AV has a relatively minor
impact on an Android device. Fingerprinting is the only operation that would likely
take place with any frequency during long-term use. Given the best case scenario for the
killswitch, 1 second of computation followed by less than 4 KB of data transmission for
all 110 apps, it is likely that this operation would be unnoticeable to a user. Furthermore,
given that these tests were performed on the Android emulator, the fingerprinting would
almost certainly take considerably less time on a physical Android device.
In terms of data consumption, the 74 MB a month for uploading the fingerprint of
110 apps is not trivial. However, given that cellular carriers offer data plans ranging
103
from 500 MB a month to unlimited data usage, the impact of Thin AV would represent
a small fraction of a user’s allotted data consumption for a given month. Furthermore,
it would be possible to reduce the amount of data consumed by a large fraction simply
by reducing the frequency with which the killswitch is run, and by removing extraneous
bytes from the messages sent and received by ThinAV.
It should be noted that the above results assumed that Thin AV has already scanned
the packages present on the mobile device. This assumption is reasonable considering
that the killswitch is intended to operate in conjunction with the safe installer. This
means that any app installed on a device that has not been scanned by Thin AV would
be uploaded and scanned at install time, as a consequence, the scan result would already
be present in Thin AV when the killswitch is later run. The one exception to this case
would be if the killswitch is installed on a device after several other apps have been
installed. In this case, the one-time upload and scanning of missing apps must take
place. The worst case performance for this operation is quite poor. Assuming fifty apps
were uploaded, each roughly 3.7 MB in size, sent over a typical 3G connection, it would
take nearly five hours to complete the operation. However, the same operation over a
typical WiFi connection would take less than two hours. Considering that this is a one
time operation, and it is at the user’s discretion when this operation takes place, these
results are not unreasonable. A user could simply initiate the upload over their home or
office WiFi network when their phone is charging.
In general the long term performance impact of using the Thin AV killswitch is quite
favorable.
104
Chapter 6
Discussion
This chapter will discuss some of the broader issues pertaining to Thin AV and the possi-
ble use of Thin AV as a production scale service. For discussions on specific experimental
results see the discussion subsections of Chapter 4. Section 6.1 of this chapter will talk
about the feasibility of Thin AV, specifically where the system succeeds and where it
fails. The different ideal deployment scenarios for both the mobile and desktop versions
of Thin AV will be discussed in Section 6.2. Finally the privacy concerns that would
come along with using Thin AV will be expanded upon in Section 6.3.
6.1 Thin AV Performance and Feasibility
After a thorough evaluation of Thin AV in both a mobile and a desktop environment, it
can be concluded that the desktop prototype of Thin AV is, at best, marginally successful
and in its current form, not highly feasible. Conversely, the mobile prototype, even in its
unpolished state, demonstrates a highly feasible mechanism for protecting smartphones
from malware.
There are two key factors that seriously impact the performance of Thin AV, and
keep it from being a truly feasible concept on the desktop: the size of the input space,
and the frequency of file access. Because Thin AV is not selective about the files it scans,
any file on the stacked file system will be uploaded to Thin AV if it is accessed. This
creates an extraordinarily large input space, only a portion of which is even remotely
predictable. This large input space presents a significant challenge by itself. However,
when combined with the fact that the files uploaded to Thin AV are often accessed several
105
at a time, and in rapid succession, such as when launching a program, it makes for a
very underwhelming experience for a user.
Fortunately, the aspects of Thin AV that made for slow performance in the desktop
implementation do not exist in the mobile environment. Because virtually all Android
malware comes in the form of malicious applications, the scanner input space is massively
reduced. It is not necessary to scan every individual file access, and even if it were
necessary, it would not be possible without fundamentally violating the Android security
model. Instead only applications need to be scanned. This application-centric design
does create a very different security model than the conventional file scanning model
present in the desktop version of Thin AV. However, enhancing the existing Android
security framework is preferable to violating the framework in the hopes of creating a
more direct comparison with the desktop security model. Furthermore, by opting for an
app-centric approach on Android, and a file-centric approach on Linux it compares and
contrasts two current and realistic scenarios for system security, as opposed to examining
a pair of hypothetical (but more similar) scenarios.
By modifying the Android package installation code, it was possible to check for ma-
licious code in an application before it was installed on the phone. Furthermore, because
of the vastly reduced input space, it is even possible to make predictions about what
applications will be installed, namely, the applications that exist in major application
markets, both official and third-party. Being able to predict which applications will be
installed allows for these apps to be proactively downloaded and scanned, allowing for the
quick return of cached results when performing application checks. This safe installation
mechanism, combined with a background killswitch, effectively work together to prevent
the installation of malicious apps, and prompt the removal of apps if they are found to
be malicious after they have been installed, all with minimal ongoing cost in computing
time and network bandwidth.
106
Another area in which the mobile version of Thin AV out-performs the desktop version
is in the area of connectivity. While most desktop computers are increasingly moving
towards having persistent connectivity, this is not always guaranteed. When internet
connectivity is unavailable, Thin AV can only function in a passive mode, simply allowing
access to files and not scanning them. It might be possible to offer some protection in this
scenario by logging file accesses for later scanning when an internet connection is available,
but this scenario was beyond the scope of this research. In a mobile environment, this
issue does not exist, a smartphone is by its very nature, intended to be a persistently
connected device. While it is possible to lose data connectivity due to lack of service,
this would not be problematic for Thin AV because while it would not be possible to
communicate with Thin AV, it would also not be possible to download packages which
would require verification by Thin AV. It might be possible to envision a problematic
scenario where a user downloads an Android application, but does not install it until a
later time when network connectivity is unavailable. While such a scenario does present
a problem for the safe installer, the killswitch would be capable of detecting a malicious
app once connectivity was restored.
6.2 Ideal Deployment
The goal of this research has been to determine if a cloud-based security-service would
be feasible or appropriate for providing protection from malware on either a desktop
computer or a mobile device. The performance of the desktop and mobile Thin AV
prototypes suggest that such a system is definitely feasible for mobile devices, and with
significant changes, possibly feasible for desktop systems. Due to time and hardware
limitations, the prototypes that were built are, at best, rough implementations of a much
grander vision. To be truly useful as a mechanism for malware protection, a variety of
107
changes would have to be made to both the desktop and mobile systems. Section 6.2.1
will discuss the ideal deployment scenario for the desktop version of Thin AV, while
Section 6.2.2 will discuss the mobile version of Thin AV.
6.2.1 Desktop Deployment
It is clear from the performance experiments in Chapter 4 that the desktop version of Thin
AV has some significant performance impediments. There are four areas in which the
performance of Thin AV could be improved, potentially leading to a practical production
deployment.
The most basic change would be the development platform for Thin AV. The proto-
type was built using Python, an interpreted language that is not ideal for performance
intensive tasks. Python was chosen because it allowed for rapid prototyping. Addition-
ally, Python provides a number of feature-rich libraries which greatly increased the speed
of development. However, it is quite likely that some modest performance gains could
be realized by re-developing Thin AV in a compiled language such as C or C++. These
performance gains would be the most noticeable when accessing files in the Thin AV
cache, which is an extremely common occurrence.
The second area for improvement would be to allow the Thin AV client to selectively
filter the files that it sends for scanning. Bayer et al. [36] provides an overview of the
host and network behavior of a large corpus of Windows malware. If a comparable data
set were available for Linux systems, it could be used to inform the development of a
filter for Thin AV. Such a filter would cause Thin AV to be more selective about the files
it scans, rarely scanning files which typically pose a low risk of containing malware, and
more regularly scanning files which do carry such a risk.
Other areas for improving the performance of Thin AV are the network and scanning
performance. CloudAV showed impressive speed when uploading and scanning files [82].
108
This is not surprising considering that CloudAV was deployed in a university computer
lab. With a limited number of users, accessing a dedicated service over a local area
network, both the speed of file transfers and the speed of file scanning could be minimized.
Thin AV was an attempt to see how well this concept could be extended to a wide area
network. In order to realize a greater degree of performance, some of the benefits of
CloudAV would have to be applied to Thin AV. Most notably, Thin AV could vastly
benefit from running on a dedicated hardware platform with ample resources. In this
scenario, Thin AV would no longer rely on the specific third party scanning services used
in the prototype, but would have a hardware and software configuration much more like
CloudAV. It is not unreasonable to imagine a major anti-virus vendor providing such a
dedicated service to its customers either over the internet, or as a network appliance.
The last area in which Thin AV can improve is, not surprisingly, by operating over
faster network connections. For years, internet connection speeds have been increasing
for both home and business customers. While there are likely limitations to what sorts
of speeds are ultimately feasible, it is safe to say that in the short term, the speed of the
average internet connection will likely increase. Such speed increases can only serve to
improve the performance of Thin AV.
Given the success of Thin AV on the mobile platform compared to the desktop envi-
ronment discussed in Section 6.1, it is worth asking if it would be possible to make the
desktop operating system more like the mobile operating system, so that desktops could
reap the benefits of Thin AV. In general this is not an unreasonable proposition. The
desktop version of Thin AV was built on top of Ubuntu, while the mobile version was
built on Android, both of which are Linux-based operating systems. The key advantage
offered by Android is the application sandboxing which prevents application vulnerabil-
ities from compromising entire systems. Unfortunately, this sandboxing comes with a
price. It limits the interactions that are possible between applications. Android partially
109
solves this by providing a framework for application interaction. However, such a highly
sandboxed desktop operating system would surely require a major shift in the mindset
of users. Although, in-roads are already being made in this general direction, with Ap-
ple’s introduction of the Mac App Store [3] and Google’s work on Chrome OS and the
complementary Chrome Web Store [88, 6]. Both of these initiatives appear to be moti-
vated by the desire to make a simpler, more application-centric, and more user-friendly
desktop computing experience. However, this paradigm shift might bring with it some
very tangible security benefits.
6.2.2 Mobile Deployment
Unlike the desktop prototype of Thin AV, the mobile implementation is considerably
closer to an effective production scale system. Given the relatively low volume of files
that need to be scanned, it is quite possible to use the existing third party scanning
services in a production capacity. Obviously this presents a variety of challenges, not
least of which is the fact that Thin AV is completely reliant on the continued existence
of these scanning services in order to provide continued protection. However, similar to
the ideal desktop deployment scenario above, there is no reason why an anti-virus vendor
couldn’t provide a remote scanning service to subscribing customers. However, even if
this was not a feasible option, there are still several improvements which could allow
Thin AV to function as a more complete system.
The greatest performance boost to Thin AV would come from having a pre-populated
Thin AV cache. As stated previously, this is a much more realistic expectation on a mobile
device as opposed to a desktop computer. Because application markets (both official and
third-party) will likely continue to be the first stop for users seeking applications, the
apps users will be installing can be pre-scanned by Thin AV. In much the same way that
a large selection of apps were downloaded from the Google Market, and scanned with
110
VirusTotal for the purposes of evaluating Thin AV, a system could be developed which
would regularly crawl a variety of markets and download new and popular applications
and scan them with the different scanning services. This way, when Thin AV users go
to install these apps, they will already exist within the cache of the service, negating the
need to upload and scan the file, and thus vastly increasing the performance of the safe
installer and the killswitch. Another avenue for pre-populating the cache could also be
application developers. Thin AV could incorporate a tool allowing developers to upload
their application packages as part of the publication process.
Another area in which Thin AV has potential is in the extensibility of the system.
Currently, the addition of a new scanning module does require some very limited code
modification to the main Thin AV system, but it might be possible to modify Thin
AV such that these modifications could be removed, or at least moved to an external
configuration file. This would make it much easier in the future to develop and add
scanning modules for different services. This in-turn leads to a compelling scenario, one
in which users, developers, and companies can create their own Thin AV-compatible
scanning modules to interface with their various service offerings, be they application
blacklists, static code analysis tools, application permission analyzers, social reputation
tools, or mobile-specific anti-virus scanners. This could lead to a scenario where Thin
AV is a highly configurable service, and users of the service could configure their Thin
AV clients to specify which scanning modules they want Thin AV to use when uploading
packages via the safe installer or killswitch.
The final change that would be necessary in order to fully realize Thin AV on Android
would be the addition of a mechanism that allowed Thin AV to interrupt and prevent
package installations, without modifying the operating system source code. This is a
very challenging requirement, as such a mechanism, if poorly implemented, could do far
more harm than good, by offering a means for malware to prevent the installation of
111
legitimate applications. A potential solution to this issue might be for Google to allow
applications to use such a mechanism only if the developer of the application is trusted
or in some way certified by Google.
6.3 Privacy
Making use of a third party scanning service carries with it some privacy concerns. While
the mobile version of Thin AV carries some very limited privacy concerns which will be
outlined in Section 6.3.2, it is the desktop version that poses the most serious privacy
concerns. These concerns will be discussed in Section 6.3.1.
6.3.1 Desktop Privacy
Because the files scanned by Thin AV are passed along to third-party scanning services,
users must accept the fact that the information contained within those files can be seen
by the organizations operating the scanning services. The websites for the three scanning
services do not make any mention to what is done with files after they have been scanned.
It is not safe to assume they are destroyed. Rather, operating under the assumption that
these files are saved by the scanning services is likely the best course of action. For many
individuals, the prospect of putting a potentially large amount of private or personal data
in the hands of such an organization is quite discomforting, and may not be permissible
for some individuals and organizations. This is further complicated by the fact that the
desktop implementation of Thin AV communicates directly with the scanning services.
This means that it would be possible for such a scanning service to collate all of the
uploads from a single IP address, and unfortunately, such in-depth information could be
used for extremely nefarious purposes.
Given that in its current form Thin AV cannot offer any sort of guarantees regarding
the privacy of user’s files, it seems that the most appropriate deployment environment
112
would be one in which there is a reduced expectation of privacy, such as public desktop
computers in libraries and other common areas. If Thin AV were to be deployed with
a dedicated scanning service as was described in Section 6.2.1, then the service provider
could offer some privacy guarantees, making such a security arrangement more palatable.
In such a scenario, individuals or groups might still be reticent to provide so much per-
sonal information to a single organization. However, in recent years users have become
quite tolerant to the idea of putting substantial amounts of personal, even highly private
information into the hands of companies. For example Google has access to the inter-
net searches, e-mail and personal documents of their users who take advantage of their
search, GMail, and Google Docs products. For a time, Google even considered providing
storage for health records [39]. And while Google may be the largest possessor of personal
information, it is hardly alone in this regard. Companies like Facebook have amassed
a wealth of data on their users ranging from private chat logs to personal photographs.
Despite the risk, users have become quite accustomed to willingly giving personal infor-
mation to companies in exchange for a desirable service. Therefore, the privacy concerns
of Thin AV may be serious, but still within the realm of reason for many users.
6.3.2 Mobile Privacy
The privacy concerns that are present in the desktop version of Thin AV are all but
non-existent in the mobile version. Because the mobile version only uploads Android
packages, most of which come from public markets, there is no risk of leaking personal
or private information from a device to the service provider. Furthermore, because the
mobile version of Thin AV has the Thin AV web service which acts as the aggregator
for the various scanning services, it would not even be possible for the scanning services
to collate uploads by IP, because all uploads, regardless of their original source, would
appear to come from the IP of the Thin AV web application.
113
The last aspect of privacy is the issue of file retention. Again, because only packages
are being uploaded, file retention is not a major concern. However, unlike the three anti-
virus scanning services, ComDroid explicitly states that they do not retain files after
scanning. In general, there are no tangible privacy concerns when it comes to the mobile
implementation of Thin AV.
114
Chapter 7
Conclusion
This thesis examined the concept of providing anti-malware protection to desktop com-
puters and mobile devices through remote third-party services. Host based anti-virus is
the conventional answer to the malware problem, on both desktops and even on smart-
phones. However, given the vast amounts of new malware that is created on a daily basis,
these anti-virus systems require perpetual signature library updates. Furthermore, anti-
virus vendors must continually add features to their products in the hopes of standing
out in such a crowded market segment. This has lead to an array of functionally similar
anti-virus products that are becoming increasingly bloated and resource intensive. This
problem is even more serious on smartphones, where computational resources are finitely
limited by the battery power available.
Within the last decade, and particularly within the last five years, there has been
a push to move computation away from end host computers, and towards more high-
capacity remote computational resources. This has led to the notion of cloud computing.
Fundamentally the cloud is a novel business model layered over the existing concepts of
high capacity grid computing and distributed computing. Cloud computing allows for
software products and services to be offered remotely, and with sufficient capacity as to
effectively eliminate the appearance of resource constraints from the perspective of the
end user. The notion of cloud-based Security-as-a-Service has recently been examined as
a possible way to address the burgeoning malware problem.
The first major contribution of this research was an the design and development of
Thin AV, a system for providing anti-virus scanning for Linux based desktop computers
by offloading the scanning of files from the host computer, to a set of pre-existing third-
115
party scanning services. The design of such a system was beneficial because it reduced
the software footprint on the host computer to a fraction of what it would be if a full-
fledged anti-virus product were installed. Additionally, it allowed for files to be scanned
with several different anti-virus engines, as opposed to a single engine as would be the
case with a host-based system. The key factor that differentiates Thin AV from earlier
cloud-based anti-malware solutions is its reliance on existing scanning services, which are
accessed over the internet, as opposed to making use of dedicated computing resources
located on the same local area network. While the latter case does provide tremendous
performance benefits, it does not accurately represent the performance that one would
see if the service in question were being offered remotely by a third-party.
Thin AV was evaluated by directly measuring the performance of the scanning ser-
vices as well as measuring the performance of the system when executing a series of
scripted user behaviors. These performance measurements were then used to inform the
development of a simulator which was used to test the limits of Thin AV under a variety
of file system behaviors. It was found that in certain cases, the performance of such a
system was acceptable. However, in its current form, the worst case performance was
highly noticeable to the point of being excessively disruptive to the user. However, in the
future it may be possible to address the performance concerns in Thin AV, resulting in a
system that would be capable of performing nearly transparent anti-malware protection
from the cloud.
The second major contribution of this thesis was an extension of the desktop version
of Thin AV, specifically targeted at smartphones and tablets. In recent years, malware on
mobile phones and smartphones has become a major issue, with virtually every smart-
phone platform being affected. The need for addressing the issue of mobile malware
is pressing because of the extent to which smartphones are becoming integrated into
the modern lifestyle. A substantial amount of personal and private information is often
116
stored on a person’s smartphone, and this presents an extremely tempting target to mal-
ware authors. Given the resource constraints that come with mobile devices, a remote
anti-malware service appeared to be a good fit for addressing mobile security.
The desktop Thin AV system was extended and wrapped in a web application in
order to serve as a unified interface for servicing anti-malware scan requests from Android
mobile devices. Additionally, a system was developed which prevents the installation of
malicious applications, by intercepting application installation requests and sending them
to the Thin AV web application for scanning. This was complemented by a background
killswitch which can prompt the removal of pre-existing applications if they are found to
be malicious. Both of these mechanisms rely on the third party scanning services used
in the desktop version of Thin AV. Because it was determined that the scanning services
are capable of detecting Android malware, this made for a fully functioning system,
capable of preventing the installation of malicious applications, or removing malicious
applications after installation. In order to further demonstrate the extensibility and
modularity of Thin AV, a fourth scanning service, capable of performing static code
vulnerability analysis, was added to the mobile version of Thin AV.
The evaluation of the mobile extension of Thin AV was done by assessing both the
typical and best case run time for both the safe installation mechanism and the killswitch.
This was done by independently measuring the time requirements of various aspects of
each system, and then calculating a range of possible running times based on a set of
empirical measurements of 3G and WiFi network performance. However, because all ex-
periments were run on the Android development emulator, they represent a lower bound
to the actual performance. The evaluation showed the system to be highly practical, with
the only major drawback being the need to manually upload pre-installed packages the
first time Thin AV is run. The reason for the much improved performance of Thin AV on
the smartphone was due to the nature of the malware threat on Android smartphones.
117
Unlike desktop systems where malware can come in many forms, virtually all Android
malware in the wild is spread through malicious applications. This meant that Thin AV
had to only scan application packages, and not the entire smartphone file system. This
significant reduction in input space, coupled with the fact that it is possible to pre-cache
scan results for packages, meant that it was possible to get anti-malware protection at a
very low cost in terms of running time.
The successful evaluation of Thin AV shows that the concept of providing security
services from the cloud is a real possibility. The fact that Thin AV was built on top of
shared public scanning services shows what can be achieved on the proverbial “shoestring”
budget. Given sufficient dedicated resources, it is quite likely that cloud-based Security-
as-a-Service could become a real alternative to desktop users who are tired with the
ever-increasing size of anti-virus software, or more ideally, smartphone users who want
the confidence to know that the applications they are downloading are free of malware,
without bearing the burden of a full host-based security system.
There are two main avenues for future work pertaining to Thin AV on the desktop.
The first are practical changes and improvements to increase the speed and robustness
of Thin AV. However, in terms of future research, the largest question that still remains
is how transparent anti-malware scanning can be made with the addition of dedicated
scanning resources. It would be interesting to create a Thin AV like system which is
based not on freely available scanning services, but on private and dedicated systems
for scanning. Previous research has established the feasibility of this arrangement when
the scanning hardware is co-located with the host computers. However, to truly make
malware scanning a service it needs to be seen how much performance is lost when such
a dedicated service is offered over a wide area network. If such a service can be made
effectively transparent to end users, then it might be viable for security companies to
offer subscription based cloud security on the desktop.
118
Another potential topic for future research which came out of the desktop evaluation
is the notion of characterizing typical desktop software usage patterns. The scripts used
to generate file system activity on the desktop were inspired by a single study that had
been performed previously. Despite an extensive search, there does not appear to be any
major body of research which describes the general patterns of software use on desktop
computers. This is not surprising considering how vague and ill defined the question
is. Nevertheless, such a study would have greatly aided the evaluation of Thin AV
on the desktop. One possible approach would be to generate a large range of possible
operational profiles each incorporating numerous different user activities [80]. This might
help in more clearly defining the circumstances in which Thin AV excels.
The future direction for Thin AV on the smartphone is somewhat less clear. The
most necessary research pertains to the Android operating system itself. Thin AV was
able to interrupt and terminate the installation of malicious apps by modifying the op-
erating system source code. This is not a practical long term solution. Research must be
undertaken to find a way that the operating system can be made to allow applications to
arrest the package installation procedure, in a way that is both safe and secure, because
it would be very easy to use such a privileged operation for malicious purposes. While
there are other improvements that can be made to Thin AV to improve its extensibility
and performance, these would only be necessary when considering a production scale
deployment of Thin AV.
Another area of future work for the mobile version of Thin AV would be to assess
the power consumption of Thin AV in comparison to other anti-virus systems available
on the Android market. Due to the relatively small amount of processing and network
traffic generated by Thin AV, it stands to reason that such a security mechanism would
have a minimal impact on the battery life of a device in comparison to other anti-virus
products available for Android. While such a comparison would have been a desirable
119
addition to this research, it was impractical at this time for two reasons: first, such
an experiment would have required a physical Android device capable of running the
custom operating system developed with Thin AV and, as stated earlier, this is not
trivial as there are numerous compatibility issues involved in replacing the operating
system on a physical device; second, because virtually all of the anti-virus applications
on the Google Market are proprietary, it would not be a reasonable to compare their
battery consumption with Thin AV without knowing what sort of processes are actually
taking place within these proprietary systems. However, given that previous research
on cloud-based anti-malware systems have shown mixed results when it comes to power
consumption, it would be highly desirable to assess the power consumption of Thin AV
before a final determination can be made about the suitability of Thin AV for mobile
devices.
This thesis has examined the feasibility of using cloud-based security services to pro-
tected computer systems from malware. While the findings of this research show that
a cloud-based approach offers has many benefits in the fight against malware, it is safe
to predict that this will not be the ultimate solution to the malware problem. The
continually evolving nature of the malware threat virtually grantees that new systems
and techniques will continually need to be developed. However, for the moment cloud
computing may offer a temporary respite from the storm of malware to which users are
continually exposed.
Bibliography
[1] Android-APKTool. http://code.google.com/p/android-apktool/, Last Ac-
cessed: Jan. 2012.
[2] Android Developer Guide. http://developer.android.com/guide/index.html,
Last Accessed: Feb. 2012.
[3] Apple Mac app store. http://www.apple.com/mac/app-store/, Last Accessed:
Feb. 2012.
[4] AppsLib. http://appslib.com/, Last Accessed: Jan. 2012.
[5] Avira Operations GmbH & Co. KG. http://www.avira.com/, Last Accessed:
Sept. 2011.
[6] Chrome web store. https://chrome.google.com/webstore/category/home,
Last Accessed: Feb. 2012.
[7] DazukoFS. http://www.dazuko.org, Last Accessed: Sept. 2011.
[8] FileAdvisor by Bit9. http://fileadvisor.bit9.com, Last Accessed: Sept. 2011.
[9] Google market API. http://code.google.com/p/android-market-api/, Last
Accessed: Jan. 2012.
[10] Indiroid. https://indiroid.com/, Last Accessed: Jan. 2012.
[11] inotify. http://linux.die.net/man/7/inotify, Last Accessed: Sept. 2011.
[12] Kaspersky free virus scan. http://www.kaspersky.com/virusscanner, Last Ac-
cessed: Sept. 2011.
120
121
[13] Kaspersky Lab. http://www.kaspersky.com/, Last Accessed: Sept. 2011.
[14] MiKandi. http://www.mikandi.com/, Last Accessed: Jan. 2012.
[15] Nduoa. http://www.nduoa.com/, Last Accessed: Jan. 2012.
[16] Samsung Galaxy S Forums - pro’s and con’s of installing cus-
tom ROMs. http://samsunggalaxysforums.com/showthread.php/
7418-Pro-s-and-Con-s-of-Installing-custom-Roms, Last Accessed: Jan.
2012.
[17] SciMark. http://math.nist.gov/scimark2/, Last Accessed: Jan. 2012.
[18] SciMark Test Descriptions. http://math.nist.gov/scimark2/about.html, Last
Accessed: Jan. 2012.
[19] Selenium WebKit. http://code.google.com/p/selenium/, Last Accessed: Sept.
2011.
[20] VirScan. http://virscan.org, Last Accessed: Sept. 2011.
[21] VirusChief. http://www.viruschief.com/, Last Accessed: Sept. 2011.
[22] VirusTotal. http://www.virustotal.com/, Last Accessed: Sept. 2011.
[23] VirusTotal terms of service. http://www.virustotal.com/terms.html, Last Ac-
cessed: Sept. 2011.
[24] Bowers v. Baystate Technologies, Inc., 320 F. 3d 1317 - Court of Appeals, Federal
Circuit, 2003.
[25] Flask. http://flask.pocoo.org/, Last Accessed: Jan. 2012.
122
[26] McAfee SaaS endpoint protection suite. http://www.mcafee.com/us/products/
saas-endpoint-protection-suite.aspx, Last Accessed: Feb. 2012.
[27] VirusTotal scan result. https://www.virustotal.com/file/
7f0aaf040b475085713b09221c914a971792e1810b0666003bf38ac9a9b013e6/
analysis/, Last Accessed: Jan. 2012.
[28] Nitin Agrawal, William J. Bolosky, John R. Douceur, and Jacob R. Lorch. A
five-year study of file-system metadata. ACM Trans. Storage, 3:9:1–9:32, October
2007.
[29] Jerry Archer, Alan Boehme, Dave Cullinane, Nils Puhlmann, Paul Kurtz, and Jim
Reavis. Defined categories of service 2011. Technical report, Security as a service
working group. Cloud security alliance, 2011.
[30] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz,
Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei
Zaharia. A view of cloud computing. Commun. ACM, 53:50–58, April 2010.
[31] John Aycock. Computer Viruses and Malware, volume 22 of Advances in Informa-
tion Security. Springer, 2006.
[32] Mark Balanza. Android malware acts as an SMS relay. http://blog.trendmicro.
com/android-malware-acts-as-an-sms-relay/, June 2011.
[33] Mark Balanza. Android malware eavesdrops on users,
uses Google+ as disguise. http://blog.trendmicro.com/
android-malware-eavesdrops-on-users-uses-google-as-disguise/, Au-
gust 2011.
123
[34] David Barrera, William Enck, and Paul C. van Oorschot. Seeding a security-
enhancing infrastructure for multi-market application ecosystems. Technical Re-
port TR-11-06, Carleton University - School of Computer Science, 2011.
[35] David Barrera, H. Gunes Kayacik, Paul C. van Oorschot, and Anil Somayaji. A
methodology for empirical analysis of permission-based security models and its
application to Android. In Proceedings of the 17th ACM conference on Computer
and communications security, CCS ’10, pages 73–84, New York, NY, USA, 2010.
ACM.
[36] Ulrich Bayer, Imam Habibi, Davide Balzarotti, Engin Kirda, and Christopher
Kruegel. A view on current malware behaviors. In Proceedings of the 2nd USENIX
conference on Large-scale exploits and emergent threats: botnets, spyware, worms,
and more, LEET’09, pages 8–8, Berkeley, CA, USA, 2009. USENIX Association.
[37] Jeffrey Bickford, H. Andres Lagar-Cavilla, Alexander Varshavsky, Vinod Gana-
pathy, and Liviu Iftode. Security versus energy tradeoffs in host-based mobile
malware detection. In Proceedings of the 9th international conference on Mobile
systems, applications, and services, MobiSys ’11, pages 225–238, New York, NY,
USA, 2011. ACM.
[38] Jeffrey Bickford, Ryan O’Hare, Arati Baliga, Vinod Ganapathy, and Liviu Iftode.
Rootkits on smart phones: attacks, implications and opportunities. In Proceedings
of the Eleventh Workshop on Mobile Computing Systems & Applications, HotMo-
bile ’10, pages 49–54, New York, NY, USA, 2010. ACM.
[39] Aaron Brown and Bill Weihl. An update on Google Health and
Google PowerMeter. http://googleblog.blogspot.com/2011/06/
update-on-google-health-and-google.html, Last Accessed: Feb. 2012.
124
[40] Rich Canning. An update on Android market security. http://googlemobile.
blogspot.com/2011/03/update-on-android-market-security.html, Last Ac-
cessed: Nov. 2011.
[41] Sang Kil Cha, Iulian Moraru, Jiyong Jang, John Truelove, David Brumley, and
David G. Andersen. Splitscreen: enabling efficient, distributed malware detection.
In Proceedings of the 7th USENIX conference on Networked systems design and
implementation, NSDI’10, pages 25–25, Berkeley, CA, USA, 2010. USENIX Asso-
ciation.
[42] Brian Chen. Want porn? Buy an Android phone, Steve Jobs says. http://www.
wired.com/gadgetlab/2010/04/steve-jobs-porn/, April 2010.
[43] Brian Chen. Amazon app store requires security compromise. http://www.wired.
com/gadgetlab/2011/03/amazon-app-store-security/, March 2011.
[44] Jerry Cheng, Starsky H.Y. Wong, Hao Yang, and Songwu Lu. Smartsiren: virus
detection and alert for smartphones. In Proceedings of the 5th international con-
ference on Mobile systems, applications and services, MobiSys ’07, pages 258–271,
New York, NY, USA, 2007. ACM.
[45] Erika Chin, Adrienne Porter Felt, Kate Greenwood, and David Wagner. Analyzing
inter-application communication in Android. In Proceedings of the 9th Annual
International Conference on Mobile Systems, Applications, and Services (MobiSys),
2011.
[46] Mihai Chiriac. Tales from cloud nine. In Virus Bulletin Conference, pages 1–6,
2009.
[47] Byung-Gon Chun and Petros Maniatis. Augmented smartphone applications
through clone cloud execution. In Proceedings of the 12th conference on Hot topics
125
in operating systems, HotOS’09, pages 8–8, Berkeley, CA, USA, 2009. USENIX
Association.
[48] Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan
Saroiu, Ranveer Chandra, and Paramvir Bahl. Maui: making smartphones last
longer with code offload. In Proceedings of the 8th international conference on Mo-
bile systems, applications, and services, MobiSys ’10, pages 49–62, New York, NY,
USA, 2010. ACM.
[49] David Dagon, Tom Martin, and Thad Starner. Mobile phones as computing de-
vices: The viruses are coming! IEEE Pervasive Computing, 3:11–15, 2004.
[50] Toralv Dirro, Paula Greve, Rahul Kashyap, David Marcus, Franois Paget, Craig
Schmugar, Jimmy Shah, and Adam Wosotowsky. McAfee threats report: second
quarter 2011. Technical report, McAfee Labs, August 2011.
[51] B. Dixon and S. Mishra. On rootkit and malware detection in smartphones. In
2010 International Conference on Dependable Systems and Networks Workshops
(DSN-W), pages 162 –163, July 2010.
[52] The Economist. Clash of the clouds. http://www.economist.com/node/
14637206?story_id=14637206, October 2009.
[53] Marc Fossi (Editor). Symantec report on the underground economy. Technical
report, Symantec Corporation, 2008.
[54] W. Enck, M. Ongtang, and P. McDaniel. Understanding Android security. Security
Privacy, IEEE, 7(1):50 –57, jan.-feb. 2009.
[55] William Enck, Damien Octeau, Patrick McDaniel, and Swarat Chaudhuri. A study
of Android application security. In Proceedings of the 20th USENIX conference on
126
Security, Berkeley, CA, USA, 2011. USENIX Association.
[56] William Enck, Machigar Ongtang, and Patrick McDaniel. On lightweight mobile
phone application certification. In Proceedings of the 16th ACM conference on
Computer and communications security, CCS ’09, pages 235–245, New York, NY,
USA, 2009. ACM.
[57] Georgina Enzer. Android attacks on the up says Trend Micro. http://www.itp.
net/585773-android-attacks-on-the-up-says-trend-micro, August 2011.
[58] Independent Security Evaluators. Exploiting android. http://
securityevaluators.com/content/case-studies/android/index.jsp,
November.
[59] Adrienne Porter Felt, Erika Chin, Steve Hanna, Dawn Song, and David Wagner.
Android permissions demystified. In Proceedings of the 18th ACM conference on
Computer and communications security, CCS ’11, pages 627–638, New York, NY,
USA, 2011. ACM.
[60] Adrienne Porter Felt, Matthew Finifter, Erika Chin, Steve Hanna, and David Wag-
ner. A survey of mobile malware in the wild. In Proceedings of the 1st ACM work-
shop on Security and privacy in smartphones and mobile devices, SPSM ’11, pages
3–14, New York, NY, USA, 2011. ACM.
[61] J. Flinn, D. Narayanan, and M. Satyanarayanan. Self-tuned remote execution for
pervasive computing. In Proceedings of the Eighth Workshop on Hot Topics in
Operating Systems, pages 61 – 66, May 2001.
[62] Richard Gass and Christophe Diot. An experimental performance comparison of
3G and Wi-Fi. In Arvind Krishnamurthy and Bernhard Plattner, editors, Passive
127
and Active Measurement, volume 6032 of Lecture Notes in Computer Science, pages
71–80. Springer Berlin / Heidelberg, 2010.
[63] R.L. Grossman. The case for cloud computing. IT Professional, 11(2):23 –27,
March-April 2009.
[64] Gartner Group. Gartner says sales of mobile devices grew 5.6 percent in third
quarter of 2011; smartphone sales increased 42 percent. http://www.gartner.
com/it/page.jsp?id=1848514, Last Accessed: Nov. 2011.
[65] Gartner Group. Gartner says cloud computing will be as influential as e-business.
http://www.gartner.com/it/page.jsp?id=707508, June 2008.
[66] Mikko Hypponen. The state of cell phone malware in 2007. http://www.usenix.
org/events/sec07/tech/hypponen.pdf, August 2007.
[67] IEEE:802.11n-2009. Wireless LAN medium access control (MAC) and physical
layer specifications enhancements for higher throughput, IEEE, June 2009.
[68] Markus Jakobsson and Karl-Anders Johansson. Assured detection of malware with
applications to mobile platforms. Technical report, DIMACS (February 2010),
2010.
[69] Markus Jakobsson and Karl-Anders Johansson. Retroactive detection of malware
with applications to mobile platforms. In Proceedings of the 5th USENIX confer-
ence on Hot topics in security, HotSec’10, pages 1–13, Berkeley, CA, USA, 2010.
USENIX Association.
[70] Markus Jakobsson and Ari Juels. Server-side detection of malware infection. In
Proceedings of the 2009 workshop on New security paradigms workshop, NSPW ’09,
pages 11–22, New York, NY, USA, 2009. ACM.
128
[71] Gregg Keizer. Spike in mobile malware doubles Android users’ chances of infec-
tion. http://www.computerworld.com/s/article/9218831/Spike_in_mobile_
malware_doubles_Android_users_chances_of_infection, August 2011.
[72] Lei Liu, Guanhua Yan, Xinwen Zhang, and Songqing Chen. VirusMeter: Prevent-
ing your cellphone from spies. In Engin Kirda, Somesh Jha, and Davide Balzarotti,
editors, Recent Advances in Intrusion Detection, volume 5758 of Lecture Notes in
Computer Science, pages 244–264. Springer Berlin / Heidelberg, 2009. 10.1007/978-
3-642-04342-0 13.
[73] Hiroshi Lockheimer. Android and security. http://googlemobile.blogspot.
com/2012/02/android-and-security.html, February 2012.
[74] Zachary Lutz. Carrier IQ: What it is, what it isn’t, and
what you need to know. http://www.engadget.com/2011/12/01/
carrier-iq-what-it-is-what-it-isnt-and-what-you-need-to/, Decem-
ber 2011.
[75] Lorenzo Martignoni, Roberto Paleari, and Danilo Bruschi. A framework for
behavior-based malware analysis in the cloud. In Atul Prakash and Indranil
Sen Gupta, editors, Information Systems Security, volume 5905 of Lecture Notes in
Computer Science, pages 178–192. Springer Berlin / Heidelberg, 2009. 10.1007/978-
3-642-10772-6 14.
[76] P. McDaniel and W. Enck. Not so great expectations: Why application markets
haven’t failed security. Security Privacy, IEEE, 8(5):76 –78, sept.-oct. 2010.
[77] Jane McEntegart. Malicious iPhone virus takes control of your phone.
http://www.tomshardware.com/news/iphone-virus-botnet-bank-details,
9136.html, November 2009.
129
[78] Peter Mell and Timothy Grance. The NIST definition of cloud computing, Septem-
ber 2011.
[79] Yevgeniy Miretskiy, Abhijith Das, Charles P. Wright, and Erez Zadok. Avfs: an
on-access anti-virus file system. In Proceedings of the 13th conference on USENIX
Security Symposium - Volume 13, SSYM’04, pages 6–6, Berkeley, CA, USA, 2004.
USENIX Association.
[80] John D. Musa. Software Reliability Engineering: More Reliable Software Faster
and Cheaper. Authorhouse, 2nd edition, 2004, Chapter 2.
[81] Jon Oberheide, Evan Cooke, and Farnam Jahanian. Rethinking antivirus: exe-
cutable analysis in the network cloud. In Proceedings of the 2nd USENIX work-
shop on Hot topics in security, pages 5:1–5:5, Berkeley, CA, USA, 2007. USENIX
Association.
[82] Jon Oberheide, Evan Cooke, and Farnam Jahanian. CloudAV: N-version antivirus
in the network cloud. In Proceedings of the 17th Conference on Security, pages
91–106, Berkeley, CA, USA, 2008. USENIX Association.
[83] Jon Oberheide and Farnam Jahanian. When mobile is harder than fixed (and vice
versa): demystifying security challenges in mobile environments. In Proceedings of
the Eleventh Workshop on Mobile Computing Systems & Applications, HotMobile
’10, pages 43–48, New York, NY, USA, 2010. ACM.
[84] Jon Oberheide, Kaushik Veeraraghavan, Evan Cooke, Jason Flinn, and Farnam
Jahanian. Virtualized in-cloud security services for mobile devices. In Proceedings
of the First Workshop on Virtualization in Mobile Computing, MobiVirt ’08, pages
31–35, New York, NY, USA, 2008. ACM.
130
[85] A.J. O’Donnell. When malware attacks (anything but windows). IEEE Security
& Privacy, 6(3):68 –70, May-June 2008.
[86] M. Ongtang, S. McLaughlin, W. Enck, and P. McDaniel. Semantically rich
application-centric security in Android. In Computer Security Applications Con-
ference, 2009. ACSAC ’09. Annual, pages 340 –349, December 2009.
[87] Sarah Perez. Developer is building an app store for
banned Android apps. http://techcrunch.com/2012/01/20/
developer-is-building-an-app-store-for-banned-android-apps/, Jan-
uary 2012.
[88] Sundar Pichai. Introducing the Google Chrome OS. http://googleblog.
blogspot.com/2009/07/introducing-google-chrome-os.html, Last Accessed:
Feb. 2012.
[89] Georgios Portokalidis, Philip Homburg, Kostas Anagnostakis, and Herbert Bos.
Paranoid Android: versatile protection for smartphones. In Proceedings of the 26th
Annual Computer Security Applications Conference, ACSAC ’10, pages 347–356,
New York, NY, USA, 2010. ACM.
[90] Drew Roselli, Jacob R. Lorch, and Thomas E. Anderson. A comparison of file
system workloads. In Proceedings of the annual conference on USENIX Annual
Technical Conference, ATEC ’00, pages 4–4, Berkeley, CA, USA, 2000. USENIX
Association.
[91] Jamie Rosenberg. Introducing Google Play: All your entertain-
ment, anywhere you go. http://googleblog.blogspot.com/2012/03/
introducing-google-play-all-your.html, March 2012.
131
[92] Dan Rowinski. More than 50 percent of Android devices still
running Froyo. http://www.readwriteweb.com/mobile/2011/09/
more-than-50-of-android-device.php, Last Accessed: Jan. 2011.
[93] Neil Rubenking. Lab testing antivirus software. http://www.pcmag.com/
article2/0,2817,2358764,00.asp, September 2010.
[94] Alexey Rudenko, Peter Reiher, Gerald J. Popek, and Geoffrey H. Kuenning. Saving
portable computer battery power through remote process execution. SIGMOBILE
Mob. Comput. Commun. Rev., 2:19–26, January 1998.
[95] Steven Salerno, Ameya Sanzgiri, and Shambhu Upadhyaya. Exploration of attacks
on current generation smartphones. Procedia Computer Science, 5:546 – 553, 2011.
The 2nd International Conference on Ambient Systems, Networks and Technolo-
gies (ANT-2011) / The 8th International Conference on Mobile Web Information
Systems (MobiWIS 2011).
[96] Sharun Santhosh. Factoring file access patterns and user behavior into caching
design for distributed file systems. Master’s thesis, Wayne State University, Detroit,
Michigan, 2004.
[97] James Schlichting. Federal Communications Commission. Google Voice and related
iPhone applications. http://hraunfoss.fcc.gov/edocs_public/attachmatch/
DA-09-1736A1.pdf, September 2009.
[98] A.-D. Schmidt, R. Bye, H.-G. Schmidt, J. Clausen, O. Kiraz, K.A. Yuksel, S.A.
Camtepe, and S. Albayrak. Static analysis of executables for collaborative malware
detection on Android. In IEEE International Conference on Communications,
2009, pages 1 –5, june 2009.
132
[99] Aubrey-Derrick Schmidt, Frank Peters, Florian Lamour, Christian Scheel, Seyit
Camtepe, and Sahin Albayrak. Monitoring smartphones for anomaly detection.
Mobile Networks and Applications, 14:92–106, 2009. 10.1007/s11036-008-0113-x.
[100] Blake Stimac. Virus alert: Windows Mobile 6.5
virus found. http://www.intomobile.com/2010/04/15/
virus-alert-windows-mobile-6-5-virus-found/, August 2010.
[101] Cisco Systems. Demystifying cloud computing: a three-minute tutorial. http:
//www.cisco.com/web/offer/fedbiz07/july2009/index.html, July 2009.
[102] Deepak Venugopal, Guoning Hu, and Nicoleta Roman. Intelligent virus detection
on mobile devices. In Proceedings of the 2006 International Conference on Pri-
vacy, Security and Trust: Bridge the Gap Between PST Technologies and Business
Services, PST ’06, pages 65:1–65:4, New York, NY, USA, 2006. ACM.
[103] Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers, and Keying Ye. Prob-
ability & Statistics for Engineers & Scientists. Pearson Prentice Hall, 8th edition,
2007, p. 236.
[104] Xiaoyun Wang and Hongbo Yu. How to break MD5 and other hash functions. In
Ronald Cramer, editor, Advances in Cryptology - EUROCRYPT 2005, volume 3494
of Lecture Notes in Computer Science, pages 561–561. Springer Berlin / Heidelberg,
2005. 10.1007/11426639 2.
[105] Oli Warner. What really slows Windows down. http://thepcspy.com/read/
what_really_slows_windows_down/, September 2006.
[106] Joe Wells. A radical new approach to virus scanning. Technical report, CyberSoft,
Inc., 1999.
133
Appendix A
Appendix
Number of events processed 99999.7 99998 99994.2 99991.2 99921.8 99330.1 93426.7Number of files processed 77691 50000 10000 5000 1000 100 9.5Mean file size generated 977.002 976.4713 973.6049 977.5164 971.258 974.5984 927.2371Median file size generated 677.3671 676.6826 672.2553 676.6572 659.8002 713.8306 813.637Max file size generated 11539.964 10785.0526 9824.4234 9336.1715 7400.5678 5324.6606 2597.6249Proportion of file modifications 0 0 0 0 0 0 0Mean file size scanned 977.002 976.4713 973.6049 977.5164 971.258 974.5984 927.2371Median file size scanned 677.3671 676.6826 672.2553 676.6572 659.8002 713.8306 813.637Max file size scanned 11539.964 10785.0526 9824.4234 9336.1715 7400.5678 5324.6606 2597.6249Un-scanned accesses 0 0 0 0 0 0 0Cache Hit Rate 22.31% 50.00% 90.00% 95.00% 99.00% 99.90% 99.99%Time for AV scanning (sec.) 2325464.1 1495301.15 298412.16 149908.66 29661.74 3015.74 300.18Time for non-AV activities (sec.) 6673.1 6663.59 6663.27 6663.91 6657.84 6608.88 6230.54Total time (sec.) 2332137.21 1501964.74 305075.42 156572.57 36319.57 9624.63 6530.72AV Overhead 34848.69% 22440.06% 4479.49% 2249.56% 445.51% 45.63% 4.83%Average inter-access time 0.067 0.067 0.067 0.067 0.067 0.067 0.067
Table A.1: Raw data from Figure 4.6.
Number of events processed 97.8 196.3 490.8 985.6 9896.4 99183.6 983779.3Number of files processed 49.8 50 50 50 50 50 50Mean file size generated 93.5559 99.2755 90.5267 89.4505 95.0418 92.7249 94.8326Median file size generated 70.2959 75.4199 63.887 67.4278 73.7681 65.9585 69.502Max file size generated 368.1267 386.5154 395.7792 453.9882 457.671 416.767 408.3396Proportion of file modifications 0 0 0 0 0 0 0Mean file size scanned 93.5559 99.2755 90.5267 89.4505 95.0418 92.7249 94.8326Median file size scanned 70.2959 75.4199 63.887 67.4278 73.7681 65.9585 69.502Max file size scanned 368.1267 386.5154 395.7792 453.9882 457.671 416.767 408.3396Un-scanned accesses 0 0 0 0 0 0 0Cache Hit Rate 49.08% 74.53% 89.81% 94.93% 99.49% 99.95% 99.99%Time for AV scanning (sec.) 154.6 160.66 152.46 151.55 158.6 174.27 353.18Time for non-AV activities (sec.) 6.59 13.2 33.48 66.06 659.77 6605.13 65570.5Total time (sec.) 161.19 173.86 185.94 217.6 818.37 6779.41 65923.68Average inter-access time 0.067 0.067 0.068 0.067 0.067 0.067 0.067AV Overhead 2352.20% 1221.41% 455.75% 229.57% 24.05% 2.64% 0.54%
Table A.2: Raw data from Figure 4.7.
134
Number
ofevents
processed
99982.2
99984.4
99980.7
99992
99984.2
99985.9
99982.9
99987.3
Number
offilesprocessed
5000
5000
5000
5000
5000
5000
5000
5000
Meanfile
size
generated
97197.7499
63435.8064
19433.8632
9866.4089
1962.7876
972.1442
97.612
9.7095
Medianfile
size
generated
102400
67508.1407
13647.1049
6810.7295
1360.4033
674.1827
67.7155
6.6986
Max
file
size
generated
102400
102400
102400
90559.0626
18711.4329
8820.3111
801.2105
83.4222
Proportion
offile
mod
ification
s0
00
00
00
0Meanfile
size
scan
ned
10033.4459
9843.104
8525.1434
6933.636
1961.1926
972.1442
97.612
9.7095
Medianfile
size
scan
ned
9862.1326
9649.7395
7702.8734
5687.9268
1360.1801
674.1827
67.7155
6.6986
Max
file
size
scan
ned
20309.68
20464.6566
20468.9162
20463.6603
17379.1231
8820.3111
801.2105
83.4222
Un-scanned
accesses
93065.7
83882.3
25765.5
4804.2
0.7
00
0CacheHitRate
98.49%
94.14%
95.63%
95.40%
95.00%
95.00%
95.00%
95.00%
Tim
eforAV
scan
ning(sec.)
7838.98
73529.56
275772.28
405529.12
289792.97
148720.84
15924.7
7634.69
Tim
efornon
-AV
activities
(sec.)
6670.36
6667.93
6672.74
6662.6
6664.22
6664.69
6670.02
6657.17
Total
time(sec.)
14509.34
80197.49
282445.02
412191.72
296457.19
155385.53
22594.71
14291.86
AV
Overhead
117.51%
1102.73%
4132.78%
6086.82%
4348.45%
2231.50%
238.76%
114.68%
Average
inter-access
time
0.067
0.067
0.067
0.67
0.067
0.067
0.067
0.067
Kaspersky%
6.43%
5.56%
7.82%
10.98%
40.60%
65.25%
100.00%
100.00%
VirusC
hief%
45.11%
47.43%
54.72%
63.05%
58.89%
34.74%
0.00%
0.00%
VirusTotal
%48.46%
47.01%
37.45%
25.96%
0.51%
0.00%
0.00%
0.00%
Unscan
ned
%93.08%
83.90%
25.77%
4.80%
0.00%
0.00%
0.00%
0.00%
Tab
leA.3:Raw
datafrom
Figures4.8an
d4.9.
135
Number
ofevents
processed
99948
99920.4
99898.7
99900.7
99889.5
99873.8
99951.8
99791.3
99744.9
99264.6
Number
offilesprocessed
1000
1000
1000
997.3
883.7
748.5
587.9
451.1
297.8
148.2
Meanfile
size
generated
980.5748
975.3766
984.9695
977.6461
978.2049
1002.9018
975.5268
996.6749
983.568
1012.0571
Medianfile
size
generated
683.9077
681.8096
686.4618
682.5642
673.7291
701.0202
673.7253
690.6021
671.5941
704.2243
Max
file
size
generated
6981.1646
7542.5187
7342.4595
7603.4454
6689.2825
6811.0934
6585.222
7263.6021
5928.4191
5517.0901
Proportion
offile
mod
ification
s0
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
Mod
ification
sab
solute
09994
19976.1
29872
39945.1
49929.1
60029.1
69893.9
79825.8
89300.4
Mod
then
Use
08924.9
15801
20759.5
23737.9
24635.9
23678
20689.2
15704.1
8840.6
Meanfile
size
scan
ned
980.5748
975.3766
984.9695
977.6461
978.2049
1002.9018
975.5268
996.6749
983.568
1012.0571
Medianfile
size
scan
ned
683.9077
681.8096
686.4618
682.5642
673.7291
701.0202
673.7253
690.6021
671.5941
704.2243
Max
file
size
scan
ned
6981.1646
7542.5187
7342.4595
7603.4454
6689.2825
6811.0934
6585.222
7263.6021
5928.4191
5517.0901
Un-scanned
accesses
00
00
00
00
00
CacheHitRate
99.00%
88.96%
78.98%
68.93%
58.93%
49.17%
39.22%
29.29%
19.67%
9.79%
Tim
eforAV
scan
ning(sec.)
30140.21
281190.78
563833.05
905131.4
707501.46
941231.74
1039714.32
647313.82
423906.85
317449.67
Tim
efornon
-AV
activities
(sec.)
6659.07
105553.71
204727.92
302953.53
402315.21
503420.99
604099.28
701433.59
798448.29
893453.96
Total
time(sec.)
36799.29
386744.49
768560.97
1208084.93
1109816.67
1444652.74
1643813.59
1348747.4
1222355.14
1210903.63
AV
Overhead
452.61%
265.72%
274.64%
298.17%
176.42%
186.90%
172.08%
92.39%
53.12%
35.54%
Inter-access
Tim
e0.067
1.174
2.562
4.326
6.711
10.08
15.132
23.461
40.085
89.666
Tab
leA.4:Raw
datafrom
Figure
4.10.
136
Number
ofevents
processed
99987.1
99987.3
99988.4
99982.9
99983.4
99981.7
99986.7
Number
offilesprocessed
5000
5000
5000
5000
5000
5000
5000
Meanfile
size
generated
97.326
97.927
97.349
97.391
97.758
97.338
97.501
Medianfile
size
generated
67.351
67.832
67.711
67.901
68.074
67.328
67.862
Max
file
size
generated
852.013
865.051
882.332
840.176
862.779
834.322
810.934
Proportion
offile
mod
ification
s0
00
00
00
Meanfile
size
scan
ned
97.326
97.927
97.349
97.391
97.758
97.338
97.501
Medianfile
size
scan
ned
67.351
67.832
67.711
67.901
68.074
67.328
67.862
Max
file
size
scan
ned
852.013
865.051
882.332
840.176
862.779
834.322
810.934
Un-scanned
accesses
00
00
00
0CacheHitRate
0.95
0.95
0.95
0.95
0.95
0.95
0.95
Tim
eforAV
scan
ning(sec.)
15897.76
15954.44
15899.91
15903.83
15938.42
15898.87
15914.27
Tim
efornon
-AV
activities
(sec.)
199933617
20046177.2
1996798.71
199981.72
19985.02
1997.55
199.91
Total
time(sec.)
199949514.7
20062131.64
20062131.64
215885.55
35923.44
17896.42
16114.18
AV
Overhead
1999.594
200.487
19.97
20.2
0.02
0.002
Average
inter-access
time
0.01
0.08
0.8
7.95
79.75
795.93
7960.64
Tab
leA.5:Raw
datafrom
Figure
4.11.
137
Categ
ory
Number
ofApps
Mea
nSize(M
B)
Med
ianSize(M
B)
Minim
um
Size(M
B)
Max
imum
Size(M
B)
%<1MB
%<10
MB
%<20
MB
Med
ical
491.96
0.76
0.03
14.35
55.10%
95.92%
100.00
%Tools
501.40
1.00
0.07
8.37
50.00%
100.00
%10
0.00
%Finan
ce47
1.66
1.03
0.02
10.15
48.94%
97.87%
100.00
%Med
iaan
dVideo
501.81
1.03
0.02
5.74
48.00%
100.00
%10
0.00
%Com
ics
492.34
1.03
0.04
37.06
48.98%
97.96%
97.96%
Productivity
501.77
1.37
0.05
7.92
40.00%
100.00
%10
0.00
%Business
492.40
1.39
0.11
13.35
44.90%
95.92%
100.00
%Personalization
482.18
1.55
0.03
9.10
41.67%
100.00
%10
0.00
%New
san
dMag
azines
481.84
1.58
0.06
6.51
31.25%
100.00
%10
0.00
%W
eath
er50
2.20
1.62
0.02
9.08
40.00%
100.00
%10
0.00
%Photog
raphy
503.05
1.79
0.15
12.17
36.00%
94.00%
100.00
%Book
san
dReferen
ce46
3.17
2.05
0.03
19.11
32.61%
95.65%
100.00
%Shop
ping
482.68
2.14
0.16
11.95
27.08%
97.92%
100.00
%Lifestyle
493.26
2.21
0.11
15.73
28.57%
95.92%
100.00
%Musican
dAudio
492.37
2.25
0.11
8.80
24.49%
100.00
%10
0.00
%Sports
473.54
2.34
0.19
17.00
17.02%
97.87%
100.00
%Trave
lan
dLoca
l49
3.02
2.42
0.12
12.40
22.45%
97.96%
100.00
%Social
492.83
2.43
0.07
8.33
18.37%
100.00
%10
0.00
%Com
munication
503.27
2.45
0.09
14.71
28.00%
94.00%
100.00
%Hea
lthan
dFitness
484.16
2.93
0.06
21.57
18.75%
91.67%
97.92%
Educa
tion
474.80
3.56
0.09
28.44
23.40%
89.36%
97.87%
Tab
leA.6:Filesize
characteristicsof
Android
testingdataset.