Reducing the window of opportunity for Android malware Gotta catch ’em all

J Comput Virol (2012) 8:61–71DOI 10.1007/s11416-012-0162-3

ORIGINAL PAPER

Reducing the window of opportunity for Android malwareGotta catch ’em all

Axelle Apvrille · Tim Strazzere

Received: 20 December 2011 / Accepted: 3 April 2012 / Published online: 26 April 2012© Springer-Verlag France 2012

Abstract Spotting malicious samples in the wild has alwaysbeen difficult, and Android malware is no exception. Actu-ally, the fact Android applications are (usually) not directlyaccessible from market places hardens the task even more.For instance, Google enforces its own communication pro-tocol to browse and download applications from its market.Thus, an efficient market crawler must reverse and implementthis protocol, issue appropriate search requests and take nec-essary steps so as not to be banned. From end-users’ side,having difficulties spotting malicious mobile applicationsresults in most Android malware remaining unnoticed up to3 months before a security researcher finally stumbles on it.To reduce this window of opportunity, this paper presentsa heuristics engine that statically pre-processes and priori-tizes samples. The engine uses 39 different flags of differentnature such as Java API calls, presence of embedded exec-utables, code size, URLs... Each flag is assigned a differentweight, based on statistics we computed from the techniquesmobile malware authors most commonly use in their code.The engine outputs a risk score which highlights sampleswhich are the most likely to be malicious. The engine hasbeen tested over a set of clean applications and maliciousones. The results show a strong difference in the average riskscore for both sets and in its distribution, proving its use tospot malware.

A. Apvrille (B)Fortinet, EMEA AV Team, 120, rue Albert Caquot, 06410 Biot, Francee-mail: [email protected]

T. StrazzereLookout Mobile Security, 1 Front Street,Suite 2700, San Francisco, CA 94111 USAe-mail: [email protected]

1 Introduction

The mobile Android operating system is rising in all areas.It is rising in market shares −52.5 % of sales in Q3 2011according to Gartner, in front of Symbian, iOS and WindowsMobile—and in number of applications: Wikipedia [19] nowreports 370,000 applications in Google’s Android market.Unfortunately, it is also rising in malware. Schmidt et al. [14]had already predicted this in 2009, and most anti-virus ven-dors have acknowledged the rise in various blog posts [12],technical reports [10] or conferences [3]. In some case,malicious samples have been downloaded massively: forexample, Android/DrdDream was downloaded over 200,000times [8]. In November 2011, there are approximately 2,000Android malicious unique samples that belong to 80 differentfamilies.1 If, like Anderson and Wolff [1] claimed, the web isdead in favour to users downloading applications rather thandirectly browsing the web, it is likely this trend will onlysharpen in the next few years.

Usually, anti-virus vendors find new malware from one ofthe following sources:

• Users: Victims, users, customers, partners or secu-rity researchers regularly submit suspicious files theyencounter to AV vendors. Those samples are analyzedand if they are found to be malicious and undetected,a new signature is added to the AV engine. 100 of PCmalware are submitted daily for analysis through that ser-vice, but unfortunately there are only few mobile submis-sions. One of the possible reasons to this is that maliciousfiles are often hidden on mobile phones. For instance,

1 Those statistics are taken from Fortinet internal databases. It shouldhowever be noted that figures vary among anti-virus vendors dependingon the classification of samples.

123

62 A. Apvrille, T. Strazzere

an end-user typically does not see the Android package(APK) he/she installs, and therefore, doesn’t know whatto submit for scanning. Other reasons are that end-usersare not accustomed to using file browsers on their phones,or the lack of education on mobile malware.

• AV malware exchange: AV vendors do daily automatedexchanges with each other. This is a large resource formalicious samples and it ensures protection against vir-ulent samples from all participating vendors. However,of course, the exchange only occurs once a vendor hasspotted the malware. So, this source does not help findunknown malware in the wild.

• In the wild: Security researchers seek the web, forums orsocial networks for malicious samples, but in the middleof hundreds of thousands of genuine mobile applica-tions, spotting malicious ones is (fortunately) like find-ing a needle in a haystack. Some automation is needed.Additionally, the advent of application stores harden theprocess of downloading applications on a desktop foranalysis, because they lock up access to a given useraccount and his/her mobile devices.

It is that third and last category this paper is focusing on,and more specifically on malware for Android platforms.We present a heuristics engine, that helps sort out collectionsof applications (malicious or not). Precisely, we are inter-ested in mobile malware that are not detected by any vendoryet. Those samples may belong to already known families,but still be undetected because current signatures (anti-virusdetection patterns) are not good enough, or, in other cases,they might consist in entirely new and unknown families.The latter is certainly more attractive to researchers, but, yet,both categories need to be detected to protect the end-user,and consequently, both categories are taken into account inthis paper.

As we show in the next section, only little research hasbeen conducted on finding mobile malware in the wild. Wediscuss the limitations of previous work and highlight whatwe are able to tackle. We also provide a rationale for crawl-ing Android market places for unknown samples which arein the wild (Sect. 3). Basically, our contribution consistsin explaining the issues with scanning Google’s AndroidMarket (Sect. 4) and how we manage to put a magnifyingglass on the haystack and find the needles (Sect. 5). Theresults of our system is discussed in Sect. 6. This work cancertainly be improved—a few options are detailed in theresults section—but it opens research on the subject.

2 State of the art

Spotting malware in the wild, as early as possible before theyhave had time to cause harm (or too much) is an idea which

has already been developed multiple times for PC malware,particularly with the use of honeypots.

Wang et al. [18] presented an automated web patrol systemwhich browses the web using potentially vulnerable browserspiloted by monkey programs, in hope of identifying exploitsin the wild. Then, Ikinci et al. [6] built a malicious web sitecrawler and analyzer, named Monkey-Spider. From varioussources such as keywords or email spams, their system gener-ates a list of URLs to crawl. They download everything theyfind on the website and make sure to follow links, even thosefound in Javascript or PDF documents. Then, they search thedump for malware using common anti-virus products as afirst stage, and sandboxes in a second stage. Alternatively,Ma et al. [9] proposed to identify malicious web sites usingan automated URL classification method. The idea consistsin only using the URL (its look, length of hostname, num-ber of dots...) and its host (IP address, whois properties) todetermine if it is malicious or not. The content, or context, isnot downloaded, and thus results in a lightweight detectionsystem.

However, all those systems show severe limitations whenit comes to finding mobile malware, because of the specific-ities of mobile networks:

• Most application stores do not provide direct access tothe applications they host. Consequently, a wget on thestores only downloads HTML pages—which are irrele-vant for mobile malware - and not the mobile applicationitself. In particular, the Android Market implements itsspecific protocol to download applications [11,16]. Thecrawler in Ikinci et al. [6] would need to support suchprotocols to provide any result.

• URLs to be displayed on mobile phones generally have adifferent format than on PCs. They are typically shorteror shortened (using a URL shortening service), prefixedwith “m.” or using domain name mobi etc. Thus, Maet al’s. [9] work and Monkey-Spider’s seeder [6] wouldneed to be adapted.

• Up to now, there are only few exploits for mobile phonesand, actually, no browser vulnerability at all has everbeen used by an Android malware. This is because theymostly manage to do their malicious tasks by clever callsto public APIs or social engineering [2]. So, solutions likeWang et al. [18] which only look for browser exploitswould be bound to miss many malicious samples.

• Mobile phones are less easy to manipulate than a desk-top (limited resources etc). Thus, solutions such as Wanget al. [18] which browse the web from the client itself—the mobile phone in our case—are not very practical formobile phones. Moreover, monkey programs to automatethe browsing on mobile phones (like the monkeyrunnerfor Android) are not fully mature yet.

123

Reducing the window of opportunity for Android malware Gotta catch ’em all 63

• In the case of mobile platforms, the maliciousness ofmobile malware cannot be limited to downloading appli-cations, accessing given URLs or connecting to remoteservers. The very fact mobile phones operate on a dif-ferent network (GSM) opens up to other targets suchas calling premium phone numbers, sending SMS mes-sages or accessing WAP gateways. Ikinci et al.’s [6] sand-box would need to detect malicious behaviours for these.Actually, writing a sandbox for a mobile environment isa project in its own. Currently, for Android devices, weare only aware of DroidBox (https://code.google.com/p/droidbox), but it is in alpha stage. It requires manualsource code modifications and when we tested it overAndroid/Geinimi.A!tr it was slow and unable to detectthe malware’s malicious activities.

So, it seems that research for PCs cannot directly beapplied to mobile platforms and requires modifications indepth. Hence, we search for work specifically meant formobile phones but there is only little prior art in this domain.A few months ago, a Google Summer of Code project con-sisting of an Android Market crawler was proposed [7] butlater cancelled. Another project, named DroidRanger [20],seems promising, having found several malicious applica-tions in the Android market and alternative markets, but itisn’t published yet.

One of the closest match to mobile malware scannersis [4]. In that paper, the authors propose an AndroidApplication Sandbox (AAS) to detect suspicious software.For each sample, first, they perform static analysis: theydecompile the code and match 5 different types of patterns:using JNI, reflection, spawning children, using services andIPC, requested permissions. Then, they complement theirapproach with dynamic analysis consisting of a system calllogger at kernel level.

Note [4] only handles the malware analysis part, not thecrawling part of market places: AAS is meant to be installedwithin the Android Market for instance. In our paper, Sects. 3and 4 detail the crawling of market places from independent,external hosts. As for static analysis, the work we present inthis paper covers much more malicious patterns (see Sect. 5)and thus makes detection of suspicious malware more accu-rate.

Teufl et al. [17] proposed an Android Market metadataextractor and analyzer. They downloaded the metadata (i.einformation available on the application’s page: permission,download counts, ratings, price, description...) for 130,211applications and then analyzed the metadata for variouscorrelations. Our paper goes a few steps further, as first itdownloads the applications from the market—which is morecomplicated than getting the metadata—and second, the anal-ysis is performed on far more parameters, 39 propertiescurrently to be precise.

The process we propose in this paper (see Fig. 3) consistsin:

1. Crawling mobile market places for Android applica-tions. Section 4 presents a few hints at how to crawlthe Android Market,

2. Statically analyzing samples as we receive them in aheuristics engine (Sect. 5). This analysis computes a riskscore based on the features it sees in the application’sdecompiled code. Static analysis has the advantage ofbeing virtually undetectable, as obviously the malwarecannot modify its behaviour during analysis. The onlymethod to bypass static analysis is code obfuscation andwe detect some attempts which use encryption. Staticanalysis is also relatively fast, hence with less risks ofcreating a bottleneck as mentioned by [6].

3. If this score is greater than a given threshold, the sam-ple is labeled as suspicious and undergoes further treat-ment. This treatment is out of the scope of the paper,but for instance, it can be manual analysis, AV scanningor dynamic analysis. The goal of our work here is toprioritize samples that need closer investigation. Giventhe amount of samples to process, it is important to startwith those which are the most likely to be malicious.Others can be scanned later—if there is time.

3 Rationale

In this section, we explore the reasons and difficulties forfinding Android malware in the wild.

Android applications can be downloaded from severalsources. The best known application store is Google’sAndroid Market, with 370,000 applications and 7 billiondownloads in November 2011 [19], but numerous othersources exist, ranging from perfectly legitimate and man-ually reviewed stores like Amazon’s [8] to more unofficialmarkets (e.g blapkmarket).

The total count of Android applications is unknown asthe list of market places themselves evolve regularly andbecause several of them do not provide an accurate headcountof applications they store. Moreover, some applications arelisted in multiple places. We know for sure there are 370,000applications in Google’s market [19], and we counted thenumber of applications in 10 other market places: 199,617applications. There are still 37 other market places we did notcount applications for, not to mention forums and file shar-ing websites. So, even if our figures include a few duplicates,there are still probably over 600,000 Android applications inthe wild.

Given those facts, scanning Android applications is a con-siderable job, and it is not surprising that many malwarestay in the wild undetected before an anti-virus vendor or

123

https://code.google.com/p/droidbox

https://code.google.com/p/droidbox


Fig. 1 Number of daysbetween first release in the wildof a malware sample and firstdetection by any vendor

0

50

100

150

200

250

300

350

400

AdSms

BaseBridgeC

ruseWin

Dogow

arD

rdDream

DrdLight

DroidD

eluxeD

roidKungFuEw

allsFake10086FakeN

efixFakePlayerG

einimi

GingerM

aster

GoldD

reamH

ippoSms

Hongtoutou

LovetrapN

etisendN

ickispyPiratesPlanktonR

ogueSPPush

SmsH

owU

SmsPacem

SmsR

eplikSm

stibookSndAppSpitm

oW

alkinwat

Yzhcsms

days

a security researcher finally spots them and alerts the com-munity. For Android malware, according to our research, thegap between release in the wild of a new mobile malware andits detection by one anti-virus vendor is bigger than that: it isapproximately of 80 days! We explain below how Fig. 1 wascomputed.

The date of first detection by an anti-virus vendor is pre-cisely known, and, generally, it does not vary much from onevendor to another. The difficulty resides in finding the daythe malware was first released in the wild. Initially, we con-sidered using the begin date of the certificate used to sign themalware, but developers typically re-use their certificates oruse public certificates so this date would be earlier than theactual release of the malware. Instead, we chose to use thetimestamp of the package’s ZIP.

We are aware this date is not fully reliable, and only con-sider it as an approximation of the malware’s release date.Indeed, the package’s timestamp is not cryptographicallysigned, so it can obviously be tampered. Also, it is possi-ble the malware author zips the package days before he/sheactually releases it etc.

As an additional validation of the date, we only consideredcases where:

certificate begin date ≤ package’s zip date

≤ first detection date

Out of 90 malicious samples, only dates for 4 samples wereevicted.

The amount of Android applications to scan combinedto the detection delay suggests we are currently unaware ofseveral Android malware in the wild. This is only anotherincentive for our work.

4 Hints to crawl the Android market

Google’s Android Market is only accessible via a propri-etary protocol implemented on the device. As Google hasn’treleased any official API for it, reverse engineering the pro-tocol and mimicking a device is the only way to interactwith the market. To make things even harder, Google hasbeen revving their market application nearly as often as theyrelease new operating systems. Though since the first marketrelease, the protocol has remained rather constant, keepingthe protocol backwards compatible and not breaking olderclients.

The market currently uses protocol buffers [13] for com-munication both to the client and the server, which are thenencoded with a Base64 Websafe character set. This is wheremost of the complexities of crawling the market occur. Inorder to properly crawl the market, the protocol must be keptup to date and in sync with what the server expects. Thebasic request context contains data specific to the device therequest is from, these include the software version, productname, SDK version, user language, user country, operatorname and a client ID. It also contains the authentication tokenwith is tied to the users Google account. Lastly, the requestcontext will also hold the Android ID which is linked to boththe user’s account and the device. For a proper request tobe created, all these values must match up—or the resultingresponse can vary drastically. Crawling will only get hardersince application developers can restrict who is allowed tosee their application by any combination of these value.

Additionally, in order to keep a full listing of applicationsavailable in the Android Market, a crawler must perform oddsets of searches. Since the protocol is based off mimicking

123


a device, the same limitations that are imposed on a deviceare imposed on a crawler. This means searches can only beconducted with 10 results being returned at a time, with amaximum of 800 results for any given search or category.

Within each category there are different views, such as‘Top Free’, ‘Top Paid’, ‘Just In Free’ and ‘Just In Paid’. Thismeans that for one valid request context, a US based user onVerizon running SDK 10 on a Nexus S for example, can see40 different categories with 4 different views giving a pos-sibility of 128,000 different applications. Though this doesnot necessarily mean that they are all unique packages. To getthose, it would require a minimum of 12,800 search requests(10 results for each search), followed by 128,000 requests toget each specific applications metadata. Then to downloadthe applications, it would require another 128,000 requests.These are only the applications available for browsing by aphone, there are still countless other applications, with morebeing added almost every minute.

The last thing to complicate matters a bit more is banningfrom the market. On top of keeping track of request con-texts, to maintain a good crawler a wide array of accountsmust be created and maintained. If too many requests happento fast, there are many different types of bans that may comeinto effect. These can range from an IP address ban, to anaccount being blacklisted (both from searching and down-loading) to an Android ID and device ban. Monitoring thehealth of the crawling accounts is another thing that must betaken into consideration for proper crawling.

When creating a fully functional crawler for the AndroidMarket, one should take into consideration again, that youare mimicking real devices. Unlike crawling a normal webpage, this means you need to maintain multiple accounts,multiple device contexts and possibly multiple IP addresses.We found that using a combination of these along with expo-nential back-off rate limiting helps ensure we don’t becomebanned. Along with this we enlist many health checks toensure accounts do not appear to be flagged or be returningback bad data. Trying to rationalize your crawlers traffic iseasier when you think of it in the context of, how much traf-fic could one device possibly generate and how fast can itdo this? Asking ourselves these questions often, provides agood gut check on how design the rate limiting properly tonot get banned.

After maintaining a crawler for some time, the value iseasily seen. If we take all the metadata and binaries, we canperform countless types of queries which are not possiblethrough the normal market protocol. An example of these arelooking for anything named “Angry Birds” which isn’t devel-oped by “Rovio”,2 or have different permissions. Anotherexample is searching all descriptions and package names forsimilar characteristics, such a “DroidDream”, which was one

2 This was the case for samples of Android/RuFraud recently.

way extra accounts were found that where being used by theDroidDream crew [8].

5 Heuristics engine

As various market places are being crawled (see Sect. 4),Android applications are massively being downloaded. Allthose applications are not malicious—fortunately for endusers—so we need to quickly sort out from the mass thosewhich are the most suspicious. This work is performed bya heuristics engine. This tool can be seen as a quick pre-analyzer that rates applications according to their presumedsuspiciousness. Applications which receive the highest score,thus which are the most likely to be malicious, then undergofurther analysis, outside the scope of this paper.

For each Android application, the engine’s algorithm isfairly simple and illustrated at Fig. 3. First, the sample isuncompressed. This is usually nothing more than unzipping,as Android package’s format (APK) is a zip file. We alsohandle cases where the sample is additionally zipped orRARed. Then, the sample’s classes.dex, which concentratesthe application’s Dalvik code, is decompiled using baksmali[15] to produce a more or less human readable format namedSmali. The package’s manifest, initially in binary format,is also decoded to plain XML format. Finally, all those ele-ments—smali code, XML manifest, package’s resources andassets—are analyzed. The analysis consists in searching forparticular risky properties or patterns, and then accordinglyincrementing the risk score.

The difficulty and success of the engine actually liesin writing clever property detectors so as to generate fewfalse positives and false negatives. The task proves out tobe more difficult than expected, because genuine applica-tions sometimes use unexpected functionalities. For exam-ple, as some of the most advanced Android malware typ-ically use encryption algorithms to obfuscate their goal(e.g Android/DroidKungFu), we considered raising an alarmwhenever an application uses encryption. A naive detectorconsists in detecting calls to the Java KeySpec API, butthis isn’t any good, because it also detects many advertise-ment kits that genuine applications use. Indeed, many adver-tisement kits encrypt communication with their servers orimplement tricks to be ensure the ads are not removed. Afteranalyzing a series of clean applications, we implemented thefollowing pattern:

calls to KeySpec or SecretKey or Cipher APIsbut not from com.google.ads, nor mobileads.google.com,

nor com.android.vending.licensing nor openfeint noroauth.signpost.signature nor org.apache.james.mime4j nor

com.google.android.youtube.core

In most property detectors, we had to filter out cases wherethe functionality was being used within advertisement kits,

123


Table 1 Implemented Java API call detectors which are particularly monitored to detect Android malware

API detected Threat

sendTextMessage(), sendMultipartTextMessage() Sending SMS to short numbers

Constants FEATURE_ENABLE_MMS, EXTRA_STREAM, content://mms Sending MMS without consent

Constants: EXTRA_EMAIL, EXTRA_SUBJECT E.g. Communicating with remote server via emails

SmsMessage.createFromPdu(), getOriginatingAddress(),SMS_RECEIVED, content://sms, sms_body ...

Forwarding SMS to a spy number, deleting SMS etc.

Intent.ACTION_CALL Calling a premium phone number

Constant POST or class HttpPost POSTing private information via HTTP

KeySpec, SecretKey, Cipher classes Using encryption to obfuscate part of the code

Methods of the TelephonyManager class: getDeviceId(),getSubscriberId(), getNetworkOperator(),getLine1Number(), getSimOperator(),getSimSerialNumber(), getSimCountryIso()

IMEI, IMSI are personal information

Methods of PackageInfo class: signatures(),getInstalledPackages()

Checking malware’s integrity, deleting given pack-ages, posting list of packages to remote server etc.

DexClassLoader class Loading a class in a stealthy manner

Static methods Class.forName(), Method.invoke() Reflection. Loading class in a stealthy manner.

Method Runtime.exec() or using android.os.Exec class or createSubprocess() E.g. Executing an exploit

JNIenv, jclass, jmethodID, jfieldID, FindClass JNI: executing native code

billing APIs, legitimate authentication, youtube, social gam-ing networks such as Openfeint etc.

The 39 property detectors we implemented so far fall inone of the 7 categories below:

• Permissions required in the Android manifest. This isone of the first properties that comes to mind to checkwhat a sample does. We keep a particular eye on Inter-net, SMS, MMS, calls, geographic location, contacts andpackage installation permissions. If those permissions arerequested, we increment the risk score. However, permis-sions alone are insufficient to spot malware.

• API call detectors. This is the most important category ofproperty detectors. It consists in detecting the use of par-ticular Java methods, classes or constants, spotting themin the decompiled smali code. The API elements it detectsare of very different nature. Some of those concern actionsor features of the phone: send/receive SMS, call phonenumbers, geographic location, get telephony information,listing or installing other packages on the phone. Othersconcern Java language tweeks: dynamic class loading,reflection or JNI. Finally, a few calls concern the under-lying Linux operating system such as the creation of newprocesses. See Table 1 for details.

• Command detectors. The engine also detects use of spe-cific Unix or shell commands. This is similar to detectingAPI calls, except commands can be located within scriptsof raw resources or assets, so those directories need tobe scanned too. Currently, we only increment the riskscore when the command pm install (installation ofAndroid packages) is detected.

• Presence of executables or zip files in resources orassets. This is used to detect malware which run exploitson the phone. We said previously exploits weren’t usedthat often yet, but, when an exploit is used the sample isgenerally malicious.

• Geographic detectors. Currently, 40% of mobile mal-ware families seem to originate from Russia, Ukraine,Belorus, Latvia and Lithuania, and 35% from China.3

The engine consequently slightly raises the risk score forsamples which appear to come from those countries. Inparticular, it raises the risk score if the signing certificate’scountry is one of those, or if the malware mentions a fewkeywords such as 10086 (China Mobile customer ser-vice portal),cmnet or cmwap (China Mobile gateways).

• URL detectors. On one side, access to Internet is impor-tant to malware authors to report back information, updatesettings or get further commands, so it seems important toincrement the risk score when the sample accesses Inter-net. On the other side, there are so many genuine reasonsto access Internet that we have to make sure not to raise thealarm unnecessarily. We chose to raise the risk score onlyonce if a URL is encountered (i.e if the engine detects 4URLs, the risk score is only incremented once), and alsoto skip the extremely frequent URLs such as the AndroidMarket’s URL, Google services (mail, documents, calen-dar…), XML schemas and advertisement companies.

3 Statistics from Fortinet’s internal databases.

123


Table 2 Comparing likelihoodpercentages of a subset ofsituations computed for 97malware and 217 clean files

Situation Likelihood % for Likelihood % WeightAndroid malware for clean files

Send SMS 59 6 5

Receive SMS 60 10 5

Performs HTTP POST 68 25 4

Combination of SMS and access to Internet 46 6 4

Gets IMEI 63 20 4

Uses HTTP or views a URL 85 50 3

Gets IMSI 36 1 3

Code contains a URL 65 41 2

Uses encryption 34 10 2

Gets phone line number 27 6 2

Gets information concerning the SIM card 32 5 2

Specifically targets China 26 0 2

Lists installed packages 33 5 2

Other situations 1

Size <70, 000 bytes 30 21 1

Note there is a particular case for URLs: if the URL down-loads an APK, we raise the risk score more importantlyas this can mean the sample is trying to update or installanother application.

• Size of code. An analysis of the size of malicious APKscompared to benign APKs shows that the average size iscomparable, but that the distribution is different. In par-ticular, there are more very small malware, with sizes lessthan 70,000 bytes: 30 % of malicious files against 21 %of clean files (see Table 2). So, we raise the risk score forsamples below 70,000 bytes.

• Combinations. We increment the risk score if some spe-cific conditions are met, such as if the sample gets thegeographic location and accesses Internet. Indeed, in thatcase, the sample has the capability to report the end-user’sgeographic location, which results in a privacy threat.

The risk score has no particular unit. It is incrementedbased on the different likelihoods of a given situations forAndroid malware or clean applications. To compute theweights (risk score increments), we analyzed 97 malicioussamples (taken from 41 different families) and 217 cleansamples and tested whether each property was found or not.For each situation, we therefore compute a percentage oflikelihood.

Then, basically, we are interested in big differencesbetween percentages for Android malware and percentagesfor clean samples.

If the difference of percentage points is ≥ 50, we assign aweight of 5.

If the difference ≥ 40 and < 50, weight = 4.If the difference ≥ 30 and < 40, weight = 3.If the difference ≥ 20 and < 30, weight = 2.

Finally, if the difference is less than 20%, we use thesmaller weight, 1.

So, for example, Table 2 shows that 59 % of Android mal-ware send SMS messages whereas only 6 % of clean samplesdo. The difference of percentage points is 53, so we incre-ment the risk score by 5 if the situation is encountered.

6 Results

This section details the results of the tools presented in thispaper.

6.1 Market scanner results

The implementation of the market scanner is not public andconsequently could not be disclosed in this paper. However, itis successful in entirely scanning Google’s Android Market.

Besides its initial goal—retrieving samples from the mar-ket place - the market scanner proved out to be quite usefulin another area: tracking malware authors.

123


Fig. 2 Distribution of riskscores for a set of malicioussamples and a set of cleansamples

0

10

20

30

40

50

<5 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 >=55

Per

cent

age

Risk score

malicious samplesclean samples

Indeed, combining heuristics on the applications, com-mon package naming schemes and tracking metadata, devel-opers can be tracked, people pirating their applications canbe tracked, and so can malware authors.

Actually, this method was used to track Legacy (akaAndroid/DroidKungFu) in the Android Market. Originally,we found Legacy samples in third party Chinese markets,which appears to be the place where the authors first releasedit. Then, since we had historical data in that market (we keepa chronology of metadata using when searching that mar-ket), we saw the authors build a user base with an applica-tion—then push a malicious update to the market. When wesearched for the non-malicious update, we could see that itwas being seeded into the Android Market.

6.2 Heuristics engine results

On its side, a heuristics engine prototype was implementedas a basic Perl script. Overall, we processed more than 3,000samples with it. In particular, it was at the origin of the dis-covery of Riskware/Sheriff and Riskware/ESSecurity.

To test the engine, we had it analyze a set of 947 cleansamples,4 and a set of 107 malicious samples.5 Note we did

4 Those clean samples were taken from the web, in particular from theF-Droid open source market place, and were manually double checkedto ensure they were genuine.5 Malicious samples were downloaded from Mila Parkour’s reposi-tory on http://contagiominidump.blogspot.com and from a malwareexchange with NetQin.

not use the same sets as the ones used to compute the riskscore increments at Table 2 so as not to influence results.

The results are displayed at Fig. 2. They show a clear dif-ference in the distribution of risk score for both sets: cleansamples tend to have most of their risk scores below 15, whilemalicious samples are the most frequent above 45. For thosedata sets, there was no risk score above 40 for clean samplesand above 55 for malicious ones.

The computed average risk score for clean samples is 8,while it is of 44 for malicious samples.

6.3 Limitations

Though its results are quite promising up to now, there are afew limitations and improvements to plan for the engine.

First, it is important to understand a heuristic engine isby design not perfect: it generates false positives and falsenegatives.

The case occurred for an application named Prepay Wid-get, an Android widget to display your plan’s balance, freeminutes, traffic etc. The application was sending USSD com-mands6 to get plan’s information and thus triggering the callproperty detector. It was also reading incoming SMS as someoperators reply to USSD via SMS, and thus caught by theSMS receiver detector. It was signed by a Russian certifi-cate, so caught by the geographical detector. It was testingwhether the phone was rooted (to put the dialer in back-ground) and thus triggering the Runtime.exec() detector etc.

6 USSD is a GSM protocol to communicate with the operator.

123

http://contagiominidump.blogspot.com


Fig. 3 Process to find new mobile malware. The risk evaluation engineis our heuristics engine

All those alarms could be explained after a manual check,but they resulted in a high risk score (36) for a non-maliciousapplication. However, with all the borderline techniques itused, we believe the heuristic engine was however right toraise the alarm as this could have been malicious.

Reciprocally, Fig. 2 show a few malicious applicationshave risk score below 15, and this will be the case for a fewclever applications like [5]. This Proof of Concept provides a

remote shell via the installation of an application that does notrequest any permission. The trick consists in having the appli-cation launch a web browser, registering a custom URI andhaving the remote server send encoded commands throughthe connection to the web browser. Then, the commands aredecoded and executed. The code for the PoC is not publishedso we could not test the heuristic engine on it, however, fromdiscussion with its author, it appears it would have at leasttriggered the API call detector to Runtime.exec().

Clearly, the goal of the heuristics engine is not to detecteverything, but to help detect most malicious malware. Theresults we presented at Fig. 2 prove this is statistically true.

Technically speaking, the engine could be enhanced inseveral ways:

• Performance. For example, currently each propertydetector results in a search (grep) on a given directory.This means parsing the analysis directory several times(nearly one for each property—depending on properties).Rather, a single common search could be done, and theresults scanned for each property.

• Improving or adding detectors. We plan to improve theexecutable detector which currently merely detects thepresence of an executable or zip file in assets or rawresources. It is therefore triggered by the presence ofgenuine libraries such as libGoogleAnalytics.jar. Thedetector could filter out such libraries or scan for givenkeywords in the executables.We also plan to add a public certificate detector to spotapplications signed using a debug, test or developmentcertificate. Indeed, the following public certificate has

Fig. 4 Sample output reportof the heuristic engine (extract)

123


been used several times by malware authors and mightbe a potential indicator:

[email protected]=Android OU=Android O=AndroidL=Mountain View ST=CaliforniaC=US Serial number: 936eacbe07f201df

We also consider improving the URL detector, as nearlyall URL trigger the alarm (see Sect. 5). It would be inter-esting to update [9] for mobile URLs and incrementthe risk score differently depending on URLs which arefound.

• Computing weights. Data mining approaches would bean improvement to our heuristics engine. In our proto-type, we arbitrarily decided to assign weights from 1to 5 depending on difference of percentage points, but areal engine should certainly be tuned from results of datamining research.

• Recursive applications. The heuristics engine detects thata given sample contains another APK in its resources orassets, as indeed, this is an additional risk. However, theengine does not then recursively analyze that APK.

• Tests. We plan to test the engine against larger sets ofsamples, to fine tune the detectors and risk increments.Testing the engine against clean file sets is particularlytime consuming, because each application has to bemanually inspected to make sure it is not malicious orinfected.

7 Conclusion

In this paper, we have presented ways to help spot Androidmalware which are in the wild. We explained the catches inimplementing a market scanner, and implemented one thatentirely scans Google’s Android Market. To do so, it usesprotocol buffers and performs several searches using differ-ent combination of parameters such as the country, language,application category etc. It makes sure to crawl all existingapplications, and not only those visible by a given device.It also deals with the risk of getting banned because of toomany downloads. As a side-effect, by keeping a history ofscanned metadata of markets, the scanner has also success-fully been used to track malware authors such as the creatorsof Legacy / DroidKungFu.

While markets are being crawled and samples stack on adisk, we use a heuristics engine to quickly pre-process sam-ples. The goal is to give a rough idea of which samples are themost likely to be malicious and prioritize them. This heuris-tics engine is detailed in the paper. It is a static analyzer thatchecks for 39 different properties such as requested permis-sions, calls to particular Java methods or classes, constants,

assumed geographic data, code size etc. Each property cor-responds to a given risk score increment. The value of theincrement in itself has been computed out of training datasets. An engine prototype is currently implemented as a Perlscript and has been tested against 947 clean samples and 107malicious ones, for which it provides clearly different riskscores.

The concepts and implementation of the heuristics engineare strongly based on Android malware statistics. The papertherefore also presents a few interesting results such as thefact Android malware sit in the wild 3 months in averagebefore anybody spots them, or that 63 % of Android mali-cious samples retrieve the phone’s IMEI and 59 % send SMSmessages.

The question of scanning mobile markets is relatively newand there hasn’t been much research on it yet. So, the toolspresented in this paper are quite relative newcomers. They areconsequently expected to much improvements in the future,both in performance, tuning of scores and selectivity of mal-ware.

Appendix: Android market places

Amazon AppStore 4,000Android Blip > 70,000Android Pazari 2,052Appoke 3,300AppsLib 38,771F-Droid 502GetJar 75,000Hyper Market 792Indiroid >700Soc.io 4,500

http://andappstore.comhttp://andiim3.comhttp://androides-os.comhttp://androidis.ruhttp://android-phones.ru/category/files/http://www.anzhi.comhttp://aptoide.comhttp://apk.hiapk.comhttp://apps.opera.com/http://blapkmarket.com,http://indiroid.comhttp://mikandi.comhttp://myandroid.su/index.php/catproghttp://onlyandroid.mobihand.comhttp://open.app.qq.comhttp://snappzmarket.com/http://wandoujia.com/http://www.androidpit.com

123


http://www.19sy.comhttp://www.1mobile.comhttp://www.92apk.comhttp://www.androidonline.nethttp://www.androidz.com.brhttp://www.appchina.comhttp://www.appitalism.comhttp://www.aproov.comhttp://www.downapk.comhttp://www.eoemarket.comhttp://www.handster.comhttp://www.insydemarket.comhttp://moyandroid.nethttp://www.nduoa.comhttp://www.openappmkt.comhttp://www.pocketgear.com,http://slideme.orghttp://www.sjapk.comhttp://www.starandroid.comhttp://www.yingyonghui.comhttp://www.zerosj.com/http://yaam.mobi/http://forum.xda-developers.comhttp://4pda.ru/forum/index.php?

showforum=281

References

1. Anderson, C., Wolff, M.: The web is dead. Long live the internet.(2010). http://www.wired.com/magazine/2010/08/ff_webrip/all/1

2. Apvrille, A., Zhang, J.: Four malware and a funeral. In: 5th Conf.on Network architectures and information systems security (SAR-SSI) (2010)

3. Armstrong, T., Maslennikov, D.: Android malware is on the rise.In: Virus bulletin conference (2011)

4. Bläsing, T., Schmidt, A.-D., Batyuk, L., Camtepe, S.A., Albayrak,S.: An Android application sandbox system for suspicious soft-ware detection. In: 5th international conference on malicious andunwanted software (MALWARE’2010). Nancy, France (2010)

5. Cannon, T.: No-permission Android App Gives Remote Shell.(2011). http://viaforensics.com/security/nopermission-android-app-remote-shell.html

6. Ikinci, A., Holz, T., Freiling, F.C.: Monkey-spider: detecting mali-cious websites with low-interaction honeyclients. In: Sicherheit,pp. 407–421 (2008)

7. Logan, R., Desnos, A., Smith, R.: The Android marketplaceCrawler (2011). http://www.honeynet.org/gsoc/ideas

8. Lookout mobile security: lookout mobile threat report (2011)9. Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond black-

lists: learning to detect malicious web sites from suspicious urls.In: Proceedings of the 15th acm sigkdd international conferenceon knowledge discovery and data mining, pp. 1245–1254. ACM,New York (2009)

10. McAffee Labs.: Mc Affee threats report: third quarter 2011 (2011).http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q3-2011.pdf

11. Pišljar, P.: Reversing Android market protocol (2010). http://peter.pisljar.si/

12. de Pontevès, K.: Android Malware Surges in 2011 (2011). https://blog.fortinet.com/android-malware-surges-in-2011

13. Protocol Buffers.: (n.d.). http://code.google.com/p/protobuf/14. Schmidt, A.-D., Schmidt, H.-G., Batyuk, L., Clausen, J.H., Cam-

tepe, S.A., Albayrak, S.: Smartphone malware evolution revisited:Android next target? In: 4th international conference on maliciousand unwanted software (malware), IEEE, pp. 1–7 (2009)

15. Smali.: (n.d.). https://code.google.com/p/smali16. Strazzere, T.: Downloading market applications without the Vend-

ing app. (2009). http://strazzere.com/blog/?p=29317. Teufl, P., Kraxberger, S., Orthacker, C., Lackner, G., Gissing,

M., Marsalek, A., et al.: Android market analysis with activationpatterns. In: Proceedings of the international ICST conference onsecurity and privacy in mobile information and communication(MobiSec) (2011)

18. Wang, Y.-M., Beck, D., Jiang, X., Roussev, R., Verbowski, C.,Chen, S., et al.: Automated web patrol with strider honeymonkeys:finding web sites that exploit browser vulnerabilities. In: Proceed-ings of the network and distributed system security symposium,NDSS 2006, San Diego, California, USA. The Internet Society(2006)

19. Wikipedia: Android Market (2011). https://en.wikipedia.org/wiki/Android_Market

20. Zhou, Y., Wang, Z., Zhou, W., Jiang, X.: Hey, you, get off of mymarket: detecting malicious apps in official and alternative androidmarkets. In: Proceedings of the 19th network and distributed sys-tem security symposium (NDSS 2012), San Diego, CA (2012)

123

http://www.wired.com/magazine/2010/08/ff_webrip/all/1

http://viaforensics.com/security/nopermission-android-app-remote-shell.html

http://viaforensics.com/security/nopermission-android-app-remote-shell.html

http://www.honeynet.org/gsoc/ideas

http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q3-2011.pdf

http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q3-2011.pdf

http://peter.pisljar.si/

http://peter.pisljar.si/

https://blog.fortinet.com/android-malware-surges-in-2011

https://blog.fortinet.com/android-malware-surges-in-2011

http://code.google.com/p/protobuf/

https://code.google.com/p/smali

http://strazzere.com/blog/?p=293

https://en.wikipedia.org/wiki/Android_Market

https://en.wikipedia.org/wiki/Android_Market

Date post:	26-Aug-2016
Category:	Documents
Upload:	tim
View:	223 times
Download:	4 times

Reducing the window of opportunity for Android malware Gotta catch ’em all

Documents