Reconciling User Privacy and Implicit Authentication for ... · plicit authentication systems with...

Reconciling User Privacy and Implicit Authentication for Mobile Devices∗

Siamak F. ShahandashtiNewcastle University, UK

[email protected]

Reihaneh Safavi-NainiUniversity of Calgary, Canada

[email protected]

Nashad Ahmed SafaUniversity of Calgary, Canada

July 15, 2015

Abstract

In an implicit authentication system, a user profile is usedas an additional factor to strengthen the authentication ofmobile users. The profile consists of features that are con-structed using the history of user actions on her mobile de-vice over time. The profile is stored on the server and is usedto authenticate an access request originated from the deviceat a later time. An access request will include a vector ofrecent measurements of the features on the device, that willbe subsequently matched against the features stored at theserver, to accept or reject the request. The features how-ever include private information such as user location orweb sites that have been visited. We propose a privacy-preserving implicit authentication system that achieves im-plicit authentication without revealing information aboutthe usage profiles of the users to the server. We propose anarchitecture, give a formal security model and a construc-tion with provable security in two settings where: (i) thedevice follows the protocol, and (ii) the device is capturedand behaves maliciously.

Keywords: Implicit Authentication, User Privacy, Ho-momorphic Encryption, Provable Security, BehaviouralFeatures

∗This manuscript has been accepted for publication in Comput-ers & Security. The manuscript will undergo copyediting, type-setting, and review of the resulting proof before it is published inits final form. Please note that during the production process er-rors may be discovered which could affect the content, and all dis-claimers that apply to the journal apply to this manuscript. A defini-tive version is published in Computers & Security (2015) under DoI:10.1016/j.cose.2015.05.009 [51].This is an extended version of a paper that appeared in the proceedingsof the 29th International Information Security and Privacy ConferenceIFIP SEC 2014 [49].cbnd This work is licensed under the Creative CommonsAttribution-NonCommercial-NoDerivatives 4.0 International License.It can be shared as long as the original work is credited, but cannotbe changed in any way or used commercially. To view a copy of thislicense, visit http://creativecommons.org/licenses/by-nc-nd/4.0

1 Introduction

In applications such as mobile commerce, users often pro-vide authentication information using Mobile Internet De-vices (MIDs) such as cell phones, tablets, and notebooks.In most cases, password is the primary method of authen-tication. The weaknesses of password-based authenticationsystems, including widespread use of weak passwords, havebeen widely studied (see e.g. [56] and references within).In addition to these weaknesses, limitations of user inter-face on MIDs results in an error-prone process for inputtingpasswords, encouraging even poorer choices of password byusers.

Two-factor authentication systems can potentially pro-vide higher security. Second factors that use special hard-ware such as RSA SecurID tokens1 or biometrics, incur ad-ditional cost which limit their wide usage. An attractivemethod of strengthening password systems is to use im-plicit authentication [30] as an additional factor for authen-tication. The idea is to use the history of a user’s actionson the device, to construct a profile for the user consistingof a set of features, and employ it to verify a future authen-tication request. In the authentication phase, the devicereports recent user behaviour, and authentication succeedsif the reported recent user behaviour “matches” her storedprofile. Experiments in [30] showed that the features col-lected from the device history can be effectively used todistinguish users. Although the approach is general, it isprimarily used to enhance security of mobile users carryingMIDs because of the richness of sensor and other data thatcan be collected on these devices. In such a scenario, a net-work service provider (the carrier) wishes to authenticate auser in possession of the MID.

An important distinction one needs to make is that thegoal in implicit authentication is authenticating the user inpossession of the device rather than the device itself. Con-sequently, the user profile needs to be stored at the carrierside to ensure that a compromised device cannot be used toimpersonate the legitimate user.

The collected data about the user’s actions can be dividedinto the following categories: (i) device data, such as GPS

1www.emc.com/security/rsa-securid.htm

1

http://dx.doi.org/10.1016/j.cose.2015.05.009

http://creativecommons.org/licenses/by-nc-nd/4.0

www.emc.com/security/rsa-securid.htm

location data, WiFi/Bluetooth connections, and other sen-sor data, (ii) carrier data, such as information on cell towersseen by the device, or Internet access points, and (iii) thirdparty data, such as cloud data, app usage data, and calendarentries. As discussed, the user profile including data froma mixture of these categories is stored at the carrier side.This profile however includes private and potentially sensi-tive user data, including device and third party data, thatmust be protected. One might be lead to think that there isan inherent trade-off between user privacy on one hand andthe effectiveness of implicit authentication on the other. Inthis paper we show that this is a false trade-off; i.e., userprivacy and effective implicit authentication can coexist. Inparticular, we propose an efficient privacy-preserving im-plicit authentication systems with verifiable security.

We consider a network-based implicit authentication sys-tem where user authentication is performed collaborativelyby the device (the MID) and the carrier (network serviceprovider), and will be used by application servers to au-thenticate users. The implicit authentication protocol ge-nerates a score for each feature representing the confidencelevel of the authentication based on that individual feature.Individual scores are subsequently combined based on thecarrier authentication policy to accept or reject the user.Individual scores are obtained through a secure two partycomputation between the device and the carrier. Securetwo party protocols can be constructed using generic con-structions based on secure circuit evaluation, e.g. [60, 26],or fully homomorphic encryption [24]. We however opt todesign a special-purpose protocol fit for the type of com-putations needed in implicit authentication. This allows usto achieve a level of efficiency which is practical and higherthan those provided by generic constructions.

1.1 Our Contributions

We propose an implicit authentication system in which userdata is encrypted and stored at the carrier, and an inter-active protocol between the MID and the carrier is used tocompute the authentication score. Data privacy is guaran-teed since user data is stored in encrypted form. Because nodata is stored on the MID, user data stays protected evenif the device is lost or stolen. The main contributions ofthis paper are proposing a profile matching function thatuses the statistics of features to accept or reject a new sam-ple presented by a user, and designing a privacy-preservingprotocol for computing a score function for newly presenteddata.

We assume the user profile is a vector of multiple features(V1, . . . , Vn), each corresponding to a random variable withan associated probability distribution. Samples from thedistribution of Vi is stored as the set of values of the vari-ables in the last ì successful logins. A new login attemptgenerates a vector of values, one for each feature. The veri-fication function must decide if this vector indeed has been

generated by the claimed user. Our proposed verificationalgorithm takes considers feature separately and computesa score for each feature indicating the confidence level inthe presented value being from the claimed user. The finalverdict is reached by combining the scores from all features.

To determine if a new value presented for a feature vimatches the model (stored distribution of the feature), wewill use a statistical decision making approach that uses theAverage Absolute Deviation (AAD) of the distribution. Weuse AAD to define an interval around the reported valuevi given by [vi − AAD(Vi), vi + AAD(Vi)] and then deter-mine the score representing the concentration of past userbehaviour observations close to the reported value vi bycounting the number of the stored values in the user pro-file that fall within the interval: the higher the number isthe higher the score for that feature will be. Eventually thescores from all features are considered and the outcome ofthe authentication according to a certain policy is decided.AAD and standard deviation are commonly used statisticalmeasures of dispersion, estimating the “spread” of a dis-tribution. Our verification algorithm effectively measuressimilarity of the presented value with the “most common”readings of the variable. Using AAD allows more efficientprivate computation.

Constructing User Profiles: A user profile is a featurevector (V1, . . . , Vn), where feature Vi is modelled by a vec-tor of ì past samples. The vector can be seen as a slidingwindow that considers the latest ì successful authentica-tion data. Using different ì is allowed for better estimationof the feature distribution. Possible features are the fre-quency of phone calls made or received by the user, user’stypical locations at a particular time, commonly used WiFiaccess-points, websites that the user frequently visits, andthe like. We survey the literature and find several featuresthat are appropriate for our protocols. These are listed inSection 3.2. Some features might be dependent on otherones. For example, given that the user is in his office andit is lunch time, then there is a higher chance that he re-ceives a call from home. We do not consider dependence offeatures and in selecting them make special care to selectthose that appear independent.

Privacy-Preserving Authentication: All user profiledata is stored in encrypted form on the carrier and the de-cryption keys are only known by the device. To find theauthentication score for each feature, the device and thecarrier have to perform a secure two-party computation pro-tocol that outputs the authentication score to the carrier,and nothing to the device. We propose two 3-round pro-tocols between the device and the carrier that allow thecarrier to “securely” calculate the score. The two protocolsare designed to provide security in two different threat mod-els where the device is considered either honest-but-curiousor malicious. To provide the required efficiency, we have to

2

sacrifice some privacy in the sense that although the actualdata samples are not leaked, the protocols do expose lim-ited structural information related to the relative order ofdata samples. We give formal definitions of our notions ofprivacy which guarantee that no information other than therelative order of samples is revealed by a secure protocol inthe two threat models. We then prove the security of bothprotocols in the corresponding adversarial models.

The paper is organised as follows. We discuss the re-lated work in the field of behavioural authentication in Sec-tion 1.2. Section 2 contains the preliminaries needed for ourprotocols. System architecture, adversarial models, and thecore implicit authentication protocol not guaranteeing pri-vacy are presented in Section 3. We give details of our pro-posed protocols for semi-honest and malicious devices andprovide analyses on their computation and communicationcomplexity in Section 4. Formal security definitions andproofs are provided in the appendices.

1.2 Related Work

The privacy problem in implicit authentication was notedin [30]. The three approaches proposed for enhancing pri-vacy are: (i) removing unique identifier information; (ii)using pseudonyms; and (iii) using aggregate data insteadof fine grained data. All these methods however have lim-ited effectiveness in protecting users’ privacy while main-taining usefulness of the system. It is well known thatuser data with identification information removed can becombined with other public data to re-identify individu-als [55], and fixed pseudonyms does not prevent linkabilityof records [35]. Finally coarse aggregates result in inaccu-rate authentication decisions.

User authentication is a widely studied problem witha wide range of mechanisms, ranging from cryptographicprotocols to biometrics and systems that use special to-kens [43, 8, 17, 2]. In a user study of authentication onmobile devices in [22], Furnell et al. showed that users prefersystems that authenticate the user periodically on a contin-uous basis throughout the day in order to maintain confi-dence in the identity of the user.

Implicit authentication systems that authenticate userscontinuously without disturbing the user, like the systemconsidered in this paper, are the best fit for this require-ment. These schemes have the potential to augment existingcryptographic and password-based authentication systems.Implicit authentication systems have been proposed basedon many different features. These include systems that au-thenticate the user based on biometrics [36, 40, 42, 50], ac-celerometers [11], gait recognition [33, 23, 20], call and SMSpattern [52], location pattern [10, 53, 9, 12, 54], keystrokedynamics [14, 28, 61, 38, 19, 59], proximity to other de-vices [32, 48], and touchscreen dynamics [58, 16, 21]. In-dustry products such as Sentry from AdmitOne Security2

2www.admitonesecurity.com

and BehavioMobile from BehavioSec3 implement implicitauthentication. A major weakness of all these systems how-ever is that the carrier learns the user’s behaviour. Protect-ing the user’s privacy in such a situation is the motivationfor this paper.

Protecting user privacy in implicit authentication systemshas been the motivating problem to a few works in the liter-ature. The TrustCube framework proposed in [13] supportsimplicit authentication. The implicit authentication systembased on keystroke dynamics in [41] also provides a level ofprivacy for the user. However, both of these systems re-quire trusted remote platforms to carry out part or all ofthe computation. A trusted platform is typically imple-mented as a secure chip able to carry out limited security-sensitive operations. Examples of such platforms includethe Trusted Platform Module (TPM), or its proposed mo-bile device equivalent a Mobile Trusted Module (MTM)4.Such trusted platforms are yet to be deployed extensivelyand we view this requirement as a limiting factor of theseworks. Finally, authors in [45] propose an implicit authen-tication system with user privacy that requires remote at-testation, i.e., the calculations of the device being certifiedby a trusted third party. The involvement of such a trustedthird party, we believe, significantly complicates and limitsthe usability of such a system. The aim of this paper is toachieve user privacy without requiring a trusted platformor involving a trusted third party.

2 Preliminaries

Our constructions use homomorphic encryption and orderpreserving (symmetric) encryption. In the following we firstgive an overview of these primitives.

Homomorphic Encryption (HE): We use here an ad-ditive homomorphic public key encryption scheme [44, 15]which supports addition and scalar multiplication in the ci-phertext domain. Let EHE

pk (·) denote such an encryptionalgorithm. Given encryptions of a and b, an encryption ofa+ b can be computed as EHE

pk (a+ b) = EHEpk (a)�EHE

pk (b),where � represents an efficient operation in the ciphertextspace. The existence of the operation � enables scalar mul-tiplication to be possible in the ciphertext domain as well;that is, given an encryption of a, an encryption of ca canbe calculated efficiently for a known c. To simplify the no-tation, we use + for both the operations + and �. As aninstantiation, we use Paillier cryptosystem [44, 15] in whichthe operation � can be carried out by multiplication of ci-phertexts and scalar multiplication by exponentiation. Pail-lier cryptosystem is semantically secure under the decisionalcomposite residuosity assumption [44, 15].

3www.behaviosec.com4Specifications of TPM and MTM are available from the Trusted

Computing Group web page: www.trustedcomputinggroup.org

3

www.admitonesecurity.com

www.behaviosec.com

www.trustedcomputinggroup.org

Order Preserving Encryption (OPE): A function f :D 7→ R is order preserving if for all i, j ∈ D: f(i) > f(j) ifand only if i > j. An encryption scheme with plaintext andciphertext spaces D and R, respectively, is order preservingif its encryption algorithm is an order preserving functionfrom D to R for all keys; i.e., an OPE maps plaintext val-ues to ciphertext space in such a way that the order of theplaintext values remains intact. Order preserving (symmet-ric) encryption (OPE) was introduced in [5]. The providedconstruction was proven secure in the POPF-CCA (pseudo-random order preserving function against chosen-ciphertextattack) model. More details on the security model and en-cryption system are given in Appendix A.

Secure Two-party Computation: In a secure two-party computation, two parties A and B with private inputsx and y, respectively, compute a function f(x, y), ensuringthat correctness and privacy are guaranteed. Correctnessmeans that the output is indeed f(x, y) and not somethingelse. Privacy means that neither A nor B learns anythingabout the other party’s input, other than what they wouldlearn from the outputs (if any) that they receive by partici-pating in the protocol. To formalise security of a two-partyprotocol, the execution of the protocol is compared to an“ideal execution” in which parties send their inputs to atrusted third party who computes the function using theinputs that it receives. Informally, a protocol is consideredsecure if a real adversary in a real execution can learn “thesame” amount of information as, or can “change the proto-col output” not more than what an ideal adversary can doin the ideal model.

Security of two-party protocols is considered against dif-ferent types of adversaries. In the semi-honest model(a.k.a. honest-but-curious model), the adversary followsthe protocol specification but tries to learn extra informa-tion from the protocol communications. A malicious (a.k.a.dishonest) adversary however follows an arbitrary strategy(bounded by polynomial time algorithms) and can deviatefrom the protocol specification.

There are a number of generic constructions for securetwo party computation, e.g. [60, 26], however they haveproven to be too inefficient in practice, specially in resource-restricted devices. An alternative approach to realise spe-cific secure two-party protocols is based on homomorphicencryption (HE). In this approach, one party sends its en-crypted inputs to the other party, who then computes thespecific desired function in the encrypted domain using thehomomorphic properties of the encryption system. Pail-lier’s additively homomorphic cryptosystem [44] and Gen-try’s fully homomorphic scheme [25] are the commonly usedtools in this approach.

Average Absolute Deviation: In our protocol we usea model of feature comparison that uses average absolutedeviation. The median of a data set is the numeric value

separating the higher half of distribution from the lowerhalf. The average absolute deviation (AAD) of a data set isthe average of the absolute deviations and characterises asummary of statistical dispersion of the data set. For a setX = {x1, x2, . . . , xN} with a median denoted by Med(X),

AAD is defined as AAD(X) = 1N

∑Ni=1 |xi − Med(X)|.

Let N be an odd number and T and B denote respec-tively the set of top half and bottom half indexes, i.e.,T = {i | xi > Med(X)} and B = {i | xi < Med(X)}.Note that T and B are both of the same size. The aboveequation for calculating AAD can be simplified for an oddN as follows:

AAD(X) =1

N

( ∑i∈T

xi −∑i∈B

xi

). (1)

Notation: Throughout the paper we use EHEpk and DHE

sk

to denote the encryption and decryption algorithms of a ho-momorphic encryption scheme such as Paillier cryptosystemwith public and secret key pair (pk, sk). For the OPE al-gorithm we use EOPE

k and DOPEk to refer to the encryption

and decryption with key k. Key generation algorithms aredenoted by KeyGenHE and KeyGenOPE , respectively forHE and OPE schemes.

3 System Model

We consider a system including three players: a device, acarrier, and an application server. A user who has the devicewishes to obtain some service from the application server.The application server wishes to ensure that a legitimateuser is in possession of the device, but at the same timedoes not want to require user’s frequent active involvementto authenticate herself. Since the carrier is continuouslyproviding service to the device, it has sufficient behaviouralinformation on the user to be able to decide if the user inpossession of the device is the legitimate user or not. Hencea natural course of action for the application server wouldbe to consult with the carrier on the legitimacy of the user.

A typical protocol for the above scenario consists of thefollowing sequence of messages. First the device requeststhe service from the application server, which subsequentlysends a request to the carrier to pass its judgement onwhether the user in possession of the device is the legitimateuser or not. The carrier that has been continuously authen-ticating the user is then able to respond to the applicationserver’s request and the application server either providesor refuses service to the device accordingly. Figure 1 showsthis scenario. Our focus in this paper is the continuous im-plicit authentication protocol between the carrier and thedevice. Note that this protocol runs continuously and istransparent to the user.

Throughout the paper we use “device” to refer to boththe user and the device since the device is carrying out the

4

1 2

4

Imp. Auth.

3

App. Server

Device User Carrier

1. Device requests service2. AS requests authentication3. Carrier sends authentication decision4. AS provides service

Figure 1: The system model and scenario. We proposea privacy-preserving implicit authentication protocol (de-noted by Imp. Auth. above.), which is continuously carriedout by the device and the carrier to determine whether theuser in possession of the device is legitimate.

computations involved in the protocol. However note thatthe aim of implicit authentication is to authenticate the userin possession of the device.

Trust Assumptions and the Adversarial Model: Weassume the communication channels in the protocol aresecure and the information is communicated safely acrossthese channels. User data is stored in encrypted form atthe carrier. The device records user data, encrypts it andsends it to the carrier. No data used to develop the user pro-file in implicit authentication is stored on the device. Thisensures that if the device is compromised, the adversarycannot learn the user profile and simulate her behaviour.

We aim to protect the data collected by the device, andthus in our protocol we only consider such device data. Theinformation collected by the carrier is known to the car-rier and is not included. Selection of an appropriate set offeatures that allow sufficient distinguishability of users isoutside the scope of this paper. The goal here is to pro-vide privacy for user features that are used as part of theuser profile. Nevertheless, we give concrete examples of suchdistinguishing features from the literature in Section 3.2.

We assume that the carrier correctly follows the proto-col but tries to learn user data through the information itreceives by participating in the implicit authentication pro-tocol. This, we believe, is a reasonable assumption giventhe stature and reputation of carriers on one hand, and thedifficulty of tracing the source of data leakage on the other.We assume the device is used by the legitimate user for aperiod of time before being compromised. This is the periodduring which the user profile is constructed.

We consider two types of adversaries. Firstly, we considera less sophisticated adversary that tries to use a stolen de-vice without tampering with the hardware or the softwareand so the device is assumed to follow the protocol. Thisalso corresponds to the case that the authentication pro-gram resides in a tamper proof [27, 39] part of the deviceand cannot be modified by the adversary and so a captureddevice follows the protocol but takes input data from theadversary. We assume the program can be read by the de-vice holder, but cannot be changed. In the second case, thedevice behaves in a malicious way and may deviate from theprotocol arbitrarily to succeed in the authentication proto-col. This corresponds to the case where the device softwareor hardware is tampered with by the adversary in possessionof the device.

In both cases the system must guarantee privacy of theuser: that is, neither the carrier nor the adversary in posses-sion of the compromised device should learn the user profiledata. Naturally, a stolen device used by an illegitimate usermust also fail in authentication.

3.1 Authentication without Privacy

A user profile consists of a record of the distributions ofone or more behavioural features of the user. A featureis a random variable that can be sampled by the deviceand in combination with other features provides a reliablemeans of distinguishing users. We denote feature i by therandom variable Vi that is sampled at each authenticationrequest. If the authentication is successful, the sample isstored by the carrier and used as part of the distributionsamples for evaluation of future authentication requests.The variable distribution for the i-th feature is approxi-mated as Vi = (vi(t1), vi(t2), . . . , vi(tì)). Here, vi(tj) is thefeature value at time tj and ì is the length of the featurevector stored in the profile. This feature vector length is asystem parameter which is assumed to be an odd integer tosimplify the calculation of the mean. As discussed before,we only consider independent features.

Let us denote the user profile by U . The profile consistsof a tuple of n features; that is U = (V1, V2, . . . , Vn).

The implicit authentication protocol is carried out be-tween a carrier and a device. The protocol is carried outcontinuously and periodically. We consider one round ofauthentication in the following. At the beginning of eachround, the carrier is in possession of a user profile U thatconsists of features Vi. The device wishes to authenticate it-self to the carrier as a device whose user behaviour matchesthe recorded user profile U . The protocol works as fol-lows. The device samples the current features {vi(t)}ni=1

and reports them to the carrier. The carrier considers eachreported feature sample vi(t) and by comparing it to thesample distribution Vi = (vi(t1), vi(t2), . . . , vi(tì)) from theuser profile, decide how likely it is that the reported sam-ple belongs to this distribution. We call this likelihood the

5

start

finish

Current behaviour

Authentication successoutput:

input:

Authentication failure

Scoring algorithmrun

output:

Profile updaterun

policy satisfiedis

?

yes

noProfile

Update

Userprofiles

User

Profile

Figure 2: The authentication protocol flow in each round.Data flow is denoted by dashed arrows.

authentication score for the feature i, and denote it by si.Having calculated all the individual feature scores {si}ni=1,the carrier then decides based on a policy if the authen-tication succeeds or not. At the end of the round, if theauthentication succeeds, the carrier updates the user pro-file to include the reported sample in the recorded samples.Figure 2 shows the flow of the authentication protocol ineach round.

The authentication policy may vary between different car-riers and it is crucial for an implicit authentication protocolto be able to support various carrier authentication policies.An example of a policy is one that requires each score tobe above a certain threshold. Another carrier might requirethat at least a certain number of feature scores are abovetheir corresponding threshold values. A simple and popularauthentication policy is to require that a weighted linearcombination of feature scores is above a certain threshold.In this case, the feature scores are combined linearly to cal-culate a combined score as S = w1s1(t) + · · · + wnsn(t),where wi represents the weight assigned to the i-th featureand S is the combined authentication score.

Each individual feature score is calculated by the car-rier as the likelihood that a reported feature value belongsto the corresponding sample distribution recorded as partof the user profile. In this paper, we propose a simpleand effective method for calculating these scores as follows.The carrier first calculates a measure of dispersion for therecorded sample distribution, namely the average absolutedeviation (AAD). Then, the carrier considers an intervalcentred around the reported sample, with a length dou-ble the size of the AAD. The carrier counts the numberof recorded samples from the user profile that fall withinthe above interval and considers the proportion of recordedsamples that fall within the interval to be the score for thefeature. Intuitively, the closer the reported sample is to thecentre of concentration for the recorded samples, the morerecorded samples will fall in the above interval, and hence

the higher the feature score will be.More formally, let AAD(Vi) represent the average abso-

lute deviation of data in the set Vi. Also let the reportedvalue for the i-th feature at time t be denoted by vi(t). For afeature Vi we define our scoring function at time t as follows:

si(t) = Pr[ bli (t) ≤ Vi ≤ bhi (t) ], where (2)

bli (t) = vi(t)−AAD(Vi) and bhi (t) = vi(t) + AAD(Vi) .

The probability Pr[ bli (t) ≤ Vi ≤ bhi (t) ] is approximated bycounting the number of elements of Vi that fall within theinterval [bli (t), b

hi (t)] and dividing the count by the number

of all elements, i.e. ì, thus calculating the proportion ofelements that fall within the interval.

The above method can in theory work with any reason-able measure of dispersion instead of AAD. However, as willbe shown in Section 4, the choice of AAD(Vi) allows thecarrier to perform the required computation on encrypteddata.

3.2 Feature Selection

There are many works in the literature on distinguishingusers through their different usage patterns. Most of theseworks use multiple features extracted from different sourcesof measurement and then export all the extracted featuresto a server in which a machine learning algorithm is firsttrained and then employed to tell users apart. Since suchsolutions do not address the issue of privacy, they can affordto use sophisticated algorithms based on arbitrarily chosenfeatures which might not each be sufficiently discriminatingon their own between different usage patterns. Our authen-tication protocols, on the other hand, require features thatare reasonably discriminating on their own. A more carefullook at the literature reveals that many such features areindeed available. In the following we list some candidatesto be used as features in our protocols. Note that all the fol-lowing candidates can be categorised under “device data”,as described in Section 1.

Perhaps the most natural choice is the device location assensed by GPS sensors. Jakobsson et al. analysed the lo-cation traces for participants in their study and found thatthey tend to be concentrated in three clusters correspondingto where the user lives, works, and shops [30]. Furthermore,user’s location is highly correlated with the time of day andday of week. Hence, device latitude and longitude at spe-cific times of day and days of week make good choices asfeatures in our system. Effectively, implicit authenticationusing device location will then succeed with a good proba-bility if the device is being used in a “usual” location, andfail with a good probability otherwise.

The study by Kang et al. [34] provides a few other featurecandidates. They investigate smartphone usage patternsand their results show for example that although averagedaily device idle time does not vary much amongst differ-ent users, power consumption while idle as a percentage of

6

total power consumption by the device varies significantlybetween different users and hence may be considered a dis-tinguishing factor. They also find out that WiFi sessiondurations for different users are concentrated around consid-erably different average values which span about an order ofmagnitude of varying lengths. Perhaps more interestingly,their results also demonstrate that users exhibit differentand distinct habits in terms of when they start chargingtheir smartphone. The median battery level at the start ofcharging varies from around 20% for some users to around80% for others. Users with usually lower battery levels atthe start of charging are the ones who are comfortable towait until their battery is quite drained before worryingabout recharging, whereas those with usually higher bat-tery levels at the start of charging actively ensure that theyhave a good amount of charge on their battery most of thetime.

Another interesting study is that of Falaki et al. [18].They show among other results that users spend quitedifferent and distinctive amounts of time interacting withtheir smartphones during the period roughly correspondingto “normal working hours” of 9-to-5 (i.e., 9:00 to 17:00),whereas the patterns of their interaction times may not beas significantly distinguishable during other hours of dayor the weekend. This 9-to-5 interaction time is distributedaround an average which can vary between around 10 min-utes per hour to around 20 minutes per hour.

As a concrete system example, consider one that has thefollowing features: device location latitude and longitude,power consumption while idle (henceforth PCI), WiFi ses-sion length (henceforth WSL), and battery level at the startof charging (henceforth BLS). Let all the features be re-ported on an hourly basis, with latitude and longitude beingthe GPS reading at the time, BCI being the power consump-tion while idle in the past hour as a percentage of the totalpower consumption in the past hour, WSL being the totalWiFi session length in minutes in the past hour, and BLSbeing reported in percentage and only present if charginghas started in the past hour. A possible implicit authenti-cation policy may be as follows: scores from latitude andlongitude are first considered; if they are both above certainthresholds, then at least one of the other scores, i.e., scoresfrom PCI, WSL, and possibly BLS, needs to be above a cer-tain threshold for implicit authentication to succeed; other-wise, all of the other scores need to be above certain thresh-olds for implicit authentication to succeed. Effectively, insuch a system if the device is located where it is usuallyused, implicit authentication succeeds if the usage pattern(expressed as PCI, WSL, and BLS) loosely follows the pre-vious usage pattern of the device, and if the device is locatedsomewhere else, the usage pattern must strictly conform tothe previous usage pattern for implicit authentication tosucceed.

For all of the features mentioned above, the reported us-age pattern seems to be highly dependent on the time of

day and day of week. Hence, it would make sense to com-pare a reported feature by the device only to those in therecorded usage history profile that belong to the same timeof day and day of week. That is, usage pattern reportedon a Wednesday at 17:00 would only be compared to usagepattern history on previous Wednesdays at around the sametime.

Note that although we have focussed on measurementsof continuous variables (such as latitude and longitude,amount of power, and duration of time) in our examplesabove, we expect that our proposed protocols would workjust as well for any discrete variable or in fact for any or-dered nominal data type. As an example of a discrete vari-able, the 9-to-5 interaction time discussed above may bereplaced by the number of sessions of interactions during9-to-5, which as Falaki et al. show is also distinct betweendifferent users [18]. As an example of an ordered nominalvariable, the activity classes that the Jigsaw sensing en-gine [37] provides may be considered as a feature in oursystem. Using the accelerometer sensor, Jigsaw is able torobustly classify user’s activity as being stationary, walking,running, cycling, and in a moving vehicle. Assuming usershave certain distinctive routines of for example cycling towork around a certain hour of day, the activity output byJigsaw in different hours of day can potentially constitutefeatures that can reasonably distinguish between differentuser behaviours. In some cases, other scoring functions suchas those suggested by Jakobsson et al. [30], e.g. estimatingPr[Vi = vi] or Pr[Vi ≥ vi] instead of what we propose (seeEquation 2), would be more appropriate. Our protocolsare generic in the sense that they can be easily modified tosupport such variants.

4 Privacy-Preserving Authentica-tion

At the heart of the authentication protocol proposed in theprevious section is the score computing algorithm. It ba-sically takes two inputs: the stored distribution and thefresh device sample, and it produces a feature score. Allthe computation takes place at the carrier side, given thetwo inputs above, where the former is stored by the carrier,and the latter is provided by the device. Both inputs are inplaintext. In this section, we focus on this algorithm andprovide a two-party score computing protocol that is ableto calculate the feature score from encrypted profiles storedat the carrier and encrypted fresh samples provided by thedevice, where the decryption keys are only known to thedevice.

We chose to provide private protocols for score computa-tion on the feature score level, as opposed to the combinedscore level, for two reasons: first, different carriers mighthave different authentication policies, and our formulationleaves the choice of a specific authentication policy open

7

for the carrier; second, we consider it an overkill to requirethat the carrier only finds out a potential combined scoreand nothing about the individual scores, and indeed solu-tions satisfying such a requirement are likely to be inefficientin practice.

In the following we propose a protocol between a deviceand a carrier that enables the carrier to calculate a featurescore for the device, while provably guaranteeing that noinformation about the stored profile at the carrier is revealedto the device other than the AAD of the stored featurevalues, and no information about the fresh feature valueprovided by the device is revealed to the carrier other thanhow it is ordered with respect to the stored profile featurevalues.

4.1 A Protocol Secure against Semi-Honest Adversaries

Let HE = (KeyGenHE , EHE , DHE ) be a homomorphicencryption scheme, such as the Paillier cryptosystem,and OPE = (KeyGenOPE , EOPE , DOPE ) be an order-preserving encryption scheme. The protocol Π we proposeconsists of four phases: system setup, (user) profile initial-isation, authentication, and profile update. System setupand profile initialisation are carried out once per device, butafterwards the authentication and profile update phases arecarried out once per authentication round. Authenticationrounds are carried out periodically and continuously. Theprotocol works as follows:

Phase 1. System Setup: Performed once for each de-vice, KeyGenHE and KeyGenOPE are run by the deviceto generate the HE key pair (pk, sk) and the OPE key k2.Public parameters of the two encryption systems HE andOPE, including pk, are communicated to the carrier.

Phase 2. Profile Initialisation: This phase is per-formed only once for each device to record the initial ìfeature readings and compute an initial AAD for each fea-ture. During this phase the device is assumed to be hon-est. Recall that implicit authentication requires a periodof honest device usage to set up a usage profile based onwhich it is subsequently able to authenticate the user. Dur-ing this phase, the device periodically sends HE and OPEencrypted feature readings ei(t) = EHE

pk (vi(t)) and e′i(t) =

EOPEk2

(vi(t)) to the carrier. The communications end afterì feature readings. At the end of this phase, the carrierhas 2ì ciphertexts for the i-th feature: { ei(tj), e′i(tj) }ìj=1.Since the OPE ciphertexts e′i(tj) enable the carrier to com-pare the corresponding plaintexts, the carrier is able tofind the HE encryption of the median of the feature read-ings EHE

pk (Med(Vi)), where Med(Vi) denotes the median of

{ vi(tj) }ìj=1. The carrier finds the indexes of the top andbottom half of plaintexts with respect to the median. Let

us denote the set of top half indexes by Ti and the set ofbottom half indexes by Bi. In other words:

Ti = {j|vi(tj) > Med(Vi)}, Bi = {j|vi(tj) < Med(Vi)}.

Now the carrier uses the homomorphic property of HE tocompute the encryption of AAD based on Equation 1 asfollows:

EHEpk (AAD(Vi)) = `−1

i ·(∑j∈Ti

ei(tj)−∑j∈Bi

ei(tj)).

The setup and profile initialisation phases are now completeand from now on, the system will enter the mode in whichthe device is not trusted any more. In this mode, a contin-uous and periodic succession of authentication and profileupdate phases will be carried out.

Phase 3. Authentication: The device and the carrierenter the authentication phase with the carrier holding aprofile of the device user including ì HE ciphertexts forthe i-th feature: { ei(tj) = EHE

pk (vi(tj)) }ìj=1, the ì corre-sponding OPE ciphertexts for the i-th feature: { e′i(tj) =

EOPEk2

(vi(tj)) }ìj=1, and the HE encryption of the AAD of

the features EHEpk (AAD(Vi)). The device reports to the car-

rier the encryptions of a new reading as follows:

ei(t) = EHEpk (vi(t)) and e′i(t) = EOPE

k2 (vi(t)) .

The HE ciphertext allows the carrier to perform necessarycomputations, namely addition and scalar multiplication, inthe encrypted domain, while the OPE ciphertext helps thecarrier find the order information necessary to the compu-tation. The carrier calculates EHE

pk (bli (t)) and EHEpk (bhi (t))

as follows:

EHEpk (bli (t))← EHE

pk (vi(t))− EHEpk (AAD(Vi)) ,

EHEpk (bhi (t))← EHE

pk (vi(t)) + EHEpk (AAD(Vi)) .

The carrier however does not know the order of the newlygenerated encrypted values with respect to the stored ci-phertexts in the user profile. To find the order, the car-rier interacts with the device as follows: the carrier firstsends EHE

pk (bli (t)) and EHEpk (bhi (t)) back to the device for all

features. The device decrypts the ciphertexts using the de-cryption function DHE

sk and obtains bli (t) and bhi (t), and thenencrypts them to compute the following OPE ciphertexts:

cli (t) = EOPEk2 (bli (t)) and chi (t) = EOPE

k2 (bhi (t)) .

The device sends cli (t) and chi (t) back to the carrier. Thecarrier computes the individual score si(t) as the numberof the OPE ciphertexts e′i(tj) in the profile that satisfycli (t) ≤ e′i(tj) ≤ chi (t). Note that this condition is equiv-alent to bli (t) ≤ vi(tj) ≤ bhi (t). Note that the scores are allcalculated in parallel, and in only three rounds of interac-tion. The final authentication decision is then made by the

8

Device Carrier

ei(t) = EHEpk (vi(t))

∀i ∈ [1, n] : Calculate

{ei(t), e′i(t)}ni=1

∀i ∈ [1, n] : CalculateEHE

pk (bLi (t)) = EHEpk (vi(t))− EHE

pk (AAD(Vi))EHE

pk (bHi (t)) = EHEpk (vi(t)) + EHE

pk (AAD(Vi))

{EHEpk (bLi (t)), EHE

pk (bHi (t))}ni=1

cLi (t) = EOPEk2

(bLi (t))∀i ∈ [1, n] : Calculate

cHi (t) = EOPEk2

(bHi (t))

{cLi (t), cHi (t)}ni=1

∀i ∈ [1, n] : Calculate si(t)

Calculate final authentication score

e′i(t) = EOPEk2

(vi(t))

Figure 3: The authentication phase of our protocol Π

carrier based on its authentication policy, e.g. the weightedsum method described earlier in Section 3.1. If implicit au-thentication is not successful, the device is challenged onan explicit authentication method, e.g. the user is loggedout of a service and prompted to log in anew by providinga password. If either implicit or explicit authentication issuccessful, the carrier enters the profile update phase. Fig-ure 3 shows the interaction diagram of the authenticationphase of the protocol.

Phase 4. Profile Update: The carrier enters this phaseafter a successful implicit or explicit authentication. Thecarrier has the ciphertext for a new feature value ei(t) =EHEpk (vi(t)), and from the authentication phase it knows

how vi(t) compares with the previously recorded features{vi(tj)}ìj=1. The carrier updates the recorded features andthe AAD as follows. Assume ei(t1) is the ciphertext cor-responding to the oldest feature and it is to be omittedfrom the feature list, and instead the new feature cipher-text ei(tì+1) = ei(t) added. Let T old

i and Boldi respectively

denote the set of top and bottom half indexes for the oldfeatures {vi(tj)}ìj=1, and T new

i and Bnewi denote sets de-

fined similarly for the updated features {vi(tj)}ì+1j=2 . Also

let AADold(Vi) and AADnew(Vi) denote the old and updatedAADs. We have

EHEpk (AADold(Vi)) = `−1

i ·( ∑j∈T old

i

ei(tj)−∑

j∈Boldi

ei(tj)),

EHEpk (AADnew(Vi)) = `−1

i ·( ∑j∈Tnew

i

ei(tj)−∑

j∈Bnewi

ei(tj)).

Let us denote by ∆i the difference between the two AADciphertexts times ì, i.e.

∆i = ì · ( EHEpk (AADnew(Vi))− EHE

pk (AADold(Vi)) ).

If ∆i is calculated, the updated AAD can be calculated asfollows:

EHEpk (AADnew(Vi)) = EHE

pk (AADold(Vi)) + `−1i ·∆i.

Let \ denote the set difference operation. To calculate ∆i

given T oldi , Bold

i , T newi , and Bnew

i , the carrier computes thefollowing:

∆i =∑

j∈Tnewi \T old

i

ei(tj) −∑

j∈T oldi \Tnew

i

ei(tj)

−∑

j∈Bnewi \T old

i

ei(tj) +∑

j∈Boldi \Tnew

i

ei(tj) .

Note that each of the above four set differences includes atmost one element. This means the profile update phase canbe carried out very efficiently. At the end of this phase,the carrier holds a set of updated feature ciphertexts andan updated AAD ciphertext. The carrier will enter the au-thentication phase afterwards and wait for a new featurereading to be reported from the device.

Complexity: We discuss the computation complexity ofthe profile initialisation, authentication, and profile updatephases of our protocol Π in the following. We also imple-mented Paillier and OPE to confirm computation bench-marks in the literature, and calculate concrete runningtimes for our protocol. In the following we analyse thecomputation complexity of the protocol for one feature. Tocalculate approximate execution times for multiple features,the figures may be multiplied by the number of features.

In the profile initialisation phase, the device calculatesa total of ì HE encryptions and ì OPE encryptions, andthe carrier calculates ì ciphertext-space homomorphic ad-ditions and 1 ciphertext-space homomorphic scalar multi-plication. Recall that this phase is only executed once.

The computation in the authentication phase is domi-nated by 1 homomorphic encryption, 2 homomorphic de-cryptions, and 3 order-preserving encryptions on the deviceside, and 2 ciphertext-space homomorphic additions (imple-mented in Paillier scheme by multiplications) on the carrierside, for each feature.

In the profile update phase, the carrier performs 4ciphertext-space homomorphic additions and 1 ciphertext-space homomorphic scalar multiplication.

For typical parameters and on platforms comparable totoday’s smart-phones, HE encryption, decryption, and OPEencryption and decryption each take at most in the or-der of a few tens of milliseconds, as reported by previousworks on the implementation of Paillier, and recent works

9

on the implementation of OPE. We confirm these bench-marks through implementations of our own. Hence, we cansee that the authentication phase for one feature is almostreal-time. For multiple features, this phase can take at thelongest in the order of a second to complete, which is rea-sonable given that there is no requirement for implicit au-thentication to be real-time.

To confirm the efficiency of Paillier homomorphic encryp-tion and OPE implementations, we have benchmarked bothschemes using Java-based implementations for both on anIntel 2.66 GHz core 2 duo processor (which is comparable tothe processors of today’s smartphones) while running otherprocesses (including web browser, word processor, terminal,music player etc.) in the background. Hyper threading wasnot activated and only one core was used by the implemen-tation.

For Paillier with 1024-bit keys (i.e. moduli) we have foundthat encryption takes 26 ms (milliseconds), decryption takes35 ms, and both homomorphic addition and homomor-phic scalar multiplication take negligible time comparatively(both are more than 500 times faster). We did not apply anyoptimisation techniques. These benchmarks are comparableto the ones reported in the literature on PCs with compara-ble specifications. Basu, Kikuchi, and Vaidya report compa-rable results, namely encryption times of 17 and 125 ms, anddecryption times of 17 and 124 ms, with 1024-bit and 2048-bit keys, respectively [3]. Jakobsen, Makkes, and Nielsenreport encryption times of 8 and 17 ms for optimised Pail-lier with 1024-bit and 2048-bit keys, respectively [29]. Weuse our own 1024-bit Paillier benchmarks, i.e. HE encryp-tion: 26 ms, HE decryption: 35 ms, to demonstrate howefficient a simple implementation of our protocol can be inpractice. Although, from the above examples it can be seenthat optimised versions of the protocol can achieve higherefficiency even when 2048-bit keys are used. To providesome insight into how efficiency can be improved using op-timisation techniques, we also give concrete execution timesfor our protocol using the best benchmarks known to us (asdiscussed above), i.e. HE encryption: 8 ms, HE decryption:17 ms.

We have also implemented the OPE scheme proposedin [5]. Our implementation was independent of the onlytwo other implementations of such scheme in the literatureknown to us, which are parts of the encrypted database sys-tems CryptDB [46] and Monomi [57]. Using typical featuresfor implicit authentication, a maximum plaintext lengtharound 100 bits seems to be sufficient for our protocol Π.Execution times required for encryption and decryption inour implementation are at most 56 ms for plaintext lengthsbetween 100 and 1000 bits5. CryptDB authors report initialencryption time of 25 ms for 32-bit plaintexts [46], and were

5OPE complexity depends on ciphertext length as well as plaintextlength. In our implementations, we have considered combinations ofplaintext sizes of 100 and 1000 bits and ciphertext sizes of 10, 100,and 1000 kilobits.

able to optimise the encryption to bring the encryption timedown to 4 ms [47]. We use the 56-ms benchmark to calcu-late concrete execution times for a simple implementationof our protocol. However, as the CryptDB implementationdemonstrates, execution times can be as much as around10 times lower. Hence, we also provide concrete executiontimes based on the optimised benchmarks above, i.e. OPEencryption / decryption: 4 ms. This optimised benchmarkassumes that features may be expressed in 32 bits, whichis a reasonable assumption for the majority of features pro-posed in the literature for implicit authentication. In factall the candidate features listed in Section 3.2 can be ex-pressed in 32 bits. GPS coordinates are usually expressedin the NMEA 0183 standard format [1] which for a one me-ter precision would consist of at most 9 digits: at most 3 fordegrees, 2 for minutes, and at most 4 for decimal minutes.9 digits may be expressed in 30 bits. An additional bit isneeded to indicate either N/S or E/W. Hence latitude andlongitude can each be expressed in 32 bits. Other discussedcandidates such as power consumption in percentage, WiFisession duration in minutes, battery level in percentage, andinteraction time in minutes can be immediately seen to beexpressible in less than 32 bits.

Using the above sets of benchmarks in two categories:simple and optimised benchmarks, we can estimate concretetimes for our protocol on the device side. Table 1 sum-marises the computational complexity of the profile initiali-sation, authentication, and profile update phases of protocolΠ for one feature. On the device side, initialisation phasecomplexity is reported for each period in which the devicereports a pair of ciphertexts, whereas on the carrier-side, thecomputation of the initial AAD is assumed to take place atonce at the end of the phase, hence the reported complexityis for the whole initialisation phase. On the carrier side, thecomputations are limited to ciphertext-space homomorphicadditions and scalar multiplications which are in the orderof hundreds of time faster than HE and OPE encryption.Besides, the carrier side naturally has much more compu-tational power than the device side. Hence, despite themultiplicative factor ì (typically in the order of 100) in thecomplexity of the profile initialisation phase on the carrierside, the concrete execution time for all phases on the car-rier side ends up being negligible compared to those on thedevice side.

Finally, note that the authentication phase for each fea-ture takes less than 300 ms on the device side and neg-ligible time on the carrier side, even with a non-optimisedimplementation. The concrete system example given in Sec-tion 3.2 involves at most five features and hence the totalsystem authentication time will be at most five times theabove figure. Considering the whole process is executed im-plicitly as a background process, the overhead of introducingprivacy is not significant.

In terms of communication complexity, for each featurethe device needs to send one HE ciphertext and 3 OPE

10

Phase Complexity Concrete Times(simple) (optimised)

Device-side:Init. tHE + tOPE 82 ms 12 msAuth. tHE + 2tHD + 3tOPE 264 ms 54 msCarrier-side:Init. ìtHA + tHM negl. negl.Auth. 2tHA negl. negl.Upd. 4tHA + tHM negl. negl.

Table 1: Computation complexity of protocol Π based onone feature assuming ì = 100. Legend: Init: profile ini-tialisation phase, Auth: authentication phase, Upd: pro-file update phase, tHE , tHD : HE encryption and decryp-tion times, tOPE : OPE encryption time, tHA, tHM : HEciphertext-space addition and scalar multiplication times,negl: negligible.

ciphertexts in each round of authentication, and the carrierneeds to send 2 HE ciphertexts. Each HE ciphertext is 1 kb(kilobits) and each OPE ciphertext may be implemented as10 kb for typical plaintext sizes in our scheme. This meansthat the device needs to send less than 4 kB (kilo Bytes) andreceive around 0.25 kB in each round of communication.

Security: We discuss the security of our protocol con-sidering semi-honest devices and carriers in Appendix B.We provide a formal definition of privacy for our protocolagainst honest-but-curious devices and carriers. The def-inition intuitively guarantees that by participating in theprotocol, the device only learns the AAD of the usage datastored at the carrier side, and the carrier only learns littlebeyond the order information of the current sample with re-spect to the stored data. We argue that the AAD and orderinformation learned during the protocol reveal little aboutthe actual content of the data in question, and hence ourdefinition guarantees a high level of privacy. Eventually, inAppendix B.1, we prove the following theorem guaranteeingthe privacy of our protocol:

Theorem 1 Our protocol Π is provably secure againstsemi-honest devices and semi-honest carriers.

4.2 Securing the Protocol against Mali-cious Devices

In the above version of the protocol, secure against hon-est but curious adversaries, in the authentication phasethe carrier interacts with the device as follows: the carriersends homomorphic ciphertexts EHE

pk (bli (t)) and EHEpk (bhi (t))

to the device and the device is expected to reply backorder-preserving ciphertexts of the same plaintexts, i.e.EOPEk2

(bli (t)) and EOPEk2

(bhi (t)). These order-preserving ci-phertexts are subsequently used to compare the values ofbli (t) and bhi (t) in the order-preserving ciphertext space with

the feature values and find out how many feature values liebetween bli (t) and bhi (t). However, a malicious device cannotbe trusted to return correctly formatted order-preserving ci-phertexts.

First, we note that the device cannot be forced to use anhonest feature value vi(t) to start with. In the absence ofa trusted hardware such as tamper-proof hardware, the de-vice may enter the interaction with the carrier on any arbi-trary input. Even with the recent advances in smartphonetechnology, e.g. ARM’s TrustZone6, the device cannot beprevented to change the sensor readings unless the wholealgorithm is run in the so called Trusted Execution Environ-ment (TEE). However, the device can be required to showthat the ciphertext EHE

pk (vi(t)) is well-formed. To enforcethis requirement, we require that the device sends a proof ofknowledge of the corresponding plaintext vi(t) along withthe ciphertext EHE

pk (vi(t)). Efficient proofs of knowledge ofplaintext exist for most public key encryption schemes. ForPaillier encryption, a concrete and efficient interactive proofprotocol can be found in [4]. The protocol can be madenon-interactive using the well-known Fiat-Shamir heuristic,by replacing the random challenge generated by the verifierwith the hash of the protocol parameters concatenated withthe message sent by the prover in the first round. We de-note the resulting proof of knowledge of the plaintext vi(t)by PoK{vi(t)}.

Apart from inclusion of the above proof of knowledge,further modification is required to make the protocol se-cure against malicious devices. The main idea here is asfollows: instead of asking the device for order-preservingciphertexts, the ability to interact with the device is usedto directly compare bli (t) and bhi (t) with the feature val-ues, only using the homomorphic ciphertexts. Assume thatthe carrier wishes to compare bli (t) with vi(tj). The carrierhas homomorphic encryptions of both, i.e. EHE

pk (bli (t)) with

EHEpk (vi(tj)), and hence can calculate EHE

pk (bli (t) − vi(tj)).The carrier is hence interested in knowing whether bli (t) −vi(tj) is positive, negative, or zero. In the following, weshow how the carrier is able to interact with the device anddetermine whether the above value is positive, negative, orzero, without the device being able to cheat or to have a no-ticeable chance of finding out some information about thevalue in question.

In the following, we propose a modified version of the pro-tocol secure against malicious devices. We call this modifiedversion Π?. Let HE = (KeyGenHE , EHE , DHE ) be a ho-momorphic encryption scheme, such as Paillier cryptosys-tem. The protocol Π? consists of four phases: system setup,(user) profile initialisation, authentication, and profile up-date. The profile initialisation phase is exactly the same asthat of the protocol Π described in Section 4.1, and thusis not repeated here. System setup is carried out once foreach device, but afterwards the authentication and profile

6www.arm.com/products/processors/technologies/trustzone

11

www.arm.com/products/processors/technologies/trustzone

update phases are carried out once per each authenticationround. Authentication rounds are carried out periodicallyand continuously. The protocol works as follows:

Phase 1. System Setup: This phase is performed oncefor each device. KeyGenHE is run by the device to generatethe HE key pair (pk, sk). The public key pk is communi-cated to the carrier. The private key sk is kept by thedevice.

Phase 2. Profile Initialisation: This phase is per-formed only once for each device to record the initial ì fea-ture readings and compute an initial AAD for each feature.During this phase the device is assumed to be honest. Thisphase is similar to the user initialisation phase in protocolΠ, but here there are no OPE ciphertexts involved. Duringthis phase, the device periodically sends HE encrypted fea-ture readings ei(t) = EHE

pk (vi(t)) to the carrier. The devicealso keeps a record of vi(t) values and along with each HEciphertext, it sends the order information of the value withrespect to the previous values to the carrier. The commu-nications end after ì feature readings. At the end of thisphase, the carrier has ì ciphertexts for the i-th feature:{ ei(tj) }ìj=1 and the ordering information about the corre-sponding plaintexts. Since the carrier knows the ordering ofplaintexts, it is able to find the encryption of the median ofthe feature readings EHE

pk (Med(Vi)), where Med(Vi) denotes

the median of { vi(tj) }ìj=1. The carrier finds the indexesof the top and bottom half of plaintexts with respect to themedian. Let us denote the set of top half indexes by Ti andthe set of bottom half indexes by Bi. The carrier uses thehomomorphic property of HE to compute the encryption ofAAD based on Equation 1 as follows:

EHEpk (AAD(Vi)) = `−1

i ·(∑j∈Ti

ei(tj)−∑j∈Bi

ei(tj)).

The device deletes the record of vi(t) values it has beenkeeping at the end of this phase. The setup and initialisationof the system are complete and from now on, the systemwill enter the mode in which the device is not trusted anymore. In this mode, a continuous and periodic successionof authentication and profile update phases will be carriedout.

Phase 3. Authentication: The device and the carrierenter the authentication phase with the carrier holding aprofile of the device user including ì HE ciphertexts for thei-th feature: { ei(tj) = EHE

pk (vi(tj)) }ìj=1 and the HE en-

cryption of the AAD of the features EHEpk (AAD(Vi)). The

device reports to the carrier the HE encryption of a newreading ei(t) = EHE

pk (vi(t)). The device also sends a proofof knowledge of the plaintext PoK{vi(t)} to show that theciphertext is well-formed. The carrier verifies the proof of

knowledge and if the verification fails, it deems authentica-tion failed. Otherwise, the carrier calculates the followingusing the homomorphic property:

EHEpk (bli (t)) ← EHE

pk (vi(t))− EHEpk (AAD(Vi))

EHEpk (bhi (t)) ← EHE

pk (vi(t)) + EHEpk (AAD(Vi))

Now the carrier needs to find out how bli (t) and bhi (t) com-pare to { vi(tj) }ìj=1 to be able to count the number of ei(tj)values that fall between bli (t) and bhi (t) for the purpose ofauthentication. The carrier also needs to find out how thenew reported reading vi(t) compares to the previous ones{ vi(tj) }ìj=1, so that if the authentication succeeds, it hasthe ordering information necessary to update the profile ac-cordingly in the profile update phase. Let us define for all iand j:

δlij = bli (t)− vi(tj),δij = vi(t)− vi(tj), and

δhij = bhi (t)− vi(tj) .

To compare any of the above values, i.e., bli (t), vi(t) and

bhi (t), with vi(tj), the carrier needs to find out if the corre-sponding differences, i.e., δlij , δij , and δhij , as defined above,are each negative, zero, or positive. To achieve this, thecarrier first calculates for all j ∈ [1, ì] the ciphertextsEHEpk (δlij), EHE

pk (δij), and EHEpk (δhij) using the homomor-

phic property of the encryption scheme from EHEpk (vi(tj)),

EHEpk (vi(t)), E

HEpk (bli (t)), and EHE

pk (bhi (t)). Then the carrierchooses ì random bits, and for each j ∈ [1, ì] based on thej-th bit either leaves the calculated HE ciphertext tripletsas is, or calculates the ciphertext triplets for −δlij , −δij , and−δhij through ciphertext homomorphic scalar multiplication

by −1. Let us denote these ciphertexts by EHEpk (±δlij),

EHEpk (±δij), and EHE

pk (±δhij). This makes sure that the thesedifferences are distributed independently of the value of thecurrent reading in terms of being positive or negative. Thatis, on any vi(t) a similar number of the differences will bepositive or negative.

Assume the i-th feature values belong to the interval[mini,maxi] with a range di = maxi−mini. This meansδij ∈ [−di, di]. The carrier chooses σì random values

{{δ′ijk}ìj=1}σk=1 from the interval [−di, di], where σ is a secu-rity parameter. The values {δ′ijk}σk=1 serve as values amongwhich δij will be hidden. Also note that δlij = δij−AAD(Vi)and δhij = δij + AAD(Vi). Let us define analogouslyδ′lijk = δ′ijk − AAD(Vi) and δ′hijk = δ′ijk + AAD(Vi). Thecarrier now calculates the corresponding ciphertexts for the“fake” difference values as follows: for all j and k it calcu-lates EHE

pk (δ′ijk), and then EHEpk (δ′lijk) and EHE

pk (δ′hijk).

The carrier then puts together the following set of valuesfor all j ∈ [1, ì] and all k ∈ [1, σ]: EHE

pk (δlij), EHEpk (δij),

EHEpk (δhij), E

HEpk (δ′ijk), EHE

pk (δ′lijk)), and EHEpk (δ′hijk). The car-

rier shuffles these values and sends them to the device. The

12

device decrypts the ciphertexts and replies to the carrier in-dicating whether each ciphertext corresponds to a positive,zero, or negative plaintext. The device is able to computeAAD(Vi) and also distinguish the three sets of values: thedifferences δij and δ′ijk, the differences minus AAD, and thedifferences plus AAD. However, among the differences δijand δ′ijk the device cannot distinguish between “real” and“fake” values. The carrier on the other hand, knows whatthe response should be for all fake differences δ′ijk. Also ifδij is positive, then the carrier knows that δhij should be alsopositive, and if δij is negative, then the carrier knows thatδlij should be also negative. Hence upon receiving the re-sponses the carrier checks if these responses are correct andif not the authentication is deemed failed. The idea here isthat since all the σ+1 differences (real and fake altogether)look indistinguishable to the device, a malicious device hasat most 1

σ+1 chance of cheating and not getting caught. σ isa security parameter of the protocol and controls a trade-offbetween complexity and security. The larger σ is, the lesschance there is for a malicious device to cheat, but at thesame time the higher the complexity of the protocol is.

If the responses pass all the checks, from the responsesfor the real differences, the carrier is able to find out howeach of bli (t), vi(t), and bhi (t) compare to { vi(tj) }ìj=1. Thecarrier computes the individual score si(t) as the numberof vi(tj) that are between bli (t) and bhi (t). The final au-thentication decision is then made by the carrier based onits authentication policy, e.g. the weighted sum method de-scribed earlier in Section 3.1. If implicit authentication isnot successful, the device is challenged on an explicit au-thentication method, e.g. the user is logged out of a serviceand prompted to log in anew by providing a password. Ifeither implicit or explicit authentication is successful, thecarrier enters the profile update phase. Figure 4 shows theinteraction diagram of the authentication phase of the pro-tocol.

Phase 4. Profile Update: The carrier enters this phaseafter a successful implicit or explicit authentication. Thecarrier updates the recorded features and the AAD in thisphase. The calculations in this phase are the same as thoseof the profile update phase in protocol Π. At the end of thisphase, the carrier holds a set of updated feature ciphertextsand an updated AAD ciphertext. The carrier will enter theauthentication phase afterwards and wait for a new featurereading to be reported from the device.

Complexity: We discuss the computation complexity ofthe profile initialisation, authentication, and profile updatephases of our protocol Π? in the following. We also calcu-late concrete running times for the protocol. Like before, weanalyse the computation complexity of the protocol for onefeature. To calculate approximate execution times for mul-tiple features, the figures may be multiplied by the numberof features.

Device Carrier

ei(t) = EHEpk (vi(t)),PoK{vi(t)}∀i ∈ [1, n] : Calculate

{ei(t),PoK{vi(t)}}ni=1

∀i ∈ [1, n] : CalculateEHEpk (bLi (t)) = EHEpk (vi(t))− EHEpk (AAD(Vi))EHEpk (bHi (t)) = EHEpk (vi(t)) + EHEpk (AAD(Vi))

Calculate response ∈ {−, 0,+}for each value received

all responses

∀i ∈ [1, n] : Calculate si(t)

Calculate final authentication score

∀i ∈ [1, n]∀j ∈ [1, ì] : CalculateEHEpk (±δLij), EHEpk (±δij), EHEpk (±δHij)

∀i ∈ [1, n]∀j ∈ [1, ì]∀k ∈ [1, σ] :Choose random δ′ijk, CalculateEHEpk (δ′Lijk), EHEpk (δ′ijk), EHEpk (δ′Hijk)

EHEpk (±δLij), {EHEpk (δ′Lijk)}σk=1

EHEpk (±δij), {EHEpk (δ′ijk)}σk=1

EHEpk (±δHij), {EHEpk (δ′Hijk)}σk=1

∀i ∈ [1, n]∀j ∈ [1, ì] :

Check responsesfor known values

Figure 4: The authentication phase of our protocol Π?

The profile initialisation and update phases are similar tothose of protocol Π with the exception that OPE ciphertextsare no more involved.

The authentication phase on the other hand differs sub-stantially from that of protocol Π. In the authenticationphase, the protocol requires 1 homomorphic encryption, 1proof of knowledge generation, and (σ + 1)ì homomor-phic decryptions on the device side. Given that the proofof knowledge generation takes only a couple of multiplica-tions, the computation complexity here is dominated by(σ + 1)ì homomorphic decryptions. On the carrier side,the following computations are required: 1 proof of knowl-edge verification (roughly as complex as 1 multiplication), 2homomorphic ciphertext additions to calculate EHE

pk (bli (t))

and EHEpk (bhi (t)) (roughly as expensive as a multiplication

each), then 3ì homomorphic ciphertext additions to calcu-late EHE

pk (δlij), EHEpk (δij), and EHE

pk (δhij), then an expected12ì homomorphic ciphertext scalar multiplications to cal-culate EHE

pk (±δlij), EHEpk (±δij), and EHE

pk (±δhij), then σìhomomorphic encryptions to calculate EHE

pk (δ′ijk), and fi-nally 2σì homomorphic ciphertext additions to calculateEHEpk (δ′lijk)) and EHE

pk (δ′hijk). This means on the carrier sidethe total computation cost is dominated by σì homomor-

13

phic encryption operations.

Choosing a small σ means that a malicious device iscaught at the time of protocol execution with lower proba-bility, however, the device does not gain meaningful advan-tage by cheating and will not have a higher chance of suc-ceeding in authentication. Hence, even a small σ providesa reasonable level of protection against malicious devices.Consequently, we consider σ to be a small multiplicativefactor and will be able to state that the complexity of themodified protocol is approximately proportional to ì. Inother words, the complexity grows linearly with the size ofthe user profile.

Note that finding how each of the ciphertexts EHEpk (bli (t)),

EHEpk (vi(t)), and EHE

pk (bhi (t)) compare with the recorded fea-tures can be carried out in log ì rounds (instead of at once)through a binary search. That is, since the carrier knows theordering of the recorded profile features, in each round thecarrier can ask the device to help with comparing the aboveciphertexts with one recorded feature value, and based onthe answer to each round decide which recorded featurevalue to use for comparison in the next round. This is atrade-off between the round complexity and the communi-cation complexity. Carrying out the comparison in this wayrequires log ì rounds of communication (instead of one),σ log ì homomorphic encryption operations on the serverside, and σ log ì homomorphic decryption operations on theclient side. Thus this trade-off brings the communicationcomplexity down to a logarithmic function of the size of theuser profile. We consider this a reasonable price to be paidfor protection against malicious devices.

To give concrete examples, consider σ = 9 (which meansa cheating device is caught immediately with probability 1

10each time it deviates from the protocol) and a typical profilesize of ì = 100.

Table 2 summarises the computational complexity ofthe profile initialisation, authentication, and profile updatephases of protocol Π? for one feature, using the same bench-marks as in the previous section. Similar to before, on thedevice side, initialisation phase complexity is reported foreach period, whereas on the carrier-side, the reported com-plexity is for the whole initialisation phase. On the carrierside, the computations in the profile initialisation and up-date phases are limited to ciphertext-space homomorphicadditions and scalar multiplications which end up being neg-ligible compared to the other computation times. The au-thentication phase however requires σ log ì homomorphicencryptions on the carrier side. To merely calculate a nom-inal concrete execution time for the carrier-side, we assumethat the carrier has 10 times the processing power of thedevice. This assumption gives us the figures for concreteexecution times for the authentication phase on the carrierside reported in Table 2. Of course, the concrete figures inthis case are to be treated as merely an indication of theefficiency of the protocol.

Finally, considering the execution times on both sides,

Phase Complexity Concrete Times(simple) (optimised)

Device-side:Init. tHE 26 ms 8 msAuth. (σ + 1) log ìtHD 2326 ms 1130 msCarrier-side:Init. ìtHA + tHM negl. negl.Auth. σ log ìtHE 156 ms 48 msUpd. 4tHA + tHM negl. negl.

Table 2: Computation complexity of protocol Π? based onone feature assuming σ = 9, ì = 100, and that the carrierside has 10 times the computation power of the device side.Legend: Init: profile initialisation phase, Auth: authen-tication phase, Upd: profile update phase, tHE , tHD : HEencryption and decryption times, tHA, tHM : HE ciphertext-space addition and scalar multiplication times, negl: negli-gible.

note that an authentication failure for one feature is dis-covered in around 2.5 seconds after the first feature readingis reported by the device, even with a non-optimised im-plementation. The concrete system example given in Sec-tion 3.2 involves at most five features and hence the totalsystem authentication time will be at most five times theabove figure. We stress again that implicit authenticationis an ongoing background process and does not need to bereal-time.

In terms of communication complexity, for each featurethe device needs to send one HE ciphertext, one proof ofknowledge, and 3(σ+ 1) log ì bits in each round of authen-tication, and the carrier needs to send 3(σ + 1) log ì HEciphertexts. Each HE ciphertext is 1 kb (kilobits) and eachproof of knowledge is 2 kb for typical parameters in ourscheme. This means that the device needs to send less than0.5 kB (kilo Bytes) and receive around 25 kB in each roundof communication.

Security: We discuss the security of our protocol consid-ering malicious devices in Appendix C. We provide a formaldefinition of privacy for our protocol against maliciously-controlled devices. The definition intuitively guaranteesthat even if the device is maliciously controlled, it will not beable to learn any information more than what it would learnduring an honest execution of the protocol. Eventually, inAppendix C.1, we prove the following theorem guaranteeingthe privacy of our protocol:

Theorem 2 Our protocol Π? is provably secure againstmaliciously-controlled devices (with probability at leastσσ+1), and is provably secure against honest-but-curious car-riers.

14

Conclusion

In this paper we proposed a privacy preserving implicit au-thentication system that can calculate authentication scoreusing a realistic scoring function. We argued that usinguser behaviour as an additional factor in authenticationhas attractive applications. We showed that by relaxingthe notion of privacy, one can construct efficient protocolsthat ensure user privacy and can be used in practice. Thelow computation and communication complexity of our pro-posed protocol in the case of semi-honest adversary makes itexecutable almost in real-time for carrier and modern MIDs.We also provided a modification to the basic protocol to en-sure security in the case of a malicious device. Our proposedprotocol in this case, has a complexity that grows logarith-mically with the size of the user profile. We argued thatthis translates into a reasonable time-frame for implicit au-thentication with protection against malicious devices. Ourbenchmark implementations and other optimised implemen-tations of the primitives used in our protocols give us con-crete estimations of execution times for our protocols. Weprovided such concrete times and argued that our protocolsare sufficiently efficient in practice.

Acknowledgements

The authors would like to thank the anonymous reviewersof Elsevier’s Computers and Security as well as those ofIFIP SEC 2014 for their constructive comments that im-proved this article considerably.

References[1] The NMEA 0183 Standard. The National Marine Electronics

Association. http://www.nmea.org.

[2] F. Aloul, S. Zahidi, and W. El-Hajj. Two Factor Authentica-tion Using Mobile Phones. In Computer Systems and Applica-tions (AICCSA 2009), IEEE/ACS Int’l Conf. on, pages 641–644.IEEE, 2009.

[3] A. Basu, H. Kikuchi, and J. Vaidya. Privacy-Preserving WeightedSlope One predictor for Item-based Collaborative Filtering. InProceedings of the int’l workshop on Trust and Privacy in Dis-tributed Information Sharing (IFIP TP-DIS 2011), 2011.

[4] O. Baudron, P.-A. Fouque, D. Pointcheval, J. Stern, andG. Poupard. Practical Multi-Candidate Election System. In Proc.20th ACM symposium on Principles of Distributed Computing,pages 274–283. ACM, 2001.

[5] A. Boldyreva, N. Chenette, Y. Lee, and A. O’Neill. Order-Preserving Symmetric Encryption. In Advances in Cryptology- EUROCRYPT 2009, pages 224–241. Springer, 2009.

[6] A. Boldyreva, N. Chenette, and A. O’Neill. Order-Preserving En-cryption Revisited: Improved Security Analysis and AlternativeSolutions. In Advances in Cryptology - CRYPTO 2011, pages578–595. Springer, 2011.

[7] D. Boneh, K. Lewi, M. Raykova, A. Sahai, M. Zhandry, andJ. Zimmerman. Semantically Secure Order-Revealing Encryp-tion: Multi-Input Functional Encryption Without Obfuscation.In Proceedings of EuroCrypt 2015 (to appear), 2015. Preprintavailable at http://eprint.iacr.org/2014/834.

[8] X. Boyen, Y. Dodis, J. Katz, R. Ostrovsky, and A. Smith. SecureRemote Authentication Using Biometric Data. In Advances inCryptology - EUROCRYPT 2005, pages 147–163. Springer, 2005.

[9] S. Capkun, M. Cagalj, and M. Srivastava. Secure Localizationwith Hidden and Mobile Base Stations. In Int’l Conf. on Com-puter Communication (INFOCOM 2006), 2006.

[10] S. Capkun and J.-P. Hubaux. Secure Positioning of Wireless De-vices with Application to Sensor Networks. In INFOCOM 2005:24th Annual Joint Conf. of the IEEE Computer and Communi-cations Societies, volume 3, pages 1917–1928. IEEE, 2005.

[11] K.-H. Chang, J. Hightower, and B. Kveton. Inferring Identity Us-ing Accelerometers in Television Remote Controls. In PervasiveComputing, pages 151–167. Springer, 2009.

[12] J. T. Chiang, J. J. Haas, and Y.-C. Hu. Secure and PreciseLocation Verification Using Distance Bounding and SimultaneousMultilateration. In 2nd ACM conference on Wireless NetworkSecurity, pages 181–192. ACM, 2009.

[13] R. Chow, M. Jakobsson, R. Masuoka, J. Molina, Y. Niu, E. Shi,and Z. Song. Authentication in the Clouds: A Framework andIts Application to Mobile Users. In Proceedings of the 2010 ACMWorkshop on Cloud Computing Security Workshop, CCSW ’10,pages 1–6, New York, NY, USA, 2010. ACM.

[14] N. Clarke and S. Furnell. Authenticating Mobile Phone UsersUsing Keystroke Analysis. International Journal of InformationSecurity, 6(1):1–14, 2007.

[15] I. Damgard and M. Jurik. A Generalisation, a Simplification andSome Applications of Paillier’s Probabilistic Public-Key System.In Public Key Cryptography, pages 119–136. Springer, 2001.

[16] A. De Luca, A. Hang, F. Brudy, C. Lindner, and H. Hussmann.Touch Me Once and I Know It’s You!: Implicit AuthenticationBased on Touch Screen Patterns. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems, CHI ’12,pages 987–996, New York, NY, USA, 2012. ACM.

[17] Y. Dodis, L. Reyzin, and A. Smith. Fuzzy Extractors: How toGenerate Strong Keys from Biometrics and Other Noisy Data.In Advances in cryptology - Eurocrypt 2004, pages 523–540.Springer, 2004.

[18] H. Falaki, R. Mahajan, S. Kandula, D. Lymberopoulos, R. Govin-dan, and D. Estrin. Diversity in Smartphone Usage. In Proceed-ings of the 8th Int’l Conf. on Mobile Systems, Applications, andServices, MobiSys ’10, pages 179–194. ACM, 2010.

[19] T. Feng, X. Zhao, B. Carbunar, and W. Shi. Continuous Mo-bile Authentication Using Virtual Key Typing Biometrics. InTrust, Security and Privacy in Computing and Communications(TrustCom), 2013 12th IEEE International Conference on, pages1547–1552, July 2013.

[20] J. Frank, S. Mannor, and D. Precup. Activity and Gait Recogni-tion with Time-Delay Embeddings, 2010.

[21] M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song. Toucha-lytics: On the Applicability of Touchscreen Input as a BehavioralBiometric for Continuous Authentication. Information Forensicsand Security, IEEE Transactions on, 8(1):136–148, Jan 2013.

[22] S. Furnell, N. Clarke, and S. Karatzouni. Beyond the PIN: En-hancing User Authentication for Mobile Devices. Computer Fraud& Security, 2008(8):12–17, 2008.

[23] D. Gafurov, K. Helkala, and T. Søndrol. Biometric Gait Au-thentication Using Accelerometer Sensor. Journal of Computers,1(7):51–59, 2006.

[24] C. Gentry. A Fully Homomorphic Encryption Scheme. PhDthesis, Stanford University, 2009.

[25] C. Gentry and S. Halevi. Implementing Gentrys Fully-Homomorphic Encryption Scheme. In Advances in Cryptology- EUROCRYPT 2011, pages 129–148. Springer, 2011.

15

[26] O. Goldreich, S. Micali, and A. Wigderson. How to Play AnyMental Game - A Completeness Theorem for Protocols with Hon-est Majority. In Proc. 19th ACM symposium on Theory of Com-puting, pages 218–229. ACM, 1987.

[27] E. Haubert, J. Tucek, L. Brumbaugh, and W. Yurcik. Tamper-Resistant Storage Techniques for Multimedia Systems. In Elec-tronic Imaging 2005, pages 30–40. International Society for Op-tics and Photonics, 2005.

[28] S.-s. Hwang, S. Cho, and S. Park. Keystroke dynamics-basedauthentication for mobile devices. Computers & Security, 28(1–2):85–93, 2009.

[29] T. Jakobsen, M. Makkes, and J. Nielsen. Efficient Implementa-tion of the Orlandi Protocol. In J. Zhou and M. Yung, editors,Applied Cryptography and Network Security, volume 6123 of Lec-ture Notes in Computer Science, pages 255–272. Springer BerlinHeidelberg, 2010.

[30] M. Jakobsson, E. Shi, P. Golle, and R. Chow. Implicit Authenti-cation for Mobile Devices. In Proc. of the 4th USENIX conf. onHot Topics in Security. USENIX Association, 2009.

[31] V. Kachitvichyanukul and B. Schmeiser. Computer Generationof Hypergeometric Random Variates. Journal of Statistical Com-putation and Simulation, 22(2):127–145, 1985.

[32] A. Kalamandeen, A. Scannell, E. de Lara, A. Sheth, andA. LaMarca. Ensemble: Cooperative Proximity-based Authen-tication. In Proceedings of the 8th International Conference onMobile Systems, Applications, and Services, MobiSys ’10, pages331–344, New York, NY, USA, 2010. ACM.

[33] A. Kale, A. Rajagopalan, N. Cuntoor, and V. Kruger. Gait-Based Recognition of Humans Using Continuous HMMs. In Proc.5th IEEE Int’l Conf. on Automatic Face & Gesture Recognition,pages 336–341. IEEE, 2002.

[34] J.-M. Kang, S.-S. Seo, and J. W.-K. Hong. Usage Pattern Anal-ysis of Smartphones. In 13th Asia-Pacific Network Operationsand Management Symposium (APNOMS ’11), pages 1–8. IEEE,2011.

[35] J. Krumm. Inference Attacks on Location Tracks. In PervasiveComputing, pages 127–143. Springer, 2007.

[36] J. Leggett, G. Williams, M. Usnick, and M. Longnecker. DynamicIdentity Verification via Keystroke Characteristics. InternationalJournal of Man-Machine Studies, 35(6):859–870, 1991.

[37] H. Lu, J. Yang, Z. Liu, N. D. Lane, T. Choudhury, and A. T.Campbell. The Jigsaw Continuous Sensing Engine for MobilePhone Applications. In Proceedings of the 8th ACM Conferenceon Embedded Networked Sensor Systems, SenSys ’10, pages 71–84. ACM, 2010.

[38] E. Maiorana, P. Campisi, N. Gonzalez-Carballo, and A. Neri.Keystroke Dynamics Authentication for Mobile Phones. In Pro-ceedings of the 2011 ACM Symposium on Applied Computing,SAC ’11, pages 21–26, New York, NY, USA, 2011. ACM.

[39] S. Moller, C. Perlov, W. Jackson, C. Taussig, and S. R. Forrest. APolymer/Semiconductor Write-Once Read-Many-Times Memory.Nature, 426(6963):166–169, 2003.

[40] F. Monrose and A. Rubin. Authentication via Keystroke Dynam-ics. In Proceedings of the 4th ACM conference on Computer andCommunications Security, pages 48–56. ACM, 1997.

[41] M. Nauman, T. Ali, and A. Rauf. Using trusted computingfor privacy preserving keystroke-based authentication in smart-phones. Telecommunication Systems, 52(4):2149–2161, 2013.

[42] M. Nisenson, I. Yariv, R. El-Yaniv, and R. Meir. Towards Be-haviometric Security Systems: Learning to Identify a Typist. InKnowledge Discovery in Databases: PKDD 2003, pages 363–374.Springer, 2003.

[43] L. O’Gorman. Comparing Passwords, Tokens, and Biometrics forUser Authentication. Proceedings of the IEEE, 91(12):2021–2040,2003.

[44] P. Paillier. Public-Key Cryptosystems Based on Composite De-gree Residuosity Classes. In Advances in cryptology - EURO-CRYPT99, pages 223–238. Springer, 1999.

[45] B. Parno, J. McCune, and A. Perrig. Bootstrapping Trust inCommodity Computers. In Security and Privacy (SP), 2010IEEE Symposium on, pages 414–429, May 2010.

[46] R. A. Popa, C. M. S. Redfield, N. Zeldovich, and H. Balakrish-nan. CryptDB: Protecting Confidentiality with Encrypted QueryProcessing. In Proceedings of the Twenty-Third ACM Sympo-sium on Operating Systems Principles, SOSP ’11, pages 85–100,New York, NY, USA, 2011. ACM.

[47] R. A. Popa, N. Zeldovich, and H. Balakrishnan. CryptDB: APractical Encrypted Relational DBMS. Technical Report MIT-CSAIL-TR-2011-005, Computer Science and Artificial Intelli-gence Lab (CSAIL), Massachusetts Institute of Technology, 2011.Available at http://hdl.handle.net/1721.1/60876.

[48] O. Riva, C. Qin, K. Strauss, and D. Lymberopoulos. Progres-sive Authentication: Deciding When to Authenticate on MobilePhones. In Presented as part of the 21st USENIX Security Sym-posium (USENIX Security 12), pages 301–316, Bellevue, WA,2012. USENIX.

[49] N. A. Safa, R. Safavi-Naini, and S. F. Shahandashti. Privacy-Preserving Implicit Authentication. In N. Cuppens-Boulahia,F. Cuppens, S. Jajodia, A. Abou El Kalam, and T. Sans, edi-tors, ICT Systems Security and Privacy Protection, volume 428of IFIP Advances in Information and Communication Technol-ogy, pages 471–484. Springer Berlin Heidelberg, 2014.

[50] S. F. Shahandashti, R. Safavi-Naini, and P. Ogunbona. Pri-vate Fingerprint Matching. In Information Security and Privacy,pages 426–433. Springer, 2012.

[51] S. F. Shahandashti, R. Safavi-Naini, and N. A. Safa.Reconciling User Privacy and Implicit Authenticationfor Mobile Devices. Computers & Security, 2015.(DoI: 10.1016/j.cose.2015.05.009).

[52] E. Shi, Y. Niu, M. Jakobsson, and R. Chow. Implicit Au-thentication through Learning User Behavior. In M. Burmester,G. Tsudik, S. Magliveras, and I. Ilic, editors, Information Secu-rity, volume 6531 of Lecture Notes in Computer Science, pages99–113. Springer Berlin Heidelberg, 2011.

[53] D. Singelee and B. Preneel. Location Verification Using Se-cure Distance Bounding Protocols. In Mobile Adhoc and SensorSystems Conference, 2005. IEEE International Conference on,pages 840–846. IEEE, 2005.

[54] A. Studer and A. Perrig. Mobile User Location-specific Encryp-tion (MULE): Using Your Office As Your Password. In Proceed-ings of the Third ACM Conference on Wireless Network Security,WiSec ’10, pages 151–162, New York, NY, USA, 2010. ACM.

[55] K. Tan, G. Yan, J. Yeo, and D. Kotz. A Correlation AttackAgainst User Mobility Privacy in a Large-Scale WLAN Network.In Proc. of the 2010 ACM workshop on Wireless of the Students,by the Students, for the Students, pages 33–36. ACM, 2010.

[56] C.-S. Tsai, C.-C. Lee, and M.-S. Hwang. Password Authenti-cation Schemes: Current Status and Key Issues. IJ NetworkSecurity, 3(2):101–115, 2006.

[57] S. Tu, M. F. Kaashoek, S. Madden, and N. Zeldovich. ProcessingAnalytical Queries over Encrypted Data. Proc. VLDB Endow.,6(5):289–300, Mar. 2013.

[58] D.-S. Wang and J.-P. Li. A New Fingerprint-Based Remote UserAuthentication Scheme Using Mobile Devices. In Int’l Conf.on Apperceiving Computing and Intelligence Analysis (ICACIA2009), pages 65–68. IEEE, 2009.

16

[59] H. Xu, Y. Zhou, and M. R. Lyu. Towards Continuous and PassiveAuthentication via Touch Biometrics: An Experimental Studyon Smartphones. In Symposium On Usable Privacy and Secu-rity (SOUPS 2014), pages 187–198, Menlo Park, CA, July 2014.USENIX Association.

[60] A. C.-C. Yao. How to Generate and Exchange Secrets. In Foun-dations of Computer Science, 1986., 27th Annual Symposium on,pages 162–167. IEEE, 1986.

[61] S. Zahid, M. Shahzad, S. A. Khayam, and M. Farooq. Keystroke-Based User Identification on Smart Phones. In E. Kirda, S. Jha,and D. Balzarotti, editors, Recent Advances in Intrusion Detec-tion, volume 5758 of Lecture Notes in Computer Science, pages224–243. Springer Berlin Heidelberg, 2009.

A Order Preserving Encryption

Consider an order-preserving (symmetric) encryption de-fined as OPE = (KeyGenOPE , EOPE , DOPE ), with keyspace K, plaintext space D, and ciphertext space R, inwhich we have |D| ≤ |R|. For an adversary A attackingthe scheme, we define its POPF-CCA advantage (pseudo-random order-preserving function advantage under chosen-ciphertext attack) against OPE as the difference between

the probability Pr[k ∈R K : AEOPEk (·),DOPE

k (·) = 1] and the

probability Pr[f ∈R OPFD 7→R : Af(.),f−1(.) = 1], whereOPFD 7→R represents the set of all order-preserving func-tions from D to R. We say that OPE is POPF-CCA-secureif no polynomial-time adversary has a non-negligible advan-tage against it.

Informally, the definition implies that OPE acts indis-tinguishably as a random order-preserving function, even ifthe adversary is given free access to encrypt and decryptarbitrary messages of its choosing. For details of such anencryption scheme, readers are referred to [5]. The OPEscheme makes use of the implementation of hypergeometricdistribution (HyG) given in [31].

B Security of Protocol Π

To formulate a private score computing protocol, we firstneed to formalise a score computing protocol without pri-vacy. We define such a protocol as follows:

Definition 1 A score computing protocol for feature Viis a protocol between a device with input Zi = (vi(t), t)and a carrier with input Yi, where t denotes the currenttime, vi(t) denotes the current feature sample, and Yi isa sample distribution of Vi with average absolute deviationAAD(Vi). The two parties also share an input which in-cludes agreed protocol setup parameters. The protocol out-put for the carrier is a score si(t) and null for the device.The score is defined as si(t) = Pr[ bli (t) ≤ Vi ≤ bhi (t) ] wherebli (t) = vi(t)−AAD(Vi) and bhi (t) = vi(t) + AAD(Vi).

Let us first consider honest-but-curious (a.k.a. semi-honest) adversaries. An honest-but-curious party follows

the protocol, but tries to infer extra information from theprotocol execution. To formalise the security of SC proto-cols, we will use the standard simulation-based approach.The view of a party in a protocol execution is a tuple con-sisting the party’s input, random selections and all the mes-sages it receives during an execution of the protocol. Thistuple is a function of the inputs of the parties and their ran-domness. Let V iewΠ

D(Zi, Yi) (resp. V iewΠS (Zi, Yi)), denote

the random variable representing the view of the device D(resp. carrier S), with device input Zi and carrier input Yi,

andc≡ denote computational indistinguishability.

Π is said to be a perfectly private score computing proto-col, if there exists a probabilistic polynomial time algorithmSimD (resp. SimS) that can simulate the view of D (resp.S) in Π, given only the device’s input Zi (resp. carrier’sinput Yi and its output si); that is for all Zi and Yi:

V iewΠD(Zi, Yi)

c≡ SimD(Zi)

( resp. V iewΠS (Zi, Yi)

c≡ SimS(Yi, si) )

To achieve the above security level, one can design a pro-tocol using a fully homomorphic encryption system [25],or using a general two party computation protocol. How-ever the communication and computation cost of these ap-proaches will be prohibitive. For example, Gentry’s fullyhomomorphic encryption scheme takes 32 seconds on a typ-ical processor to perform a single re-crypt operation whenthe modulus is 2048 bits [24, 25].

To improve efficiency, we sacrifice perfect privacy of theprotocol and allow the device and carrier to learn some ag-gregate and order information about the profile data, re-spectively. We argue that although this means some leakageof information, no direct values are revealed and the leakedinformation does not affect privacy of the user data in anysignificant way. It does not increase the adversary’s successchance in authentication in a significant way either.

We therefore consider the protocol private if the deviceonly learns the average absolute deviation (AAD) of Vistored in the profile U and the carrier only learns the in-formation that can be implied from the output of an idealrandom order-preserving function f on input Zi, i.e., onlythe information that can be implied from f(Zi). The infor-mation implied from such a function is shown to be littleother than the order of the device input with respect to thestored data. In fact, Boldyreva et al. have proven that sucha function leaks neither the precise value of any input northe precise distance between any two inputs [6].

Alternatively, one may use an order-revealing encryption(ORE) instead of an OPE. OREs provide a similar function-ality and may be employed as a building block in our proto-cols with little change required. Recently, OREs have beenshown to leak strictly less information than OPEs. Boneh etal. have shown that their ORE construction, although com-putationally more expensive, reveals no information otherthan the order of plaintexts [7].

17

We note that, although knowing the AAD or the order ofa data set does leak some information, it reveals little aboutthe actual content. For example, the sets {8, 1, 4, 3, 11} and{130, 121, 127, 125, 131} have the same order and the sameAAD with completely different elements. Similarly two setsof GPS coordinates may have the same order and averageabsolute deviation but be completely different, and in factbelong to completely different places.

To formalise our notion of privacy, let us define theaugmented tuple V +

i that besides the elements in Vi in-cludes vi(t), i.e. for Vi = (vi(t1), vi(t2), . . . , vi(tì)) we haveV +i = (vi(t1), vi(t2), . . . , vi(tì), vi(t)). Also let f be an ideal

random order-preserving function. Let If (V +i ) denote the

information about V +i that can be implied from f(V +

i ).We emphasise again that it has been proven that If (V +

i )includes little more than the order information of the ele-ments of V +

i . Hence practically one can think of If (V +i )

as the information on how elements of V +i are ordered. We

define a private score computing protocol as follows:

Definition 2 Let D and S denote the device and carrierentities in Π, respectively. We say that Π is a privatescore computing protocol for honest-but-curious de-vices (resp. carriers), if there exists a probabilistic poly-nomial time algorithm SimD (resp. SimS for any randomorder-preserving function f) to simulate the view of D (resp.S) in Π, given the device’s input Zi (resp. carrier’s inputYi and its output si) and the average absolute deviation ofVi in U (resp. If (V +

i )); that is for all Zi and Yi:

V iewΠD(Zi, Yi)

c≡ SimD(Zi,AAD(Vi))

( resp. V iewΠS (Zi, Yi)

c≡ SimS(Yi, si, If (V +

i )) ).

Intuitively, the above definition requires that the informa-tion revealed to the parties during the protocol executionis limited merely to the AAD of the stored data, or littleother than the order information of the current sample withrespect to the stored data, respectively.

B.1 Proof of Theorem 1

Proof: (Outline) In Π, the device has the input Zi, andreceives the values Zi − AAD(Vi) and Zi + AAD(Vi) fromthe carrier during the protocol execution. Therefore,

V iewΠD(Zi, Yi) = ( Zi, Zi −AAD(Vi), Zi + AAD(Vi) ).

The device has no output at the end of the proto-col. Now, let us define SimD such that for given in-puts (Zi,AAD(Vi)) (according to Definition 2), it outputs(Zi, Zi − AAD(Vi), Zi + AAD(Vi), where Vi ∈ U). So, forall Zi and Yi, the distribution SimD(Zi,AAD(Vi)) andV iewΠ

D(Zi, Yi) are indistinguishable. Hence the protocol issecure against honest-but-curious devices.

The carrier has the input Yi and during the executionof Π it receives the following values: EHE

pk (Zi), EOPEk2

(Zi),

EOPEk2

(bli (t)) and EOPEk2

(bhi (t)). Therefore, for its view ofthe protocol, we have

V iewΠS (Zi, Yi) =

( Yi, EHEpk (Zi), E

OPEk2 (Zi), E

OPEk2 (bli (t)), E

OPEk2 (bhi (t)) ),

where bli (t) = Zi − AAD(Vi) and bhi (t) = Zi + AAD(Vi).The carrier has the output si(t).

Let SimS(Yi, si, If (V +

i )) be defined as follows. On inputsYi, si, and If (V +

i ), and for a given random order-preserving

function f , it first selects a random Zi such that Zi satisfiesthe information that If (V +

i ) includes about Zi and in par-ticular the order relations between Zi and elements of Vi.At the same time we require that Zi is chosen in a way thatit achieves the score si with respect to Vi, i.e., the num-ber of elements in Vi that lie within the distance AAD(Vi)of Zi is si. This is possible by shifting Zi. Then SimS

computes and outputs the following: Yi, EHEpk (Zi), f(Zi),

f(Zi −AAD(Vi)), and f(Zi + AAD(Vi)).We claim that the distribution of this output is indistin-

guishable from the distribution of V iewΠS (Zi, Yi) for all Zi

and Yi. If not, a standard hybrid argument implies that atleast one of the following is true:

(A) there exists an algorithm that distinguishes EHEpk (Zi)

and EHEpk (Zi); or

(B) there exists an algorithm that distinguishes the tuple

( f(Zi), f(Zi −AAD(Vi)), f(Zi + AAD(Vi)) )

and the tuple

( EOPEk2 (Zi), E

OPEk2 (Zi −AAD(Vi)),

EOPEk2 (Zi + AAD(Vi)) ) .

The former, (A), contradicts the semantic security of thehomomorphic encryption scheme HE. We prove in the fol-lowing that the latter, (B), contradicts the POPF securityof the order preserving encryption OPE.

Assume (B) is true. It follows that there is a distinguisherfor at least one of the following pairs: f(Zi) and EOPE

k2(Zi),

or f(Zi − AAD(Vi)) and EOPEk2

(Zi − AAD(Vi)), or f(Zi +

AAD(Vi)) and EOPEk2

(Zi + AAD(Vi)). We consider thesepossibilities next.

Assume there is a distinguisher for f(Zi) and EOPEk2

(Zi).A hybrid argument implies that there must be a distin-guisher for at least one of the following pairs: f(Zi) andf(Zi), or f(Zi) and EOPE

k2(Zi). A distinguisher for the for-

mer pair is impossible because Zi is chosen to conform toIf (V +

i ), i.e. the information implied from either of f(Zi)or f(Zi) is the same. A distinguisher for the latter pair onthe other hand implies that it is possible to distinguish theorder-preserving encryption OPE from f , which contradictsthe security of the OPE.

18

Now note that since AAD(Vi) is a constant determinedby Yi, the three distributions Zi, Zi − AAD(Vi), andZi + AAD(Vi) are merely shifted versions of one another.The same is true for Zi, Zi −AAD(Vi), and Zi + AAD(Vi).Hence, similar arguments can be made to show that a distin-guisher for any of the pairs f(Zi−AAD(Vi)) and EOPE

k2(Zi−

AAD(Vi)), or f(Zi + AAD(Vi)) and EOPEk2

(Zi + AAD(Vi))would also contradict the POPF security of the OPE. There-fore, (B) contradicts the security of OPE.

We have shown that both (A) and (B) would contradictthe security of the underlying schemes. Hence, assumingthat the underlying schemes are secure, SimS is able to pro-duce an output with a distribution indistinguishable fromthat of V iewΠ

S (Zi, Yi), and therefore, the protocol is secureagainst honest-but-curious carriers. �

C Security of Protocol Π?

In order to formalise security against malicious adversaries,one usually compares a real execution of the protocol withan ideal execution. During the ideal execution which takesplace in an ideal world, both device and carrier submit theirinputs to a trusted party TP at the beginning of the pro-tocol. TP computes the outputs of the parties and sendsthe outputs back to the parties. For an ideal device ID andan ideal carrier IS, let IdealIΠ

?

ID,IS(Zi, Yi) denote the jointoutput of the execution of the ideal protocol IΠ? for com-puting si(t), where Zi is the input of ID and Yi is the inputof IS in the ideal world. Also let RealΠ

?

D,S(Zi, Yi) denotethe joint output of the real device D with input Zi and realcarrier S with input Yi after a real execution of protocolΠ?. We use M as a prefix to denote ‘malicious’ and sim-ilarly H to denote ‘honest’. Security of Π? is defined asfollows. We say Π? is perfectly secure against malicious de-vices if for any malicious real-world device algorithm MD,there exists an ideal world algorithm MID such that for allZi and Yi the output in the ideal world IdealIΠ

?

ID,IS(Zi, Yi)is computationally indistinguishable from the output in thereal world RealΠ

?

D,S(Zi, Yi). Perfect security is defined basedon a perfect ideal-world protocol in which the trusted partyTP is given all the inputs, carries out all the calculations,and outputs the score only to the carrier. This captures theideal security requirement that the device learns nothing byparticipating in the protocol and the carrier only learns onlythe feature score.

In the case of score computing protocols however, in or-der to achieve higher efficiency, we do not aim for perfectsecurity and view some information leakage acceptable. Foreach feature, we accept leakage of the AAD of that featurein the stored user profile to the device. We also allow thecarrier to learn the order of the encrypted profile values withrespect to each other. Hence we relax the above definitionas follows. To incorporate the leakage of the AAD on onehand and the ordering information on the other, we model

the ideal protocol IΠ? in the following way. After receivingeach entity’s reported input, i.e., the current feature read-ing from the device and the stored user profile from thecarrier, TP calculates the score along with the AAD of theprofile features and the ordering information of the new fea-ture with respect to the profile features. Then TP outputsto the device the AAD of the profile features, and to thecarrier the score and the ordering information. Consider-ing this ideal protocol, we define security against maliciousdevices as follows:

Definition 3 Let (HD,HS) and (HID,HIS) denote thehonest device and carrier programs for protocol Π? in thereal and ideal world respectively. Let IΠ? be the ideal pro-tocol in which upon receiving each party’s input, the TPoutputs the AAD of the carrier input to the device. Wesay that Π? is a private score computing protocol formalicious devices if for any probabilistic polynomial-timealgorithm MD, there exists a probabilistic polynomial-timealgorithm MID such that for all Zi, Yi:

IdealIΠ?

MID,HIS(Zi, Yi)c≡ RealΠ?

MD,HS(Zi, Yi) .

Intuitively, the above definition guarantees that a mali-cious device following an arbitrary strategy does not findany information other than the AAD of the stored profilefeatures. In the following we prove that our protocol Π?

satisfies this definition.

C.1 Proof of Theorem 2

Proof: (Outline) We prove the security of Π? in twostages. First, we prove that the protocol is secure againstmalicious devices and then we prove that the protocol issecure against honest-but-curious carriers. We provide asketch of the first stage of the proof in the following. Thesecond stage of the proof is similar to that of Theorem 1,and hence we do not repeat it.

Stage 1. Based on Definition 3, we have to prove that forevery probabilistic polynomial-time algorithm MD, thereexists a probabilistic polynomial-time algorithm MID suchthat for all Zi, Yi:

IdealIΠ?

MID,HIS(Zi, Yi)c≡ RealΠ?

MD,HS(Zi, Yi) ,

where Zi, Yi are respective inputs of the device and the car-rier. We note that, as the carrier is honest, in the idealworld, HIS forwards its input Yi without any change toTP , and hence IdealIΠ

?

MID,HIS(Zi, Yi) will be the score pro-duced by TP on receiving the honest input Yi from HISand an arbitrary value Zi = MID(Zi) from MID. In otherwords, to ensure security against a malicious device, we haveto show that for any possible device behaviour in the realworld, there is an input that the device provides to the TPin the ideal world, such that the score produced in the idealworld is the same as the score produced in the real world.

19

Given a real-world malicious device MD, the ideal worlddevice MID is constructed as follows. MID executes MDto obtain the current encrypted feature value EHE

pk (Zi) anda proof of knowledge of the plaintext. By rewinding theproof, MID is able to extract Zi. MID sends Zi to TPwhich replies with the AAD of the HIS input: AAD(Vi).MID then selects ì arbitrary values to construct a mock

user profile such that the AAD of the mock profile featuresis equal to AAD(Vi). It then is able to calculate all thefollowing values according to the protocol for all j and k:EHEpk (δlij), E

HEpk (δij), E

HEpk (δhij), E

HEpk (δ′ijk), EHE

pk (δ′lijk)), and

EHEpk (δ′hijk). MID shuffles these values and sends them to

MD. The latter three sets of values are distributed iden-tically to the protocol. The former three sets of values arebased on mock profile feature values rather than the realones. However, the malicious device MD is assumed tohave no knowledge about the user behaviour, and hence thedevice is not able to distinguish them hidden within the lat-ter three sets of values. MD then replies and indicates foreach one of the received values if they are positive, nega-tive, or zero. MID checks all the values and makes sureMD does not cheat. MD does not get any output at theend of the simulation.

From all the values MD receives 1σ+1 of them deviate

from a real protocol execution, however these values arehidden within the values that are calculated following theprotocol. Thus, MD has at most 1

σ+1 chance of being ableto output a value which would be distinguishable from areal world execution of Π?. This means that the protocolis secure against malicious devices with probability at leastσσ+1 .

Stage 2. With similar arguments presented in the proof ofTheorem 1, we can claim that a honest-but-curious carrieronly learns the order of the feature data. Therefore, we cansimilarly show that protocol Π? is secure against an honest-but-curious carrier. �

20

Date post:	22-Jun-2020
Category:	Documents
Upload:	others
View:	11 times
Download:	0 times

Reconciling User Privacy and Implicit Authentication for ... · plicit authentication systems with...

Documents