A Framework for Private Location-based Queries using ...1 A Framework for Private Location-based...

1

A Framework for Private Location-basedQueries using Cryptographic Protocols

Yan Huang, Stephen R. Tate, Roopa Vishwanathan

F

Abstract—An important privacy issue in Location Based Services (LBS)is to hide a user’s identity and location while still providing qualityservice. A user’s identity can be easily hidden through anonymous webbrowsing services, however, a user’s location can reveal a user’s identity.

For example, if a user at home asks queries such as “Find thenearest hospital around me” to a LBS server, then based on publicallyavailable demographic data and the location of the user, the server couldpotentially guess the identity of the querying user. A common way toachieve location privacy is through cloaking, e.g., the user/client sendsa cloaked region to the server and filters the results to find the exactanswer, but existing solutions such as cloaking using k-anonymity, orusing a trusted anonymizer, aren’t completely privacy-preserving andleak some spatial information. In this paper, we propose an efficienttwo-phase framework for privacy in LBS based on two cryptographicprotocols: Private Information Retrieval (PIR) and Oblivious Transfer(OT). Our framework provides privacy for the client and server, does notuse a trusted party or anonymizer and is provably privacy-preserving.When compared to previous approaches, it ensures that the serverreveals only as much data as is required by the querying client. Inaddition to providing a proof of security, we conduct experiments tomeasure the performance of our proposed framework and find that thecosts of our solutions are reasonable and not prohibitively expensive toimplement in practice.

1 INTRODUCTION

Location based services are becoming increasingly commonwith location-enabled client devices like mobile phones orPDAs making queries to location servers. Clients may notwant to reveal their own identity and location while queryingthe server. A client’s identity can be hidden by using a fakeone, but their location is needed to answer location-basedqueries. One way for the client to hide its own location isto query a large region and filter the query results to findits exact location within that region. As a result the serverneeds to respond to the clients’ queries with more data thanwhat is needed. However, the server may not want to revealunnecessary information that is not relevant to or close tothe client’s precise location. Hence, we need to achieve atrade-off between the privacy of the client and the amount

• Author 1 is with the Dept. of Computer Science and Engineering, Universityof North Texas, Denton, TX 76203.Email: [email protected]

• Authors 2 and 3 are with the Department of Computer Science, Universityof North Carolina at Greensboro, Greensboro, NC 27402.Email: [email protected], [email protected]

of data supplied by the server. Most solutions proposed tillnow take into account the privacy of the client. In additionto this, we also need to ensure that while protecting theprivacy and anonymity of the client, we do not inadvertentlyrelease more information than what is required about otherdata and locations stored in the server. Typically a locationserver might contain thousands of points of interest (POIs)listed in a particular category (e.g., all restaurants in a givenarea) organized into a table, and if a user queries this table,the server does not want to return every item in its listing. Inaddition to being unnecessary, this would also enable a user toengage the server’s resources for long periods of time. Also,the location database is a potentially valuable asset for theserver and if the server is operating on a commercial basis, itmight be charging the client for each query; revealing extrainformation may not be in the business interests of the server.Hence we need to ensure that the data returned is fine-grainedand precise (very close to the client’s location). This wouldalso save the client time, since it does not have to sort throughreams of extraneous data. That is, if the server maintains anumber of POIs with each type organized into a table, it mayallow clients to query part of the table about a limited regionaround a specific location, but not access all (or large amountsof) the data stored in it. For example, the server may allowqueries like: “How many people are within 1 mile of me,” or“Where is the nearest gas station from me?” But the servermay not want the client to ask queries along the lines of “Giveme a list of all restaurants in the state of Texas”. One of thecommon queries in location services is a client query abouttheir nearest neighbour(s). A client might query the locationserver to find its approximate nearest neighbour, exact nearestneighbour, k nearest neighbours, or reverse nearest neighbour.In this paper, we present a solution for the exact nearestneighbour query. Specifically, if a client/user u has a locationl and needs to query a LBS to obtain its nearest neighbourfrom a set of POIs S where S = {s1, s2, . . . , sα}, the goal isto develop an efficient protocol that does not allow the LBSto learn l while reducing the subset of S to be sent to u inorder to answer the query.

2 RELATED WORK

One line of previous work in privacy in location servicesmainly focuses on cloaking techniques using anonymizers [4],

2

[15], [22]. In this approach, service requests are first transmit-ted to a third-party trusted server called location anonymizingserver (AS). The AS strips off a service requester’s identi-fying information, such as user’s network address and name.Various perturbation operations can be performed on the ASto obfuscate the original location information. Location basedservice (LBS) requests are then issued from the AS to the LBSproviders. The AS can also help filter the query results andreturn the exact answers to the original requesters. One majorproblem with this approach is that the anonymizer becomes asingle point of failure and a performance bottleneck, and theclient and server will have to trust the anonymizer.

Another solution is to organize the network as a peer-to-peernetwork, instead of having a centralized location server [3],[7]. In this approach, a mobile client first finds a peer groupthat meets the privacy requirement (i.e., finds k−1 neighbours)through peer searching. Then the client calculates a region thatincludes the k−1 peers (called k-anonymity [10]). It randomlychooses one of its neighbours as the agent to request LBSusing the cloaking region. The query results are eventuallyforwarded to the original client through the agent. Since thedisclosed location of a requester is expanded to an area thatincludes at least k−1 other mobile users, any of the k peoplewithin the disclosed area could have been the user. For privatelocations that are likely to release a requester’s identification,the cloaking area is generally large if we use k-anonymity dueto the small number of users in private (e.g., residential) areas.This approach requires the construction of a large peer networkof mutually trusting and honest users and cannot guaranteeservice availability when clients are sparsely distributed.

Taking a spatial tranformation-based approach Yiu etal. [29] develop a solution for preserving privacy in outsourcedspatial data; this work is orthogonal to ours since it tackles adifferent problem, i.e., problems in a situation where the serveroutsources its spatial database to a third party, but transformsthe database beforehand and shares the transformation keywith the client(s). This methodology is more suitable forstrengthening security in the k-anonymity approach (as theauthors point out). Also, this work is vulnerable to accesspattern or correlation attacks.

In work that uses cryptography for constructing privacy-preserving solutions, Wong et al. [28] present a way to se-curely compute k-nearest neighbors on an encrypted database.In their work, one requires Θ(n2) time for each encryption anddecryption operation, where n is the number of elements in thedatabase. Also [28] only focuses on server (database) privacyand does not consider the situation where the database triesto gain access to a querying client’s information. Addition-ally, [28] recommends using 80 different data dimensions foroptimum security; one resulting problem of this approach isthat there might not be enough indices for all the data pointsin the resulting space due to the curse of dimensionality [26].

Ghinita et al. [6] present a solution based on computationalPrivate Information Retrieval (PIR) [2]. This approach hasseveral advantages: it does not reveal any spatial information,it is resistant against correlation attacks, and it does not requireany trusted third party, among others. The paper first presentsa PIR-based framework for location based services and then

presents solutions to the problems of finding approximatenearest neighbours and exact nearest neighbours. The approachis expensive in terms of communication and computation;some of the improvements and optimizations the authors haveproposed include compressing the data sent by the server,having rectangular grids/matrices in addition to square ones,and using off-line data mining to avoid redundant computation.However the server reveals large amounts of extra informationwhich results in exposing large portions of the server’s dataassets to the client.

In addition to computational PIR as used by [6], there hasbeen work in the area of hardware-assisted PIR and applyinghardware-assisted PIR solutions to solve nearest neighbor pri-vacy problems. In [27], Williams and Sion propose a hardware-assisted PIR scheme for obtaining practical privacy solutions.In their solution, clients need O(

√n) working memory and

they use a system augmented with a fairly expensive (upwardsof $10,000) secure co-processor. In our solution, clients do notneed any extra working memory and do not need any physicalenhancements to current business-class systems. In recentwork that uses the hardware-assisted PIR protocol of [27],Papadopoulos et al. [24], present a framework for arbitraryk-nearest neighbor queries. We note that besides having thesame issues as [27] (namely using an expensive, physical add-on co-processor and clients requiring working memory in theorder of O(

√n)), [24] also leaks valuable database indexing

information.Along similar lines, recent work by Khoshgozaran et

al. [18], [17] presents a solution where a powerful secureco-processor, SC, is placed close to (or inside) an untrustedLBS server, LBS DB. The SC first shuffles the locationdatabase, LBS DB using a private shuffling function π suchthat LBS DB[i] = LBS DBπ[π[i]] and encrypts the shuf-fled database which can then be safely stored on the untrustedserver. A querying client which wants to retrieve the ith

bit from LBS DB then sends its query, q = LBS DB[i]encrypted with the public key of the SC to the server whichpasses it along to the SC. The SC decrypts q to find i andretrieves LBS DBπ[π[i]] from the untrusted server, decryptsit and again re-encrypts it with the client’s public key andsends it back to the client. The SC processes and returnsanswers to all client queries and through the entire process, theserver remains oblivious as to what data the client has retrievedfrom it. Hengartner takes a trusted computing-based approachin [12], in which the execution of the PIR protocol is done byand responses to queries (locations) are given by a trustedmodule such as a Trusted Platform Module (TPM) chip.This methodology has not been implemented yet. Other workby Khoshgozaran and Shahabi on nearest neighbour queriesuses tamper-evident hardware on both the server and clientside [16]. The basic idea is to utilize one-way transformationsto map the space of all static and dynamic objects to anotherspace and resolve the query blindly in the transformed space.This approach requires that the transformation used preservesrelative proximity of POIs. In another related paper Ghinitaet al. [5] present a hybrid technique that combines cloakingand PIR, but the resulting solution reveals coarse-grainedinformation about a client’s location to the server.

3

Hardware-assisted PIR, in general, requires current sys-tems to be augmented with powerful, expensive secure co-processors (such as the IBM 4764) and cannot be realizedas is with existing systems, either with no co-processorsrequired, or by using an inexpensive, widely deployed com-modity processor such as RSA’s SecureID tokens [20],Verisign’s e-token [14], Entrust’s IdentityGuard minitoken [13]or the Trusted Computing Group’s Trusted Platform Mod-ule (TPM) [9] chip. It would be ideal to realize hardware-assisted PIR using a cost-effective, already widely deployedhardware token, but such tokens are necessarily computation-ally lightweight, in order to keep their costs down and toencourage their widespread use. Hence, none of them have thecomputational power to fully support complex cryptographicprotocols such as PIR and can only support the most basic ofcryptographic primitives such as encryption, signatures, hashfunctions, and pseudo-random number generation, besideslacking any application-specific support (e.g., for LBS). Priorresearch has also shown that they need significant extensions inorder to realize cryptographic protocols similar to PIR, such asOblivious Transfer and Secure Function Evaluation [11], [25].Designing a hardware-assisted PIR protocol wholly supportedby cost-effective tokens would require significant extensionsto their specifications, and would be speculative at best, andimpossible at worst.

In conclusion, cryptographic protocols such as PIR andOblivious Transfer (OT) seem to be the best solution forrealizing privacy in location-based services, which don’t sufferfrom the drawbacks of previously proposed solutions suchas k-anonymity or having a trusted perturbation anonymizer.There are two ways one can realize PIR for use in applicationssuch as LBS: computational PIR and hardware-assisted PIR.We believe that hardware-assisted PIR, although a promisingconcept, currently requires systems to be enhanced with expen-sive, powerful co-processors and cannot be realized by usingcost-effective, commodity hardware chips such as the TPM [9]or smartcards [20], [14], [13]. Hence we have based ourmethodology on computational PIR, which offers reasonableperformance with no additional system setup requirements.

2.1 Our Contributions and Outline

We present an efficient solution for the exact nearest neigh-bour query where a location-based server does not gain anyinformation about a querying client’s location and the clientdoes not gain information about any location in the server’sdatabase other than its own. Our solution is a general-purposeone and can use either a two-phase PIR or a combination ofPIR and Oblivious Transfer (OT). We use the layout of theserver database similar to Ghinita et al. [6] where the serversuper-imposes a grid on the POIs. A user u in a grid cell wantto know its nearest neighbour but does not want to release itsown location or the grid cell that (s)he is in.

Using the single PIR approach proposed by Ghinita et al. [6]on the grid will result in the server returning an entire columnof the grid, thus releasing more information than is necessaryto the client. We note that if the data (locations) are notarranged in a grid, but as a one-dimensional list, we can still

use PIR to return just one cell, but this approach would incur acommunication cost of Θ(fn) where n is the number of gridcells and f is the number of bits in the PIR modulus. Using1-of-n OT [23] by itself will require the server to transmit thewhole grid of encrypted POIs to the client, the communicationcost of which is Θ(gn) where g is the length of the symmetrickeys used in OT.

One possible solution to this problem is to use a two-phase protocol wherein the server recursively performs a PIRor OT on the grid, and the other solution is to define atwo-phase protocol that combines the features of PIR andOT which brings the cost down from Θ(fn) or Θ(gn) toΘ(f√n) and Θ(g

√n) respectively. We explore both of these

approaches in our methodology. The paper is organized asfollows: In Section 3 we describe the PIR and OT protocols.In Section 4 we present a two-phase framework in whichone can use three different combinations of PIR and OT(PIR+OT, OT+PIR, PIR+PIR), and discuss why a two-phaseOT approach (OT+OT), while possible in principle, isn’t veryefficient in terms of cost. In Section 5 we describe a single-phase approach where one can use either (1) PIR over a squaregrid as described in [6], or (2) 1-of-n OT over the entiredatabase organized as single-dimensional list, or (3) a randomOT protocol which offers a lower degree of privacy. In Section6, we formally define the security properties of our protocoland give a proof sketch. In Section 7 we analyze the cost ofour two-phase framework and compare it with the single-phaseapproaches.

3 PRELIMINARIES

In this section, we give a system architecture of our proposedschemes and a table of notations that summarizes symbols thatappear frequently in our protocols presented later in Section 4.We also give a brief introduction to PIR and OT, the twocryptographic protocols which are used as building blocksin our framework. Readers familiar with them can skip toSection 4.

3.1 System Architecture and Notations

Figure 1 summarizes our system architecture. The systemsetup does not require any third party and keys aren’t sharedbetween users. Consider a database server which holds a n-element string, X1, · · · , Xn arranged as a

√n ×√n matrix,

and a user who wants to retrieve one element, Xi,j fromthe matrix. In the theoretical definitions of PIR and OT,each element is a single bit, although in applications this isis usually generalized to m-bit strings (as in our protocolsin Section 4). The user first sends an obfuscated request,qi,j = E2(E1(Xi,j)) using some combination of PIR and OTrepresented by E1 and E2. The server responds with a tuple(Xi,j , qi,j), using which the user can later compute the valueof Xi,j = E2(E1(qi,j)). PIR and OT rely on the fact that it iscomputationally infeasible in polynomial time for the serverto deduce the value of i, j given qi,j .

Table 1 introduces notations that are frequently referred toin Section 4.

4

Fig. 1. System architecture

TABLE 1Description of Notations

Notation Descriptionf Length of PIR modulus in bitsg Length of OT symmetric keys in bitsM Location database matrix maintained by servern Total number of cells in server’s database, M√n Number of cells in each column

m Number of bits in each cellMa,b Cell querying user is located in

M1,1 · · · , M√n,√

n Granularity of matrixp, p′ Large primes of length f/2 bits

N = p× p′ Composite modulus of length f bits⊕ Binary XOR operation⊕k

i=1Cumulative XOR from 1 to k

3.2 Private Information Retrieval (PIR)

Private Information Retrieval (PIR), first introduced by Choret al. [2], is a technique or protocol to let a user retrieveinformation from a database server without the database serverknowing the element or record that has been retrieved. In thetraditional definition of PIR (computational PIR), the databaseserver has an n-bit string X = X1, ...., Xn, and the clientwants to know the value of XI . The client sends a requestto the server in the form of an obfuscated vector, E(I) = q,where E denotes the algorithm used for generating the obfus-cated vector. The server responds with a value v(X, q). Usingthis, the client can compute the value of XI . Typically werequire that the client’s request remain private in the presenceof a computationally-bounded adversary, which is referred toas “computational PIR” (as opposed to “information theoreticPIR” in which the adversary is not bounded). Kushilevitz andOstrovsky’s well-known solution [19] to the computationalPIR problem is based on the computational intractability ofdeciding whether a given number is a quadratic residue ofa composite modulus. Although there are other forms ofPIR such as hardware-assisted PIR, multiple-server PIR andsymmetric PIR, in this paper we focus on computationalPIR and the protocol from Kushilevitz and Ostrovsky. In thefollowing paragraph we give a high-level description of thissolution to the computational PIR problem.

Let p and p′ be large primes, and N = p ·p′ be a compositemodulus. Let Z∗N denote the set of numbers that are co-primeor relatively prime with N . The set of quadratic residues is de-fined by QRN = {y ∈ Z∗N |∃x ∈ Z∗N : y = x2 mod N}, andthe set of non-quadratic residues is the complement of QRN ,written QR′N . If we denote the set Z+1

N = {y ∈ Z∗N |(y/N) =1}, where (y/N) denotes the Jacobi symbol, exactly half ofthe numbers in Z+1

N are in the set QRN and the other halfare in QR′N . Determining whether a given x ∈ Z+1

N is in

QRN or in QR′N is easy if the factorization of N is known,but is commonly believed to be computationally intractableif the factorization of N is unknown; this is a widely-usedcryptographic assumption known as the “quadratic residuosityassumption.” For a detailed theoretical exposition, readers arereferred to [2].

To form a PIR query, the client first randomly selects twof/2 bit primes, p and p′ and computes N = p · p′ whereN is f bits long. The query is then formed as E(I) = q =[q1, . . . , qn], where each qi is drawn from Z+1

N such that qI ∈QR′N and ∀i 6= I, qi ∈ QRN . The server computes v(X, q) =Πnj=1wj , where wj = q2

j if Xj = 0, or qj otherwise. Notethat v(X, q) ∈ QRN if and only if XI = 0, so the client usesits knowledge of p and p′ to test whether v(X, q) ∈ QRNand thus determines the value of XI . As just described, thisproduces a query of size Θ(fn) and a response of size Θ(f);however, in the simplest form of the Kushilevitz/Ostrovskyprotocol the server’s data is organized into a

√n×√n matrix,

and this basic PIR step is run on every row with a single E(I)vector. As a result, the query size is Θ(f

√n) and the response

size is also Θ(f√n). Further optimizations are possible — see

the original paper for details [19].We note that, like many cryptographic protocols, the tradi-

tional PIR definition above is one in which data items are sin-gle bits. In practice the items would typically be multiple bits,which is easily accomplished by running multiple instances ofthe PIR protocol in parallel.

3.3 Oblivious Transfer (OT)

Oblivious Transfer is a cryptographic protocol that was intro-duced by Brassard, Crepeau and Roberts [1] and is used asa fundamental construct in various cryptographic protocols,including those for multi-party secure function evaluation.Consider a setting where Bob has a string of bits X1, ..., Xn

and Alice wants to know one of them. Alice does not wantBob to know which bit she has chosen, and Bob does not wantAlice to know any bit other than the one she has chosen. Thesolution for this is known as the 1-of-n OT protocol. Thereare two variants of this protocol: the 1-of-2 OT protocol andthe k-of-n OT protocol. In 1-of-2 OT, Bob has 2 elements, X0

and X1. Alice chooses to know one of them. Alice receivesexactly one element without learning anything about the otherelement, while Bob remains oblivious as to which elementwas sent. In the k-of-n OT protocol, Bob has a string of bitsX1, ...Xn, Alice wants to know a subset of k of these bits.Many efficient protocols have been proposed for the ObliviousTransfer problem, including a protocol in the standard modeldue to Naor and Pinkas [23], and more recently a hardware-assisted protocol due to Gunupudi and Tate [11]. In this paperwe use Naor and Pinkas’s OT [23] — we describe theirtechnique for doing 1-of-N OT using an existing 1-of-2 OTbelow. The protocols below handle m-bit messages directly,rather than relying on multiple single-bit OTs.

Let Bob be a server who holds an N -element stringX1, · · · , XN where each Xi ∈ {0, 1}m is an m-bit value. LetAlice be a client who wishes to obtain element XI . The 1-of-N OT protocol proceeds thus: Bob first prepares l = dlogNe

5

random key-pairs (K01 ,K

11 ), (K0

2 ,K12 ), · · · , (K0

l ,K1l ) where

for all 1 ≤ j ≤ l and b ∈ {0, 1}, Kbj is a t-bit key to a pseudo-

random function FK . For all 1 ≤ I ≤ N let (i1, i2, · · · , il)be the bits of I . Bob encrypts each element in the stringas YI = XI ⊕

⊕lj=1 FKij

j

(I) and sends all the encrypted

strings Y1, · · · , YN to Alice. Alice and Bob then engage ina 1-of-2 OT for each pair of keys (K0

j ,K1j ). If Alice wants

element XI , she picks key Kijj . Finally Alice reconstructs

XI = YI ⊕⊕l

j=1 FKijj

(I).

4 A TWO-PHASE FRAMEWORK

For organizing the location database, we follow the same basicmethodology given in [6]. As part of pre-processing, the wholespace is divided into Voronoi tessellations using the set ofPOIs, S = {s1, s2, . . . , sα} and a grid or matrix of size M =d√ne×d

√ne is superimposed on top of it as shown in Figure

2. For each grid cell gc (which can also be referred to as Ma,b

where a is the row number and b is the column number), theserver computes the POIs located in all the Voronoi cells thatintersect gc and prepares a neighbour list that contains thepossible nearest neighbours for any point in the cell. Figure2 shows the neighbour list for each grid cell. The server setsthe maximum number of POIs in the lists to a constant fixednumber, Pmax (e.g., in Figure 2, Pmax = 3). In case some ofthe cells do not have enough POIs in the list, the server padsthe list with duplicate entries till it reaches the value of Pmax.This can be seen in Figure 2 in the neighbour lists for A1,C2, etc. In the figure, we have used duplicate entries for thepadding, but one can use a pre-agreed padding value as well.

When a user u with location l requests its nearest neighbour,it simply requests the neighbour list associated with the gridcell that contains l and filters the results to find the nearestneighbour of l. Let Ma,b be the matrix cell corresponding tothe user’s location l. Using PIR, in [6], the user selects a querymessage y = [y1, ..., yd

√ne] (yi ∈ Z+1

n , i = 1, 2, . . . ,√n)

where yb ∈ QR′ and ∀j 6= b, yj ∈ QR (Here b is the columnthe user is located in). The server computes for each row, r ofthe matrix, M, the value zr = Πd

√ne

j=1 (wr,j) and returns z =[z1, ..., zd

√ne]. Here wr,j = y2

j if Mr,j = 0, or yj otherwise.The client can then use a similar method as introduced inSection 3 to calculate the values of M∗,b where ∗ representsany row in M .

Since the data (locations) are organized in a grid, thisleads to the server returning extra information which the userhas not queried about, wherein most of the data returnedis not relevant. Specifically, if the user requests one grid ormatrix cell Ma,b, the server, along with Ma,b returns thecontents of the entire column b. Since the size of the datareturned increases with the size of the grid, the user hasto sift through a lot of extraneous data, or data that mightnot be particularly useful to get their nearest neighbours. Inthe optimization techniques proposed in [6], data compressionand using a rectangular matrix instead of a square one arecited as ways to reduce the amount of data returned bythe server. By compressing the query results returned by theserver, we would just be removing duplicates and reducing the

size of the data returned, without reducing the actual contentreturned. Organizing the data as a rectangular matrix wouldreturn as many extra locations as there are rows in the grid.Neither of these approaches ensure that the server alwaysreturns just one grid cell and its corresponding neighbours list.One could possibly achieve this by organizing the locationdatabase as a one-dimensional list, however, this approachwould be too expensive; our two-level approaches ensure thatthe server always returns only one cell to the user and at areasonable cost. Furthermore, our approaches are orthogonalto techniques proposed in [6] and can be applied on top oftheir methodology.

In our two-phase framework, the user first asks the serverto keep aside its encrypted column, b, using PIR or OT. Inthe second step, to retrieve its cell in the column, i.e., cellMa,b, the user can use either PIR, if they had used PIR orOT in the first step, or 1-of-

√n OT, if they had used PIR

in the first step. Below we describe our framework for theexact nearest neighbour query, EXACT NN with three cases:PIR+OT, PIR+PIR, OT+PIR, and the two sub-protocols whichare used in them: OT 1√

nSERVER and OT 1√

nCLIENT where

OT 1√n

refers to 1-of-√n OT. We also discuss why the possible

fourth case: OT+OT, although possible, isn’t as efficient as theother three approaches.

A1 P1,P1,P1 A2 P1,P3,P3

A3, A4, ….. ………… B2 P1,P2,P3 C2 P2,P5,P5 C4 P3,P4,P4

Fig. 2. Sample dataset for protocol illustration

Protocol 1 EXACT NN WITH PIR+OT1) Server – Bob uses the POIs S to create Voronoi tessel-

lations of the space.2) Bob divides the space into an

√n ×√n grid M , so

the grid is represented by (M1,1, · · · ,M√n,√n). Theneighbour list for each grid cell is prepared. Here, thecontents of each grid cell is an m-bit string.

3) User – Alice initiates a query in order to find theneighbour list of the cell she is in. Alice also randomlygenerates a composite modulus N = p · p′, where p, p′

are large primes, and sends N to Bob. Note that onlyAlice knows the factorization of N , not Bob.

4) Bob sends the granularity of the grid.5) Let Alice be located in Ma,b. Alice selects the column

in which her cell is located, i.e., column b. At this point,Alice can choose to engage in either of two protocols:PIR or OT 1√

nCLIENT. For this protocol, Alice chooses

to engage in PIR.

6

Let PIR(b) = [y1, ..., yd√ne] such each yi ∈ Z+1

N ,yb ∈ QRN , and ∀j 6= b, yj /∈ QRN . Alice issues aPIR request PIR(b) to Bob.

6) Bob prepares the PIR response, z = [z1, · · · , zd√ne] foreach row, r of the matrix thus:

zr = Πd√ne

j=1 (wr,j)

where wr,j = y2j if Mr,j = 0, or yj otherwise. In

total there will be m responses for each zi (denotedby zi[1], zi[2], · · · , zi[m]). At this point Bob can chooseto engage in either of two protocols: OT 1√

nSERVER or

PIR. For this protocol, Bob chooses OT 1√n

SERVER. Bobruns the protocol OT 1√

nSERVER(z) where z is the PIR

response for each bit of the m-bit string.7) Alice runs the OT 1√

nCLIENT protocol for obtain-

ing the ath element in z. In the course of theOT 1√

nCLIENT(a, Y ) protocol, Alice gets the encrypted

values Y = [Y1, · · · , Yd√ne], of which Alice can onlydecrypt Ya = za. After decrypting za Alice can checkif each bit i (i = 1, 2, . . . ,m) of za[i] ∈ QRN orza[i] ∈ QR′N using the formula (za[i]

p−12 = 1 mod

p) ∧ (za[i]p′−1

2 = 1 mod p′). If this expressionevaluates to true, then it implies that za[i] is in QRN .For each bit, if za[i] ∈ QRN , then Alice concludes thatbit is 0, else if za ∈ QR′N , the bit is 1. Alice repeats thisprocess for each bit of the m-bit string. Once she doesthis, she can get the contents of her cell Ma,b. Fromthe contents of Ma,b, she can get the list of neighbours(POIs) associated with Ma,b. Once Alice knows the setof POI’s, she can calculate their individual distances andfind her nearest neighbour.

8) Return the nearest neighbour.Protocol 2 EXACT NN WITH OT+PIR1) Server – Bob uses the POIs S to create Voronoi tessel-


√n ×

√n grid M ,

so the grid is represented by (M1,1, · · · ,M√n,√n);The neighbour list for each grid cell is prepared. Bobconverts each column (along with its cells’ neighbours)into a string: (X1 = [X1,1, · · · , X1,

√n], · · · , X√n =

[X√n,1, · · · , X√n,√n]), with each element m bits long.So in all there will be

√n strings. Bob then exe-

cutes the OT 1√n

SERVER(X) on each column of thegrid and creates

√n encrypted columns: (Y1 =

[Y1,1, · · · , Y1,√n], · · · , Y√n = [Y√n,1, · · · , Y√n,√n]).

Please note that Bob needs to encrypt each column ofthe grid with a different set of keys, but the same keyis used for encrypting all the cells in a single column.So Bob needs to generate a total of log

√n keys, one

for each column. Bob does not engage in the 1-of-2 OTwith Alice at this point to give her the keys, but waitsuntil after the PIR step.

3) User – Alice initiates a query in order to find theneighbour list of the cell she is in (for each bit inthe neighbour list). Alice also randomly generates acomposite modulus N = p · p′ and sends N to Bob

(p, p′ are large primes). Note that only Alice knows thefactorization of N .

4) Bob sends the granularity of the grid.5) Alice chooses the encrypted column containing her cell

Ma,b, e.g., column Yb = [Y1, · · · , Yd√ne], and engagesin PIR. Let Z∗N denote the set of integers that areco-prime with N . Denote the set of quadratic residues(QR’s) modulo N using the formula:

QRN = {y ∈ Z∗N |∃x ∈ Z∗N : y = x2 mod N}

Let PIR(b) = [y1, ..., yd√ne] such that yb ∈ QRN and

∀j 6= b, yj /∈ QRN . Alice issues a PIR request PIR(b)to Bob.

6) Bob prepares the PIR response: z = [z1, · · · , zd√ne] foreach row, r of the column b thus:

zr = Πd√ne

j=1 (wr,j)

where wr,j = y2j if Mr,j = 0, or yj otherwise. Please

note that the PIR request and response are prepared overBob’s encrypted database as opposed to the plaintextdatabase as in the EXACT NN WITH PIR+OT andEXACT NN WITH PIR+PIR protocols. Bob sends thez vector to Alice.

7) For each element of the encrypted z vector, Alice checksif each bit i (i = 1, 2, . . . ,m) of za[i] ∈ QRN orza[i] ∈ QR′N using the formula (za[i]

p−12 = 1 mod

p) ∧ (za[i]p′−1

2 = 1 mod p′). If this expressionevaluates to true, then it implies that za[i] is in QRN .For each bit, if za[i] ∈ QRN , then Alice concludes thatbit is 0, else if za ∈ QR′N , the bit is 1.At this point, Alice and Bob engage in a 1-of-

√n

OT over the log√n keys. Bob converts log

√n keys

into a bit string, T and executes OT 1√n

SERVER(T).Alice executes OT 1√

nCLIENT(a,E(T)), where E(T ) is

the encrypted T string. Alice chooses the keys thatcorrespond to the za value (more precisely, the keys ofthe element with index a in z vector). Once she doesthis, she can get the contents of her cell Ma,b. Fromthe contents of Ma,b, she can get the list of neighbours(POIs) associated with Ma,b. Once Alice knows the setof POI’s, she can calculate their individual distances andfind her nearest neighbour.

8) Return the nearest neighbour.Protocol 3 EXACT NN WITH PIR+PIR1) Server – Bob uses the POIs S to create Voronoi tessel-


√n ×√n grid M , so

the grid is represented by (M1,1, · · · ,M√n,√n); Theneighbour list for each grid cell is prepared. Here, thecontents of each grid cell is an m-bit string.

3) User – Alice initiates a query in order to find theneighbour list of the cell she is in (for each bit inthe neighbour list). Alice also randomly generates acomposite modulus N = p · p′ and sends N to Bob(p, p′ are large primes). Note that only Alice knows thefactorization of N .

7

4) Bob sends the granularity of the grid.5) Let Alice be located in Ma,b. Alice selects the column

in which her cell is located, i.e., column b. At this point,Alice can choose to engage in either of two protocols:PIR or OT 1√

nCLIENT. For this protocol, Alice chooses

to engage in PIR. Let Z∗N denote the set of integersthat are co-prime with N . Denote the set of quadraticresidues (QR’s) modulo N using the formula:

QRN = {y ∈ Z∗N |∃x ∈ Z∗N : y = x2 mod N}

Let PIR(b) = [y1, ..., yd√ne] such that yb ∈ QRN and

∀j 6= b, yj /∈ QRN . Alice issues a PIR request PIR(b)to Bob.

6) Bob computes for each row, r of the matrix, M, thevalue zr = Πd

√ne

j=1 (wr,j) where wr,j = y2j if Mr,j = 0,

or yj , and returns z = [z1, ..., zd√ne] (please note this

process needs to be done for every bit of the contentsof the cell).

7) Alice sends a request y′ = [y′1, · · · , y′d√ne] with onlyy′a ∈ QR′N . For each bit i of the elements in the vectorz denoted by zj [i]

8) Bob computes the value z′ = Πd√ne

j=1 (wj,i) where wj,i =y′2j if zj [i] = 0, y′j otherwise. The server returns z′ tothe client.

9) Alice recovers za from z′ (bit by bit) and then Ma,b

from za using a similar approach as described in theEXACT NN WITH PIR+OT and EXACT NN WITHOT+PIR. From the contents of Ma,b, she can get the listof neighbours (POIs) associated with Ma,b. Once Aliceknows the set of POI’s, she can calculate their individualdistances and find her nearest neighbour.

10) Return the nearest neighbour.

Sub-protocol 1 OT 1√n

SERVER(X)

1) Let X = [X1, X2, · · · , X√n]. The client would like toknow the contents of one element Xi. Let l = log2

√n.

2) The server chooses a symmetric encryption algorithmEK , which uses key K and generates l randompairs of keys. The pairs of keys are represented by(K0

1 ,K11 ), (K0

2 ,K12 ), ·, (K0

l ,K1l ). So, Kb

j is a key forthe encryption algorithm EK ∀1 ≤ j ≤ l and ∀b ∈{0, 1}.

3) For each 1 ≤ i ≤√n, let 〈p1, p2, ..., pl〉 represent

the bits of i, the server prepares an encryption: Yi =Xi ⊕

⊕lj=1(E

Kpjj

(i)). Thus, there are a total of√n

encryptions.4) For all 1 ≤ j ≤ l, the server engages in a 1-of-2 OT

with the client on the strings 〈K0j ,K

1j 〉.

5) The server sends the strings Y = Y1, · · ·Y√n to theclient.

Sub-protocol 2 OT 1√n

CLIENT(i, Y )

1) The client picks an index i where 〈p1, p2, ..., pl〉 repre-sents the bits of i to get from the server.

2) The client engages in a 1-of-2 OT with the server tolearn the key Kpj

j for all j (1 ≤ j ≤ l).

3) Once the client gets the value of Y = Y1, · · ·Y√nfrom the server, it can reconstruct Xi thus: Xi =Yi ⊕

⊕lj=1(E

Kpjj

(i)).

We have given three protocols using different combinationsof PIR and OT: PIR+OT, OT+PIR and PIR+PIR. A possiblefourth protocol would be OT+OT. One can implement thisoption by first having the server encrypt the grid row-wise andthen re-encrypting the encrypted grid column-wise. The serverand client would then perform two OT 1√

n’s for the client to

get its cell’s row and column keys. The communication cost ofperforming OT+OT is the same as that required for a single-level OT (OT 1

n over the entire database), since, in single-levelOT, the server and client need to perform OT 1

2 over log n key-pairs, while in OT+OT, they need to perform OT 1

2 ’s over twovectors of log(

√n) key-pairs (and log n = 2 log(

√n)). The

computation cost on the server’s side would be a lot morein OT+OT too, since the server needs to encrypt the entiredatabase twice. Hence the efficiency gain (computation andcommunication-wise) from this would not be as significant asthe other 3 two-phase protocols, and the cost of performingOT+OT would be close to, if not more than single-phase OT.Hence we do not pursue the OT+OT approach any further.

We note that the PIR+PIR protocol is quite similar to the re-cursive PIR protocol given in Kushilevitz and Ostrovsky [19].The major difference between their protocol and ours is thatin the recursive scheme given in [19], if we assume eachelement of the z = [z1, · · · , z√n] vector to be f -bits long,where f is the length of the PIR modulus N , the client andserver engage in f executions of the PIR scheme, once forgetting each bit of the element za (za is the element theclient wants). In our scheme, the client gets za in one go.Additionally, the PIR+OT and OT+PIR schemes are essentiallyequivalent to constructing a symmetric PIR scheme where aclient gets exactly one element from the server. Kushilevitz andOstrovsky had suggested a way in which their PIR protocolcan be extended to support symmetric PIR, but their solutionrequires the client and server to execute a zero knowledgeproof in which the client proves that it followed the PIRprotocol correctly. Such zero knowledge proofs are expensiveprotocols and should generally be avoided in practice. Usingour protocols, we can implement symmetric PIR, but withouthaving to execute a zero knowledge proof. We note that theidea of constructing a symmetric PIR scheme using PIR and1-of-n OT was also pointed out by Naor and Pinkas [23].

5 SINGLE-PHASE APPROACHES

In this section, we describe three single-phase approaches forthe purpose of comparing performance with the two-phaseapproaches. The single-phase approaches include a single-phase PIR protocol, a single-phase 1-of-n OT protocol and a“random” OT protocol. In the first case, single-phase PIR, theserver database is organized as a grid in exactly the same wayand the protocol is the same as given in [6]. In the secondcase, single-phase OT, the database is organized as a one-dimensional list and the protocol is a simple invocation ofOT 1

nSERVER and OT 1nCLIENT. These are exactly the same

as OT 1√n

SERVER and OT 1√n

CLIENT with all instances of

8

√n replaced by n. In the random OT case, we organize the

database as a grid and do a 1-of-k OT over k randomlychosen cells (chosen by the client). The last approach is lessexpensive than the two-phase approaches or other single-phaseapproaches but with the caveat that it isn’t completely privacy-preserving — the server has a 1/k probability of guessing theclient’s location.

Protocol 4 EXACT NN WITH PIR

1) Server - Bob uses the POIs S to create Voronoi tessel-lations of the space.

2) Bob arranges the database as a single-dimensional gridwith n cells, X1, · · · , Xn. Here, the contents of eachgrid cell is an m-bit string.

3) User - Alice initiates a query in order to find theneighbour list of the cell she is in (for each bit inthe neighbour list). Alice also randomly generates acomposite modulus N = p · p′ and sends N to Bob(p, p′ are large primes). Note that only Alice knows thefactorization of N .

4) Bob sends the value of n.5) Let Alice be located in Xi. Alice issues a PIR request

[y1, ..., ydne] to Bob such that yi 6∈ QRN and for allj 6= i, and yj ∈ QRN .

6) Bob prepares the PIR response, z = [z1, · · · , zdne] foreach bit of the m-bit string. So there will be m responsesfor each zi (denoted by zi[1], zi[2], · · · , zi[m]).

7) From the previous step, Alice getszi[1], zi[2], · · · , zi[m], of which Alice canonly decrypt za. After decrypting za Alicechecks if each bit i (i = 1, 2, . . . ,m) ofza[i] ∈ QRN or za[i] ∈ QR′N using the formula(za[i]

p−12 = 1 mod p) ∧ (za[i]

p′−12 = 1 mod p′). For

each bit, if za[i] ∈ QRN , then Alice concludes that bitis 0, else if za ∈ QR′N , the bit is 1. Alice repeats thisprocess for each bit of the m-bit string. Once she doesthis, she can get the contents of her cell Xi. From thecontents of Xi, she gets the list of neighbours (POIs)associated with Xi. Once Alice knows the set of POI’s,she calculates their individual distances and find hernearest neighbour.

8) Return the nearest neighbour.

Protocol 5 EXACT NN WITH OT


2) Bob arranges the database as a single-dimensional gridwith n cells, X1, · · · , Xn. Here, the contents of eachgrid cell is an m-bit string.

3) Alice calls OT 1nCLIENT(i, Y ), where i is the index of

the cell Alice wants.4) Bob executes OT 1

nServer(X) and prepares the responseY = [Y1, · · · , Ydne] to return to Alice.

5) Alice computes Xi from Y = [Y1, · · · , Ydne], andfrom the contents of Xi, gets her list of neighboursand computes the nearest neighbour based on minimumindividual distance.

Protocol 6 EXACT NN WITH RANDOM OT


2) Bob divides the space into an√n ×√n grid M , so

the grid is represented by (M1,1, · · · ,M√n,√n); Theneighbour list for each grid cell is prepared. Here, thecontents of each grid cell is an m-bit string. Also, Bobconverts the M =

√n ×√n matrix into a n-element

string: X = X1, · · · , Xn.3) User - Alice initiates a query in order to find the

neighbour list of the cell she is in (for each bit in theneighbour list).

4) Bob sends the granularity of the grid.5) Let Alice be located in Xi. Alice chooses k − 1 other

random cells in the grid so she has a total of k indices,i1, i2, . . . , ik, where i (Alice’s location) is one of theindices.

6) Alice then runs the OT knCLIENT(i1, · · · , ik, Y ) algo-rithm as given in Naor and Pinkas [23].

7) Bob runs the OT knSERVER(k) algorithm as in [23] andreturns Xi and its associated nearest neighbours to Alice

8) Alice computes nearest neighbour from the list of neigh-bours based on minimum individual distance.

6 DEFINITIONS, PROOF AND COMPLEXITY

In this section, we define the security properties that arerequired for the client and server in the two-phase frameworkand then show that our protocols satisfy them. PIR and OThave been individually proven secure [19], [23], and we donot reproduce the proofs here. We need to prove that ourcombination of PIR and OT retains the relevant securityproperties. Additionally, we assume that the OT protocolin [23] uses a secure 1-of-2 OT protocol. Our definitions arebased on the concept of computational indistinguishability andwe assume our protocols are executed in the presence of static,semi-honest adversaries. Both of these concepts are definedbelow as in [8].Computational indistinguishablity: Let there be two distribu-tion ensembles, X = {Xk}k∈N and Y = {Yk}k∈N, where k isa security parameter. We say that X and Y are computationallyindistinguishable if, for every non-uniform polynomial-timecircuit family, {Ck}k∈N (also known as distinguisher) andevery positive polynomial p(·), it holds that:

|Pr[Ck(Xk) = 1]− Pr[Ck(Yk) = 1]| < 1p(k)

Definition 6.1 (Secure function evaluation): Letf : ({0, 1}∗)m → ({0, 1}∗)m be an m-ary functionality, wherefi(x1, · · · , xm) denotes the ith element of f(x1, · · · , xm). ForI = {i1, · · · , it} ⊆ [m]

def= {1, · · · ,m}, let fI(x1, · · · , xm)

denote the subsequencefi1(x1, · · · , xm), · · · , fit(x1, · · · , xm). Let Π be an m-partyprotocol for computing f . The view of the ith party duringan execution of Π on x = (x1, · · · , xm), is denoted byviewΠ

i (x), and for I = {i1, · · · , it}, we let viewΠI (x)

def=

(I, viewΠi1(x), · · · , viewΠ

it(x)). Letc≡ denote computational

indistinguishability by non-uniform families of polynomial-time circuits. We say that Π privately computes f if there

9

exists a probabilistic polynomial-time algorithm denoted S,such that for every I ⊆ [m], it holds that

{S(I, (x1, cdots, xit), fI(x)), f(x)}x ∈ ({0, 1}∗)mc≡ {(viewΠ

I (x), outputΠ(x))}x ∈ ({0, 1}∗)mInformally put, this says that the view of all the parties in

I can be efficiently simulated solely based on their inputs andoutputs.

6.1 Definition of privacy propertiesDefinition 6.2 gives the formal definition of the desired privacyproperties, but these properties also have simple intuitivedescriptions. The Correctness property states that if the clientand server are honest and the client requests the neighbours ofa grid cell i, then the client will learn the neighbours of celli. The Client’s Privacy property states that the server’s viewof the transaction when the client requests the neighbours ofcell i is computationally indistinguishable from the case wherethe client requests the neighbours of a different cell i′. TheServer’s Privacy property states that the client’s view of thetransaction can be completely simulated by a polynomial-timesimulator that is given access to just the client’s inputs andoutput, such that the simulated and real executions are com-putationally indistinguishable. This implies that a corruptedclient cannot gain any information that it is not meant to learn.

6.1.1 Formal definitionDefinition 6.2 (EXACT NN Security Properties): As

before, we let Mi,j for 1 ≤ i, j ≤√n be the cells in the

matrix M , where each cell contains the list of Voronoiregions that intersect the cell and is exactly m bits long (withpadding added if necessary). Let k be a security parameterthat determines, among other things, the length of the keysin the cryptographic constructs that are used in our protocols— the size of the input (nm) should also be bounded by apolynomial in k. Let I = (a, b) denote the client’s location, sothe full input to the EXACT NN computation is X = (I,M),with I being known to the client and M being known to theserver. For an EXACT NN protocol Π, let viewΠ

s (I,M) andviewΠ

c (I,M) denote the server’s and client’s views of theprotocol, respectively (note that since Π can be randomized,these are actually random variables, or ensembles indexedby (I,M)). Similarly, let outputΠc (I,M) denote the outputof the client when running protocol Π (note that while theserver interacts with the client, it does not have an “output”or result of its own). Given these definitions, an EXACT NNprotocol Π is secure if the following properties hold:

1) Correctness: For honest client c and honest server sinteracting as defined by Π,

Pr[outputΠc (I,M) 6= MI

]≤ 1p(k)

for any polynomial p(k) and sufficiently large k.2) Client’s Privacy: For any PPT server s (not necessarily

the honest server specified by protocol Π), there existsa polynomial time simulator Ss such that

Ss(M)c≡ viewΠ

s (I,M) .

Note that since the simulation Ss(M) operates in-dependently of I , it follows that for every I, I ′ ∈{1, . . . ,

√n} × {1, . . . ,

√n},

viewΠs (I,M)

c≡ viewΠs (I ′,M) .

3) Server’s Privacy: For any PPT client c (not necessarilythe honest client specified by protocol Π), there exists apolynomial time simulator Sc such that

Sc(I,MI)c≡ viewΠ

c (I,M) .

Note that S is given only c’s input I , and the correctoutput MI , so there is no information leaked about theother cells of M .

6.2 Theorem

As a starting point for our constructions, we assume that weare given secure protocols for PIR and OT — in our protocolswe use the PIR protocol due to Kushilevitz and Ostrovsky(KO) [19] and the OT protocol due to Naor and Pinkas(NP) [23]. We need to prove that our combination of PIR andOT preserves the security properties of the original protocols.The two primary security proof models for cryptographicprotocols are the simulation-based model and the reductionmodel. Sequentially composing two proofs in the simulationmodel is typically straightforward; however, the KO PIRprotocol was previously proved secure in the reduction model.Hence, we first convert the PIR reduction proof of KO to asimulation-based proof, and then sequentially compose it withthe simulation-based OT proof of Naor and Pinkas. We notehere that the security of the PIR+PIR scheme follows directlyfrom the security of the recursive PIR scheme presented in [19]and is not discussed further.

Readers can either refer to Section 3 of the KO paper forthe description of the PIR scheme or refer to Section 3 ofthis paper. We re-write the KO proof in the simulation modelbelow. Let Bob be the server with inputs (M1,1, · · · ,M√n,√n)arranged in a

√n ×√n matrix, and let Alice be a querying

client who wants element Ma,b.Theorem 6.1: The PIR protocol as described in the KO

paper preserves the privacy of Alice against a corrupt static,semi-honest PPT server, Bob.

Proof: The correctness property follows directly from thedescription of the PIR scheme. For protecting against a corruptBob (or for preserving Alice’s privacy), we need to construct asimulator for Bob, B(M1,1, · · · ,M√n,√n) who can simulateBob’s view of the protocol. B first picks two f/2-bit primes,multiplies them and gets a f -bit modulus N . Since B knowsthe factorization of N , B can easily generate a vector of fakevalues: y′1, · · · , y′√n ∈ Z+1

N . This is not the same as Alice’sinput y1, · · · , y√n in the real, non-simulated protocol, but isnevertheless computationally indistinguishable from the realy values since N belongs to a hard set for which quadraticresiduosity predicates are hard to determine. It is easy to seethat B can generate the z1, · · · , z√n vector which would beindistinguishable from the real Bob’s z vector. Hence B cancompletely simulate the view of Bob.

10

Since this is PIR, not OT, we do not have to account for acorrupted Alice, since PIR only preserves Alice’s privacy, notBob’s. Hence the proof.

Next follows our main theorem.Theorem 6.2: Using the KO PIR protocol and the NP

OT protocol, the PIR+OT and OT+PIR protocols given inSection 4 are secure EXACT NN protocols as defined inDefinition 6.2.

Proof: We consider the individual properties from Defi-nition 6.2 below.

1) Correctness: Follows directly from the correctness ofPIR and OT.

2) Client’s privacy: We construct a simulator for the server(Bob) Ss(M) by combining Bob’s simulators for PIRand OT: BPIR from Theorem 6.1 and BOT . If wecombine the simulators in the order: (BPIR, BOT ), i.e.,BPIR goes first, since Alice’s contribution to the OTprotocol is independent of the result of the PIR, wesimply concatenate BPIR and BOT as the simulation ofour PIR+OT protocol (Bob computes the z1, · · · , z√nvector, so it can be given as input to the BOT sim-ulator). If we combine the simulators in the order:(BOT , BPIR), i.e., BOT goes first, BOT ’s output is

√n

encrypted vectors, each corresponding to one column ofthe grid: E(y1), · · · , E(y√n). The encrypted vectors arethen given to the BPIR simulator which then simulatesthe rest of the PIR protocol over the encrypted matrix,instead of the plaintext matrix.Since the outputs of the PIR and OT simulators areindividually indistinguishable from the views of the realprotocols, and since Alice’s contributions to the two pro-tocols are independent, the concatenated simulated viewis indistinguishable from the view of the sequentiallycomposed real protocols.In both cases (PIR+OT and OT+PIR) we have createda simulator for the server’s view, and therefore Alice’sprivacy is preserved in the presence of a corrupted Bob.

3) Server’s privacy: For a client c (i.e., Alice), we needto construct a simulator Sc(I,MI) that completely sim-ulates Alice’s view of the transaction. We assume wehave a simulator AOT that simulates Alice’s view ofthe OT protocol. We do not have a APIR simulatorsince PIR, by itself is not secure against a maliciousAlice. For (PIR+OT) where PIR is performed first, Bobperforms the first half and does not return anything tothe real Alice, so Sc does not need to simulate anything.In the second half, Sc can easily simulate Alice’s part byrunning the AOT simulator. If the OT keys are generatedusing a secure PRF and if a secure OT 1

2 exists, theoutput of Sc will be computationally indistinguishablefrom the real Alice’s output.In (OT+PIR), where OT goes first, Sc does not haveto do anything for OT, since Bob does not return anyresult in the first half to the real Alice. Sc needs tosimulate Alice’s view in the second half - PIR. Forthis, Sc randomly generates a vector: y′1, · · · , y′√n and

the corresponding z′1, · · · , z′√n vector, which will befake strings, but indistinguishable from the real y and zvectors because of the quadratic residuosity assumption.Hence one can construct a simulator for Alice, and Bob’sprivacy is preserved in the presence of a corrupted Alice.

Hence any combination of PIR and OT are sequentiallycomposable.

6.3 Complexity discussionThe computation and communication complexity of the single-phase and two-phase protocols are shown in Table 2, Table 3,Table 4, and Table 5. In the analysis, we will assume that thelocation database is organized as a

√n ×√n grid with the

contents of each grid cell being m-bits long. The length ofthe PIR modulus is f bits and the length of the keys used inOT are g bits. In the following, we use polynomials p(f) andp(g) to denote the cost of basic computations on f -bit and g-bit values — at worst these are modular powering operations,and so p(x) = O(x3). The costs of basic operations used inthe analysis are given below:

1) The computation cost of a user preparing a PIR requestis O(p(f)

√n). This is the cost of a user preparing the

y = [y1, · · · , y√n] vector.2) The server computation cost for PIR is O(p(f)mn).

This is the cost of the server computing the z =[z1, · · · , z√nm] vector.

3) The computation cost of a user decrypting the z array re-ceived from server as the PIR response is O(p(f)m

√n).

4) The communication cost of PIR is O(fm√n).

5) The user computation cost for OT is O(p(g) log n) —in the last step of OT, the user needs to decrypt theserver’s response log n times, and g is the length of thekeys used.

6) The server computation cost for OT is O(p(g)n) — theserver needs to encrypt the entire database, where g isthe length of keys being used.

7) The communication cost of OT is O(gn), since theserver sends the entire encrypted database over to user.Note that there is also the cost of log n 1-of-2 OT’s,but this cost of O(g log n) is dominated by the databasecommunication cost.

Analysis: The single-phase PIR and single-phase OT analy-ses, given in Table 2 and Table 3 are just direct applications ofthe above costs. For analyzing the computation cost of randomOT, we replace n cells by k cells. For the communicationcost of random OT, the main cost is due to sending kencrypted cells from the server, with other communication(initial communication of k cells and the cost of doing 1-of-2 OTs) being insignificant in comparison. For the two-phaseprotocols, the computation and communications analysis isbriefly explained below.

Computation analysis: In the two-phase PIR computationanalysis, we consider the server’s computation cost for per-forming PIR twice and user’s side computation for preparingand decrypting the server’s PIR response. In the PIR+OTcomputation analysis, we consider the user’s computation cost

11

TABLE 2Computational cost for single-phase protocols

Party PIR OT Random OTUser O(p(f)m

√n) O(p(g) log n) O(p(g) log k)

Server O(p(f)mn) O(p(g)n) O(p(g)k)Total time O(p(f)mn) O(p(g)n) O(p(g)k)

TABLE 3Communication cost for single-phase protocols

PIR OT Random OTO(fm

√n) O(gn) O(gk)

of preparing the PIR request, server’s cost of performing thePIR computation and encrypting the result of the first phase(PIR) using OT, besides generating the key pairs for OT. Also,we consider the user’s computation cost for decrypting thefinal result of the OT that the server sends to the user. For theOT+PIR computation analysis, we consider the server’s costof encrypting the database and generating keys in the firststep for OT, and the user’s cost for decrypting the final PIRresponse that the server sends. The computation cost of thetwo-phase protocols are given in Table 4.

Communication analysis: For the communication cost, inthe PIR+PIR protocol, we consider the cost of user sendingthe initial PIR request vector and the server sending back thefinal PIR response. For the PIR+OT, we consider the usersending the initial PIR vector and the server and user doingthe 1-of-2 OT’s for the user to get their keys and the cost ofthe server sending the encrypted OT data to the user. For theOT+PIR analysis, we consider the cost of the user and serverperforming 1-of-2 OT’s for the user to pick their keys and thecost of the server sending the final PIR response to the user.The communication cost of the two-phase protocols are givenin Table 5

TABLE 5Communication cost for two-phase protocols

PIR+PIR PIR+OT OT+PIRO(fm

√n) O(gm

√n) O(fg

√n)

The main advantage of our two-phase scheme over single-phase PIR is that the amount of data revealed by the serveris very low. Hence if we have a server that allows the user toquery some part of the database, but does not want to reveal theentire database, it can do so more effectively with our protocolthan with single-phase PIR (as described in [6] for squaregrids). It is possible to achieve this with PIR over a single-dimensional list or array of data (not a grid), but that wouldbe too expensive. Additionally, this also reduces the burdenon the client/user to sort through redundancies in the data theserver sends to find the data element or item that they (client)are looking for. Also, one can use OT over the entire grid andachieve the same level of security, but the cost of doing thisis significantly higher than the two-phase approach (Θ(gn)in single-phase OT vs. Θ(g

√n) in two-phase). Random OT

has a lower cost than the single and two-phase approaches,but provides a lower degree of privacy: the server has a 1/k

probability of guessing the client’s location.

7 EXPERIMENTS

We used Java to implement the two-phase and single-phaseframeworks and used the Stopwatch library to instrumenttiming measurements in the code; the numbers given in theexperiments are all actual execution times. For measuringcomputation time at the server and user, we averaged thetime taken over 100 queries. We needed two datasets to usein our experiments: a user location dataset and a Points ofInterest (POIs) dataset for which we used a set of spatialdatasets provided by Li et al. [21] which is a real datasetof a road network and POIs in California. The number ofuser locations in our experiments range from 0-100K. Forgenerating user locations, we partially used the Californiaroad network dataset [21] which contains 21,048 locations andthe rest were randomly generated by us. Each user locationcorresponds to a grid cell. The POIs in our experiments rangefrom 0-100K for which we used the California POI datasetprovided in [21] which contains 104,770 POIs. The dataset weused contains POIs from 62 categories such as school, church,airport, beach, etc. In the experiments, for computing the user’snearest neighbour, we measured the Euclidean distance fromthe user locations to the POIs and considered all POIs in asingle category. In particular, in the set of experiments wherewe keep the number of POIs constant and vary the number ofgrid cells, we set an upper limit on the number of POIs (1000).The 1000 POIs can be taken from any category as long as theyare closest to the user’s location. This can easily be extendedto a more restricted case where the user’s neighbours are froma specific category of POIs by reducing the limit on the POIsreturned by the server and choosing POIs only from a specificcategory. For the restricted case, we estimate the costs will beless than the ones reported in this paper.

Since PIR requires large integers, we used the Java Big-Integer data type in the PIR implementation. We measuredthe performance of the three protocols in our two-phaseframework and the performance of the three protocols inthe single-phase framework in terms of computational andcommunication cost. In our experiments, we varied the lengthof the modulus for PIR, N from 576-1536 bits (the size of apublic key) and varied the number of POIs and grid cells from10K to 100K. In our implementation, we used Kushilevitz andOstrovsky’s version of PIR [19] and Naor and Pinkas’s versionof 1-of-n OT [23]. In the experiments, we have comparedour two-phase protocols with protocols that offer a similardegree of privacy (but with different costs), and hence do notcompare them with approaches such as k-anonymity or datapertubation which do not offer the same level of privacy thatwe are offering (total non-disclosure of user/server data). Thecomparison benchmarks we use are the protocol in [6] andour own implementation of Oblivious Transfer.

Firstly we compare the server’s computation time for thetwo-phase protocols: PIR+PIR, PIR+OT, OT+PIR and single-phase protocols: PIR (as proposed in [6]), OT, Random OT, forvarying grid sizes from 10K to 100K, the results of which areshown in Figure 3 and Figure 4. For this set of experiments,

12

TABLE 4Computational cost for two-phase protocols

Party PIR+PIR PIR+OT OT+PIRUser O(p(f)m

√n) O(p(f)m

√n + p(g)m log

√n) O(p(f)g

√n)

Server O(p(f)mn) O(p(f)mn + p(g)m√

n) O(p(g)n + p(f)g√

n)Total time O(p(f)mn) O(p(f)mn + p(g)m

√n) O(p(g)n + p(f)g

√n)

we kept the number of POIs per cell constant at 1000 (sincewe were varying the grid size). In the random OT protocol,we varied the number of random cells chosen by the userfrom 2K for 10K grid to 20K for 100K grid. We can seethat PIR+PIR and PIR+OT have similar times while OT+PIRrequires significantly more time. This is due to the fact thatthe server needs to encrypt the entire database column-by-column in OT+PIR, whereas in PIR+OT, the server just has toencrypt one column. The two-level protocols are expectedlymore expensive than the single-level PIR and random OT butalso offer more privacy. In particular, the client gets only theircell and its neighbours in the two-level protocols, but usingsingle-level PIR, it would get the entire grid column, andusing random OT, the server would have a 1/k probability ofguessing the client’s location, where k is the number of cellsthe client chooses. The two-phase protocols provide a higherlevel of privacy at the expense of increasing the computationtime on the server’s side. The cost of single-phase OT is muchhigher than the two-phase protocols though, which confirmsour hypothesis that a 1-of-n OT over the entire grid would betoo expensive, although it provides the same level of privacy.

0

10

20

30

40

50

60

70

80

90

10K 25K 50K 75K 100K

Serv

er

tim

e (

s)

Number of grid cells

PIR+PIR

PIR+OT

OT+PIR

Fig. 3. Server computation time with varying grid size intwo-phase protocols

We next compared the computation time required at theserver for two-phase protocols with the single-phase protocolswith varying modulus sizes. The modulus N as used in PIRis the size of a public key (between 576-1536 bits). As themodulus grows it becomes harder to factor, and hence provideshigher security, but incurs more computation cost. Figure 5shows that even for a modulus of size 1536 bits, which isreasonably large, the computation time taken is a little over

0

50

100

150

200

250

300

10K 25K 50K 75K 100KS

erv

er

tim

e (

s)

Number of grid cells

PIR

OT

Random OT

Fig. 4. Server computation time with varying grid size insingle-phase protocols

a minute in the two-phase protocols. We estimate that inapplications such as LBS, the typical size of the moduluswould be around 768 bits. Figure 6 shows the computationtime for single-phase PIR which is less than the time forthe two-phase protocols, since we perform computations onthe modulus just once. Also, the modulus is used only inPIR, hence we haven’t shown graphs for single-phase OT andrandom OT. In the two-phase protocols, this is reflected inthe fact that the time taken for PIR+OT and OT+PIR do notincrease as much as PIR+PIR.

Figure 7 shows the communication cost in Kb for the two-phase protocols with the number of POIs returned increasinglinearly from 10K-100K. The communication cost here de-notes the amount of data returned from server to client. Thecommunication cost for PIR+PIR is the least since the serverjust has to send the PIR response vector (the z vector) back tothe client. For PIR+OT and OT+PIR, it is slightly higher sincethe server needs to send an encrypted column back to the userand also perform a 1-of-2 OT for exchanging keys. Figure 8shows the communication cost in MB for POIs varying from10K to 100K for the single-phase protocols. We note thatthe communication cost for the single-phase protocols is interms of MB rather than Kb as in the two-phase protocolsand is one of the major points of difference between the two-phase and single-phase protocols. In single-phase PIR, thecommunication cost is obviously higher since the server hasto send extraneous data to the client. In single-phase OT, wedo a 1-of-n OT over the entire database, hence the server andclient have to perform log n 1-of-2 OT’s for the client to get

13

0

10

20

30

40

50

60

70

80

576

768

10

24

15

36

Serv

er

tim

e (

s)

Number of bits in modulus

PIR+PIR

PIR+OT

OT+PIR

Fig. 5. Server computation time with varying modulussize in two-phase protocols

5

10

15

20

25

30

35

40

45

576

768

10

24

15

36

Serv

er

tim

e (

s)


PIR

Fig. 6. Server computation time with varying modulussize in single-phase protocol

its keys, besides the server having to send the entire encrypteddatabase to the client. In the two-phase protocols where weuse 1-of-n OT, the OT is performed over a single column andthe amount of encrypted data and keys the server has to sendthe client is much less than in OT over the entire database.In random OT, the server has to send k encrypted cells of thegrid to the client and keys for decrypting a single cell out ofk cells.

We next compare the user computation time with increasingnumber of POIs (10K-100K) in the two-phase and single-phase protocols. We can see from Figure 9 that the usercomputation time for PIR+OT and OT+PIR is slightly higherthan PIR+PIR since the user needs to decrypt its grid cell andthe cell’s nearest neighbours from the data items returned bythe server using the keys obtained from the server through 1-

0

10

20

30

40

50

60

70

80

10K 25K 50K 75K 100K

Serv

er

co

mm

un

ica

tion

cost in

Kb

Number of POIs

PIR+PIR

PIR+OT

OT+PIR

Fig. 7. Server communication cost in two-phase proto-cols

0

0.5

1

1.5

2

2.5

3

3.5

4

10K 25K 50K 75K 100K

Serv

er

co

mm

un

ica

tion

cost in

MB

Number of POIs

PIR

OT

Random OT

Fig. 8. Server communication cost in single-phaseprotocols

of-2 OT. In case of PIR+PIR, the user does not have to performany decryptions: it just needs to check whether the bits of thestring returned by the server are quadratic residues or not. Inthe single-phase protocols in Figure 10, the user computationtime is a bit higher for single-phase PIR than for two-phasePIR since the user has to perform the quadratic residue/non-residue check for a few more cells than in the two-level case.The time taken for single-phase OT is significantly higherthan the two-phase protocols since the user needs to performdecryptions over log n elements as opposed to log

√n in the

two-phase protocols that involve OT. The user computationtime in random OT is less than the single-phase OT, but thisreally depends on the choice of k in the random OT. If k istoo small, the cost will be less, but the privacy offered willalso be lower. The largest value of k is n; if we set k = n, thecost will be the same as single-phase OT. This confirms that a

14

single-level OT, while offering the same level of privacy as thetwo-level protocols is much more expensive than the two-levelapproaches.

0

1

2

3

4

5

6

7

8

10K 25K 50K 75K 100K

Use

r tim

e (

s)

Number of POIs

PIR+PIR

PIR+OT

OT+PIR

Fig. 9. User computation time with varying number ofPOIs in two-phase protocols

0

5

10

15

20

25

10K 25K 50K 75K 100K

Use

r tim

e (

s)

Number of POIs

PIR

OT

Random OT

Fig. 10. User computation time with varying number ofPOIs in single-phase protocols

Figure 11 shows the user computation time with a varyingmodulus, N , from 576 bits to 1536 bits. This is relevant onlyin the case of PIR+PIR since OT does not use a modulus. Inthe two-phase protocols, PIR+PIR shows significant variationwhile the PIR+OT and OT+PIR times are almost constant.Figure 12 shows the user computation time in the single-phaseprotocols with varying modulus bits from 576-1536 bits; fortwo-phase PIR, this is 2-3 seconds less than the single-phasePIR.

Figure 13 and Figure 14 show the user computation timefor the two-phase protocols and single-phase protocols respec-tively with varying size of OT keys. The keys used in OT

0

0.5

1

1.5

2

2.5

576

768

10

24

15

36

Use

r tim

e (

s)


PIR+PIR

PIR+OT

OT+PIR

Fig. 11. User computation time with varying modulussize in two-phase protocols

0.5

1

1.5

2

2.5

3

57

6

76

8

10

24

15

36

Use

r tim

e (

s)


PIR

Fig. 12. User computation time with varying modulussize in single-phase protocol

are symmetric keys with size ranging from 80-256 bits. Wecan see that in the two-phase protocols, the time for PIR+OTand OT+PIR is almost same ranging from 1-13 seconds. Forsingle-phase OT, the user computation time goes from 1-25seconds. Needless to say, this is due to the extra decryptionsperformed on the user’s side (log n decryptions in single-phaseas compared to log

√n in the two-phase approaches). We

measured only the user’s computation time on varying the keysize, since in typical LBS scenarios; the user’s device wouldbe a mobile phone, PDA or any resource-constrained devicewhile the server would be a more powerful machine. Henceone would be more interested in minimizing the computationcost on the user’s side rather than server’s side.

The three two-phase protocols: PIR+PIR, PIR+OT andOT+PIR have a slightly higher (30-50 seconds) server com-

15

0

2

4

6

8

10

12

14

80

12

8

16

0

25

6

Use

r tim

e (

s)

Size of OT keys

PIR+OT

OT+PIR

Fig. 13. User computation time with varying OT key sizein two-phase protocols

0

5

10

15

20

25

80

12

8

16

0

25

6

Use

r tim

e (

s)

Size of OT keys

OT

Random OT

Fig. 14. User computation time with varying OT key sizein single-phase protocol

putation cost than single-phase PIR but offer an extra layerof security. The same level of security can be obtained byusing a 1-of-n OT, the cost of which is too high. One can useRandom OT as a middle ground between the conflicting goalsof providing privacy and minimizing cost, but this does notprovide 100% privacy which the other protocols do. Also, wenote that the communication cost of the two-phase protocolsare far lower than the single-phase protocols.

8 DISCUSSION AND FUTURE WORK

In this paper, we have proposed a way to achieve user privacyin location-based services, while ensuring the user doesn’tlearn information about any location in the server’s databaseother than its own. We have also defined the privacy propertiesdesired of such protocols in general and provided a proof

sketch for our protocols. Our experiments show that the two-phase protocols using PIR and OT have reasonable costswherein most operations take around a minute and are thusfeasible to use in real-world LBS applications.

One way our framework can be improved is by using a moreefficient OT protocol such as the one presented by Gunupudiand Tate [11] which makes the OT protocol non-interactiveas opposed to the interactive OT used in this paper. Usinga non-interactive OT protocol would significantly decreasethe communication cost of the OT operation; but requiresextra hardware security assumptions in the form of a TrustedPlatform Module chip [9] (TPM) on the server’s side. Inthis paper we do not consider the hardware-assisted modelof computation and hence have not used [11]. Previous workhas explored the idea of hardware-assisted PIR by requiringthe system to be augmented with expensive, computationallypowerful secure co-processors. It would be interesting toexplore the idea of realizing hardware-assisted PIR using cost-effective (but computationally weak) commodity hardwarechips such as the TPM or smartcards [20], [14], [13]. One canalso extend our framework beyond nearest neighbour queriessuch as k-nearest neighbour or other spatial queries. One of thedirections of future work which we are currently exploring isconsidering different models of communication in LBS suchas the peer-to-peer model and investigate the application ofcryptographic protocols to provide privacy solutions for spatialqueries in them such as group nearest neighbour queries.

REFERENCES

[1] G. Brassard, C. Crepeau, and J.M. Roberts. All or nothing disclosuresof secrets. In CRYPTO, pages 234–238, 1986.

[2] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan. Private Informa-tion Retrieval. In Proc. of the IEEE FOCS, pages 41–50, 1995.

[3] C. Y. Chow, M. Mokbel, and X. Liu. A peer-to-peer spatial cloakingalgorithm for anonymous location-based services. In Proc. of the ACMGIS, pages 247–256, 2006.

[4] B. Gedik and L. Liu. Privacy in mobile systems: A personalizedanonymization model. In Proc. of ICDCS, pages 620–629, 2005.

[5] G. Ghinita, P. Kalnis, M. Kantarcioglu, and E. Bertino. A hybridtechnique for private location-based queries with database protection.In Proc. of SSTD, pages 98–116, 2009.

[6] G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K.L. Tan. Pri-vate queries in location based services: Anonymizers are not necessary.In Proc. of the ACM SIGMOD, pages 121–132, 2008.

[7] G. Ghinita, P. Kalnis, and S. Skiadopoulos. PRIVE: Anonymouslocation-based queries in distributed mobile systems. In Proc. of the1st Int. Conference on World Wide Web (WWW), pages 371–380, 2007.

[8] Oded Goldreich. Foundations of Cryptography, volume 2: Basic Appli-cations. Cambridge University Press, New York, First edition, 2001.

[9] Trusted Computing Group. Trusted Platform Mod-ule Specifications – Parts 1–3. Available athttps://www.trustedcomputinggroup.org/specs/TPM/.

[10] Marco Gruteser and Dirk Grunwald. Anonymous usage of location-based services through spatial and temporal cloaking. In Proc. ofMobiSys, pages 31–42, 2003.

[11] Vandana Gunupudi and Stephen R. Tate. Generalized non-interactiveoblivious transfer using count-limited objects with applications to securemobile agents. In Proc. of Financial Cryptography, pages 98–112, 2008.

[12] U. Hengartner. Location privacy based on trusted computing and securelogging. In SecureComm: Proc. of the 4th international conference onsecurity and privacy in communication networks, pages 1–8, 2008.

[13] Entrust Inc. Entrust identityguard. Available athttp://www.entrust.com/strong-authentication/identityguard/.

[14] Verisign Inc. Verisign e-tokens. Available at http://www3.safenet-inc.com/news/2004/etoken/verisign-otp.aspx.

16

[15] P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias. Preventinglocation-based identity inference in anonymous spatial queries. In Proc.of the IEEE TKDE, pages 1719–1733, 2007.

[16] A. Khoshgozaran and C. Shahabi. Blind evaluation of nearest neighbourqueries using spatial transformation to preserve location privacy. In Proc.of SSTD, pages 239–257, 2007.

[17] A. Khoshgozaran and C. Shahabi. Privacy in location-based applications,research issues and emerging trends. In Privacy in Location-BasedApplications, volume 5599, pages 59–83, 2009.

[18] A. Khoshgozaran, C. Shahabi, and H. Shirani-Mehr. Location privacy:going beyond k-anonymity, cloaking and anonymizers. Knowledge andInformation Systems KAIS, 26:435–465, 2011.

[19] E. Kushilevitz and R. Ostrovsky. Replication is not needed: singledatabase, computationally private information retrieval. In Proc. of IEEEFOCS, pages 364–373, 1997.

[20] RSA Labs. Rsa SecureID authenticators. Available athttp://www.rsa.com/node.aspx?id=1311.

[21] Feifei Li, Dihan Cheng, Marios Hadjieleftheriou, George Kollios, andShang hua Teng. On trip planning queries in spatial databases. In Proc.of SSTD, pages 273–290, 2005.

[22] M. Mokbel, C. Chow, and W. Aref. The new casper: Query processingfor location services without compromising privacy. In Proc. of VLDB,pages 219–229, 2006.

[23] Moni Naor and Benny Pinkas. Computationally secure oblivioustransfer. Journal of Cryptology, pages 1–35, 2005.

[24] S. Papadopoulos, S. Bakiras, and D. Papadias. Nearest neighbor searchwith strong location privacy. Proc. of VLDB Endowment, 3:619–629,2010.

[25] S. R. Tate and R. Vishwanathan. General secure function eveluationusing standard trusted computing hardware. In Proc. of Privacy, Securityand Trust (PST), To appear, 2011.

[26] R. Weber, H. Schek, and S. Blott. A quantitative analysis and perfor-mance study for similarity-search methods in high-dimensional spaces.In Proc. of VLDB, pages 194–205, 1998.

[27] P. Williams and R. Sion. Usable PIR. In Proc. of NDSS, 2008.[28] W. Wong, D. Cheung, B. Kao, and N. Mamoulis. Secure kNN

computation on encrypted databases. In Proc. of the 35th SIGMODinternational conference on Management of data, pages 139–152, 2009.

[29] M. L. Yiu, G. Ghinita, C. S. Jensen, and P. Kalnis. Enabling searchservices on outsourced private spatial data. VLDB Journal, pages 363–384, 2010.

Date post:	11-Feb-2020
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

A Framework for Private Location-based Queries using ...1 A Framework for Private Location-based...

Documents