+ All Categories
Home > Documents > EfficientSearchableSymmetricEncryptionSupportingDynamic...

EfficientSearchableSymmetricEncryptionSupportingDynamic...

Date post: 28-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
16
Research Article EfficientSearchableSymmetricEncryptionSupportingDynamic MultikeywordRankedSearch YuZhang , 1 YinLi, 2 andYifanWang 3 1 School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China 2 School of Cyberspace Security, Dongguan University of Technology, Dongguan, China 3 Wayne State University, 42 W. Warren Ave., Detroit, MI 48202, USA Correspondence should be addressed to Yu Zhang; [email protected] Received 9 January 2020; Revised 10 June 2020; Accepted 24 June 2020; Published 16 July 2020 Academic Editor: Stelvio Cimato Copyright © 2020 Yu Zhang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Searchable symmetric encryption that supports dynamic multikeyword ranked search (SSE-DMKRS) has been intensively studied during recent years. Such a scheme allows data users to dynamically update documents and retrieve the most wanted documents efficiently. Previous schemes suffer from high computational costs since the time and space complexities of these schemes are linear with the size of the dictionary generated from the dataset. In this paper, by utilizing a shallow neural network model called “Word2vec” together with a balanced binary tree structure, we propose a highly efficient SSE-DMKRS scheme. e “Word2vec” tool can effectively convert the documents and queries into a group of vectors whose dimensions are much smaller than the size of the dictionary. As a result, we can significantly reduce the related space and time cost. Moreover, with the use of the tree-based index, our scheme can achieve a sublinear search time and support dynamic operations like insertion and deletion. Both theoretical and experimental analyses demonstrate that the efficiency of our scheme surpasses any other schemes of the same kind, so that it has a wide application prospect in the real world. 1.Introduction Nowadays, with the development of the network and vir- tualization technology, cloud computing technology has been developed rapidly. rough the cloud service, enterprises and individuals can obtain better computing and storage services at a lower cost. Since cloud servers are not entirely trusted, utilizing cloud services while maintaining data privacy is an essential concern. A straightforward way to address this issue is encrypting the data before outsourcing it to the cloud servers. However, this approach fails to meet the requirement of data retrieval since traditional encryption will scramble the original data, making the data inconvenient to utilize. In this scenario, the users have to download all the ciphertext data and decrypt them locally, which will bring huge transmission, storage, and computation overhead, which is not applicable in cloud environment. Searchable encryption (SE) can support keyword search without decrypting the data, and thus it is very suitable to achieve the keyword search over ciphertext. Based on the SE scheme, data owners and authorized users share a secret key. Data owners can encrypt the sensitive data and upload them to the cloud server. If data users want to search the encrypted data, they can generate an encrypted trapdoor by using the query and the secret key. When the cloud server receives the trapdoor, it tests the trapdoor against the encrypted data without decrypting these data and returns the data related to the query to the users. e first searchable symmetric en- cryption (SSE) scheme was proposed by Song et al. [1]. is scheme can not only encrypt data, but also provide a search mechanism over the encrypted data. With the improvement of security and efficiency of SSE schemes, it has attracted the community attention. During recent years, researchers fo- cused on how to construct solutions with complex query functions, such as multikeyword search [2–7], similarity search [8, 9], and ranked search [10–15]. In particular, ranked search schemes can sort the query results according to the relevant degree between the documents and queries Hindawi Security and Communication Networks Volume 2020, Article ID 7298518, 16 pages https://doi.org/10.1155/2020/7298518
Transcript
Page 1: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

Research ArticleEfficient Searchable Symmetric Encryption Supporting DynamicMultikeyword Ranked Search

Yu Zhang 1 Yin Li2 and Yifan Wang3

1School of Computer and Information Technology Xinyang Normal University Xinyang 464000 China2School of Cyberspace Security Dongguan University of Technology Dongguan China3Wayne State University 42 W Warren Ave Detroit MI 48202 USA

Correspondence should be addressed to Yu Zhang willow1223126com

Received 9 January 2020 Revised 10 June 2020 Accepted 24 June 2020 Published 16 July 2020

Academic Editor Stelvio Cimato

Copyright copy 2020 Yu Zhang et al )is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Searchable symmetric encryption that supports dynamic multikeyword ranked search (SSE-DMKRS) has been intensively studiedduring recent years Such a scheme allows data users to dynamically update documents and retrieve the most wanted documentsefficiently Previous schemes suffer from high computational costs since the time and space complexities of these schemes arelinear with the size of the dictionary generated from the dataset In this paper by utilizing a shallow neural network model calledldquoWord2vecrdquo together with a balanced binary tree structure we propose a highly efficient SSE-DMKRS scheme )e ldquoWord2vecrdquotool can effectively convert the documents and queries into a group of vectors whose dimensions are much smaller than the size ofthe dictionary As a result we can significantly reduce the related space and time cost Moreover with the use of the tree-basedindex our scheme can achieve a sublinear search time and support dynamic operations like insertion and deletion Boththeoretical and experimental analyses demonstrate that the efficiency of our scheme surpasses any other schemes of the same kindso that it has a wide application prospect in the real world

1 Introduction

Nowadays with the development of the network and vir-tualization technology cloud computing technology has beendeveloped rapidly )rough the cloud service enterprises andindividuals can obtain better computing and storage servicesat a lower cost Since cloud servers are not entirely trustedutilizing cloud services while maintaining data privacy is anessential concern A straightforward way to address this issueis encrypting the data before outsourcing it to the cloudservers However this approach fails to meet the requirementof data retrieval since traditional encryption will scramble theoriginal data making the data inconvenient to utilize In thisscenario the users have to download all the ciphertext dataand decrypt them locally which will bring huge transmissionstorage and computation overhead which is not applicable incloud environment

Searchable encryption (SE) can support keyword searchwithout decrypting the data and thus it is very suitable to

achieve the keyword search over ciphertext Based on the SEscheme data owners and authorized users share a secret keyData owners can encrypt the sensitive data and upload themto the cloud server If data users want to search the encrypteddata they can generate an encrypted trapdoor by using thequery and the secret key When the cloud server receives thetrapdoor it tests the trapdoor against the encrypted datawithout decrypting these data and returns the data related tothe query to the users )e first searchable symmetric en-cryption (SSE) scheme was proposed by Song et al [1] )isscheme can not only encrypt data but also provide a searchmechanism over the encrypted data With the improvementof security and efficiency of SSE schemes it has attracted thecommunity attention During recent years researchers fo-cused on how to construct solutions with complex queryfunctions such as multikeyword search [2ndash7] similaritysearch [8 9] and ranked search [10ndash15] In particularranked search schemes can sort the query results accordingto the relevant degree between the documents and queries

HindawiSecurity and Communication NetworksVolume 2020 Article ID 7298518 16 pageshttpsdoiorg10115520207298518

and only return the most related (top-k) documents )usranked schemes can significantly reduce the computationand storage costs

)e primary ranked search schemes were proposed in[10 11] which only support a single-keyword search )eearly SSE scheme supporting multikeyword ranked searchwas given by Cao et al [12] )e score evaluation methodused in their scheme is the inner product between the queryand document vectors In their scheme since each docu-ment has its own vector representation the search time islinear with the number of documents in the dataset whichwill have a very high storage overhead for a big data en-vironment)en Sun et al [13] gave a similar scheme with abetter-than-linear search efficiency by using a tree-basedindex [16 17] )ey adopt the technique of term frequency-inverse document frequency (TF-IDF) to evaluate the scorebetween the index and queries To further improve thesearch efficiency and support dynamic update Xia et alproposed an efficient SSE scheme supporting dynamicmultikeyword ranked search [14] In their scheme theyconstruct a tree-based structure and propose a parallelsearch algorithm to accelerate the search process Moreoverthey also provide a dynamic update method to cope with thedeletion and insertion of documents flexibly Recently byutilizing the Bloom filter [18] Guo et al constructed anefficient SSE scheme supporting dynamic multikeywordranked search [15] to further improve the efficiency ofkeywords search and index construction Owing to theBloom filter the internal nodes in the index tree are notneeded to be encrypted and the dimension of the vectors inthe internal nodes is also reduced As a result this schemecan achieve a better performance than the previous similarschemes

Another kind of SE is called searchable public key en-cryption (SPE) which is established on the public keysystem In SSE the key for encrypting data is the same as thekey for generating search trapdoor By contrast in SPE thepublic key for encrypting data is open to public while thesecret key for generating search trapdoor is only given to theauthorized data receivers )e very first SPE scheme sup-porting keyword search was introduced by Boneh et al andit is so-called public key with keyword search (PEKS) [19]However their work only supports a single-keyword searchIn order to support more expressive query many SPEschemes [20ndash23] were proposed to realize advanced searchfor example conjunctive disjunctive and Boolean keywordsearch By using a special hidden structure Xu et al pro-posed two SPE schemes supporting single-keyword search[24 25] whose search performance is very close to that of apractical SSE scheme By converting an attribute-basedencryption scheme Han et al proposed an SPE schemewhich can control userrsquos search permission according to anaccess control policy [26] After this Kai et al proposed anSPE scheme achieving both Boolean keyword search andfine-grained search permission [27] Sepehri et al proposeda scalable proxy-based protocol for privacy-preservingqueries which allows authorized users to perform queriesover data encrypted with different keys [28] Later by uti-lizing an El-Gamal elliptic curve encryption system Sepehri

et al gave a similar scheme with better efficiency [29] Inorder to improve search accuracy Zhang et al proposed anSPE scheme supporting semantic keywords search byadopting a method called ldquoWord2vecrdquo [30] For the sake ofbrevity we summarize some SPE and SSE schemes in Ta-ble 1 which describes the difference between our schemeand previous schemes

11 Motivation )e previous ranked search schemes insymmetric key setting are secure and somewhat efficientHowever the index building trapdoor generation andsearch time are all related to the size of the dictionarygenerated from the dataset which is not suitable for the bigdata environment According to the statistical informationgiven in [20] we found that the vocabulary size in a dataset iscommonly linear with O (106) )erefore it is necessary toconstruct a more efficient ranked search scheme Motivatedby this in this paper we aim to construct a novel SSE schemesupporting dynamic multikeyword ranked search (SSE-DMKRS) with high efficiency

12 Contributions )e main contributions are summarizedas follows

(1) Based on ldquoWord2vecrdquo [31] technique we propose anovel method which can change the documents andqueries into vector representations )e dimensionof the vector representation obtained by our methodis nearly 10 of that in the previous SSE-DMKRSschemes [14 15]

(2) We propose an efficient index building algorithmwhich can create a balanced binary tree to index allthe documents)e obtained index tree can achieve asublinear search time and support dynamic updateoperations

(3) )rough applying the secure k-nearest neighbour(KNN) scheme [32] to encrypt the index tree and thequery we propose an efficient SSE-DMKRS scheme

In addition we implement our scheme on a widely useddata collection )e experiment results show that ourscheme extremely reduces the time cost of index buildingtrapdoor generation keywords search and update withoutlosing too much accuracy eg the time cost of indexbuilding in our scheme is nearly 10 of that in the previousschemes Meanwhile the storage cost of encrypted index isalso reduced greatly eg the storage cost of the index in ourscheme is nearly one percent of that in the previous schemesIn conclusion compared to the previous SSE-DMKRSschemes [14 15] our scheme is very suitable for the mobilecloud environment in which the client device has limitedcomputation and storage resources

13 Organization )is paper is organized as follows InSection 2 we give a formal definition of the system modeland threat model in our scheme and also introduce the toolswe adopt in our scheme which contains ldquoWord2vecrdquo andthe vector space model In Section 3 we present the

2 Security and Communication Networks

construction of the search index tree and the SSE-DMKRSscheme Besides a detailed security analysis and updateoperations of our scheme are also given )eoretical andexperimental analyses are given in Section 4 Section 5 givesthe conclusion

2 Preliminaries

In this section we first give the framework of the systemmodel and introduce the threat model adopted in ourscheme )en we introduce some tools adopted in ourschemes including a famous term representation method inthe field of natural language processing eg ldquoWord2vecrdquoand the vector space model Finally we present the designgoal of our scheme In addition the main notations used inthis paper are summarized in Table 2

21 System Model )e system model contains three dif-ferent roles data owner data user and cloud server)e dataowner outsources a group of documents F f1 f2 fn tothe cloud in ciphertext form C c1 c2 cn Moreoverthe data owner also generates an encrypted searchable indexfor keywords search operation For each query of an arbi-trary keyword set Q the data user computes a searchtrapdoor TQ of the query Q and sends it to the cloud serverUpon receiving TQ from the data user the cloud serversearches against the encrypted index and returns the can-didate encrypted documents After this the data user de-crypts the candidate documents and obtains the plaintext

As illustrated in Figure 1 the architecture of the systemmodel is formally described as follows

(1) Data Owner (DO) DO holds a group of documentsF f1 f2 fn and generates a secure searchableindex I from F and an encrypted document col-lection C for F )en DO uploads I and C to thecloud server and distributes the secret key to theauthorized data users Furthermore DO needs toupdate the index and documents stored in the cloudserver

(2) Data User (DU) Authorized DU can launch key-words query over the encrypted data by utilizing a

trapdoor which is generated by using the secret keyfetched from DO Moreover DU can decrypt theencrypted documents by utilizing the secret key

(3) Cloud Server (CS) CS stores the encrypted index Iand documents C from DO When CS receives thetrapdoor for query Q from DU CS executes key-words query over the index and returns the top-kmost relevant encrypted documents associated withthe query Q Upon receiving the update informationfrom DO CS also performs update operation overthe encrypted data In addition we assume that CS isldquohonest-but-curiousrdquo which is employed by manysearchable encryption schemes [12 14 15] )ismeans that CS honestly and correctly executes thealgorithms in our scheme However CS curiouslyinfers and analyses the received data to obtain extraprivacy information

22 2reat Model )roughout the paper we mainly utilizetwo threat models proposed by Cao et al [12]

(1) Known Ciphertext Model CS only knows the in-formation of the encrypted index ciphertext andtrapdoor )at is to say CS can execute cipher-onlyattacks in this model

(2) Known Background Model CS knows more infor-mation than the known ciphertext model such as thestatistical information inferred from the documentsBy taking advantage of these pieces of statisticalinformation eg term frequency (TF) and inversedocument frequency (IDF) CS can conduct statis-tical attack to verify whether certain keywords are inthe query [33]

23 Design Goals As mentioned before we aim to build asecure and efficient SSE-DMKRS scheme )e design goal ofour scheme is described as follows

(1) Efficiency )e scheme aims to realize a sublinearsearch efficiency and the time and space costs ofindex building and trapdoor generation are muchless than those of the current schemes

Table 1 Comparison between previous SE schemes and ours

Type Ref Query condition Additional special abilities

SSE

[6] Conjunctive keyword search mdash[8] Multikeyword fuzzy search mdash[11] Single-keyword ranked search mdash[12] Multikeyword ranked search mdash[14] Multikeyword ranked search Dynamic updateOurs Multikeyword ranked search Semantic search and dynamic update

SPE

[22] Conjunctive and disjunctive keyword search mdash[23] Boolean keyword search mdash[25] Single-keyword search Fast search[27] Boolean keyword search Access control[29] Multikeyword search Data sharing[30] Multikeyword search Semantic search

Security and Communication Networks 3

(2) Privacy Preserving Similar with previous schemes[12 14 15] our scheme needs to prevent CS fromlearning extra privacy information which is inferredfrom the documents secure index and queries Moreprecisely the privacy requirement is listed as follows

Index and Trapdoor Privacy )e plaintext infor-mation concealed in the index and the trapdoorcannot be leaked to CS )is information involvesthe keywords and the corresponding vector rep-resentation of each keyword Trapdoor Unlinkability CS cannot determine

whether two trapdoors are built from the samequery

Keyword Privacy CS cannot identify whether aspecific keyword is in the trapdoor or index byanalysing the search results and the statisticalinformation of the documents

(3) Dynamic )e scheme can efficiently support dy-namic operations like documents insertion anddeletion Note that the efficiency of update opera-tions in our scheme is better than the previous SSE-DMKRS schemes

24 Word2Vec ldquoWord2Vecrdquo model is a shallow two-layerneural network which is used to convert words into a groupof vector representations [31] Under this model each wordin the document set is mapped to a vector which can be usedto calculate the similarity between words For instanceFigure 2 shows that through training a simple corpus threewords ldquodogrdquo ldquofoxrdquo and ldquoorangerdquo are mapped to three vectorrepresentations respectively By utilizing these vectors thesimilarity among these three words can be calculated Wecan find that the similarity between ldquodogrdquo and ldquofoxrdquo is morethan that between ldquodogrdquo and ldquoorangerdquo since ldquodogrdquo and ldquofoxrdquoare animals )us we can utilize ldquoWord2Vecrdquo to convert thekeywords in a corpus into a group of vector representationsand then apply these vectors to perform ranked search

25 Advice on Equations Vector space model is a verypopular method used in the field of information retrievalusually along with the TF-IDF rule to realize the top-ksearch where TF is term frequency and IDF is the inversedocument frequency [34] By utilizing the TF-IDF rule the

Table 2 Notations

F A document set f1 f2 fnn )e number of documents in FC )e encrypted form of F denoted by c1 c2 cnWi )e keyword set wi1 wi2 witi

for the document fiti )e number of keywords in Wi and i isin [1 n]wij )e jth keywords in Wi and i isin [1 n] j isin [1 ti]

Wi

rarr)e vector representation for Wi

wijrarr )e vector representation for wiju A node in the index treeurarr )e vector representation for the node uuminrarr umax

rarr Vector representations are obtained by splitting urarr

Iu )e encrypted index for the node uIT )e encrypted index tree of FQ )e keyword set q1 q2 qt for queryqj A keyword in Q j isin [1 t]qrarr )e vector representation for query Qqminrarr qmax

rarr Vector representations are obtained by splitting qrarr

qjrarr )e vector representation for qjTQ )e trapdoor of QD A dictionary w1 w2 wm containing all keywords in FM11 M12 M21 M22 Matrices for encryption (encryption key)Mminus1

11 Mminus112 Mminus1

21 Mminus122 Matrices for decryption (decryption key)

N )e number of semantic keywords associated with each dictionaryrsquos keywordm )e number of keywords in Dd )e dimension of vector generated by using ldquoWord2vecrdquok )e number of files returned to the user

Secret key

Data owners Data users

Encrypted e-mails

Encrypted index

Search results

TrapdoorSemitrusted cloud server

Figure 1 System model of the keywords search over encrypteddata

4 Security and Communication Networks

documents and queries can be represented as a group ofvectors )ese vectors can be adopted in the top-k searchover the ciphertext [12 14 15] However the dimension ofthese obtained vectors is linear with the number of words inthe dataset which is not efficient if the dataset has a lot ofwords To address this issue we will apply ldquoWord2Vecrdquo topresent a novel keywords conversion method which isdescribed as follows

(1) )rough applying ldquoWord2Vecrdquo to a corpus wecreate a dictionary in which each keyword is asso-ciated with a vector representation

(2) For the keyword set Wi wi1 wi2 witi of the

document fi we obtain a vector xirarr

wi1rarr

+ wi2rarr

+ middot middot middot +

witi

rarr by looking up the dictionary where wijrarr is a

vector representation for wij and i isin [1 n] j isin [1 ti]

After this we set Wi

rarr xi

rarrxirarr

as the vector repre-sentation of Wi

(3) For the query keyword set Q q1 q2 qt weutilize the dictionary to construct a vectorvrarr

q1rarr

+ q2rarr

+ middot middot middot + qtrarr )en we set Q

rarr v

rarr vrarr

asthe vector representation of Q

Note that the dimensions ofWi andQ are very small eg200 which is significantly smaller than the number of wordsin the dataset )us the proposed method is better than theprevious method based on the TF-IDF rule In additiontogether with the vector space model mentioned above weuse the cosine measure to evaluate the relevance between thedocument and the query )e relevant evaluation function isdefined in the next section

3 Proposed Scheme

In this section we first give the algorithms of the index treebuilding and the search algorithm on this tree )en we give

the concrete construction of our scheme and the dynamicupdate operations of our scheme Finally we give a detailedanalysis of the security of our scheme

31 Search Index Balanced Binary Tree In this section weadopt a balanced binary tree to create the search indexwhich will be used in our main scheme Inspired by theconstruction process in [14] the tree building and the searchprocess for our scheme are described as follows

311 Tree Building Process Formally the data structure ofthe tree node u is defined as ule ID umin

rarr umaxrarr

Pl Pr FIDgt where ID is the identity of the node u umin

rarr and umaxrarr are the

vector representations of the node u Pl and Pr are pointerswhich point ursquos left and right children respectively and FIDstores the identity of a document if u is a leaf node Note thatcompared with the previous index trees [12 14 15] the nodein our tree has two vectors while it has only one vector inprevious trees)emain reason is that the node vector in ourtree has a negative number while the node vector in previoustrees only contains positive number For clarity we give asimple example Let a

rarr 01 02 minus03 and

brarr

minus05 0 03 be two vectors of leaf nodes A and Brespectively For the previous index trees the vector of theparent node C of these two leaf nodes is c

rarr 01 02 03 in

which the value of each dimension is the larger value of ararr

and brarr For a query vector v

rarr minus1 0 minus1 the scores of the

nodes A B and C are 02 02 and minus04 respectively It isvery important to note that the score of the parent node isless than the scores of its children which causes the fact thatthese two leaf nodes will be ignored in the tree search processeven if they should be considered

In our index tree let the dimensions of uminrarr and umax

rarr beboth d )e methods for constructing umin

rarr and umaxrarr are

denoted by M1 and M2 respectively and given as follows

(1) M1 if the node u is a leaf node which is corre-sponding a file f we create a vector u

rarr for f byadopting the keywords conversion method men-tioned in Section 25 )en we set umin

rarr u

rarr andumaxrarr

urarr

(2) M2 if the node u is an internal node the uminrarr and

umaxrarr are based on its children vectors Let Pl middot umin

rarr

and Pl middot umaxrarr be the two vectors of ursquos left child and

let Pr middot uminrarr and Pr middot umax

rarr be the two vectors of ursquosright child

Suppose that Min () and Max () are the functions of theminimum and maximum respectively umin

rarr is built asfollows

uminrarr

[i] Min Pl middot uminrarr

[i] Pr middot uminrarr

[i]( 1113857 i isin [1 d] (1)

And umaxrarr is built as follows

umaxrarr

[i] Max Pl middot umaxrarr

[i] Pr middot umaxrarr

[i]( 1113857 i isin [1 d] (2)

We find that umaxrarr is built by utilizing the larger number

of Pl middot umaxrarr and Pr middot umax

rarr and uminrarr is created by using the

smaller number of Pl middot uminrarr and Pr middot umin

rarr

Dog

Fox

Orange

Figure 2 A vector space representation of words shows that ldquodogrdquois closer to ldquofoxrdquo since they share more common attributes thanldquodogrdquo and ldquoorangerdquo

Security and Communication Networks 5

An illustration of the above methods is given in Figure 3From Figure 3 let the node u be a leaf node and letW be thekeyword set of the file that u stores By using the keywordconversion method W is converted to be a vectorurarr

minus02 02 05 minus07 08 )en we set uminrarr

umaxrarr

urarr

a If the node u is an internal node and the vectors of itschildren are Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr and

the vectors of the internal node are uminrarr

minus02 minus02 minus03 minus07 minus05 and umaxrarr

03 07

03 minus01 03Based on the methods M1 and M2 inspired by the tree

building algorithm introduced in [14] our tree buildingalgorithm is given in Algorithm 1 An example of theproposed index tree is given in Example 1 and Figure 4 InAlgorithm 1 we use function GenID () to generate theunique identity ID for each node and apply GenFID () togenerate the unique file ID for each leaf node Current-NodeSet contains a group of nodes having no parent nodewhich are needed to be processed |CurrentNodeSet| is thenumber of nodes in CurrentNodeSet If |CurrentNodeSet| iseven we assume that |CurrentNodeSet| 2h otherwise weassume that |CurrentNodeSet| 2h+ 1 where h is a positivenumber TempNodeSet is a set containing the newly gen-erated nodes Moreover for each node u if u is a leaf nodewe use methodM1 to generate umin

rarr and umaxrarr otherwise umin

rarr

and umaxrarr are created by using M2

312 Search Process For a query vector qrarr of query Q

we spilt qrarr into two vectors qmin

rarr and qmaxrarr For each di-

mension i isin [1 d] if qrarr

[i]lt 0 qminrarr

[i] qrarr

[i] andqmaxrarr

[i] 0 otherwise qminrarr

[i] 0 and qmaxrarr

[i] qrarr

[i]Obviously qmin

rarr holds all the negative part of qrarr while qmax

rarr

holds the positive part For clarity we denote this splittingmethod for query Q byM3 )e illustration of this method isgiven in Figure 5 If the query vector q

rarr 01 minus02

03 minus04 05 then qminrarr

0 minus02 0 minus04 0 andqmaxrarr

01 0 03 0 05 For a query Q and a node u the score is calculated as

Score(u Q) uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

(3)

We can utilize the above equation to evaluate whichdocuments are the most related to the query Moreover wecan verify that the score of the parent node is larger than itschildrenrsquos score )is property can significantly reduce thenumber of nodes which will be checked in the searchprocess

)e search process is given in Algorithm 2 In Algo-rithm 2 we use RList to store the top-k files which have thek-largest relevance scores to the query )e RList is ini-tialized to be an empty list and it is updated when finding arelevance file )e kth score is defined as the smallest rel-evance score in the current RList which is initialized to be avery small integer By using the kth score we can acceleratethe search process by ignoring some paths with low scoresIn Example 1 and Figure 4 an illustration of the searchprocess is given where F f1 f2 f6 query vectors areqminrarr

0 minus03 0 and qmaxrarr

01 0 02 and d (vector di-mension) is 3

313 Example 1 An example of an index tree and a searchprocess on this tree is illustrated in Figure 4 In Figure 4 weshow an index tree with F f1 f2 f6 in which thedimension of the vector for each node is 3 For each node uin the tree the upper vector and lower vector are corre-sponding to umin

rarr and umaxrarr respectively In the tree building

process we first generate the leaf nodes from F and thencreate the internal nodes based on these leaf nodes

Moreover Figure 4 also gives an illustration of the searchprocess In Figure 4 we set q

rarr 01 minus03 02 and split it

into qminrarr

0 minus03 0 and qmaxrarr

01 0 02 We supposethat top-3 files will be returned to the data user According toAlgorithm 2 the search process begins with the root node rand calculates the score between the query Q and the twochild nodes r11 and r12 of r by using equation (3) )ecalculation process is presented as follows

Score r11 Q( 1113857 minus04 minus02 minus03 middot 00 minus03 00

+ 03 03 08 middot 01 00 02 025

Score r12 Q( 1113857 minus05 07 minus02 middot 00 minus03 00

+ 05 09 08 middot 01 00 02 0

(4)

Because the score between r11 and Q is higher than thatbetween r12 andQ Algorithm 2 will traverse the subtree withr11 as the root node and compute the score between thequery Q and two child nodes of r11 Since the score betweenr21 and Q is higher than that between r22 and Q Algorithm 2will traverse the subtree with r21 as the root node and add theleaf nodes f1 f2 to the RList After this the subtree with r22 asthe root node will be traversed and the leaf nodes f3 and f4are reached Since the number of files in RList is less than 3f3 is added to RList directly For the file f4 since the numberof files in RList equals 3 now Algorithm 2 will compare thescore between f4 and Q to the minimum score in the RListBecause the score between f4 and Q is smaller than theminimum score in the RList f4 is not added to the RList Atpresent the subtree with r11 as the root node has beentraversed Algorithm 2 will traverse the subtree with r12 asthe root node As the score between r12 andQ is smaller thanthe minimum score in the RList which means that the scoreof all child nodes of r12 is smaller than the minimum score inthe RList (this property is described in Section 312) f5 andf6 will not be checked )erefore Algorithm 2 outputsRList f1 f2 f3

32 Construction of SSE-DMKRS In this section throughcombining the secure KNN algorithm [32] and the indextree building algorithm we propose a concrete SSE-DMKRSscheme )e SSE-DMKRS scheme consists of five algo-rithms )e algorithms KeyGen DictionaryBuild andIndexBuild are executed by the data owners while the al-gorithms TrapdoorGen and Search are performed by the datausers and the cloud server respectively

(i) KeyGen (λ) given a security parameter λ this al-gorithm first randomly chooses four d times d invertiblematrices M11 M12 M21 and M22 where d is the

6 Security and Communication Networks

ndash02 02 05

05

ndash07 08

08

W

ndash02 02 05 ndash07 08 ndash02 02 ndash07

u

umin umax

(a)

ndash01 ndash02 02 ndash07 ndash05 ndash02 04 02 ndash01 03

ndash02 ndash02 ndash03 ndash01 02 03 07 03 ndash02 ndash01

ndash02 ndash02 ndash03 ndash07 ndash05 03 07 03 ndash01 03

plumin

prumin prumax

plumax

umin umax

(b)

Figure 3 An example of the vectors generation of the node u (a) MethodM1 u is a leaf nodeW is a keyword set for the file which u storesand u

rarr is a vector generated by adopting the keyword conversion method mentioned in Section 25 (b) Method M2 u is an internal nodeand Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr are the vectors of its children

Input the document collection F f1 f2 fn a semantic dictionary D generated by applying ldquoWord2Vecrdquo to FOutput the index tree T

(1) for each i isin [1 n] do(2) Construct a leaf node u for fi with uIDGenID () uPl uPr NULL uFIDGenFID (fi) and generate umin

rarr and umaxrarr

according to the method M1(3) Insert u to CurrentNodeSet(4) end for(5) while |CurrentNodeSet|ge 1 do(6) if |CurrentNodeSet| is even ie 2h then(7) for each pair of nodes uprime and uPrime in CurrentNodeSet do(8) Create a parent node u for uprime and uPrime with uIDGenID () uPl uprime u Pr uPrime uFIDNULL and set umin

rarr and umaxrarr

according to the method M2(9) Insert u to TempNodeSet(10) end for(11) else Suppose that |CurrentNodeSet| 2h+ 1(12) for each pair of nodes uprime and uPrime of the former 2hminus 2 nodes in CurrentNodeSet do(13) Create a parent node u for uprime and uPrime(14) Insert u to TempNodeSet(15) end for(16) Create a parent node u1 for the (2hminus 1)-th and (2h)-th nodes and then generate a parent node u for the (2h+ 1)-th node and u1(17) Insert u to TempNodeSet(18) end if(19) Set CurrentNodeSetTempNodeSet and clear TempNodeSet(20) end while(21) return CurrentNodeSet(22) Note that the CurrentNodeSet only contains one node which is the root of the index tree T

ALGORITHM 1 BuildIndexTree (FileSet F Dictionary D)

Security and Communication Networks 7

dimension of uminrarr and umax

rarr )en it randomlygenerates a d-bit vector S Finally it outputs thesecret key sk S M11 M12 M21 M22

(ii) DictionaryBuild (F) given the document set F

f1 f2 fn the algorithm runs ldquoWord2vecrdquo togenerate the dictionary D of F In the dictionaryD each keyword is associated with a vectorrepresentation Besides each keyword is alsocorresponding with a set of semantically relatedkeywords

(iii) IndexBuild (sk F D) given the document set F andthe dictionaryD for F the algorithm first creates theindex tree T by using the algorithm BuildIndexTree(F D) (Algorithm 1) )en for each node u in thetree T the algorithm generates two random vector

pairs Vuminprime VuminPrime1113966 1113967 and Vumax

Prime VumaxPrime1113966 1113967 for the vec-

tors of uminrarr and umax

rarr respectively More precisely ifS [i] 0 it sets Vumin

prime [i] VuminPrime [i] umin

rarr[i] and

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S [i] 1Vuminprime VuminPrime Vumaxprime VumaxPrime are set as four random

values under the constraints Vuminprime [i] Vumin

Prime [i]

uminrarr

[i] and Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] )is pro-cess is expressed as the following equation

Vuminprime [i] Vumin

Prime [i] uminrarr

[i] if S[i] 0

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S[i] 0

Vuminprime [i] + Vumin

Prime [i] uminrarr

[i] if S[i] 1

Vumaxprime [i] + Vumax

Prime [i] umaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(5)

Finally for each node u it computes Iu MT11Vuminprime 1113966

MT12VuminPrime MT

21Vumaxprime MT

22VumaxPrime )rough replacing the

plaintext vectors uminrarr and umax

rarr with the encrypted index Iuan encrypted index tree IT is created

(iv) TrapdoorGen (sk Q) given a query keyword set Qthe algorithm first extends Q to a new semantickeyword set Qprime )e process is as follows

(a) It generates a new keyword set Qprime which isinitialized to an empty set

r

ndash05 ndash02 ndash03

05 09 08

ndash04 ndash02 ndash03

03 03 08

ndash05 07 ndash02

05 09 08

ndash02

03 08

ndash02

03 03

ndash0209

05 07 08

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

05 07 08

ndash05

ndash05

ndash0209

ndash01ndash02

01 00 02

00 ndash03 0001 ndash03 02

ndash02 ndash03

qqmin

qmax

r11 r12

r21 r22

f1 f2 f3 f4

f5 f6

umin

umax

(1)

(2)

(4) (3) (5)(6)

(7)

ndash04 07

Figure 4 An example of Algorithm 1 and Algorithm 2 (example 1)

Q

01 ndash02 03 ndash04 05

01 0 03 0 050 ndash02 0 ndash04 0

q

qmin qmax

Figure 5 An example of the vector generation of the query QMethod M3 q

rarr is a vector generated by adopting the keywordconversion method mentioned in Section 25 qmin

rarr holds all thenegative part of q

rarr while qmaxrarr holds the positive part

8 Security and Communication Networks

(b) Note that each keyword in the dictionary isassociated with a group of keywords semanti-cally related to this keyword For each keyword qin Q it randomly chooses kprime semantic keywordsbased on the dictionary and inserts these key-words into Qprime where kprime is chosen dynamicallyand k isin [1 N]

)en based onQprime the TrapdoorGen algorithm generatesa pair of vectors qmin

rarr and qmaxrarr by adopting the method M3

After this it generates two random vector pairs Qqminprime QqminPrime1113966 1113967

and Qqmaxprime QqmaxPrime1113966 1113967 for the vectors of qmin

rarr and qmaxrarr respec-

tively)is process is similar to the process in the IndexBuildalgorithm and can be expressed as the following equations

Qqminprime [i] Qqmin

Prime [i] qminrarr

[i] if S[i] 0

Qqmaxprime [i] Qqmax

Prime [i] qmaxrarr

[i] if S[i] 0

Qqminprime [i] + Qqmin

Prime [i] qminrarr

[i] if S[i] 1

Qqmaxprime [i] + Qqmax

Prime [i] qmaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(6)

Finally this algorithm generates TQ Mminus111Qqminprime 1113966

Mminus112QqminPrime Mminus1

21Qqmaxprime Mminus1

22QqmaxPrime as the trapdoor for Q

(v) Search (sk TQ IT) for each node u in IT the al-gorithm computes

Iu middot TQ MT11Vuminprime middot M

minus111Qqminprime1113872 1113873 + M

T12VuminPrime middot M

minus112QqminPrime1113872 1113873

+ MT21Vumaxprime middot M

minus121Qqmaxprime1113872 1113873 + M

T22VumaxPrime middot M

minus122QqmaxPrime1113872 1113873

Vuminprime middot Qqminprime1113872 1113873 + Vumin

Prime middot QqminPrime1113872 1113873 + Vumax

prime middot Qqmaxprime1113872 1113873

+ VumaxPrime middot QqmaxPrime1113872 1113873

uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

Score(u Q)

(7)

According to equation (3) the relevance score calculatedfrom the encrypted vector Iu and the trapdoor TQ equals thevalue of Score (u Q) By using this property the algorithmcan utilize the SearchIndexTree algorithm (Algorithm 2) toperform ranked search

33 Dynamic Update Operations Besides search operationthe proposed scheme also supports some dynamic opera-tions eg documents insertion and deletion satisfying therequirement of real-world application Because the proposedscheme is built over a balanced binary tree the updateoperations are realized by modifying the nodes in the treeInspired by the update method introduced in [14 15] theupdate algorithm is presented as follows

(i) UpdateInfoGen (sk Ts fi Utype) this algorithm isexecuted by the data owners and generates the up-date information Is ci to the cloud server where Tsis a set containing all the update nodes Is is anencrypted form of Ts fi is the target document ci isan encrypted form of fi and Utype is the update typeIn order to reduce the communication cost the dataowners will store the unencrypted index tree on itsown device For the Utype isin Ins Del the algorithmworks as follows

(a) If Utype ldquoDelrdquo it means that the algorithm willdelete a document fi from the tree )e algorithmfirst finds the leaf node associated with thedocument fi and deletes it In addition internalnodes associated with this leaf node are also addedto Ts Specifically if the deletion operation willbreak the balance of the index tree the algorithmcan set the target leaf node as a fake node insteadof removing it After this the algorithm encryptsTs to generate Is Finally the algorithm sends Is tothe cloud server and sets ci as null

Input A vector qrarr of query Q a semantic dictionary D generated by applying ldquoWord2Vecrdquo to F a root node u of IndexTree and

RListOutput RList

(1) Split qrarr into qmin

rarr and qmaxrarr according to the method M3

(2) if u is an internal node then(3) if Score (u Q)gt k-th score then(4) SearchIndexTree ( q

rarr D uPl RList)(5) SearchIndexTree ( q

rarr D uPr RList)(6) else(7) return(8) end if(9) else(10) if Score (u Q)gt k-th score then Update RList(11) Delete the element holding the smallest relevance score in RList(12) Insert a new element ltScore (u Q) uFIDgt in the Rlist and sort the elements in RList(13) end if(14) return(15) end if

ALGORITHM 2 SearchIndexTree (QueryVector qrarr Dictionary D TreeNode u RList)

Security and Communication Networks 9

(b) If Utype ldquoInsrdquo it means that the algorithm willinsert a document fi to the tree )e algorithmfirst creates a leaf node for fi according to themethodM1 introduced in Section 31 and insertsthis leaf node to Ts )en based on the methodM2 the algorithm updates the vectors of theinternal nodes which are placed on the path fromroot to the new leaf node and inserts these in-ternal nodes to Ts Here the algorithm prefers toreplace the fake leaf node with the new leaf noderather than insert a new leaf node Finally thealgorithm encrypts Ts and fi to generate Is and cirespectively and sends them to the cloud server

(ii) Update (IT C Is ci Utype) this algorithm is exe-cuted by the cloud server to update the index tree ITwith encrypted nodes set Is After this ifUtype ldquoDelrdquo then the algorithm removes ci fromCOtherwise the algorithm inserts ci to C

Note that after a period of insertion and deletion op-erations the number of keywords in the dictionary should bechanged Because the dimensions of the index and trapdoorvectors in the previous schemes are linear with the numberof keywords in the dictionary these schemes have to rebuildthe search index tree By contrast our scheme will not beaffected by this problem For the proposed scheme thedimensions of the vectors in the index and trapdoor aredetermined by the tool of ldquoWord2vecrdquo and set by the usersFor example if we set the dimension of the vector as 200 thedimension of each keywordrsquos vector is 200 and thus thedimensions of the vectors of umin

rarr umaxrarr qmin

rarr and qmaxrarr are all

200 According to the above analysis our scheme is moresuitable for the update operations than the previousschemes

34 SecurityAnalysis In this section we analyse the securityof the proposed SSE-DMKRS scheme according to theprivacy requirement introduced in Section 23

(1) Index and Trapdoor Privacy In the proposed schemeeach node u in the index tree and the query Q in thetrapdoor are encrypted by using the secure KNNalgorithm introduced in [32] )us the attackerscannot obtain the original vectors in the tree nodesand the query which means that the index andtrapdoor privacy are well protected

(2) Trapdoor Unlinkability In the trapdoor generationphase the query vector will be split randomlyMoreover the same keyword set Q will be extendedto bemultiple different semantic keyword setsQprime Sothe same query Q will be encrypted to be differenttrapdoors which means that the goal of the trapdoorunlinkability is achieved

(3) Keyword Privacy Since the index and the trapdoorare protected by the secure KNN algorithm theadversary cannot infer the plaintext informationfrom the index and the trapdoor under the knownciphertext model Considering that the known

background model is common in real-world appli-cations we will analyse the security of the proposedscheme under the known backgroundmodel For theTrapdoorGen algorithm the original query keywordsetQ is extended to a new setQprime Specifically for eachkeyword q inQ randomly choosing a number kprime thealgorithm chooses kprime semantic keywords related to qby utilizing the dictionary and inserts these keywordsinto the Qprime Suppose that each keyword is associatedwith N semantic keywords in the dictionary eachkeyword can generate 2N different keyword sets sinceeach semantic keyword can be chosen or not Forexample if a keyword q is associated with threesemantic keywords q1 q2 q3 then q can generate 23keyword sets q q q1 q q2 q q3 q q1 q2 qq1 q3 q q2 q3 and q q1 q2 q3 Since the queryQusually contains more than one keyword Q willgenerate more than 2N different semantic keywordsets According to this method the final similarityscore is obfuscated by these random semantic key-word sets As the analysis in [14 15] our scheme canprotect the keyword privacy under the knownbackground model

4 Proposed Scheme

In this section we analyse the proposed SSE-DMKRSscheme theoretically and experimentally A detailed ex-periment is given to demonstrate that our scheme can ef-ficiently perform dynamic ranked keywords search over theencrypted data Our experiment is run on Intelreg Coretrade i7CPU at a 290GHz processor and 16GB memory size and isbased on a real-world e-mail dataset called Enron e-maildataset [35] We mainly analyse the performance of ourscheme in two aspects (1) the efficiency of the proposedscheme including index building trapdoor generationsearch and update (2) the relationship between the searchprecision and the privacy level Moreover in order to showthe advantages of our scheme we also compare our schemeto two previous schemes related to our scheme For sim-plicity we denote these two schemes introduced in [14 15]by X15 and G19

41 Efficiency

411 Index Building )e process of index building mainlyconsists of two steps (1) creating an unencrypted index treeby utilizing Algorithm 1 (2) encrypting each node in the treeby using the secure KNN scheme In the tree building stepAlgorithm 1 generates O (n) nodes based on the documentset F Because each node has two vectors umin

rarr umaxrarr whose

dimensions are both d the vector splitting process needs O(d) time and the matrix multiplication operations take O(d times d) time in the encryption step According to these twosteps the whole time complexity of index building isO (nd2)which means that the time cost for index building mainlydepends on the number of documents in F and the di-mension of each nodersquos vector

10 Security and Communication Networks

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 2: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

and only return the most related (top-k) documents )usranked schemes can significantly reduce the computationand storage costs

)e primary ranked search schemes were proposed in[10 11] which only support a single-keyword search )eearly SSE scheme supporting multikeyword ranked searchwas given by Cao et al [12] )e score evaluation methodused in their scheme is the inner product between the queryand document vectors In their scheme since each docu-ment has its own vector representation the search time islinear with the number of documents in the dataset whichwill have a very high storage overhead for a big data en-vironment)en Sun et al [13] gave a similar scheme with abetter-than-linear search efficiency by using a tree-basedindex [16 17] )ey adopt the technique of term frequency-inverse document frequency (TF-IDF) to evaluate the scorebetween the index and queries To further improve thesearch efficiency and support dynamic update Xia et alproposed an efficient SSE scheme supporting dynamicmultikeyword ranked search [14] In their scheme theyconstruct a tree-based structure and propose a parallelsearch algorithm to accelerate the search process Moreoverthey also provide a dynamic update method to cope with thedeletion and insertion of documents flexibly Recently byutilizing the Bloom filter [18] Guo et al constructed anefficient SSE scheme supporting dynamic multikeywordranked search [15] to further improve the efficiency ofkeywords search and index construction Owing to theBloom filter the internal nodes in the index tree are notneeded to be encrypted and the dimension of the vectors inthe internal nodes is also reduced As a result this schemecan achieve a better performance than the previous similarschemes

Another kind of SE is called searchable public key en-cryption (SPE) which is established on the public keysystem In SSE the key for encrypting data is the same as thekey for generating search trapdoor By contrast in SPE thepublic key for encrypting data is open to public while thesecret key for generating search trapdoor is only given to theauthorized data receivers )e very first SPE scheme sup-porting keyword search was introduced by Boneh et al andit is so-called public key with keyword search (PEKS) [19]However their work only supports a single-keyword searchIn order to support more expressive query many SPEschemes [20ndash23] were proposed to realize advanced searchfor example conjunctive disjunctive and Boolean keywordsearch By using a special hidden structure Xu et al pro-posed two SPE schemes supporting single-keyword search[24 25] whose search performance is very close to that of apractical SSE scheme By converting an attribute-basedencryption scheme Han et al proposed an SPE schemewhich can control userrsquos search permission according to anaccess control policy [26] After this Kai et al proposed anSPE scheme achieving both Boolean keyword search andfine-grained search permission [27] Sepehri et al proposeda scalable proxy-based protocol for privacy-preservingqueries which allows authorized users to perform queriesover data encrypted with different keys [28] Later by uti-lizing an El-Gamal elliptic curve encryption system Sepehri

et al gave a similar scheme with better efficiency [29] Inorder to improve search accuracy Zhang et al proposed anSPE scheme supporting semantic keywords search byadopting a method called ldquoWord2vecrdquo [30] For the sake ofbrevity we summarize some SPE and SSE schemes in Ta-ble 1 which describes the difference between our schemeand previous schemes

11 Motivation )e previous ranked search schemes insymmetric key setting are secure and somewhat efficientHowever the index building trapdoor generation andsearch time are all related to the size of the dictionarygenerated from the dataset which is not suitable for the bigdata environment According to the statistical informationgiven in [20] we found that the vocabulary size in a dataset iscommonly linear with O (106) )erefore it is necessary toconstruct a more efficient ranked search scheme Motivatedby this in this paper we aim to construct a novel SSE schemesupporting dynamic multikeyword ranked search (SSE-DMKRS) with high efficiency

12 Contributions )e main contributions are summarizedas follows

(1) Based on ldquoWord2vecrdquo [31] technique we propose anovel method which can change the documents andqueries into vector representations )e dimensionof the vector representation obtained by our methodis nearly 10 of that in the previous SSE-DMKRSschemes [14 15]

(2) We propose an efficient index building algorithmwhich can create a balanced binary tree to index allthe documents)e obtained index tree can achieve asublinear search time and support dynamic updateoperations

(3) )rough applying the secure k-nearest neighbour(KNN) scheme [32] to encrypt the index tree and thequery we propose an efficient SSE-DMKRS scheme

In addition we implement our scheme on a widely useddata collection )e experiment results show that ourscheme extremely reduces the time cost of index buildingtrapdoor generation keywords search and update withoutlosing too much accuracy eg the time cost of indexbuilding in our scheme is nearly 10 of that in the previousschemes Meanwhile the storage cost of encrypted index isalso reduced greatly eg the storage cost of the index in ourscheme is nearly one percent of that in the previous schemesIn conclusion compared to the previous SSE-DMKRSschemes [14 15] our scheme is very suitable for the mobilecloud environment in which the client device has limitedcomputation and storage resources

13 Organization )is paper is organized as follows InSection 2 we give a formal definition of the system modeland threat model in our scheme and also introduce the toolswe adopt in our scheme which contains ldquoWord2vecrdquo andthe vector space model In Section 3 we present the

2 Security and Communication Networks

construction of the search index tree and the SSE-DMKRSscheme Besides a detailed security analysis and updateoperations of our scheme are also given )eoretical andexperimental analyses are given in Section 4 Section 5 givesthe conclusion

2 Preliminaries

In this section we first give the framework of the systemmodel and introduce the threat model adopted in ourscheme )en we introduce some tools adopted in ourschemes including a famous term representation method inthe field of natural language processing eg ldquoWord2vecrdquoand the vector space model Finally we present the designgoal of our scheme In addition the main notations used inthis paper are summarized in Table 2

21 System Model )e system model contains three dif-ferent roles data owner data user and cloud server)e dataowner outsources a group of documents F f1 f2 fn tothe cloud in ciphertext form C c1 c2 cn Moreoverthe data owner also generates an encrypted searchable indexfor keywords search operation For each query of an arbi-trary keyword set Q the data user computes a searchtrapdoor TQ of the query Q and sends it to the cloud serverUpon receiving TQ from the data user the cloud serversearches against the encrypted index and returns the can-didate encrypted documents After this the data user de-crypts the candidate documents and obtains the plaintext

As illustrated in Figure 1 the architecture of the systemmodel is formally described as follows

(1) Data Owner (DO) DO holds a group of documentsF f1 f2 fn and generates a secure searchableindex I from F and an encrypted document col-lection C for F )en DO uploads I and C to thecloud server and distributes the secret key to theauthorized data users Furthermore DO needs toupdate the index and documents stored in the cloudserver

(2) Data User (DU) Authorized DU can launch key-words query over the encrypted data by utilizing a

trapdoor which is generated by using the secret keyfetched from DO Moreover DU can decrypt theencrypted documents by utilizing the secret key

(3) Cloud Server (CS) CS stores the encrypted index Iand documents C from DO When CS receives thetrapdoor for query Q from DU CS executes key-words query over the index and returns the top-kmost relevant encrypted documents associated withthe query Q Upon receiving the update informationfrom DO CS also performs update operation overthe encrypted data In addition we assume that CS isldquohonest-but-curiousrdquo which is employed by manysearchable encryption schemes [12 14 15] )ismeans that CS honestly and correctly executes thealgorithms in our scheme However CS curiouslyinfers and analyses the received data to obtain extraprivacy information

22 2reat Model )roughout the paper we mainly utilizetwo threat models proposed by Cao et al [12]

(1) Known Ciphertext Model CS only knows the in-formation of the encrypted index ciphertext andtrapdoor )at is to say CS can execute cipher-onlyattacks in this model

(2) Known Background Model CS knows more infor-mation than the known ciphertext model such as thestatistical information inferred from the documentsBy taking advantage of these pieces of statisticalinformation eg term frequency (TF) and inversedocument frequency (IDF) CS can conduct statis-tical attack to verify whether certain keywords are inthe query [33]

23 Design Goals As mentioned before we aim to build asecure and efficient SSE-DMKRS scheme )e design goal ofour scheme is described as follows

(1) Efficiency )e scheme aims to realize a sublinearsearch efficiency and the time and space costs ofindex building and trapdoor generation are muchless than those of the current schemes

Table 1 Comparison between previous SE schemes and ours

Type Ref Query condition Additional special abilities

SSE

[6] Conjunctive keyword search mdash[8] Multikeyword fuzzy search mdash[11] Single-keyword ranked search mdash[12] Multikeyword ranked search mdash[14] Multikeyword ranked search Dynamic updateOurs Multikeyword ranked search Semantic search and dynamic update

SPE

[22] Conjunctive and disjunctive keyword search mdash[23] Boolean keyword search mdash[25] Single-keyword search Fast search[27] Boolean keyword search Access control[29] Multikeyword search Data sharing[30] Multikeyword search Semantic search

Security and Communication Networks 3

(2) Privacy Preserving Similar with previous schemes[12 14 15] our scheme needs to prevent CS fromlearning extra privacy information which is inferredfrom the documents secure index and queries Moreprecisely the privacy requirement is listed as follows

Index and Trapdoor Privacy )e plaintext infor-mation concealed in the index and the trapdoorcannot be leaked to CS )is information involvesthe keywords and the corresponding vector rep-resentation of each keyword Trapdoor Unlinkability CS cannot determine

whether two trapdoors are built from the samequery

Keyword Privacy CS cannot identify whether aspecific keyword is in the trapdoor or index byanalysing the search results and the statisticalinformation of the documents

(3) Dynamic )e scheme can efficiently support dy-namic operations like documents insertion anddeletion Note that the efficiency of update opera-tions in our scheme is better than the previous SSE-DMKRS schemes

24 Word2Vec ldquoWord2Vecrdquo model is a shallow two-layerneural network which is used to convert words into a groupof vector representations [31] Under this model each wordin the document set is mapped to a vector which can be usedto calculate the similarity between words For instanceFigure 2 shows that through training a simple corpus threewords ldquodogrdquo ldquofoxrdquo and ldquoorangerdquo are mapped to three vectorrepresentations respectively By utilizing these vectors thesimilarity among these three words can be calculated Wecan find that the similarity between ldquodogrdquo and ldquofoxrdquo is morethan that between ldquodogrdquo and ldquoorangerdquo since ldquodogrdquo and ldquofoxrdquoare animals )us we can utilize ldquoWord2Vecrdquo to convert thekeywords in a corpus into a group of vector representationsand then apply these vectors to perform ranked search

25 Advice on Equations Vector space model is a verypopular method used in the field of information retrievalusually along with the TF-IDF rule to realize the top-ksearch where TF is term frequency and IDF is the inversedocument frequency [34] By utilizing the TF-IDF rule the

Table 2 Notations

F A document set f1 f2 fnn )e number of documents in FC )e encrypted form of F denoted by c1 c2 cnWi )e keyword set wi1 wi2 witi

for the document fiti )e number of keywords in Wi and i isin [1 n]wij )e jth keywords in Wi and i isin [1 n] j isin [1 ti]

Wi

rarr)e vector representation for Wi

wijrarr )e vector representation for wiju A node in the index treeurarr )e vector representation for the node uuminrarr umax

rarr Vector representations are obtained by splitting urarr

Iu )e encrypted index for the node uIT )e encrypted index tree of FQ )e keyword set q1 q2 qt for queryqj A keyword in Q j isin [1 t]qrarr )e vector representation for query Qqminrarr qmax

rarr Vector representations are obtained by splitting qrarr

qjrarr )e vector representation for qjTQ )e trapdoor of QD A dictionary w1 w2 wm containing all keywords in FM11 M12 M21 M22 Matrices for encryption (encryption key)Mminus1

11 Mminus112 Mminus1

21 Mminus122 Matrices for decryption (decryption key)

N )e number of semantic keywords associated with each dictionaryrsquos keywordm )e number of keywords in Dd )e dimension of vector generated by using ldquoWord2vecrdquok )e number of files returned to the user

Secret key

Data owners Data users

Encrypted e-mails

Encrypted index

Search results

TrapdoorSemitrusted cloud server

Figure 1 System model of the keywords search over encrypteddata

4 Security and Communication Networks

documents and queries can be represented as a group ofvectors )ese vectors can be adopted in the top-k searchover the ciphertext [12 14 15] However the dimension ofthese obtained vectors is linear with the number of words inthe dataset which is not efficient if the dataset has a lot ofwords To address this issue we will apply ldquoWord2Vecrdquo topresent a novel keywords conversion method which isdescribed as follows

(1) )rough applying ldquoWord2Vecrdquo to a corpus wecreate a dictionary in which each keyword is asso-ciated with a vector representation

(2) For the keyword set Wi wi1 wi2 witi of the

document fi we obtain a vector xirarr

wi1rarr

+ wi2rarr

+ middot middot middot +

witi

rarr by looking up the dictionary where wijrarr is a

vector representation for wij and i isin [1 n] j isin [1 ti]

After this we set Wi

rarr xi

rarrxirarr

as the vector repre-sentation of Wi

(3) For the query keyword set Q q1 q2 qt weutilize the dictionary to construct a vectorvrarr

q1rarr

+ q2rarr

+ middot middot middot + qtrarr )en we set Q

rarr v

rarr vrarr

asthe vector representation of Q

Note that the dimensions ofWi andQ are very small eg200 which is significantly smaller than the number of wordsin the dataset )us the proposed method is better than theprevious method based on the TF-IDF rule In additiontogether with the vector space model mentioned above weuse the cosine measure to evaluate the relevance between thedocument and the query )e relevant evaluation function isdefined in the next section

3 Proposed Scheme

In this section we first give the algorithms of the index treebuilding and the search algorithm on this tree )en we give

the concrete construction of our scheme and the dynamicupdate operations of our scheme Finally we give a detailedanalysis of the security of our scheme

31 Search Index Balanced Binary Tree In this section weadopt a balanced binary tree to create the search indexwhich will be used in our main scheme Inspired by theconstruction process in [14] the tree building and the searchprocess for our scheme are described as follows

311 Tree Building Process Formally the data structure ofthe tree node u is defined as ule ID umin

rarr umaxrarr

Pl Pr FIDgt where ID is the identity of the node u umin

rarr and umaxrarr are the

vector representations of the node u Pl and Pr are pointerswhich point ursquos left and right children respectively and FIDstores the identity of a document if u is a leaf node Note thatcompared with the previous index trees [12 14 15] the nodein our tree has two vectors while it has only one vector inprevious trees)emain reason is that the node vector in ourtree has a negative number while the node vector in previoustrees only contains positive number For clarity we give asimple example Let a

rarr 01 02 minus03 and

brarr

minus05 0 03 be two vectors of leaf nodes A and Brespectively For the previous index trees the vector of theparent node C of these two leaf nodes is c

rarr 01 02 03 in

which the value of each dimension is the larger value of ararr

and brarr For a query vector v

rarr minus1 0 minus1 the scores of the

nodes A B and C are 02 02 and minus04 respectively It isvery important to note that the score of the parent node isless than the scores of its children which causes the fact thatthese two leaf nodes will be ignored in the tree search processeven if they should be considered

In our index tree let the dimensions of uminrarr and umax

rarr beboth d )e methods for constructing umin

rarr and umaxrarr are

denoted by M1 and M2 respectively and given as follows

(1) M1 if the node u is a leaf node which is corre-sponding a file f we create a vector u

rarr for f byadopting the keywords conversion method men-tioned in Section 25 )en we set umin

rarr u

rarr andumaxrarr

urarr

(2) M2 if the node u is an internal node the uminrarr and

umaxrarr are based on its children vectors Let Pl middot umin

rarr

and Pl middot umaxrarr be the two vectors of ursquos left child and

let Pr middot uminrarr and Pr middot umax

rarr be the two vectors of ursquosright child

Suppose that Min () and Max () are the functions of theminimum and maximum respectively umin

rarr is built asfollows

uminrarr

[i] Min Pl middot uminrarr

[i] Pr middot uminrarr

[i]( 1113857 i isin [1 d] (1)

And umaxrarr is built as follows

umaxrarr

[i] Max Pl middot umaxrarr

[i] Pr middot umaxrarr

[i]( 1113857 i isin [1 d] (2)

We find that umaxrarr is built by utilizing the larger number

of Pl middot umaxrarr and Pr middot umax

rarr and uminrarr is created by using the

smaller number of Pl middot uminrarr and Pr middot umin

rarr

Dog

Fox

Orange

Figure 2 A vector space representation of words shows that ldquodogrdquois closer to ldquofoxrdquo since they share more common attributes thanldquodogrdquo and ldquoorangerdquo

Security and Communication Networks 5

An illustration of the above methods is given in Figure 3From Figure 3 let the node u be a leaf node and letW be thekeyword set of the file that u stores By using the keywordconversion method W is converted to be a vectorurarr

minus02 02 05 minus07 08 )en we set uminrarr

umaxrarr

urarr

a If the node u is an internal node and the vectors of itschildren are Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr and

the vectors of the internal node are uminrarr

minus02 minus02 minus03 minus07 minus05 and umaxrarr

03 07

03 minus01 03Based on the methods M1 and M2 inspired by the tree

building algorithm introduced in [14] our tree buildingalgorithm is given in Algorithm 1 An example of theproposed index tree is given in Example 1 and Figure 4 InAlgorithm 1 we use function GenID () to generate theunique identity ID for each node and apply GenFID () togenerate the unique file ID for each leaf node Current-NodeSet contains a group of nodes having no parent nodewhich are needed to be processed |CurrentNodeSet| is thenumber of nodes in CurrentNodeSet If |CurrentNodeSet| iseven we assume that |CurrentNodeSet| 2h otherwise weassume that |CurrentNodeSet| 2h+ 1 where h is a positivenumber TempNodeSet is a set containing the newly gen-erated nodes Moreover for each node u if u is a leaf nodewe use methodM1 to generate umin

rarr and umaxrarr otherwise umin

rarr

and umaxrarr are created by using M2

312 Search Process For a query vector qrarr of query Q

we spilt qrarr into two vectors qmin

rarr and qmaxrarr For each di-

mension i isin [1 d] if qrarr

[i]lt 0 qminrarr

[i] qrarr

[i] andqmaxrarr

[i] 0 otherwise qminrarr

[i] 0 and qmaxrarr

[i] qrarr

[i]Obviously qmin

rarr holds all the negative part of qrarr while qmax

rarr

holds the positive part For clarity we denote this splittingmethod for query Q byM3 )e illustration of this method isgiven in Figure 5 If the query vector q

rarr 01 minus02

03 minus04 05 then qminrarr

0 minus02 0 minus04 0 andqmaxrarr

01 0 03 0 05 For a query Q and a node u the score is calculated as

Score(u Q) uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

(3)

We can utilize the above equation to evaluate whichdocuments are the most related to the query Moreover wecan verify that the score of the parent node is larger than itschildrenrsquos score )is property can significantly reduce thenumber of nodes which will be checked in the searchprocess

)e search process is given in Algorithm 2 In Algo-rithm 2 we use RList to store the top-k files which have thek-largest relevance scores to the query )e RList is ini-tialized to be an empty list and it is updated when finding arelevance file )e kth score is defined as the smallest rel-evance score in the current RList which is initialized to be avery small integer By using the kth score we can acceleratethe search process by ignoring some paths with low scoresIn Example 1 and Figure 4 an illustration of the searchprocess is given where F f1 f2 f6 query vectors areqminrarr

0 minus03 0 and qmaxrarr

01 0 02 and d (vector di-mension) is 3

313 Example 1 An example of an index tree and a searchprocess on this tree is illustrated in Figure 4 In Figure 4 weshow an index tree with F f1 f2 f6 in which thedimension of the vector for each node is 3 For each node uin the tree the upper vector and lower vector are corre-sponding to umin

rarr and umaxrarr respectively In the tree building

process we first generate the leaf nodes from F and thencreate the internal nodes based on these leaf nodes

Moreover Figure 4 also gives an illustration of the searchprocess In Figure 4 we set q

rarr 01 minus03 02 and split it

into qminrarr

0 minus03 0 and qmaxrarr

01 0 02 We supposethat top-3 files will be returned to the data user According toAlgorithm 2 the search process begins with the root node rand calculates the score between the query Q and the twochild nodes r11 and r12 of r by using equation (3) )ecalculation process is presented as follows

Score r11 Q( 1113857 minus04 minus02 minus03 middot 00 minus03 00

+ 03 03 08 middot 01 00 02 025

Score r12 Q( 1113857 minus05 07 minus02 middot 00 minus03 00

+ 05 09 08 middot 01 00 02 0

(4)

Because the score between r11 and Q is higher than thatbetween r12 andQ Algorithm 2 will traverse the subtree withr11 as the root node and compute the score between thequery Q and two child nodes of r11 Since the score betweenr21 and Q is higher than that between r22 and Q Algorithm 2will traverse the subtree with r21 as the root node and add theleaf nodes f1 f2 to the RList After this the subtree with r22 asthe root node will be traversed and the leaf nodes f3 and f4are reached Since the number of files in RList is less than 3f3 is added to RList directly For the file f4 since the numberof files in RList equals 3 now Algorithm 2 will compare thescore between f4 and Q to the minimum score in the RListBecause the score between f4 and Q is smaller than theminimum score in the RList f4 is not added to the RList Atpresent the subtree with r11 as the root node has beentraversed Algorithm 2 will traverse the subtree with r12 asthe root node As the score between r12 andQ is smaller thanthe minimum score in the RList which means that the scoreof all child nodes of r12 is smaller than the minimum score inthe RList (this property is described in Section 312) f5 andf6 will not be checked )erefore Algorithm 2 outputsRList f1 f2 f3

32 Construction of SSE-DMKRS In this section throughcombining the secure KNN algorithm [32] and the indextree building algorithm we propose a concrete SSE-DMKRSscheme )e SSE-DMKRS scheme consists of five algo-rithms )e algorithms KeyGen DictionaryBuild andIndexBuild are executed by the data owners while the al-gorithms TrapdoorGen and Search are performed by the datausers and the cloud server respectively

(i) KeyGen (λ) given a security parameter λ this al-gorithm first randomly chooses four d times d invertiblematrices M11 M12 M21 and M22 where d is the

6 Security and Communication Networks

ndash02 02 05

05

ndash07 08

08

W

ndash02 02 05 ndash07 08 ndash02 02 ndash07

u

umin umax

(a)

ndash01 ndash02 02 ndash07 ndash05 ndash02 04 02 ndash01 03

ndash02 ndash02 ndash03 ndash01 02 03 07 03 ndash02 ndash01

ndash02 ndash02 ndash03 ndash07 ndash05 03 07 03 ndash01 03

plumin

prumin prumax

plumax

umin umax

(b)

Figure 3 An example of the vectors generation of the node u (a) MethodM1 u is a leaf nodeW is a keyword set for the file which u storesand u

rarr is a vector generated by adopting the keyword conversion method mentioned in Section 25 (b) Method M2 u is an internal nodeand Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr are the vectors of its children

Input the document collection F f1 f2 fn a semantic dictionary D generated by applying ldquoWord2Vecrdquo to FOutput the index tree T

(1) for each i isin [1 n] do(2) Construct a leaf node u for fi with uIDGenID () uPl uPr NULL uFIDGenFID (fi) and generate umin

rarr and umaxrarr

according to the method M1(3) Insert u to CurrentNodeSet(4) end for(5) while |CurrentNodeSet|ge 1 do(6) if |CurrentNodeSet| is even ie 2h then(7) for each pair of nodes uprime and uPrime in CurrentNodeSet do(8) Create a parent node u for uprime and uPrime with uIDGenID () uPl uprime u Pr uPrime uFIDNULL and set umin

rarr and umaxrarr

according to the method M2(9) Insert u to TempNodeSet(10) end for(11) else Suppose that |CurrentNodeSet| 2h+ 1(12) for each pair of nodes uprime and uPrime of the former 2hminus 2 nodes in CurrentNodeSet do(13) Create a parent node u for uprime and uPrime(14) Insert u to TempNodeSet(15) end for(16) Create a parent node u1 for the (2hminus 1)-th and (2h)-th nodes and then generate a parent node u for the (2h+ 1)-th node and u1(17) Insert u to TempNodeSet(18) end if(19) Set CurrentNodeSetTempNodeSet and clear TempNodeSet(20) end while(21) return CurrentNodeSet(22) Note that the CurrentNodeSet only contains one node which is the root of the index tree T

ALGORITHM 1 BuildIndexTree (FileSet F Dictionary D)

Security and Communication Networks 7

dimension of uminrarr and umax

rarr )en it randomlygenerates a d-bit vector S Finally it outputs thesecret key sk S M11 M12 M21 M22

(ii) DictionaryBuild (F) given the document set F

f1 f2 fn the algorithm runs ldquoWord2vecrdquo togenerate the dictionary D of F In the dictionaryD each keyword is associated with a vectorrepresentation Besides each keyword is alsocorresponding with a set of semantically relatedkeywords

(iii) IndexBuild (sk F D) given the document set F andthe dictionaryD for F the algorithm first creates theindex tree T by using the algorithm BuildIndexTree(F D) (Algorithm 1) )en for each node u in thetree T the algorithm generates two random vector

pairs Vuminprime VuminPrime1113966 1113967 and Vumax

Prime VumaxPrime1113966 1113967 for the vec-

tors of uminrarr and umax

rarr respectively More precisely ifS [i] 0 it sets Vumin

prime [i] VuminPrime [i] umin

rarr[i] and

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S [i] 1Vuminprime VuminPrime Vumaxprime VumaxPrime are set as four random

values under the constraints Vuminprime [i] Vumin

Prime [i]

uminrarr

[i] and Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] )is pro-cess is expressed as the following equation

Vuminprime [i] Vumin

Prime [i] uminrarr

[i] if S[i] 0

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S[i] 0

Vuminprime [i] + Vumin

Prime [i] uminrarr

[i] if S[i] 1

Vumaxprime [i] + Vumax

Prime [i] umaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(5)

Finally for each node u it computes Iu MT11Vuminprime 1113966

MT12VuminPrime MT

21Vumaxprime MT

22VumaxPrime )rough replacing the

plaintext vectors uminrarr and umax

rarr with the encrypted index Iuan encrypted index tree IT is created

(iv) TrapdoorGen (sk Q) given a query keyword set Qthe algorithm first extends Q to a new semantickeyword set Qprime )e process is as follows

(a) It generates a new keyword set Qprime which isinitialized to an empty set

r

ndash05 ndash02 ndash03

05 09 08

ndash04 ndash02 ndash03

03 03 08

ndash05 07 ndash02

05 09 08

ndash02

03 08

ndash02

03 03

ndash0209

05 07 08

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

05 07 08

ndash05

ndash05

ndash0209

ndash01ndash02

01 00 02

00 ndash03 0001 ndash03 02

ndash02 ndash03

qqmin

qmax

r11 r12

r21 r22

f1 f2 f3 f4

f5 f6

umin

umax

(1)

(2)

(4) (3) (5)(6)

(7)

ndash04 07

Figure 4 An example of Algorithm 1 and Algorithm 2 (example 1)

Q

01 ndash02 03 ndash04 05

01 0 03 0 050 ndash02 0 ndash04 0

q

qmin qmax

Figure 5 An example of the vector generation of the query QMethod M3 q

rarr is a vector generated by adopting the keywordconversion method mentioned in Section 25 qmin

rarr holds all thenegative part of q

rarr while qmaxrarr holds the positive part

8 Security and Communication Networks

(b) Note that each keyword in the dictionary isassociated with a group of keywords semanti-cally related to this keyword For each keyword qin Q it randomly chooses kprime semantic keywordsbased on the dictionary and inserts these key-words into Qprime where kprime is chosen dynamicallyand k isin [1 N]

)en based onQprime the TrapdoorGen algorithm generatesa pair of vectors qmin

rarr and qmaxrarr by adopting the method M3

After this it generates two random vector pairs Qqminprime QqminPrime1113966 1113967

and Qqmaxprime QqmaxPrime1113966 1113967 for the vectors of qmin

rarr and qmaxrarr respec-

tively)is process is similar to the process in the IndexBuildalgorithm and can be expressed as the following equations

Qqminprime [i] Qqmin

Prime [i] qminrarr

[i] if S[i] 0

Qqmaxprime [i] Qqmax

Prime [i] qmaxrarr

[i] if S[i] 0

Qqminprime [i] + Qqmin

Prime [i] qminrarr

[i] if S[i] 1

Qqmaxprime [i] + Qqmax

Prime [i] qmaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(6)

Finally this algorithm generates TQ Mminus111Qqminprime 1113966

Mminus112QqminPrime Mminus1

21Qqmaxprime Mminus1

22QqmaxPrime as the trapdoor for Q

(v) Search (sk TQ IT) for each node u in IT the al-gorithm computes

Iu middot TQ MT11Vuminprime middot M

minus111Qqminprime1113872 1113873 + M

T12VuminPrime middot M

minus112QqminPrime1113872 1113873

+ MT21Vumaxprime middot M

minus121Qqmaxprime1113872 1113873 + M

T22VumaxPrime middot M

minus122QqmaxPrime1113872 1113873

Vuminprime middot Qqminprime1113872 1113873 + Vumin

Prime middot QqminPrime1113872 1113873 + Vumax

prime middot Qqmaxprime1113872 1113873

+ VumaxPrime middot QqmaxPrime1113872 1113873

uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

Score(u Q)

(7)

According to equation (3) the relevance score calculatedfrom the encrypted vector Iu and the trapdoor TQ equals thevalue of Score (u Q) By using this property the algorithmcan utilize the SearchIndexTree algorithm (Algorithm 2) toperform ranked search

33 Dynamic Update Operations Besides search operationthe proposed scheme also supports some dynamic opera-tions eg documents insertion and deletion satisfying therequirement of real-world application Because the proposedscheme is built over a balanced binary tree the updateoperations are realized by modifying the nodes in the treeInspired by the update method introduced in [14 15] theupdate algorithm is presented as follows

(i) UpdateInfoGen (sk Ts fi Utype) this algorithm isexecuted by the data owners and generates the up-date information Is ci to the cloud server where Tsis a set containing all the update nodes Is is anencrypted form of Ts fi is the target document ci isan encrypted form of fi and Utype is the update typeIn order to reduce the communication cost the dataowners will store the unencrypted index tree on itsown device For the Utype isin Ins Del the algorithmworks as follows

(a) If Utype ldquoDelrdquo it means that the algorithm willdelete a document fi from the tree )e algorithmfirst finds the leaf node associated with thedocument fi and deletes it In addition internalnodes associated with this leaf node are also addedto Ts Specifically if the deletion operation willbreak the balance of the index tree the algorithmcan set the target leaf node as a fake node insteadof removing it After this the algorithm encryptsTs to generate Is Finally the algorithm sends Is tothe cloud server and sets ci as null

Input A vector qrarr of query Q a semantic dictionary D generated by applying ldquoWord2Vecrdquo to F a root node u of IndexTree and

RListOutput RList

(1) Split qrarr into qmin

rarr and qmaxrarr according to the method M3

(2) if u is an internal node then(3) if Score (u Q)gt k-th score then(4) SearchIndexTree ( q

rarr D uPl RList)(5) SearchIndexTree ( q

rarr D uPr RList)(6) else(7) return(8) end if(9) else(10) if Score (u Q)gt k-th score then Update RList(11) Delete the element holding the smallest relevance score in RList(12) Insert a new element ltScore (u Q) uFIDgt in the Rlist and sort the elements in RList(13) end if(14) return(15) end if

ALGORITHM 2 SearchIndexTree (QueryVector qrarr Dictionary D TreeNode u RList)

Security and Communication Networks 9

(b) If Utype ldquoInsrdquo it means that the algorithm willinsert a document fi to the tree )e algorithmfirst creates a leaf node for fi according to themethodM1 introduced in Section 31 and insertsthis leaf node to Ts )en based on the methodM2 the algorithm updates the vectors of theinternal nodes which are placed on the path fromroot to the new leaf node and inserts these in-ternal nodes to Ts Here the algorithm prefers toreplace the fake leaf node with the new leaf noderather than insert a new leaf node Finally thealgorithm encrypts Ts and fi to generate Is and cirespectively and sends them to the cloud server

(ii) Update (IT C Is ci Utype) this algorithm is exe-cuted by the cloud server to update the index tree ITwith encrypted nodes set Is After this ifUtype ldquoDelrdquo then the algorithm removes ci fromCOtherwise the algorithm inserts ci to C

Note that after a period of insertion and deletion op-erations the number of keywords in the dictionary should bechanged Because the dimensions of the index and trapdoorvectors in the previous schemes are linear with the numberof keywords in the dictionary these schemes have to rebuildthe search index tree By contrast our scheme will not beaffected by this problem For the proposed scheme thedimensions of the vectors in the index and trapdoor aredetermined by the tool of ldquoWord2vecrdquo and set by the usersFor example if we set the dimension of the vector as 200 thedimension of each keywordrsquos vector is 200 and thus thedimensions of the vectors of umin

rarr umaxrarr qmin

rarr and qmaxrarr are all

200 According to the above analysis our scheme is moresuitable for the update operations than the previousschemes

34 SecurityAnalysis In this section we analyse the securityof the proposed SSE-DMKRS scheme according to theprivacy requirement introduced in Section 23

(1) Index and Trapdoor Privacy In the proposed schemeeach node u in the index tree and the query Q in thetrapdoor are encrypted by using the secure KNNalgorithm introduced in [32] )us the attackerscannot obtain the original vectors in the tree nodesand the query which means that the index andtrapdoor privacy are well protected

(2) Trapdoor Unlinkability In the trapdoor generationphase the query vector will be split randomlyMoreover the same keyword set Q will be extendedto bemultiple different semantic keyword setsQprime Sothe same query Q will be encrypted to be differenttrapdoors which means that the goal of the trapdoorunlinkability is achieved

(3) Keyword Privacy Since the index and the trapdoorare protected by the secure KNN algorithm theadversary cannot infer the plaintext informationfrom the index and the trapdoor under the knownciphertext model Considering that the known

background model is common in real-world appli-cations we will analyse the security of the proposedscheme under the known backgroundmodel For theTrapdoorGen algorithm the original query keywordsetQ is extended to a new setQprime Specifically for eachkeyword q inQ randomly choosing a number kprime thealgorithm chooses kprime semantic keywords related to qby utilizing the dictionary and inserts these keywordsinto the Qprime Suppose that each keyword is associatedwith N semantic keywords in the dictionary eachkeyword can generate 2N different keyword sets sinceeach semantic keyword can be chosen or not Forexample if a keyword q is associated with threesemantic keywords q1 q2 q3 then q can generate 23keyword sets q q q1 q q2 q q3 q q1 q2 qq1 q3 q q2 q3 and q q1 q2 q3 Since the queryQusually contains more than one keyword Q willgenerate more than 2N different semantic keywordsets According to this method the final similarityscore is obfuscated by these random semantic key-word sets As the analysis in [14 15] our scheme canprotect the keyword privacy under the knownbackground model

4 Proposed Scheme

In this section we analyse the proposed SSE-DMKRSscheme theoretically and experimentally A detailed ex-periment is given to demonstrate that our scheme can ef-ficiently perform dynamic ranked keywords search over theencrypted data Our experiment is run on Intelreg Coretrade i7CPU at a 290GHz processor and 16GB memory size and isbased on a real-world e-mail dataset called Enron e-maildataset [35] We mainly analyse the performance of ourscheme in two aspects (1) the efficiency of the proposedscheme including index building trapdoor generationsearch and update (2) the relationship between the searchprecision and the privacy level Moreover in order to showthe advantages of our scheme we also compare our schemeto two previous schemes related to our scheme For sim-plicity we denote these two schemes introduced in [14 15]by X15 and G19

41 Efficiency

411 Index Building )e process of index building mainlyconsists of two steps (1) creating an unencrypted index treeby utilizing Algorithm 1 (2) encrypting each node in the treeby using the secure KNN scheme In the tree building stepAlgorithm 1 generates O (n) nodes based on the documentset F Because each node has two vectors umin

rarr umaxrarr whose

dimensions are both d the vector splitting process needs O(d) time and the matrix multiplication operations take O(d times d) time in the encryption step According to these twosteps the whole time complexity of index building isO (nd2)which means that the time cost for index building mainlydepends on the number of documents in F and the di-mension of each nodersquos vector

10 Security and Communication Networks

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 3: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

construction of the search index tree and the SSE-DMKRSscheme Besides a detailed security analysis and updateoperations of our scheme are also given )eoretical andexperimental analyses are given in Section 4 Section 5 givesthe conclusion

2 Preliminaries

In this section we first give the framework of the systemmodel and introduce the threat model adopted in ourscheme )en we introduce some tools adopted in ourschemes including a famous term representation method inthe field of natural language processing eg ldquoWord2vecrdquoand the vector space model Finally we present the designgoal of our scheme In addition the main notations used inthis paper are summarized in Table 2

21 System Model )e system model contains three dif-ferent roles data owner data user and cloud server)e dataowner outsources a group of documents F f1 f2 fn tothe cloud in ciphertext form C c1 c2 cn Moreoverthe data owner also generates an encrypted searchable indexfor keywords search operation For each query of an arbi-trary keyword set Q the data user computes a searchtrapdoor TQ of the query Q and sends it to the cloud serverUpon receiving TQ from the data user the cloud serversearches against the encrypted index and returns the can-didate encrypted documents After this the data user de-crypts the candidate documents and obtains the plaintext

As illustrated in Figure 1 the architecture of the systemmodel is formally described as follows

(1) Data Owner (DO) DO holds a group of documentsF f1 f2 fn and generates a secure searchableindex I from F and an encrypted document col-lection C for F )en DO uploads I and C to thecloud server and distributes the secret key to theauthorized data users Furthermore DO needs toupdate the index and documents stored in the cloudserver

(2) Data User (DU) Authorized DU can launch key-words query over the encrypted data by utilizing a

trapdoor which is generated by using the secret keyfetched from DO Moreover DU can decrypt theencrypted documents by utilizing the secret key

(3) Cloud Server (CS) CS stores the encrypted index Iand documents C from DO When CS receives thetrapdoor for query Q from DU CS executes key-words query over the index and returns the top-kmost relevant encrypted documents associated withthe query Q Upon receiving the update informationfrom DO CS also performs update operation overthe encrypted data In addition we assume that CS isldquohonest-but-curiousrdquo which is employed by manysearchable encryption schemes [12 14 15] )ismeans that CS honestly and correctly executes thealgorithms in our scheme However CS curiouslyinfers and analyses the received data to obtain extraprivacy information

22 2reat Model )roughout the paper we mainly utilizetwo threat models proposed by Cao et al [12]

(1) Known Ciphertext Model CS only knows the in-formation of the encrypted index ciphertext andtrapdoor )at is to say CS can execute cipher-onlyattacks in this model

(2) Known Background Model CS knows more infor-mation than the known ciphertext model such as thestatistical information inferred from the documentsBy taking advantage of these pieces of statisticalinformation eg term frequency (TF) and inversedocument frequency (IDF) CS can conduct statis-tical attack to verify whether certain keywords are inthe query [33]

23 Design Goals As mentioned before we aim to build asecure and efficient SSE-DMKRS scheme )e design goal ofour scheme is described as follows

(1) Efficiency )e scheme aims to realize a sublinearsearch efficiency and the time and space costs ofindex building and trapdoor generation are muchless than those of the current schemes

Table 1 Comparison between previous SE schemes and ours

Type Ref Query condition Additional special abilities

SSE

[6] Conjunctive keyword search mdash[8] Multikeyword fuzzy search mdash[11] Single-keyword ranked search mdash[12] Multikeyword ranked search mdash[14] Multikeyword ranked search Dynamic updateOurs Multikeyword ranked search Semantic search and dynamic update

SPE

[22] Conjunctive and disjunctive keyword search mdash[23] Boolean keyword search mdash[25] Single-keyword search Fast search[27] Boolean keyword search Access control[29] Multikeyword search Data sharing[30] Multikeyword search Semantic search

Security and Communication Networks 3

(2) Privacy Preserving Similar with previous schemes[12 14 15] our scheme needs to prevent CS fromlearning extra privacy information which is inferredfrom the documents secure index and queries Moreprecisely the privacy requirement is listed as follows

Index and Trapdoor Privacy )e plaintext infor-mation concealed in the index and the trapdoorcannot be leaked to CS )is information involvesthe keywords and the corresponding vector rep-resentation of each keyword Trapdoor Unlinkability CS cannot determine

whether two trapdoors are built from the samequery

Keyword Privacy CS cannot identify whether aspecific keyword is in the trapdoor or index byanalysing the search results and the statisticalinformation of the documents

(3) Dynamic )e scheme can efficiently support dy-namic operations like documents insertion anddeletion Note that the efficiency of update opera-tions in our scheme is better than the previous SSE-DMKRS schemes

24 Word2Vec ldquoWord2Vecrdquo model is a shallow two-layerneural network which is used to convert words into a groupof vector representations [31] Under this model each wordin the document set is mapped to a vector which can be usedto calculate the similarity between words For instanceFigure 2 shows that through training a simple corpus threewords ldquodogrdquo ldquofoxrdquo and ldquoorangerdquo are mapped to three vectorrepresentations respectively By utilizing these vectors thesimilarity among these three words can be calculated Wecan find that the similarity between ldquodogrdquo and ldquofoxrdquo is morethan that between ldquodogrdquo and ldquoorangerdquo since ldquodogrdquo and ldquofoxrdquoare animals )us we can utilize ldquoWord2Vecrdquo to convert thekeywords in a corpus into a group of vector representationsand then apply these vectors to perform ranked search

25 Advice on Equations Vector space model is a verypopular method used in the field of information retrievalusually along with the TF-IDF rule to realize the top-ksearch where TF is term frequency and IDF is the inversedocument frequency [34] By utilizing the TF-IDF rule the

Table 2 Notations

F A document set f1 f2 fnn )e number of documents in FC )e encrypted form of F denoted by c1 c2 cnWi )e keyword set wi1 wi2 witi

for the document fiti )e number of keywords in Wi and i isin [1 n]wij )e jth keywords in Wi and i isin [1 n] j isin [1 ti]

Wi

rarr)e vector representation for Wi

wijrarr )e vector representation for wiju A node in the index treeurarr )e vector representation for the node uuminrarr umax

rarr Vector representations are obtained by splitting urarr

Iu )e encrypted index for the node uIT )e encrypted index tree of FQ )e keyword set q1 q2 qt for queryqj A keyword in Q j isin [1 t]qrarr )e vector representation for query Qqminrarr qmax

rarr Vector representations are obtained by splitting qrarr

qjrarr )e vector representation for qjTQ )e trapdoor of QD A dictionary w1 w2 wm containing all keywords in FM11 M12 M21 M22 Matrices for encryption (encryption key)Mminus1

11 Mminus112 Mminus1

21 Mminus122 Matrices for decryption (decryption key)

N )e number of semantic keywords associated with each dictionaryrsquos keywordm )e number of keywords in Dd )e dimension of vector generated by using ldquoWord2vecrdquok )e number of files returned to the user

Secret key

Data owners Data users

Encrypted e-mails

Encrypted index

Search results

TrapdoorSemitrusted cloud server

Figure 1 System model of the keywords search over encrypteddata

4 Security and Communication Networks

documents and queries can be represented as a group ofvectors )ese vectors can be adopted in the top-k searchover the ciphertext [12 14 15] However the dimension ofthese obtained vectors is linear with the number of words inthe dataset which is not efficient if the dataset has a lot ofwords To address this issue we will apply ldquoWord2Vecrdquo topresent a novel keywords conversion method which isdescribed as follows

(1) )rough applying ldquoWord2Vecrdquo to a corpus wecreate a dictionary in which each keyword is asso-ciated with a vector representation

(2) For the keyword set Wi wi1 wi2 witi of the

document fi we obtain a vector xirarr

wi1rarr

+ wi2rarr

+ middot middot middot +

witi

rarr by looking up the dictionary where wijrarr is a

vector representation for wij and i isin [1 n] j isin [1 ti]

After this we set Wi

rarr xi

rarrxirarr

as the vector repre-sentation of Wi

(3) For the query keyword set Q q1 q2 qt weutilize the dictionary to construct a vectorvrarr

q1rarr

+ q2rarr

+ middot middot middot + qtrarr )en we set Q

rarr v

rarr vrarr

asthe vector representation of Q

Note that the dimensions ofWi andQ are very small eg200 which is significantly smaller than the number of wordsin the dataset )us the proposed method is better than theprevious method based on the TF-IDF rule In additiontogether with the vector space model mentioned above weuse the cosine measure to evaluate the relevance between thedocument and the query )e relevant evaluation function isdefined in the next section

3 Proposed Scheme

In this section we first give the algorithms of the index treebuilding and the search algorithm on this tree )en we give

the concrete construction of our scheme and the dynamicupdate operations of our scheme Finally we give a detailedanalysis of the security of our scheme

31 Search Index Balanced Binary Tree In this section weadopt a balanced binary tree to create the search indexwhich will be used in our main scheme Inspired by theconstruction process in [14] the tree building and the searchprocess for our scheme are described as follows

311 Tree Building Process Formally the data structure ofthe tree node u is defined as ule ID umin

rarr umaxrarr

Pl Pr FIDgt where ID is the identity of the node u umin

rarr and umaxrarr are the

vector representations of the node u Pl and Pr are pointerswhich point ursquos left and right children respectively and FIDstores the identity of a document if u is a leaf node Note thatcompared with the previous index trees [12 14 15] the nodein our tree has two vectors while it has only one vector inprevious trees)emain reason is that the node vector in ourtree has a negative number while the node vector in previoustrees only contains positive number For clarity we give asimple example Let a

rarr 01 02 minus03 and

brarr

minus05 0 03 be two vectors of leaf nodes A and Brespectively For the previous index trees the vector of theparent node C of these two leaf nodes is c

rarr 01 02 03 in

which the value of each dimension is the larger value of ararr

and brarr For a query vector v

rarr minus1 0 minus1 the scores of the

nodes A B and C are 02 02 and minus04 respectively It isvery important to note that the score of the parent node isless than the scores of its children which causes the fact thatthese two leaf nodes will be ignored in the tree search processeven if they should be considered

In our index tree let the dimensions of uminrarr and umax

rarr beboth d )e methods for constructing umin

rarr and umaxrarr are

denoted by M1 and M2 respectively and given as follows

(1) M1 if the node u is a leaf node which is corre-sponding a file f we create a vector u

rarr for f byadopting the keywords conversion method men-tioned in Section 25 )en we set umin

rarr u

rarr andumaxrarr

urarr

(2) M2 if the node u is an internal node the uminrarr and

umaxrarr are based on its children vectors Let Pl middot umin

rarr

and Pl middot umaxrarr be the two vectors of ursquos left child and

let Pr middot uminrarr and Pr middot umax

rarr be the two vectors of ursquosright child

Suppose that Min () and Max () are the functions of theminimum and maximum respectively umin

rarr is built asfollows

uminrarr

[i] Min Pl middot uminrarr

[i] Pr middot uminrarr

[i]( 1113857 i isin [1 d] (1)

And umaxrarr is built as follows

umaxrarr

[i] Max Pl middot umaxrarr

[i] Pr middot umaxrarr

[i]( 1113857 i isin [1 d] (2)

We find that umaxrarr is built by utilizing the larger number

of Pl middot umaxrarr and Pr middot umax

rarr and uminrarr is created by using the

smaller number of Pl middot uminrarr and Pr middot umin

rarr

Dog

Fox

Orange

Figure 2 A vector space representation of words shows that ldquodogrdquois closer to ldquofoxrdquo since they share more common attributes thanldquodogrdquo and ldquoorangerdquo

Security and Communication Networks 5

An illustration of the above methods is given in Figure 3From Figure 3 let the node u be a leaf node and letW be thekeyword set of the file that u stores By using the keywordconversion method W is converted to be a vectorurarr

minus02 02 05 minus07 08 )en we set uminrarr

umaxrarr

urarr

a If the node u is an internal node and the vectors of itschildren are Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr and

the vectors of the internal node are uminrarr

minus02 minus02 minus03 minus07 minus05 and umaxrarr

03 07

03 minus01 03Based on the methods M1 and M2 inspired by the tree

building algorithm introduced in [14] our tree buildingalgorithm is given in Algorithm 1 An example of theproposed index tree is given in Example 1 and Figure 4 InAlgorithm 1 we use function GenID () to generate theunique identity ID for each node and apply GenFID () togenerate the unique file ID for each leaf node Current-NodeSet contains a group of nodes having no parent nodewhich are needed to be processed |CurrentNodeSet| is thenumber of nodes in CurrentNodeSet If |CurrentNodeSet| iseven we assume that |CurrentNodeSet| 2h otherwise weassume that |CurrentNodeSet| 2h+ 1 where h is a positivenumber TempNodeSet is a set containing the newly gen-erated nodes Moreover for each node u if u is a leaf nodewe use methodM1 to generate umin

rarr and umaxrarr otherwise umin

rarr

and umaxrarr are created by using M2

312 Search Process For a query vector qrarr of query Q

we spilt qrarr into two vectors qmin

rarr and qmaxrarr For each di-

mension i isin [1 d] if qrarr

[i]lt 0 qminrarr

[i] qrarr

[i] andqmaxrarr

[i] 0 otherwise qminrarr

[i] 0 and qmaxrarr

[i] qrarr

[i]Obviously qmin

rarr holds all the negative part of qrarr while qmax

rarr

holds the positive part For clarity we denote this splittingmethod for query Q byM3 )e illustration of this method isgiven in Figure 5 If the query vector q

rarr 01 minus02

03 minus04 05 then qminrarr

0 minus02 0 minus04 0 andqmaxrarr

01 0 03 0 05 For a query Q and a node u the score is calculated as

Score(u Q) uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

(3)

We can utilize the above equation to evaluate whichdocuments are the most related to the query Moreover wecan verify that the score of the parent node is larger than itschildrenrsquos score )is property can significantly reduce thenumber of nodes which will be checked in the searchprocess

)e search process is given in Algorithm 2 In Algo-rithm 2 we use RList to store the top-k files which have thek-largest relevance scores to the query )e RList is ini-tialized to be an empty list and it is updated when finding arelevance file )e kth score is defined as the smallest rel-evance score in the current RList which is initialized to be avery small integer By using the kth score we can acceleratethe search process by ignoring some paths with low scoresIn Example 1 and Figure 4 an illustration of the searchprocess is given where F f1 f2 f6 query vectors areqminrarr

0 minus03 0 and qmaxrarr

01 0 02 and d (vector di-mension) is 3

313 Example 1 An example of an index tree and a searchprocess on this tree is illustrated in Figure 4 In Figure 4 weshow an index tree with F f1 f2 f6 in which thedimension of the vector for each node is 3 For each node uin the tree the upper vector and lower vector are corre-sponding to umin

rarr and umaxrarr respectively In the tree building

process we first generate the leaf nodes from F and thencreate the internal nodes based on these leaf nodes

Moreover Figure 4 also gives an illustration of the searchprocess In Figure 4 we set q

rarr 01 minus03 02 and split it

into qminrarr

0 minus03 0 and qmaxrarr

01 0 02 We supposethat top-3 files will be returned to the data user According toAlgorithm 2 the search process begins with the root node rand calculates the score between the query Q and the twochild nodes r11 and r12 of r by using equation (3) )ecalculation process is presented as follows

Score r11 Q( 1113857 minus04 minus02 minus03 middot 00 minus03 00

+ 03 03 08 middot 01 00 02 025

Score r12 Q( 1113857 minus05 07 minus02 middot 00 minus03 00

+ 05 09 08 middot 01 00 02 0

(4)

Because the score between r11 and Q is higher than thatbetween r12 andQ Algorithm 2 will traverse the subtree withr11 as the root node and compute the score between thequery Q and two child nodes of r11 Since the score betweenr21 and Q is higher than that between r22 and Q Algorithm 2will traverse the subtree with r21 as the root node and add theleaf nodes f1 f2 to the RList After this the subtree with r22 asthe root node will be traversed and the leaf nodes f3 and f4are reached Since the number of files in RList is less than 3f3 is added to RList directly For the file f4 since the numberof files in RList equals 3 now Algorithm 2 will compare thescore between f4 and Q to the minimum score in the RListBecause the score between f4 and Q is smaller than theminimum score in the RList f4 is not added to the RList Atpresent the subtree with r11 as the root node has beentraversed Algorithm 2 will traverse the subtree with r12 asthe root node As the score between r12 andQ is smaller thanthe minimum score in the RList which means that the scoreof all child nodes of r12 is smaller than the minimum score inthe RList (this property is described in Section 312) f5 andf6 will not be checked )erefore Algorithm 2 outputsRList f1 f2 f3

32 Construction of SSE-DMKRS In this section throughcombining the secure KNN algorithm [32] and the indextree building algorithm we propose a concrete SSE-DMKRSscheme )e SSE-DMKRS scheme consists of five algo-rithms )e algorithms KeyGen DictionaryBuild andIndexBuild are executed by the data owners while the al-gorithms TrapdoorGen and Search are performed by the datausers and the cloud server respectively

(i) KeyGen (λ) given a security parameter λ this al-gorithm first randomly chooses four d times d invertiblematrices M11 M12 M21 and M22 where d is the

6 Security and Communication Networks

ndash02 02 05

05

ndash07 08

08

W

ndash02 02 05 ndash07 08 ndash02 02 ndash07

u

umin umax

(a)

ndash01 ndash02 02 ndash07 ndash05 ndash02 04 02 ndash01 03

ndash02 ndash02 ndash03 ndash01 02 03 07 03 ndash02 ndash01

ndash02 ndash02 ndash03 ndash07 ndash05 03 07 03 ndash01 03

plumin

prumin prumax

plumax

umin umax

(b)

Figure 3 An example of the vectors generation of the node u (a) MethodM1 u is a leaf nodeW is a keyword set for the file which u storesand u

rarr is a vector generated by adopting the keyword conversion method mentioned in Section 25 (b) Method M2 u is an internal nodeand Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr are the vectors of its children

Input the document collection F f1 f2 fn a semantic dictionary D generated by applying ldquoWord2Vecrdquo to FOutput the index tree T

(1) for each i isin [1 n] do(2) Construct a leaf node u for fi with uIDGenID () uPl uPr NULL uFIDGenFID (fi) and generate umin

rarr and umaxrarr

according to the method M1(3) Insert u to CurrentNodeSet(4) end for(5) while |CurrentNodeSet|ge 1 do(6) if |CurrentNodeSet| is even ie 2h then(7) for each pair of nodes uprime and uPrime in CurrentNodeSet do(8) Create a parent node u for uprime and uPrime with uIDGenID () uPl uprime u Pr uPrime uFIDNULL and set umin

rarr and umaxrarr

according to the method M2(9) Insert u to TempNodeSet(10) end for(11) else Suppose that |CurrentNodeSet| 2h+ 1(12) for each pair of nodes uprime and uPrime of the former 2hminus 2 nodes in CurrentNodeSet do(13) Create a parent node u for uprime and uPrime(14) Insert u to TempNodeSet(15) end for(16) Create a parent node u1 for the (2hminus 1)-th and (2h)-th nodes and then generate a parent node u for the (2h+ 1)-th node and u1(17) Insert u to TempNodeSet(18) end if(19) Set CurrentNodeSetTempNodeSet and clear TempNodeSet(20) end while(21) return CurrentNodeSet(22) Note that the CurrentNodeSet only contains one node which is the root of the index tree T

ALGORITHM 1 BuildIndexTree (FileSet F Dictionary D)

Security and Communication Networks 7

dimension of uminrarr and umax

rarr )en it randomlygenerates a d-bit vector S Finally it outputs thesecret key sk S M11 M12 M21 M22

(ii) DictionaryBuild (F) given the document set F

f1 f2 fn the algorithm runs ldquoWord2vecrdquo togenerate the dictionary D of F In the dictionaryD each keyword is associated with a vectorrepresentation Besides each keyword is alsocorresponding with a set of semantically relatedkeywords

(iii) IndexBuild (sk F D) given the document set F andthe dictionaryD for F the algorithm first creates theindex tree T by using the algorithm BuildIndexTree(F D) (Algorithm 1) )en for each node u in thetree T the algorithm generates two random vector

pairs Vuminprime VuminPrime1113966 1113967 and Vumax

Prime VumaxPrime1113966 1113967 for the vec-

tors of uminrarr and umax

rarr respectively More precisely ifS [i] 0 it sets Vumin

prime [i] VuminPrime [i] umin

rarr[i] and

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S [i] 1Vuminprime VuminPrime Vumaxprime VumaxPrime are set as four random

values under the constraints Vuminprime [i] Vumin

Prime [i]

uminrarr

[i] and Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] )is pro-cess is expressed as the following equation

Vuminprime [i] Vumin

Prime [i] uminrarr

[i] if S[i] 0

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S[i] 0

Vuminprime [i] + Vumin

Prime [i] uminrarr

[i] if S[i] 1

Vumaxprime [i] + Vumax

Prime [i] umaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(5)

Finally for each node u it computes Iu MT11Vuminprime 1113966

MT12VuminPrime MT

21Vumaxprime MT

22VumaxPrime )rough replacing the

plaintext vectors uminrarr and umax

rarr with the encrypted index Iuan encrypted index tree IT is created

(iv) TrapdoorGen (sk Q) given a query keyword set Qthe algorithm first extends Q to a new semantickeyword set Qprime )e process is as follows

(a) It generates a new keyword set Qprime which isinitialized to an empty set

r

ndash05 ndash02 ndash03

05 09 08

ndash04 ndash02 ndash03

03 03 08

ndash05 07 ndash02

05 09 08

ndash02

03 08

ndash02

03 03

ndash0209

05 07 08

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

05 07 08

ndash05

ndash05

ndash0209

ndash01ndash02

01 00 02

00 ndash03 0001 ndash03 02

ndash02 ndash03

qqmin

qmax

r11 r12

r21 r22

f1 f2 f3 f4

f5 f6

umin

umax

(1)

(2)

(4) (3) (5)(6)

(7)

ndash04 07

Figure 4 An example of Algorithm 1 and Algorithm 2 (example 1)

Q

01 ndash02 03 ndash04 05

01 0 03 0 050 ndash02 0 ndash04 0

q

qmin qmax

Figure 5 An example of the vector generation of the query QMethod M3 q

rarr is a vector generated by adopting the keywordconversion method mentioned in Section 25 qmin

rarr holds all thenegative part of q

rarr while qmaxrarr holds the positive part

8 Security and Communication Networks

(b) Note that each keyword in the dictionary isassociated with a group of keywords semanti-cally related to this keyword For each keyword qin Q it randomly chooses kprime semantic keywordsbased on the dictionary and inserts these key-words into Qprime where kprime is chosen dynamicallyand k isin [1 N]

)en based onQprime the TrapdoorGen algorithm generatesa pair of vectors qmin

rarr and qmaxrarr by adopting the method M3

After this it generates two random vector pairs Qqminprime QqminPrime1113966 1113967

and Qqmaxprime QqmaxPrime1113966 1113967 for the vectors of qmin

rarr and qmaxrarr respec-

tively)is process is similar to the process in the IndexBuildalgorithm and can be expressed as the following equations

Qqminprime [i] Qqmin

Prime [i] qminrarr

[i] if S[i] 0

Qqmaxprime [i] Qqmax

Prime [i] qmaxrarr

[i] if S[i] 0

Qqminprime [i] + Qqmin

Prime [i] qminrarr

[i] if S[i] 1

Qqmaxprime [i] + Qqmax

Prime [i] qmaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(6)

Finally this algorithm generates TQ Mminus111Qqminprime 1113966

Mminus112QqminPrime Mminus1

21Qqmaxprime Mminus1

22QqmaxPrime as the trapdoor for Q

(v) Search (sk TQ IT) for each node u in IT the al-gorithm computes

Iu middot TQ MT11Vuminprime middot M

minus111Qqminprime1113872 1113873 + M

T12VuminPrime middot M

minus112QqminPrime1113872 1113873

+ MT21Vumaxprime middot M

minus121Qqmaxprime1113872 1113873 + M

T22VumaxPrime middot M

minus122QqmaxPrime1113872 1113873

Vuminprime middot Qqminprime1113872 1113873 + Vumin

Prime middot QqminPrime1113872 1113873 + Vumax

prime middot Qqmaxprime1113872 1113873

+ VumaxPrime middot QqmaxPrime1113872 1113873

uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

Score(u Q)

(7)

According to equation (3) the relevance score calculatedfrom the encrypted vector Iu and the trapdoor TQ equals thevalue of Score (u Q) By using this property the algorithmcan utilize the SearchIndexTree algorithm (Algorithm 2) toperform ranked search

33 Dynamic Update Operations Besides search operationthe proposed scheme also supports some dynamic opera-tions eg documents insertion and deletion satisfying therequirement of real-world application Because the proposedscheme is built over a balanced binary tree the updateoperations are realized by modifying the nodes in the treeInspired by the update method introduced in [14 15] theupdate algorithm is presented as follows

(i) UpdateInfoGen (sk Ts fi Utype) this algorithm isexecuted by the data owners and generates the up-date information Is ci to the cloud server where Tsis a set containing all the update nodes Is is anencrypted form of Ts fi is the target document ci isan encrypted form of fi and Utype is the update typeIn order to reduce the communication cost the dataowners will store the unencrypted index tree on itsown device For the Utype isin Ins Del the algorithmworks as follows

(a) If Utype ldquoDelrdquo it means that the algorithm willdelete a document fi from the tree )e algorithmfirst finds the leaf node associated with thedocument fi and deletes it In addition internalnodes associated with this leaf node are also addedto Ts Specifically if the deletion operation willbreak the balance of the index tree the algorithmcan set the target leaf node as a fake node insteadof removing it After this the algorithm encryptsTs to generate Is Finally the algorithm sends Is tothe cloud server and sets ci as null

Input A vector qrarr of query Q a semantic dictionary D generated by applying ldquoWord2Vecrdquo to F a root node u of IndexTree and

RListOutput RList

(1) Split qrarr into qmin

rarr and qmaxrarr according to the method M3

(2) if u is an internal node then(3) if Score (u Q)gt k-th score then(4) SearchIndexTree ( q

rarr D uPl RList)(5) SearchIndexTree ( q

rarr D uPr RList)(6) else(7) return(8) end if(9) else(10) if Score (u Q)gt k-th score then Update RList(11) Delete the element holding the smallest relevance score in RList(12) Insert a new element ltScore (u Q) uFIDgt in the Rlist and sort the elements in RList(13) end if(14) return(15) end if

ALGORITHM 2 SearchIndexTree (QueryVector qrarr Dictionary D TreeNode u RList)

Security and Communication Networks 9

(b) If Utype ldquoInsrdquo it means that the algorithm willinsert a document fi to the tree )e algorithmfirst creates a leaf node for fi according to themethodM1 introduced in Section 31 and insertsthis leaf node to Ts )en based on the methodM2 the algorithm updates the vectors of theinternal nodes which are placed on the path fromroot to the new leaf node and inserts these in-ternal nodes to Ts Here the algorithm prefers toreplace the fake leaf node with the new leaf noderather than insert a new leaf node Finally thealgorithm encrypts Ts and fi to generate Is and cirespectively and sends them to the cloud server

(ii) Update (IT C Is ci Utype) this algorithm is exe-cuted by the cloud server to update the index tree ITwith encrypted nodes set Is After this ifUtype ldquoDelrdquo then the algorithm removes ci fromCOtherwise the algorithm inserts ci to C

Note that after a period of insertion and deletion op-erations the number of keywords in the dictionary should bechanged Because the dimensions of the index and trapdoorvectors in the previous schemes are linear with the numberof keywords in the dictionary these schemes have to rebuildthe search index tree By contrast our scheme will not beaffected by this problem For the proposed scheme thedimensions of the vectors in the index and trapdoor aredetermined by the tool of ldquoWord2vecrdquo and set by the usersFor example if we set the dimension of the vector as 200 thedimension of each keywordrsquos vector is 200 and thus thedimensions of the vectors of umin

rarr umaxrarr qmin

rarr and qmaxrarr are all

200 According to the above analysis our scheme is moresuitable for the update operations than the previousschemes

34 SecurityAnalysis In this section we analyse the securityof the proposed SSE-DMKRS scheme according to theprivacy requirement introduced in Section 23

(1) Index and Trapdoor Privacy In the proposed schemeeach node u in the index tree and the query Q in thetrapdoor are encrypted by using the secure KNNalgorithm introduced in [32] )us the attackerscannot obtain the original vectors in the tree nodesand the query which means that the index andtrapdoor privacy are well protected

(2) Trapdoor Unlinkability In the trapdoor generationphase the query vector will be split randomlyMoreover the same keyword set Q will be extendedto bemultiple different semantic keyword setsQprime Sothe same query Q will be encrypted to be differenttrapdoors which means that the goal of the trapdoorunlinkability is achieved

(3) Keyword Privacy Since the index and the trapdoorare protected by the secure KNN algorithm theadversary cannot infer the plaintext informationfrom the index and the trapdoor under the knownciphertext model Considering that the known

background model is common in real-world appli-cations we will analyse the security of the proposedscheme under the known backgroundmodel For theTrapdoorGen algorithm the original query keywordsetQ is extended to a new setQprime Specifically for eachkeyword q inQ randomly choosing a number kprime thealgorithm chooses kprime semantic keywords related to qby utilizing the dictionary and inserts these keywordsinto the Qprime Suppose that each keyword is associatedwith N semantic keywords in the dictionary eachkeyword can generate 2N different keyword sets sinceeach semantic keyword can be chosen or not Forexample if a keyword q is associated with threesemantic keywords q1 q2 q3 then q can generate 23keyword sets q q q1 q q2 q q3 q q1 q2 qq1 q3 q q2 q3 and q q1 q2 q3 Since the queryQusually contains more than one keyword Q willgenerate more than 2N different semantic keywordsets According to this method the final similarityscore is obfuscated by these random semantic key-word sets As the analysis in [14 15] our scheme canprotect the keyword privacy under the knownbackground model

4 Proposed Scheme

In this section we analyse the proposed SSE-DMKRSscheme theoretically and experimentally A detailed ex-periment is given to demonstrate that our scheme can ef-ficiently perform dynamic ranked keywords search over theencrypted data Our experiment is run on Intelreg Coretrade i7CPU at a 290GHz processor and 16GB memory size and isbased on a real-world e-mail dataset called Enron e-maildataset [35] We mainly analyse the performance of ourscheme in two aspects (1) the efficiency of the proposedscheme including index building trapdoor generationsearch and update (2) the relationship between the searchprecision and the privacy level Moreover in order to showthe advantages of our scheme we also compare our schemeto two previous schemes related to our scheme For sim-plicity we denote these two schemes introduced in [14 15]by X15 and G19

41 Efficiency

411 Index Building )e process of index building mainlyconsists of two steps (1) creating an unencrypted index treeby utilizing Algorithm 1 (2) encrypting each node in the treeby using the secure KNN scheme In the tree building stepAlgorithm 1 generates O (n) nodes based on the documentset F Because each node has two vectors umin

rarr umaxrarr whose

dimensions are both d the vector splitting process needs O(d) time and the matrix multiplication operations take O(d times d) time in the encryption step According to these twosteps the whole time complexity of index building isO (nd2)which means that the time cost for index building mainlydepends on the number of documents in F and the di-mension of each nodersquos vector

10 Security and Communication Networks

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 4: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

(2) Privacy Preserving Similar with previous schemes[12 14 15] our scheme needs to prevent CS fromlearning extra privacy information which is inferredfrom the documents secure index and queries Moreprecisely the privacy requirement is listed as follows

Index and Trapdoor Privacy )e plaintext infor-mation concealed in the index and the trapdoorcannot be leaked to CS )is information involvesthe keywords and the corresponding vector rep-resentation of each keyword Trapdoor Unlinkability CS cannot determine

whether two trapdoors are built from the samequery

Keyword Privacy CS cannot identify whether aspecific keyword is in the trapdoor or index byanalysing the search results and the statisticalinformation of the documents

(3) Dynamic )e scheme can efficiently support dy-namic operations like documents insertion anddeletion Note that the efficiency of update opera-tions in our scheme is better than the previous SSE-DMKRS schemes

24 Word2Vec ldquoWord2Vecrdquo model is a shallow two-layerneural network which is used to convert words into a groupof vector representations [31] Under this model each wordin the document set is mapped to a vector which can be usedto calculate the similarity between words For instanceFigure 2 shows that through training a simple corpus threewords ldquodogrdquo ldquofoxrdquo and ldquoorangerdquo are mapped to three vectorrepresentations respectively By utilizing these vectors thesimilarity among these three words can be calculated Wecan find that the similarity between ldquodogrdquo and ldquofoxrdquo is morethan that between ldquodogrdquo and ldquoorangerdquo since ldquodogrdquo and ldquofoxrdquoare animals )us we can utilize ldquoWord2Vecrdquo to convert thekeywords in a corpus into a group of vector representationsand then apply these vectors to perform ranked search

25 Advice on Equations Vector space model is a verypopular method used in the field of information retrievalusually along with the TF-IDF rule to realize the top-ksearch where TF is term frequency and IDF is the inversedocument frequency [34] By utilizing the TF-IDF rule the

Table 2 Notations

F A document set f1 f2 fnn )e number of documents in FC )e encrypted form of F denoted by c1 c2 cnWi )e keyword set wi1 wi2 witi

for the document fiti )e number of keywords in Wi and i isin [1 n]wij )e jth keywords in Wi and i isin [1 n] j isin [1 ti]

Wi

rarr)e vector representation for Wi

wijrarr )e vector representation for wiju A node in the index treeurarr )e vector representation for the node uuminrarr umax

rarr Vector representations are obtained by splitting urarr

Iu )e encrypted index for the node uIT )e encrypted index tree of FQ )e keyword set q1 q2 qt for queryqj A keyword in Q j isin [1 t]qrarr )e vector representation for query Qqminrarr qmax

rarr Vector representations are obtained by splitting qrarr

qjrarr )e vector representation for qjTQ )e trapdoor of QD A dictionary w1 w2 wm containing all keywords in FM11 M12 M21 M22 Matrices for encryption (encryption key)Mminus1

11 Mminus112 Mminus1

21 Mminus122 Matrices for decryption (decryption key)

N )e number of semantic keywords associated with each dictionaryrsquos keywordm )e number of keywords in Dd )e dimension of vector generated by using ldquoWord2vecrdquok )e number of files returned to the user

Secret key

Data owners Data users

Encrypted e-mails

Encrypted index

Search results

TrapdoorSemitrusted cloud server

Figure 1 System model of the keywords search over encrypteddata

4 Security and Communication Networks

documents and queries can be represented as a group ofvectors )ese vectors can be adopted in the top-k searchover the ciphertext [12 14 15] However the dimension ofthese obtained vectors is linear with the number of words inthe dataset which is not efficient if the dataset has a lot ofwords To address this issue we will apply ldquoWord2Vecrdquo topresent a novel keywords conversion method which isdescribed as follows

(1) )rough applying ldquoWord2Vecrdquo to a corpus wecreate a dictionary in which each keyword is asso-ciated with a vector representation

(2) For the keyword set Wi wi1 wi2 witi of the

document fi we obtain a vector xirarr

wi1rarr

+ wi2rarr

+ middot middot middot +

witi

rarr by looking up the dictionary where wijrarr is a

vector representation for wij and i isin [1 n] j isin [1 ti]

After this we set Wi

rarr xi

rarrxirarr

as the vector repre-sentation of Wi

(3) For the query keyword set Q q1 q2 qt weutilize the dictionary to construct a vectorvrarr

q1rarr

+ q2rarr

+ middot middot middot + qtrarr )en we set Q

rarr v

rarr vrarr

asthe vector representation of Q

Note that the dimensions ofWi andQ are very small eg200 which is significantly smaller than the number of wordsin the dataset )us the proposed method is better than theprevious method based on the TF-IDF rule In additiontogether with the vector space model mentioned above weuse the cosine measure to evaluate the relevance between thedocument and the query )e relevant evaluation function isdefined in the next section

3 Proposed Scheme

In this section we first give the algorithms of the index treebuilding and the search algorithm on this tree )en we give

the concrete construction of our scheme and the dynamicupdate operations of our scheme Finally we give a detailedanalysis of the security of our scheme

31 Search Index Balanced Binary Tree In this section weadopt a balanced binary tree to create the search indexwhich will be used in our main scheme Inspired by theconstruction process in [14] the tree building and the searchprocess for our scheme are described as follows

311 Tree Building Process Formally the data structure ofthe tree node u is defined as ule ID umin

rarr umaxrarr

Pl Pr FIDgt where ID is the identity of the node u umin

rarr and umaxrarr are the

vector representations of the node u Pl and Pr are pointerswhich point ursquos left and right children respectively and FIDstores the identity of a document if u is a leaf node Note thatcompared with the previous index trees [12 14 15] the nodein our tree has two vectors while it has only one vector inprevious trees)emain reason is that the node vector in ourtree has a negative number while the node vector in previoustrees only contains positive number For clarity we give asimple example Let a

rarr 01 02 minus03 and

brarr

minus05 0 03 be two vectors of leaf nodes A and Brespectively For the previous index trees the vector of theparent node C of these two leaf nodes is c

rarr 01 02 03 in

which the value of each dimension is the larger value of ararr

and brarr For a query vector v

rarr minus1 0 minus1 the scores of the

nodes A B and C are 02 02 and minus04 respectively It isvery important to note that the score of the parent node isless than the scores of its children which causes the fact thatthese two leaf nodes will be ignored in the tree search processeven if they should be considered

In our index tree let the dimensions of uminrarr and umax

rarr beboth d )e methods for constructing umin

rarr and umaxrarr are

denoted by M1 and M2 respectively and given as follows

(1) M1 if the node u is a leaf node which is corre-sponding a file f we create a vector u

rarr for f byadopting the keywords conversion method men-tioned in Section 25 )en we set umin

rarr u

rarr andumaxrarr

urarr

(2) M2 if the node u is an internal node the uminrarr and

umaxrarr are based on its children vectors Let Pl middot umin

rarr

and Pl middot umaxrarr be the two vectors of ursquos left child and

let Pr middot uminrarr and Pr middot umax

rarr be the two vectors of ursquosright child

Suppose that Min () and Max () are the functions of theminimum and maximum respectively umin

rarr is built asfollows

uminrarr

[i] Min Pl middot uminrarr

[i] Pr middot uminrarr

[i]( 1113857 i isin [1 d] (1)

And umaxrarr is built as follows

umaxrarr

[i] Max Pl middot umaxrarr

[i] Pr middot umaxrarr

[i]( 1113857 i isin [1 d] (2)

We find that umaxrarr is built by utilizing the larger number

of Pl middot umaxrarr and Pr middot umax

rarr and uminrarr is created by using the

smaller number of Pl middot uminrarr and Pr middot umin

rarr

Dog

Fox

Orange

Figure 2 A vector space representation of words shows that ldquodogrdquois closer to ldquofoxrdquo since they share more common attributes thanldquodogrdquo and ldquoorangerdquo

Security and Communication Networks 5

An illustration of the above methods is given in Figure 3From Figure 3 let the node u be a leaf node and letW be thekeyword set of the file that u stores By using the keywordconversion method W is converted to be a vectorurarr

minus02 02 05 minus07 08 )en we set uminrarr

umaxrarr

urarr

a If the node u is an internal node and the vectors of itschildren are Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr and

the vectors of the internal node are uminrarr

minus02 minus02 minus03 minus07 minus05 and umaxrarr

03 07

03 minus01 03Based on the methods M1 and M2 inspired by the tree

building algorithm introduced in [14] our tree buildingalgorithm is given in Algorithm 1 An example of theproposed index tree is given in Example 1 and Figure 4 InAlgorithm 1 we use function GenID () to generate theunique identity ID for each node and apply GenFID () togenerate the unique file ID for each leaf node Current-NodeSet contains a group of nodes having no parent nodewhich are needed to be processed |CurrentNodeSet| is thenumber of nodes in CurrentNodeSet If |CurrentNodeSet| iseven we assume that |CurrentNodeSet| 2h otherwise weassume that |CurrentNodeSet| 2h+ 1 where h is a positivenumber TempNodeSet is a set containing the newly gen-erated nodes Moreover for each node u if u is a leaf nodewe use methodM1 to generate umin

rarr and umaxrarr otherwise umin

rarr

and umaxrarr are created by using M2

312 Search Process For a query vector qrarr of query Q

we spilt qrarr into two vectors qmin

rarr and qmaxrarr For each di-

mension i isin [1 d] if qrarr

[i]lt 0 qminrarr

[i] qrarr

[i] andqmaxrarr

[i] 0 otherwise qminrarr

[i] 0 and qmaxrarr

[i] qrarr

[i]Obviously qmin

rarr holds all the negative part of qrarr while qmax

rarr

holds the positive part For clarity we denote this splittingmethod for query Q byM3 )e illustration of this method isgiven in Figure 5 If the query vector q

rarr 01 minus02

03 minus04 05 then qminrarr

0 minus02 0 minus04 0 andqmaxrarr

01 0 03 0 05 For a query Q and a node u the score is calculated as

Score(u Q) uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

(3)

We can utilize the above equation to evaluate whichdocuments are the most related to the query Moreover wecan verify that the score of the parent node is larger than itschildrenrsquos score )is property can significantly reduce thenumber of nodes which will be checked in the searchprocess

)e search process is given in Algorithm 2 In Algo-rithm 2 we use RList to store the top-k files which have thek-largest relevance scores to the query )e RList is ini-tialized to be an empty list and it is updated when finding arelevance file )e kth score is defined as the smallest rel-evance score in the current RList which is initialized to be avery small integer By using the kth score we can acceleratethe search process by ignoring some paths with low scoresIn Example 1 and Figure 4 an illustration of the searchprocess is given where F f1 f2 f6 query vectors areqminrarr

0 minus03 0 and qmaxrarr

01 0 02 and d (vector di-mension) is 3

313 Example 1 An example of an index tree and a searchprocess on this tree is illustrated in Figure 4 In Figure 4 weshow an index tree with F f1 f2 f6 in which thedimension of the vector for each node is 3 For each node uin the tree the upper vector and lower vector are corre-sponding to umin

rarr and umaxrarr respectively In the tree building

process we first generate the leaf nodes from F and thencreate the internal nodes based on these leaf nodes

Moreover Figure 4 also gives an illustration of the searchprocess In Figure 4 we set q

rarr 01 minus03 02 and split it

into qminrarr

0 minus03 0 and qmaxrarr

01 0 02 We supposethat top-3 files will be returned to the data user According toAlgorithm 2 the search process begins with the root node rand calculates the score between the query Q and the twochild nodes r11 and r12 of r by using equation (3) )ecalculation process is presented as follows

Score r11 Q( 1113857 minus04 minus02 minus03 middot 00 minus03 00

+ 03 03 08 middot 01 00 02 025

Score r12 Q( 1113857 minus05 07 minus02 middot 00 minus03 00

+ 05 09 08 middot 01 00 02 0

(4)

Because the score between r11 and Q is higher than thatbetween r12 andQ Algorithm 2 will traverse the subtree withr11 as the root node and compute the score between thequery Q and two child nodes of r11 Since the score betweenr21 and Q is higher than that between r22 and Q Algorithm 2will traverse the subtree with r21 as the root node and add theleaf nodes f1 f2 to the RList After this the subtree with r22 asthe root node will be traversed and the leaf nodes f3 and f4are reached Since the number of files in RList is less than 3f3 is added to RList directly For the file f4 since the numberof files in RList equals 3 now Algorithm 2 will compare thescore between f4 and Q to the minimum score in the RListBecause the score between f4 and Q is smaller than theminimum score in the RList f4 is not added to the RList Atpresent the subtree with r11 as the root node has beentraversed Algorithm 2 will traverse the subtree with r12 asthe root node As the score between r12 andQ is smaller thanthe minimum score in the RList which means that the scoreof all child nodes of r12 is smaller than the minimum score inthe RList (this property is described in Section 312) f5 andf6 will not be checked )erefore Algorithm 2 outputsRList f1 f2 f3

32 Construction of SSE-DMKRS In this section throughcombining the secure KNN algorithm [32] and the indextree building algorithm we propose a concrete SSE-DMKRSscheme )e SSE-DMKRS scheme consists of five algo-rithms )e algorithms KeyGen DictionaryBuild andIndexBuild are executed by the data owners while the al-gorithms TrapdoorGen and Search are performed by the datausers and the cloud server respectively

(i) KeyGen (λ) given a security parameter λ this al-gorithm first randomly chooses four d times d invertiblematrices M11 M12 M21 and M22 where d is the

6 Security and Communication Networks

ndash02 02 05

05

ndash07 08

08

W

ndash02 02 05 ndash07 08 ndash02 02 ndash07

u

umin umax

(a)

ndash01 ndash02 02 ndash07 ndash05 ndash02 04 02 ndash01 03

ndash02 ndash02 ndash03 ndash01 02 03 07 03 ndash02 ndash01

ndash02 ndash02 ndash03 ndash07 ndash05 03 07 03 ndash01 03

plumin

prumin prumax

plumax

umin umax

(b)

Figure 3 An example of the vectors generation of the node u (a) MethodM1 u is a leaf nodeW is a keyword set for the file which u storesand u

rarr is a vector generated by adopting the keyword conversion method mentioned in Section 25 (b) Method M2 u is an internal nodeand Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr are the vectors of its children

Input the document collection F f1 f2 fn a semantic dictionary D generated by applying ldquoWord2Vecrdquo to FOutput the index tree T

(1) for each i isin [1 n] do(2) Construct a leaf node u for fi with uIDGenID () uPl uPr NULL uFIDGenFID (fi) and generate umin

rarr and umaxrarr

according to the method M1(3) Insert u to CurrentNodeSet(4) end for(5) while |CurrentNodeSet|ge 1 do(6) if |CurrentNodeSet| is even ie 2h then(7) for each pair of nodes uprime and uPrime in CurrentNodeSet do(8) Create a parent node u for uprime and uPrime with uIDGenID () uPl uprime u Pr uPrime uFIDNULL and set umin

rarr and umaxrarr

according to the method M2(9) Insert u to TempNodeSet(10) end for(11) else Suppose that |CurrentNodeSet| 2h+ 1(12) for each pair of nodes uprime and uPrime of the former 2hminus 2 nodes in CurrentNodeSet do(13) Create a parent node u for uprime and uPrime(14) Insert u to TempNodeSet(15) end for(16) Create a parent node u1 for the (2hminus 1)-th and (2h)-th nodes and then generate a parent node u for the (2h+ 1)-th node and u1(17) Insert u to TempNodeSet(18) end if(19) Set CurrentNodeSetTempNodeSet and clear TempNodeSet(20) end while(21) return CurrentNodeSet(22) Note that the CurrentNodeSet only contains one node which is the root of the index tree T

ALGORITHM 1 BuildIndexTree (FileSet F Dictionary D)

Security and Communication Networks 7

dimension of uminrarr and umax

rarr )en it randomlygenerates a d-bit vector S Finally it outputs thesecret key sk S M11 M12 M21 M22

(ii) DictionaryBuild (F) given the document set F

f1 f2 fn the algorithm runs ldquoWord2vecrdquo togenerate the dictionary D of F In the dictionaryD each keyword is associated with a vectorrepresentation Besides each keyword is alsocorresponding with a set of semantically relatedkeywords

(iii) IndexBuild (sk F D) given the document set F andthe dictionaryD for F the algorithm first creates theindex tree T by using the algorithm BuildIndexTree(F D) (Algorithm 1) )en for each node u in thetree T the algorithm generates two random vector

pairs Vuminprime VuminPrime1113966 1113967 and Vumax

Prime VumaxPrime1113966 1113967 for the vec-

tors of uminrarr and umax

rarr respectively More precisely ifS [i] 0 it sets Vumin

prime [i] VuminPrime [i] umin

rarr[i] and

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S [i] 1Vuminprime VuminPrime Vumaxprime VumaxPrime are set as four random

values under the constraints Vuminprime [i] Vumin

Prime [i]

uminrarr

[i] and Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] )is pro-cess is expressed as the following equation

Vuminprime [i] Vumin

Prime [i] uminrarr

[i] if S[i] 0

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S[i] 0

Vuminprime [i] + Vumin

Prime [i] uminrarr

[i] if S[i] 1

Vumaxprime [i] + Vumax

Prime [i] umaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(5)

Finally for each node u it computes Iu MT11Vuminprime 1113966

MT12VuminPrime MT

21Vumaxprime MT

22VumaxPrime )rough replacing the

plaintext vectors uminrarr and umax

rarr with the encrypted index Iuan encrypted index tree IT is created

(iv) TrapdoorGen (sk Q) given a query keyword set Qthe algorithm first extends Q to a new semantickeyword set Qprime )e process is as follows

(a) It generates a new keyword set Qprime which isinitialized to an empty set

r

ndash05 ndash02 ndash03

05 09 08

ndash04 ndash02 ndash03

03 03 08

ndash05 07 ndash02

05 09 08

ndash02

03 08

ndash02

03 03

ndash0209

05 07 08

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

05 07 08

ndash05

ndash05

ndash0209

ndash01ndash02

01 00 02

00 ndash03 0001 ndash03 02

ndash02 ndash03

qqmin

qmax

r11 r12

r21 r22

f1 f2 f3 f4

f5 f6

umin

umax

(1)

(2)

(4) (3) (5)(6)

(7)

ndash04 07

Figure 4 An example of Algorithm 1 and Algorithm 2 (example 1)

Q

01 ndash02 03 ndash04 05

01 0 03 0 050 ndash02 0 ndash04 0

q

qmin qmax

Figure 5 An example of the vector generation of the query QMethod M3 q

rarr is a vector generated by adopting the keywordconversion method mentioned in Section 25 qmin

rarr holds all thenegative part of q

rarr while qmaxrarr holds the positive part

8 Security and Communication Networks

(b) Note that each keyword in the dictionary isassociated with a group of keywords semanti-cally related to this keyword For each keyword qin Q it randomly chooses kprime semantic keywordsbased on the dictionary and inserts these key-words into Qprime where kprime is chosen dynamicallyand k isin [1 N]

)en based onQprime the TrapdoorGen algorithm generatesa pair of vectors qmin

rarr and qmaxrarr by adopting the method M3

After this it generates two random vector pairs Qqminprime QqminPrime1113966 1113967

and Qqmaxprime QqmaxPrime1113966 1113967 for the vectors of qmin

rarr and qmaxrarr respec-

tively)is process is similar to the process in the IndexBuildalgorithm and can be expressed as the following equations

Qqminprime [i] Qqmin

Prime [i] qminrarr

[i] if S[i] 0

Qqmaxprime [i] Qqmax

Prime [i] qmaxrarr

[i] if S[i] 0

Qqminprime [i] + Qqmin

Prime [i] qminrarr

[i] if S[i] 1

Qqmaxprime [i] + Qqmax

Prime [i] qmaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(6)

Finally this algorithm generates TQ Mminus111Qqminprime 1113966

Mminus112QqminPrime Mminus1

21Qqmaxprime Mminus1

22QqmaxPrime as the trapdoor for Q

(v) Search (sk TQ IT) for each node u in IT the al-gorithm computes

Iu middot TQ MT11Vuminprime middot M

minus111Qqminprime1113872 1113873 + M

T12VuminPrime middot M

minus112QqminPrime1113872 1113873

+ MT21Vumaxprime middot M

minus121Qqmaxprime1113872 1113873 + M

T22VumaxPrime middot M

minus122QqmaxPrime1113872 1113873

Vuminprime middot Qqminprime1113872 1113873 + Vumin

Prime middot QqminPrime1113872 1113873 + Vumax

prime middot Qqmaxprime1113872 1113873

+ VumaxPrime middot QqmaxPrime1113872 1113873

uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

Score(u Q)

(7)

According to equation (3) the relevance score calculatedfrom the encrypted vector Iu and the trapdoor TQ equals thevalue of Score (u Q) By using this property the algorithmcan utilize the SearchIndexTree algorithm (Algorithm 2) toperform ranked search

33 Dynamic Update Operations Besides search operationthe proposed scheme also supports some dynamic opera-tions eg documents insertion and deletion satisfying therequirement of real-world application Because the proposedscheme is built over a balanced binary tree the updateoperations are realized by modifying the nodes in the treeInspired by the update method introduced in [14 15] theupdate algorithm is presented as follows

(i) UpdateInfoGen (sk Ts fi Utype) this algorithm isexecuted by the data owners and generates the up-date information Is ci to the cloud server where Tsis a set containing all the update nodes Is is anencrypted form of Ts fi is the target document ci isan encrypted form of fi and Utype is the update typeIn order to reduce the communication cost the dataowners will store the unencrypted index tree on itsown device For the Utype isin Ins Del the algorithmworks as follows

(a) If Utype ldquoDelrdquo it means that the algorithm willdelete a document fi from the tree )e algorithmfirst finds the leaf node associated with thedocument fi and deletes it In addition internalnodes associated with this leaf node are also addedto Ts Specifically if the deletion operation willbreak the balance of the index tree the algorithmcan set the target leaf node as a fake node insteadof removing it After this the algorithm encryptsTs to generate Is Finally the algorithm sends Is tothe cloud server and sets ci as null

Input A vector qrarr of query Q a semantic dictionary D generated by applying ldquoWord2Vecrdquo to F a root node u of IndexTree and

RListOutput RList

(1) Split qrarr into qmin

rarr and qmaxrarr according to the method M3

(2) if u is an internal node then(3) if Score (u Q)gt k-th score then(4) SearchIndexTree ( q

rarr D uPl RList)(5) SearchIndexTree ( q

rarr D uPr RList)(6) else(7) return(8) end if(9) else(10) if Score (u Q)gt k-th score then Update RList(11) Delete the element holding the smallest relevance score in RList(12) Insert a new element ltScore (u Q) uFIDgt in the Rlist and sort the elements in RList(13) end if(14) return(15) end if

ALGORITHM 2 SearchIndexTree (QueryVector qrarr Dictionary D TreeNode u RList)

Security and Communication Networks 9

(b) If Utype ldquoInsrdquo it means that the algorithm willinsert a document fi to the tree )e algorithmfirst creates a leaf node for fi according to themethodM1 introduced in Section 31 and insertsthis leaf node to Ts )en based on the methodM2 the algorithm updates the vectors of theinternal nodes which are placed on the path fromroot to the new leaf node and inserts these in-ternal nodes to Ts Here the algorithm prefers toreplace the fake leaf node with the new leaf noderather than insert a new leaf node Finally thealgorithm encrypts Ts and fi to generate Is and cirespectively and sends them to the cloud server

(ii) Update (IT C Is ci Utype) this algorithm is exe-cuted by the cloud server to update the index tree ITwith encrypted nodes set Is After this ifUtype ldquoDelrdquo then the algorithm removes ci fromCOtherwise the algorithm inserts ci to C

Note that after a period of insertion and deletion op-erations the number of keywords in the dictionary should bechanged Because the dimensions of the index and trapdoorvectors in the previous schemes are linear with the numberof keywords in the dictionary these schemes have to rebuildthe search index tree By contrast our scheme will not beaffected by this problem For the proposed scheme thedimensions of the vectors in the index and trapdoor aredetermined by the tool of ldquoWord2vecrdquo and set by the usersFor example if we set the dimension of the vector as 200 thedimension of each keywordrsquos vector is 200 and thus thedimensions of the vectors of umin

rarr umaxrarr qmin

rarr and qmaxrarr are all

200 According to the above analysis our scheme is moresuitable for the update operations than the previousschemes

34 SecurityAnalysis In this section we analyse the securityof the proposed SSE-DMKRS scheme according to theprivacy requirement introduced in Section 23

(1) Index and Trapdoor Privacy In the proposed schemeeach node u in the index tree and the query Q in thetrapdoor are encrypted by using the secure KNNalgorithm introduced in [32] )us the attackerscannot obtain the original vectors in the tree nodesand the query which means that the index andtrapdoor privacy are well protected

(2) Trapdoor Unlinkability In the trapdoor generationphase the query vector will be split randomlyMoreover the same keyword set Q will be extendedto bemultiple different semantic keyword setsQprime Sothe same query Q will be encrypted to be differenttrapdoors which means that the goal of the trapdoorunlinkability is achieved

(3) Keyword Privacy Since the index and the trapdoorare protected by the secure KNN algorithm theadversary cannot infer the plaintext informationfrom the index and the trapdoor under the knownciphertext model Considering that the known

background model is common in real-world appli-cations we will analyse the security of the proposedscheme under the known backgroundmodel For theTrapdoorGen algorithm the original query keywordsetQ is extended to a new setQprime Specifically for eachkeyword q inQ randomly choosing a number kprime thealgorithm chooses kprime semantic keywords related to qby utilizing the dictionary and inserts these keywordsinto the Qprime Suppose that each keyword is associatedwith N semantic keywords in the dictionary eachkeyword can generate 2N different keyword sets sinceeach semantic keyword can be chosen or not Forexample if a keyword q is associated with threesemantic keywords q1 q2 q3 then q can generate 23keyword sets q q q1 q q2 q q3 q q1 q2 qq1 q3 q q2 q3 and q q1 q2 q3 Since the queryQusually contains more than one keyword Q willgenerate more than 2N different semantic keywordsets According to this method the final similarityscore is obfuscated by these random semantic key-word sets As the analysis in [14 15] our scheme canprotect the keyword privacy under the knownbackground model

4 Proposed Scheme

In this section we analyse the proposed SSE-DMKRSscheme theoretically and experimentally A detailed ex-periment is given to demonstrate that our scheme can ef-ficiently perform dynamic ranked keywords search over theencrypted data Our experiment is run on Intelreg Coretrade i7CPU at a 290GHz processor and 16GB memory size and isbased on a real-world e-mail dataset called Enron e-maildataset [35] We mainly analyse the performance of ourscheme in two aspects (1) the efficiency of the proposedscheme including index building trapdoor generationsearch and update (2) the relationship between the searchprecision and the privacy level Moreover in order to showthe advantages of our scheme we also compare our schemeto two previous schemes related to our scheme For sim-plicity we denote these two schemes introduced in [14 15]by X15 and G19

41 Efficiency

411 Index Building )e process of index building mainlyconsists of two steps (1) creating an unencrypted index treeby utilizing Algorithm 1 (2) encrypting each node in the treeby using the secure KNN scheme In the tree building stepAlgorithm 1 generates O (n) nodes based on the documentset F Because each node has two vectors umin

rarr umaxrarr whose

dimensions are both d the vector splitting process needs O(d) time and the matrix multiplication operations take O(d times d) time in the encryption step According to these twosteps the whole time complexity of index building isO (nd2)which means that the time cost for index building mainlydepends on the number of documents in F and the di-mension of each nodersquos vector

10 Security and Communication Networks

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 5: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

documents and queries can be represented as a group ofvectors )ese vectors can be adopted in the top-k searchover the ciphertext [12 14 15] However the dimension ofthese obtained vectors is linear with the number of words inthe dataset which is not efficient if the dataset has a lot ofwords To address this issue we will apply ldquoWord2Vecrdquo topresent a novel keywords conversion method which isdescribed as follows

(1) )rough applying ldquoWord2Vecrdquo to a corpus wecreate a dictionary in which each keyword is asso-ciated with a vector representation

(2) For the keyword set Wi wi1 wi2 witi of the

document fi we obtain a vector xirarr

wi1rarr

+ wi2rarr

+ middot middot middot +

witi

rarr by looking up the dictionary where wijrarr is a

vector representation for wij and i isin [1 n] j isin [1 ti]

After this we set Wi

rarr xi

rarrxirarr

as the vector repre-sentation of Wi

(3) For the query keyword set Q q1 q2 qt weutilize the dictionary to construct a vectorvrarr

q1rarr

+ q2rarr

+ middot middot middot + qtrarr )en we set Q

rarr v

rarr vrarr

asthe vector representation of Q

Note that the dimensions ofWi andQ are very small eg200 which is significantly smaller than the number of wordsin the dataset )us the proposed method is better than theprevious method based on the TF-IDF rule In additiontogether with the vector space model mentioned above weuse the cosine measure to evaluate the relevance between thedocument and the query )e relevant evaluation function isdefined in the next section

3 Proposed Scheme

In this section we first give the algorithms of the index treebuilding and the search algorithm on this tree )en we give

the concrete construction of our scheme and the dynamicupdate operations of our scheme Finally we give a detailedanalysis of the security of our scheme

31 Search Index Balanced Binary Tree In this section weadopt a balanced binary tree to create the search indexwhich will be used in our main scheme Inspired by theconstruction process in [14] the tree building and the searchprocess for our scheme are described as follows

311 Tree Building Process Formally the data structure ofthe tree node u is defined as ule ID umin

rarr umaxrarr

Pl Pr FIDgt where ID is the identity of the node u umin

rarr and umaxrarr are the

vector representations of the node u Pl and Pr are pointerswhich point ursquos left and right children respectively and FIDstores the identity of a document if u is a leaf node Note thatcompared with the previous index trees [12 14 15] the nodein our tree has two vectors while it has only one vector inprevious trees)emain reason is that the node vector in ourtree has a negative number while the node vector in previoustrees only contains positive number For clarity we give asimple example Let a

rarr 01 02 minus03 and

brarr

minus05 0 03 be two vectors of leaf nodes A and Brespectively For the previous index trees the vector of theparent node C of these two leaf nodes is c

rarr 01 02 03 in

which the value of each dimension is the larger value of ararr

and brarr For a query vector v

rarr minus1 0 minus1 the scores of the

nodes A B and C are 02 02 and minus04 respectively It isvery important to note that the score of the parent node isless than the scores of its children which causes the fact thatthese two leaf nodes will be ignored in the tree search processeven if they should be considered

In our index tree let the dimensions of uminrarr and umax

rarr beboth d )e methods for constructing umin

rarr and umaxrarr are

denoted by M1 and M2 respectively and given as follows

(1) M1 if the node u is a leaf node which is corre-sponding a file f we create a vector u

rarr for f byadopting the keywords conversion method men-tioned in Section 25 )en we set umin

rarr u

rarr andumaxrarr

urarr

(2) M2 if the node u is an internal node the uminrarr and

umaxrarr are based on its children vectors Let Pl middot umin

rarr

and Pl middot umaxrarr be the two vectors of ursquos left child and

let Pr middot uminrarr and Pr middot umax

rarr be the two vectors of ursquosright child

Suppose that Min () and Max () are the functions of theminimum and maximum respectively umin

rarr is built asfollows

uminrarr

[i] Min Pl middot uminrarr

[i] Pr middot uminrarr

[i]( 1113857 i isin [1 d] (1)

And umaxrarr is built as follows

umaxrarr

[i] Max Pl middot umaxrarr

[i] Pr middot umaxrarr

[i]( 1113857 i isin [1 d] (2)

We find that umaxrarr is built by utilizing the larger number

of Pl middot umaxrarr and Pr middot umax

rarr and uminrarr is created by using the

smaller number of Pl middot uminrarr and Pr middot umin

rarr

Dog

Fox

Orange

Figure 2 A vector space representation of words shows that ldquodogrdquois closer to ldquofoxrdquo since they share more common attributes thanldquodogrdquo and ldquoorangerdquo

Security and Communication Networks 5

An illustration of the above methods is given in Figure 3From Figure 3 let the node u be a leaf node and letW be thekeyword set of the file that u stores By using the keywordconversion method W is converted to be a vectorurarr

minus02 02 05 minus07 08 )en we set uminrarr

umaxrarr

urarr

a If the node u is an internal node and the vectors of itschildren are Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr and

the vectors of the internal node are uminrarr

minus02 minus02 minus03 minus07 minus05 and umaxrarr

03 07

03 minus01 03Based on the methods M1 and M2 inspired by the tree

building algorithm introduced in [14] our tree buildingalgorithm is given in Algorithm 1 An example of theproposed index tree is given in Example 1 and Figure 4 InAlgorithm 1 we use function GenID () to generate theunique identity ID for each node and apply GenFID () togenerate the unique file ID for each leaf node Current-NodeSet contains a group of nodes having no parent nodewhich are needed to be processed |CurrentNodeSet| is thenumber of nodes in CurrentNodeSet If |CurrentNodeSet| iseven we assume that |CurrentNodeSet| 2h otherwise weassume that |CurrentNodeSet| 2h+ 1 where h is a positivenumber TempNodeSet is a set containing the newly gen-erated nodes Moreover for each node u if u is a leaf nodewe use methodM1 to generate umin

rarr and umaxrarr otherwise umin

rarr

and umaxrarr are created by using M2

312 Search Process For a query vector qrarr of query Q

we spilt qrarr into two vectors qmin

rarr and qmaxrarr For each di-

mension i isin [1 d] if qrarr

[i]lt 0 qminrarr

[i] qrarr

[i] andqmaxrarr

[i] 0 otherwise qminrarr

[i] 0 and qmaxrarr

[i] qrarr

[i]Obviously qmin

rarr holds all the negative part of qrarr while qmax

rarr

holds the positive part For clarity we denote this splittingmethod for query Q byM3 )e illustration of this method isgiven in Figure 5 If the query vector q

rarr 01 minus02

03 minus04 05 then qminrarr

0 minus02 0 minus04 0 andqmaxrarr

01 0 03 0 05 For a query Q and a node u the score is calculated as

Score(u Q) uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

(3)

We can utilize the above equation to evaluate whichdocuments are the most related to the query Moreover wecan verify that the score of the parent node is larger than itschildrenrsquos score )is property can significantly reduce thenumber of nodes which will be checked in the searchprocess

)e search process is given in Algorithm 2 In Algo-rithm 2 we use RList to store the top-k files which have thek-largest relevance scores to the query )e RList is ini-tialized to be an empty list and it is updated when finding arelevance file )e kth score is defined as the smallest rel-evance score in the current RList which is initialized to be avery small integer By using the kth score we can acceleratethe search process by ignoring some paths with low scoresIn Example 1 and Figure 4 an illustration of the searchprocess is given where F f1 f2 f6 query vectors areqminrarr

0 minus03 0 and qmaxrarr

01 0 02 and d (vector di-mension) is 3

313 Example 1 An example of an index tree and a searchprocess on this tree is illustrated in Figure 4 In Figure 4 weshow an index tree with F f1 f2 f6 in which thedimension of the vector for each node is 3 For each node uin the tree the upper vector and lower vector are corre-sponding to umin

rarr and umaxrarr respectively In the tree building

process we first generate the leaf nodes from F and thencreate the internal nodes based on these leaf nodes

Moreover Figure 4 also gives an illustration of the searchprocess In Figure 4 we set q

rarr 01 minus03 02 and split it

into qminrarr

0 minus03 0 and qmaxrarr

01 0 02 We supposethat top-3 files will be returned to the data user According toAlgorithm 2 the search process begins with the root node rand calculates the score between the query Q and the twochild nodes r11 and r12 of r by using equation (3) )ecalculation process is presented as follows

Score r11 Q( 1113857 minus04 minus02 minus03 middot 00 minus03 00

+ 03 03 08 middot 01 00 02 025

Score r12 Q( 1113857 minus05 07 minus02 middot 00 minus03 00

+ 05 09 08 middot 01 00 02 0

(4)

Because the score between r11 and Q is higher than thatbetween r12 andQ Algorithm 2 will traverse the subtree withr11 as the root node and compute the score between thequery Q and two child nodes of r11 Since the score betweenr21 and Q is higher than that between r22 and Q Algorithm 2will traverse the subtree with r21 as the root node and add theleaf nodes f1 f2 to the RList After this the subtree with r22 asthe root node will be traversed and the leaf nodes f3 and f4are reached Since the number of files in RList is less than 3f3 is added to RList directly For the file f4 since the numberof files in RList equals 3 now Algorithm 2 will compare thescore between f4 and Q to the minimum score in the RListBecause the score between f4 and Q is smaller than theminimum score in the RList f4 is not added to the RList Atpresent the subtree with r11 as the root node has beentraversed Algorithm 2 will traverse the subtree with r12 asthe root node As the score between r12 andQ is smaller thanthe minimum score in the RList which means that the scoreof all child nodes of r12 is smaller than the minimum score inthe RList (this property is described in Section 312) f5 andf6 will not be checked )erefore Algorithm 2 outputsRList f1 f2 f3

32 Construction of SSE-DMKRS In this section throughcombining the secure KNN algorithm [32] and the indextree building algorithm we propose a concrete SSE-DMKRSscheme )e SSE-DMKRS scheme consists of five algo-rithms )e algorithms KeyGen DictionaryBuild andIndexBuild are executed by the data owners while the al-gorithms TrapdoorGen and Search are performed by the datausers and the cloud server respectively

(i) KeyGen (λ) given a security parameter λ this al-gorithm first randomly chooses four d times d invertiblematrices M11 M12 M21 and M22 where d is the

6 Security and Communication Networks

ndash02 02 05

05

ndash07 08

08

W

ndash02 02 05 ndash07 08 ndash02 02 ndash07

u

umin umax

(a)

ndash01 ndash02 02 ndash07 ndash05 ndash02 04 02 ndash01 03

ndash02 ndash02 ndash03 ndash01 02 03 07 03 ndash02 ndash01

ndash02 ndash02 ndash03 ndash07 ndash05 03 07 03 ndash01 03

plumin

prumin prumax

plumax

umin umax

(b)

Figure 3 An example of the vectors generation of the node u (a) MethodM1 u is a leaf nodeW is a keyword set for the file which u storesand u

rarr is a vector generated by adopting the keyword conversion method mentioned in Section 25 (b) Method M2 u is an internal nodeand Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr are the vectors of its children

Input the document collection F f1 f2 fn a semantic dictionary D generated by applying ldquoWord2Vecrdquo to FOutput the index tree T

(1) for each i isin [1 n] do(2) Construct a leaf node u for fi with uIDGenID () uPl uPr NULL uFIDGenFID (fi) and generate umin

rarr and umaxrarr

according to the method M1(3) Insert u to CurrentNodeSet(4) end for(5) while |CurrentNodeSet|ge 1 do(6) if |CurrentNodeSet| is even ie 2h then(7) for each pair of nodes uprime and uPrime in CurrentNodeSet do(8) Create a parent node u for uprime and uPrime with uIDGenID () uPl uprime u Pr uPrime uFIDNULL and set umin

rarr and umaxrarr

according to the method M2(9) Insert u to TempNodeSet(10) end for(11) else Suppose that |CurrentNodeSet| 2h+ 1(12) for each pair of nodes uprime and uPrime of the former 2hminus 2 nodes in CurrentNodeSet do(13) Create a parent node u for uprime and uPrime(14) Insert u to TempNodeSet(15) end for(16) Create a parent node u1 for the (2hminus 1)-th and (2h)-th nodes and then generate a parent node u for the (2h+ 1)-th node and u1(17) Insert u to TempNodeSet(18) end if(19) Set CurrentNodeSetTempNodeSet and clear TempNodeSet(20) end while(21) return CurrentNodeSet(22) Note that the CurrentNodeSet only contains one node which is the root of the index tree T

ALGORITHM 1 BuildIndexTree (FileSet F Dictionary D)

Security and Communication Networks 7

dimension of uminrarr and umax

rarr )en it randomlygenerates a d-bit vector S Finally it outputs thesecret key sk S M11 M12 M21 M22

(ii) DictionaryBuild (F) given the document set F

f1 f2 fn the algorithm runs ldquoWord2vecrdquo togenerate the dictionary D of F In the dictionaryD each keyword is associated with a vectorrepresentation Besides each keyword is alsocorresponding with a set of semantically relatedkeywords

(iii) IndexBuild (sk F D) given the document set F andthe dictionaryD for F the algorithm first creates theindex tree T by using the algorithm BuildIndexTree(F D) (Algorithm 1) )en for each node u in thetree T the algorithm generates two random vector

pairs Vuminprime VuminPrime1113966 1113967 and Vumax

Prime VumaxPrime1113966 1113967 for the vec-

tors of uminrarr and umax

rarr respectively More precisely ifS [i] 0 it sets Vumin

prime [i] VuminPrime [i] umin

rarr[i] and

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S [i] 1Vuminprime VuminPrime Vumaxprime VumaxPrime are set as four random

values under the constraints Vuminprime [i] Vumin

Prime [i]

uminrarr

[i] and Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] )is pro-cess is expressed as the following equation

Vuminprime [i] Vumin

Prime [i] uminrarr

[i] if S[i] 0

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S[i] 0

Vuminprime [i] + Vumin

Prime [i] uminrarr

[i] if S[i] 1

Vumaxprime [i] + Vumax

Prime [i] umaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(5)

Finally for each node u it computes Iu MT11Vuminprime 1113966

MT12VuminPrime MT

21Vumaxprime MT

22VumaxPrime )rough replacing the

plaintext vectors uminrarr and umax

rarr with the encrypted index Iuan encrypted index tree IT is created

(iv) TrapdoorGen (sk Q) given a query keyword set Qthe algorithm first extends Q to a new semantickeyword set Qprime )e process is as follows

(a) It generates a new keyword set Qprime which isinitialized to an empty set

r

ndash05 ndash02 ndash03

05 09 08

ndash04 ndash02 ndash03

03 03 08

ndash05 07 ndash02

05 09 08

ndash02

03 08

ndash02

03 03

ndash0209

05 07 08

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

05 07 08

ndash05

ndash05

ndash0209

ndash01ndash02

01 00 02

00 ndash03 0001 ndash03 02

ndash02 ndash03

qqmin

qmax

r11 r12

r21 r22

f1 f2 f3 f4

f5 f6

umin

umax

(1)

(2)

(4) (3) (5)(6)

(7)

ndash04 07

Figure 4 An example of Algorithm 1 and Algorithm 2 (example 1)

Q

01 ndash02 03 ndash04 05

01 0 03 0 050 ndash02 0 ndash04 0

q

qmin qmax

Figure 5 An example of the vector generation of the query QMethod M3 q

rarr is a vector generated by adopting the keywordconversion method mentioned in Section 25 qmin

rarr holds all thenegative part of q

rarr while qmaxrarr holds the positive part

8 Security and Communication Networks

(b) Note that each keyword in the dictionary isassociated with a group of keywords semanti-cally related to this keyword For each keyword qin Q it randomly chooses kprime semantic keywordsbased on the dictionary and inserts these key-words into Qprime where kprime is chosen dynamicallyand k isin [1 N]

)en based onQprime the TrapdoorGen algorithm generatesa pair of vectors qmin

rarr and qmaxrarr by adopting the method M3

After this it generates two random vector pairs Qqminprime QqminPrime1113966 1113967

and Qqmaxprime QqmaxPrime1113966 1113967 for the vectors of qmin

rarr and qmaxrarr respec-

tively)is process is similar to the process in the IndexBuildalgorithm and can be expressed as the following equations

Qqminprime [i] Qqmin

Prime [i] qminrarr

[i] if S[i] 0

Qqmaxprime [i] Qqmax

Prime [i] qmaxrarr

[i] if S[i] 0

Qqminprime [i] + Qqmin

Prime [i] qminrarr

[i] if S[i] 1

Qqmaxprime [i] + Qqmax

Prime [i] qmaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(6)

Finally this algorithm generates TQ Mminus111Qqminprime 1113966

Mminus112QqminPrime Mminus1

21Qqmaxprime Mminus1

22QqmaxPrime as the trapdoor for Q

(v) Search (sk TQ IT) for each node u in IT the al-gorithm computes

Iu middot TQ MT11Vuminprime middot M

minus111Qqminprime1113872 1113873 + M

T12VuminPrime middot M

minus112QqminPrime1113872 1113873

+ MT21Vumaxprime middot M

minus121Qqmaxprime1113872 1113873 + M

T22VumaxPrime middot M

minus122QqmaxPrime1113872 1113873

Vuminprime middot Qqminprime1113872 1113873 + Vumin

Prime middot QqminPrime1113872 1113873 + Vumax

prime middot Qqmaxprime1113872 1113873

+ VumaxPrime middot QqmaxPrime1113872 1113873

uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

Score(u Q)

(7)

According to equation (3) the relevance score calculatedfrom the encrypted vector Iu and the trapdoor TQ equals thevalue of Score (u Q) By using this property the algorithmcan utilize the SearchIndexTree algorithm (Algorithm 2) toperform ranked search

33 Dynamic Update Operations Besides search operationthe proposed scheme also supports some dynamic opera-tions eg documents insertion and deletion satisfying therequirement of real-world application Because the proposedscheme is built over a balanced binary tree the updateoperations are realized by modifying the nodes in the treeInspired by the update method introduced in [14 15] theupdate algorithm is presented as follows

(i) UpdateInfoGen (sk Ts fi Utype) this algorithm isexecuted by the data owners and generates the up-date information Is ci to the cloud server where Tsis a set containing all the update nodes Is is anencrypted form of Ts fi is the target document ci isan encrypted form of fi and Utype is the update typeIn order to reduce the communication cost the dataowners will store the unencrypted index tree on itsown device For the Utype isin Ins Del the algorithmworks as follows

(a) If Utype ldquoDelrdquo it means that the algorithm willdelete a document fi from the tree )e algorithmfirst finds the leaf node associated with thedocument fi and deletes it In addition internalnodes associated with this leaf node are also addedto Ts Specifically if the deletion operation willbreak the balance of the index tree the algorithmcan set the target leaf node as a fake node insteadof removing it After this the algorithm encryptsTs to generate Is Finally the algorithm sends Is tothe cloud server and sets ci as null

Input A vector qrarr of query Q a semantic dictionary D generated by applying ldquoWord2Vecrdquo to F a root node u of IndexTree and

RListOutput RList

(1) Split qrarr into qmin

rarr and qmaxrarr according to the method M3

(2) if u is an internal node then(3) if Score (u Q)gt k-th score then(4) SearchIndexTree ( q

rarr D uPl RList)(5) SearchIndexTree ( q

rarr D uPr RList)(6) else(7) return(8) end if(9) else(10) if Score (u Q)gt k-th score then Update RList(11) Delete the element holding the smallest relevance score in RList(12) Insert a new element ltScore (u Q) uFIDgt in the Rlist and sort the elements in RList(13) end if(14) return(15) end if

ALGORITHM 2 SearchIndexTree (QueryVector qrarr Dictionary D TreeNode u RList)

Security and Communication Networks 9

(b) If Utype ldquoInsrdquo it means that the algorithm willinsert a document fi to the tree )e algorithmfirst creates a leaf node for fi according to themethodM1 introduced in Section 31 and insertsthis leaf node to Ts )en based on the methodM2 the algorithm updates the vectors of theinternal nodes which are placed on the path fromroot to the new leaf node and inserts these in-ternal nodes to Ts Here the algorithm prefers toreplace the fake leaf node with the new leaf noderather than insert a new leaf node Finally thealgorithm encrypts Ts and fi to generate Is and cirespectively and sends them to the cloud server

(ii) Update (IT C Is ci Utype) this algorithm is exe-cuted by the cloud server to update the index tree ITwith encrypted nodes set Is After this ifUtype ldquoDelrdquo then the algorithm removes ci fromCOtherwise the algorithm inserts ci to C

Note that after a period of insertion and deletion op-erations the number of keywords in the dictionary should bechanged Because the dimensions of the index and trapdoorvectors in the previous schemes are linear with the numberof keywords in the dictionary these schemes have to rebuildthe search index tree By contrast our scheme will not beaffected by this problem For the proposed scheme thedimensions of the vectors in the index and trapdoor aredetermined by the tool of ldquoWord2vecrdquo and set by the usersFor example if we set the dimension of the vector as 200 thedimension of each keywordrsquos vector is 200 and thus thedimensions of the vectors of umin

rarr umaxrarr qmin

rarr and qmaxrarr are all

200 According to the above analysis our scheme is moresuitable for the update operations than the previousschemes

34 SecurityAnalysis In this section we analyse the securityof the proposed SSE-DMKRS scheme according to theprivacy requirement introduced in Section 23

(1) Index and Trapdoor Privacy In the proposed schemeeach node u in the index tree and the query Q in thetrapdoor are encrypted by using the secure KNNalgorithm introduced in [32] )us the attackerscannot obtain the original vectors in the tree nodesand the query which means that the index andtrapdoor privacy are well protected

(2) Trapdoor Unlinkability In the trapdoor generationphase the query vector will be split randomlyMoreover the same keyword set Q will be extendedto bemultiple different semantic keyword setsQprime Sothe same query Q will be encrypted to be differenttrapdoors which means that the goal of the trapdoorunlinkability is achieved

(3) Keyword Privacy Since the index and the trapdoorare protected by the secure KNN algorithm theadversary cannot infer the plaintext informationfrom the index and the trapdoor under the knownciphertext model Considering that the known

background model is common in real-world appli-cations we will analyse the security of the proposedscheme under the known backgroundmodel For theTrapdoorGen algorithm the original query keywordsetQ is extended to a new setQprime Specifically for eachkeyword q inQ randomly choosing a number kprime thealgorithm chooses kprime semantic keywords related to qby utilizing the dictionary and inserts these keywordsinto the Qprime Suppose that each keyword is associatedwith N semantic keywords in the dictionary eachkeyword can generate 2N different keyword sets sinceeach semantic keyword can be chosen or not Forexample if a keyword q is associated with threesemantic keywords q1 q2 q3 then q can generate 23keyword sets q q q1 q q2 q q3 q q1 q2 qq1 q3 q q2 q3 and q q1 q2 q3 Since the queryQusually contains more than one keyword Q willgenerate more than 2N different semantic keywordsets According to this method the final similarityscore is obfuscated by these random semantic key-word sets As the analysis in [14 15] our scheme canprotect the keyword privacy under the knownbackground model

4 Proposed Scheme

In this section we analyse the proposed SSE-DMKRSscheme theoretically and experimentally A detailed ex-periment is given to demonstrate that our scheme can ef-ficiently perform dynamic ranked keywords search over theencrypted data Our experiment is run on Intelreg Coretrade i7CPU at a 290GHz processor and 16GB memory size and isbased on a real-world e-mail dataset called Enron e-maildataset [35] We mainly analyse the performance of ourscheme in two aspects (1) the efficiency of the proposedscheme including index building trapdoor generationsearch and update (2) the relationship between the searchprecision and the privacy level Moreover in order to showthe advantages of our scheme we also compare our schemeto two previous schemes related to our scheme For sim-plicity we denote these two schemes introduced in [14 15]by X15 and G19

41 Efficiency

411 Index Building )e process of index building mainlyconsists of two steps (1) creating an unencrypted index treeby utilizing Algorithm 1 (2) encrypting each node in the treeby using the secure KNN scheme In the tree building stepAlgorithm 1 generates O (n) nodes based on the documentset F Because each node has two vectors umin

rarr umaxrarr whose

dimensions are both d the vector splitting process needs O(d) time and the matrix multiplication operations take O(d times d) time in the encryption step According to these twosteps the whole time complexity of index building isO (nd2)which means that the time cost for index building mainlydepends on the number of documents in F and the di-mension of each nodersquos vector

10 Security and Communication Networks

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 6: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

An illustration of the above methods is given in Figure 3From Figure 3 let the node u be a leaf node and letW be thekeyword set of the file that u stores By using the keywordconversion method W is converted to be a vectorurarr

minus02 02 05 minus07 08 )en we set uminrarr

umaxrarr

urarr

a If the node u is an internal node and the vectors of itschildren are Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr and

the vectors of the internal node are uminrarr

minus02 minus02 minus03 minus07 minus05 and umaxrarr

03 07

03 minus01 03Based on the methods M1 and M2 inspired by the tree

building algorithm introduced in [14] our tree buildingalgorithm is given in Algorithm 1 An example of theproposed index tree is given in Example 1 and Figure 4 InAlgorithm 1 we use function GenID () to generate theunique identity ID for each node and apply GenFID () togenerate the unique file ID for each leaf node Current-NodeSet contains a group of nodes having no parent nodewhich are needed to be processed |CurrentNodeSet| is thenumber of nodes in CurrentNodeSet If |CurrentNodeSet| iseven we assume that |CurrentNodeSet| 2h otherwise weassume that |CurrentNodeSet| 2h+ 1 where h is a positivenumber TempNodeSet is a set containing the newly gen-erated nodes Moreover for each node u if u is a leaf nodewe use methodM1 to generate umin

rarr and umaxrarr otherwise umin

rarr

and umaxrarr are created by using M2

312 Search Process For a query vector qrarr of query Q

we spilt qrarr into two vectors qmin

rarr and qmaxrarr For each di-

mension i isin [1 d] if qrarr

[i]lt 0 qminrarr

[i] qrarr

[i] andqmaxrarr

[i] 0 otherwise qminrarr

[i] 0 and qmaxrarr

[i] qrarr

[i]Obviously qmin

rarr holds all the negative part of qrarr while qmax

rarr

holds the positive part For clarity we denote this splittingmethod for query Q byM3 )e illustration of this method isgiven in Figure 5 If the query vector q

rarr 01 minus02

03 minus04 05 then qminrarr

0 minus02 0 minus04 0 andqmaxrarr

01 0 03 0 05 For a query Q and a node u the score is calculated as

Score(u Q) uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

(3)

We can utilize the above equation to evaluate whichdocuments are the most related to the query Moreover wecan verify that the score of the parent node is larger than itschildrenrsquos score )is property can significantly reduce thenumber of nodes which will be checked in the searchprocess

)e search process is given in Algorithm 2 In Algo-rithm 2 we use RList to store the top-k files which have thek-largest relevance scores to the query )e RList is ini-tialized to be an empty list and it is updated when finding arelevance file )e kth score is defined as the smallest rel-evance score in the current RList which is initialized to be avery small integer By using the kth score we can acceleratethe search process by ignoring some paths with low scoresIn Example 1 and Figure 4 an illustration of the searchprocess is given where F f1 f2 f6 query vectors areqminrarr

0 minus03 0 and qmaxrarr

01 0 02 and d (vector di-mension) is 3

313 Example 1 An example of an index tree and a searchprocess on this tree is illustrated in Figure 4 In Figure 4 weshow an index tree with F f1 f2 f6 in which thedimension of the vector for each node is 3 For each node uin the tree the upper vector and lower vector are corre-sponding to umin

rarr and umaxrarr respectively In the tree building

process we first generate the leaf nodes from F and thencreate the internal nodes based on these leaf nodes

Moreover Figure 4 also gives an illustration of the searchprocess In Figure 4 we set q

rarr 01 minus03 02 and split it

into qminrarr

0 minus03 0 and qmaxrarr

01 0 02 We supposethat top-3 files will be returned to the data user According toAlgorithm 2 the search process begins with the root node rand calculates the score between the query Q and the twochild nodes r11 and r12 of r by using equation (3) )ecalculation process is presented as follows

Score r11 Q( 1113857 minus04 minus02 minus03 middot 00 minus03 00

+ 03 03 08 middot 01 00 02 025

Score r12 Q( 1113857 minus05 07 minus02 middot 00 minus03 00

+ 05 09 08 middot 01 00 02 0

(4)

Because the score between r11 and Q is higher than thatbetween r12 andQ Algorithm 2 will traverse the subtree withr11 as the root node and compute the score between thequery Q and two child nodes of r11 Since the score betweenr21 and Q is higher than that between r22 and Q Algorithm 2will traverse the subtree with r21 as the root node and add theleaf nodes f1 f2 to the RList After this the subtree with r22 asthe root node will be traversed and the leaf nodes f3 and f4are reached Since the number of files in RList is less than 3f3 is added to RList directly For the file f4 since the numberof files in RList equals 3 now Algorithm 2 will compare thescore between f4 and Q to the minimum score in the RListBecause the score between f4 and Q is smaller than theminimum score in the RList f4 is not added to the RList Atpresent the subtree with r11 as the root node has beentraversed Algorithm 2 will traverse the subtree with r12 asthe root node As the score between r12 andQ is smaller thanthe minimum score in the RList which means that the scoreof all child nodes of r12 is smaller than the minimum score inthe RList (this property is described in Section 312) f5 andf6 will not be checked )erefore Algorithm 2 outputsRList f1 f2 f3

32 Construction of SSE-DMKRS In this section throughcombining the secure KNN algorithm [32] and the indextree building algorithm we propose a concrete SSE-DMKRSscheme )e SSE-DMKRS scheme consists of five algo-rithms )e algorithms KeyGen DictionaryBuild andIndexBuild are executed by the data owners while the al-gorithms TrapdoorGen and Search are performed by the datausers and the cloud server respectively

(i) KeyGen (λ) given a security parameter λ this al-gorithm first randomly chooses four d times d invertiblematrices M11 M12 M21 and M22 where d is the

6 Security and Communication Networks

ndash02 02 05

05

ndash07 08

08

W

ndash02 02 05 ndash07 08 ndash02 02 ndash07

u

umin umax

(a)

ndash01 ndash02 02 ndash07 ndash05 ndash02 04 02 ndash01 03

ndash02 ndash02 ndash03 ndash01 02 03 07 03 ndash02 ndash01

ndash02 ndash02 ndash03 ndash07 ndash05 03 07 03 ndash01 03

plumin

prumin prumax

plumax

umin umax

(b)

Figure 3 An example of the vectors generation of the node u (a) MethodM1 u is a leaf nodeW is a keyword set for the file which u storesand u

rarr is a vector generated by adopting the keyword conversion method mentioned in Section 25 (b) Method M2 u is an internal nodeand Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr are the vectors of its children

Input the document collection F f1 f2 fn a semantic dictionary D generated by applying ldquoWord2Vecrdquo to FOutput the index tree T

(1) for each i isin [1 n] do(2) Construct a leaf node u for fi with uIDGenID () uPl uPr NULL uFIDGenFID (fi) and generate umin

rarr and umaxrarr

according to the method M1(3) Insert u to CurrentNodeSet(4) end for(5) while |CurrentNodeSet|ge 1 do(6) if |CurrentNodeSet| is even ie 2h then(7) for each pair of nodes uprime and uPrime in CurrentNodeSet do(8) Create a parent node u for uprime and uPrime with uIDGenID () uPl uprime u Pr uPrime uFIDNULL and set umin

rarr and umaxrarr

according to the method M2(9) Insert u to TempNodeSet(10) end for(11) else Suppose that |CurrentNodeSet| 2h+ 1(12) for each pair of nodes uprime and uPrime of the former 2hminus 2 nodes in CurrentNodeSet do(13) Create a parent node u for uprime and uPrime(14) Insert u to TempNodeSet(15) end for(16) Create a parent node u1 for the (2hminus 1)-th and (2h)-th nodes and then generate a parent node u for the (2h+ 1)-th node and u1(17) Insert u to TempNodeSet(18) end if(19) Set CurrentNodeSetTempNodeSet and clear TempNodeSet(20) end while(21) return CurrentNodeSet(22) Note that the CurrentNodeSet only contains one node which is the root of the index tree T

ALGORITHM 1 BuildIndexTree (FileSet F Dictionary D)

Security and Communication Networks 7

dimension of uminrarr and umax

rarr )en it randomlygenerates a d-bit vector S Finally it outputs thesecret key sk S M11 M12 M21 M22

(ii) DictionaryBuild (F) given the document set F

f1 f2 fn the algorithm runs ldquoWord2vecrdquo togenerate the dictionary D of F In the dictionaryD each keyword is associated with a vectorrepresentation Besides each keyword is alsocorresponding with a set of semantically relatedkeywords

(iii) IndexBuild (sk F D) given the document set F andthe dictionaryD for F the algorithm first creates theindex tree T by using the algorithm BuildIndexTree(F D) (Algorithm 1) )en for each node u in thetree T the algorithm generates two random vector

pairs Vuminprime VuminPrime1113966 1113967 and Vumax

Prime VumaxPrime1113966 1113967 for the vec-

tors of uminrarr and umax

rarr respectively More precisely ifS [i] 0 it sets Vumin

prime [i] VuminPrime [i] umin

rarr[i] and

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S [i] 1Vuminprime VuminPrime Vumaxprime VumaxPrime are set as four random

values under the constraints Vuminprime [i] Vumin

Prime [i]

uminrarr

[i] and Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] )is pro-cess is expressed as the following equation

Vuminprime [i] Vumin

Prime [i] uminrarr

[i] if S[i] 0

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S[i] 0

Vuminprime [i] + Vumin

Prime [i] uminrarr

[i] if S[i] 1

Vumaxprime [i] + Vumax

Prime [i] umaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(5)

Finally for each node u it computes Iu MT11Vuminprime 1113966

MT12VuminPrime MT

21Vumaxprime MT

22VumaxPrime )rough replacing the

plaintext vectors uminrarr and umax

rarr with the encrypted index Iuan encrypted index tree IT is created

(iv) TrapdoorGen (sk Q) given a query keyword set Qthe algorithm first extends Q to a new semantickeyword set Qprime )e process is as follows

(a) It generates a new keyword set Qprime which isinitialized to an empty set

r

ndash05 ndash02 ndash03

05 09 08

ndash04 ndash02 ndash03

03 03 08

ndash05 07 ndash02

05 09 08

ndash02

03 08

ndash02

03 03

ndash0209

05 07 08

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

05 07 08

ndash05

ndash05

ndash0209

ndash01ndash02

01 00 02

00 ndash03 0001 ndash03 02

ndash02 ndash03

qqmin

qmax

r11 r12

r21 r22

f1 f2 f3 f4

f5 f6

umin

umax

(1)

(2)

(4) (3) (5)(6)

(7)

ndash04 07

Figure 4 An example of Algorithm 1 and Algorithm 2 (example 1)

Q

01 ndash02 03 ndash04 05

01 0 03 0 050 ndash02 0 ndash04 0

q

qmin qmax

Figure 5 An example of the vector generation of the query QMethod M3 q

rarr is a vector generated by adopting the keywordconversion method mentioned in Section 25 qmin

rarr holds all thenegative part of q

rarr while qmaxrarr holds the positive part

8 Security and Communication Networks

(b) Note that each keyword in the dictionary isassociated with a group of keywords semanti-cally related to this keyword For each keyword qin Q it randomly chooses kprime semantic keywordsbased on the dictionary and inserts these key-words into Qprime where kprime is chosen dynamicallyand k isin [1 N]

)en based onQprime the TrapdoorGen algorithm generatesa pair of vectors qmin

rarr and qmaxrarr by adopting the method M3

After this it generates two random vector pairs Qqminprime QqminPrime1113966 1113967

and Qqmaxprime QqmaxPrime1113966 1113967 for the vectors of qmin

rarr and qmaxrarr respec-

tively)is process is similar to the process in the IndexBuildalgorithm and can be expressed as the following equations

Qqminprime [i] Qqmin

Prime [i] qminrarr

[i] if S[i] 0

Qqmaxprime [i] Qqmax

Prime [i] qmaxrarr

[i] if S[i] 0

Qqminprime [i] + Qqmin

Prime [i] qminrarr

[i] if S[i] 1

Qqmaxprime [i] + Qqmax

Prime [i] qmaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(6)

Finally this algorithm generates TQ Mminus111Qqminprime 1113966

Mminus112QqminPrime Mminus1

21Qqmaxprime Mminus1

22QqmaxPrime as the trapdoor for Q

(v) Search (sk TQ IT) for each node u in IT the al-gorithm computes

Iu middot TQ MT11Vuminprime middot M

minus111Qqminprime1113872 1113873 + M

T12VuminPrime middot M

minus112QqminPrime1113872 1113873

+ MT21Vumaxprime middot M

minus121Qqmaxprime1113872 1113873 + M

T22VumaxPrime middot M

minus122QqmaxPrime1113872 1113873

Vuminprime middot Qqminprime1113872 1113873 + Vumin

Prime middot QqminPrime1113872 1113873 + Vumax

prime middot Qqmaxprime1113872 1113873

+ VumaxPrime middot QqmaxPrime1113872 1113873

uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

Score(u Q)

(7)

According to equation (3) the relevance score calculatedfrom the encrypted vector Iu and the trapdoor TQ equals thevalue of Score (u Q) By using this property the algorithmcan utilize the SearchIndexTree algorithm (Algorithm 2) toperform ranked search

33 Dynamic Update Operations Besides search operationthe proposed scheme also supports some dynamic opera-tions eg documents insertion and deletion satisfying therequirement of real-world application Because the proposedscheme is built over a balanced binary tree the updateoperations are realized by modifying the nodes in the treeInspired by the update method introduced in [14 15] theupdate algorithm is presented as follows

(i) UpdateInfoGen (sk Ts fi Utype) this algorithm isexecuted by the data owners and generates the up-date information Is ci to the cloud server where Tsis a set containing all the update nodes Is is anencrypted form of Ts fi is the target document ci isan encrypted form of fi and Utype is the update typeIn order to reduce the communication cost the dataowners will store the unencrypted index tree on itsown device For the Utype isin Ins Del the algorithmworks as follows

(a) If Utype ldquoDelrdquo it means that the algorithm willdelete a document fi from the tree )e algorithmfirst finds the leaf node associated with thedocument fi and deletes it In addition internalnodes associated with this leaf node are also addedto Ts Specifically if the deletion operation willbreak the balance of the index tree the algorithmcan set the target leaf node as a fake node insteadof removing it After this the algorithm encryptsTs to generate Is Finally the algorithm sends Is tothe cloud server and sets ci as null

Input A vector qrarr of query Q a semantic dictionary D generated by applying ldquoWord2Vecrdquo to F a root node u of IndexTree and

RListOutput RList

(1) Split qrarr into qmin

rarr and qmaxrarr according to the method M3

(2) if u is an internal node then(3) if Score (u Q)gt k-th score then(4) SearchIndexTree ( q

rarr D uPl RList)(5) SearchIndexTree ( q

rarr D uPr RList)(6) else(7) return(8) end if(9) else(10) if Score (u Q)gt k-th score then Update RList(11) Delete the element holding the smallest relevance score in RList(12) Insert a new element ltScore (u Q) uFIDgt in the Rlist and sort the elements in RList(13) end if(14) return(15) end if

ALGORITHM 2 SearchIndexTree (QueryVector qrarr Dictionary D TreeNode u RList)

Security and Communication Networks 9

(b) If Utype ldquoInsrdquo it means that the algorithm willinsert a document fi to the tree )e algorithmfirst creates a leaf node for fi according to themethodM1 introduced in Section 31 and insertsthis leaf node to Ts )en based on the methodM2 the algorithm updates the vectors of theinternal nodes which are placed on the path fromroot to the new leaf node and inserts these in-ternal nodes to Ts Here the algorithm prefers toreplace the fake leaf node with the new leaf noderather than insert a new leaf node Finally thealgorithm encrypts Ts and fi to generate Is and cirespectively and sends them to the cloud server

(ii) Update (IT C Is ci Utype) this algorithm is exe-cuted by the cloud server to update the index tree ITwith encrypted nodes set Is After this ifUtype ldquoDelrdquo then the algorithm removes ci fromCOtherwise the algorithm inserts ci to C

Note that after a period of insertion and deletion op-erations the number of keywords in the dictionary should bechanged Because the dimensions of the index and trapdoorvectors in the previous schemes are linear with the numberof keywords in the dictionary these schemes have to rebuildthe search index tree By contrast our scheme will not beaffected by this problem For the proposed scheme thedimensions of the vectors in the index and trapdoor aredetermined by the tool of ldquoWord2vecrdquo and set by the usersFor example if we set the dimension of the vector as 200 thedimension of each keywordrsquos vector is 200 and thus thedimensions of the vectors of umin

rarr umaxrarr qmin

rarr and qmaxrarr are all

200 According to the above analysis our scheme is moresuitable for the update operations than the previousschemes

34 SecurityAnalysis In this section we analyse the securityof the proposed SSE-DMKRS scheme according to theprivacy requirement introduced in Section 23

(1) Index and Trapdoor Privacy In the proposed schemeeach node u in the index tree and the query Q in thetrapdoor are encrypted by using the secure KNNalgorithm introduced in [32] )us the attackerscannot obtain the original vectors in the tree nodesand the query which means that the index andtrapdoor privacy are well protected

(2) Trapdoor Unlinkability In the trapdoor generationphase the query vector will be split randomlyMoreover the same keyword set Q will be extendedto bemultiple different semantic keyword setsQprime Sothe same query Q will be encrypted to be differenttrapdoors which means that the goal of the trapdoorunlinkability is achieved

(3) Keyword Privacy Since the index and the trapdoorare protected by the secure KNN algorithm theadversary cannot infer the plaintext informationfrom the index and the trapdoor under the knownciphertext model Considering that the known

background model is common in real-world appli-cations we will analyse the security of the proposedscheme under the known backgroundmodel For theTrapdoorGen algorithm the original query keywordsetQ is extended to a new setQprime Specifically for eachkeyword q inQ randomly choosing a number kprime thealgorithm chooses kprime semantic keywords related to qby utilizing the dictionary and inserts these keywordsinto the Qprime Suppose that each keyword is associatedwith N semantic keywords in the dictionary eachkeyword can generate 2N different keyword sets sinceeach semantic keyword can be chosen or not Forexample if a keyword q is associated with threesemantic keywords q1 q2 q3 then q can generate 23keyword sets q q q1 q q2 q q3 q q1 q2 qq1 q3 q q2 q3 and q q1 q2 q3 Since the queryQusually contains more than one keyword Q willgenerate more than 2N different semantic keywordsets According to this method the final similarityscore is obfuscated by these random semantic key-word sets As the analysis in [14 15] our scheme canprotect the keyword privacy under the knownbackground model

4 Proposed Scheme

In this section we analyse the proposed SSE-DMKRSscheme theoretically and experimentally A detailed ex-periment is given to demonstrate that our scheme can ef-ficiently perform dynamic ranked keywords search over theencrypted data Our experiment is run on Intelreg Coretrade i7CPU at a 290GHz processor and 16GB memory size and isbased on a real-world e-mail dataset called Enron e-maildataset [35] We mainly analyse the performance of ourscheme in two aspects (1) the efficiency of the proposedscheme including index building trapdoor generationsearch and update (2) the relationship between the searchprecision and the privacy level Moreover in order to showthe advantages of our scheme we also compare our schemeto two previous schemes related to our scheme For sim-plicity we denote these two schemes introduced in [14 15]by X15 and G19

41 Efficiency

411 Index Building )e process of index building mainlyconsists of two steps (1) creating an unencrypted index treeby utilizing Algorithm 1 (2) encrypting each node in the treeby using the secure KNN scheme In the tree building stepAlgorithm 1 generates O (n) nodes based on the documentset F Because each node has two vectors umin

rarr umaxrarr whose

dimensions are both d the vector splitting process needs O(d) time and the matrix multiplication operations take O(d times d) time in the encryption step According to these twosteps the whole time complexity of index building isO (nd2)which means that the time cost for index building mainlydepends on the number of documents in F and the di-mension of each nodersquos vector

10 Security and Communication Networks

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 7: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

ndash02 02 05

05

ndash07 08

08

W

ndash02 02 05 ndash07 08 ndash02 02 ndash07

u

umin umax

(a)

ndash01 ndash02 02 ndash07 ndash05 ndash02 04 02 ndash01 03

ndash02 ndash02 ndash03 ndash01 02 03 07 03 ndash02 ndash01

ndash02 ndash02 ndash03 ndash07 ndash05 03 07 03 ndash01 03

plumin

prumin prumax

plumax

umin umax

(b)

Figure 3 An example of the vectors generation of the node u (a) MethodM1 u is a leaf nodeW is a keyword set for the file which u storesand u

rarr is a vector generated by adopting the keyword conversion method mentioned in Section 25 (b) Method M2 u is an internal nodeand Pl middot umin

rarr Pl middot umaxrarr Pr middot umin

rarr and Pr middot umaxrarr are the vectors of its children

Input the document collection F f1 f2 fn a semantic dictionary D generated by applying ldquoWord2Vecrdquo to FOutput the index tree T

(1) for each i isin [1 n] do(2) Construct a leaf node u for fi with uIDGenID () uPl uPr NULL uFIDGenFID (fi) and generate umin

rarr and umaxrarr

according to the method M1(3) Insert u to CurrentNodeSet(4) end for(5) while |CurrentNodeSet|ge 1 do(6) if |CurrentNodeSet| is even ie 2h then(7) for each pair of nodes uprime and uPrime in CurrentNodeSet do(8) Create a parent node u for uprime and uPrime with uIDGenID () uPl uprime u Pr uPrime uFIDNULL and set umin

rarr and umaxrarr

according to the method M2(9) Insert u to TempNodeSet(10) end for(11) else Suppose that |CurrentNodeSet| 2h+ 1(12) for each pair of nodes uprime and uPrime of the former 2hminus 2 nodes in CurrentNodeSet do(13) Create a parent node u for uprime and uPrime(14) Insert u to TempNodeSet(15) end for(16) Create a parent node u1 for the (2hminus 1)-th and (2h)-th nodes and then generate a parent node u for the (2h+ 1)-th node and u1(17) Insert u to TempNodeSet(18) end if(19) Set CurrentNodeSetTempNodeSet and clear TempNodeSet(20) end while(21) return CurrentNodeSet(22) Note that the CurrentNodeSet only contains one node which is the root of the index tree T

ALGORITHM 1 BuildIndexTree (FileSet F Dictionary D)

Security and Communication Networks 7

dimension of uminrarr and umax

rarr )en it randomlygenerates a d-bit vector S Finally it outputs thesecret key sk S M11 M12 M21 M22

(ii) DictionaryBuild (F) given the document set F

f1 f2 fn the algorithm runs ldquoWord2vecrdquo togenerate the dictionary D of F In the dictionaryD each keyword is associated with a vectorrepresentation Besides each keyword is alsocorresponding with a set of semantically relatedkeywords

(iii) IndexBuild (sk F D) given the document set F andthe dictionaryD for F the algorithm first creates theindex tree T by using the algorithm BuildIndexTree(F D) (Algorithm 1) )en for each node u in thetree T the algorithm generates two random vector

pairs Vuminprime VuminPrime1113966 1113967 and Vumax

Prime VumaxPrime1113966 1113967 for the vec-

tors of uminrarr and umax

rarr respectively More precisely ifS [i] 0 it sets Vumin

prime [i] VuminPrime [i] umin

rarr[i] and

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S [i] 1Vuminprime VuminPrime Vumaxprime VumaxPrime are set as four random

values under the constraints Vuminprime [i] Vumin

Prime [i]

uminrarr

[i] and Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] )is pro-cess is expressed as the following equation

Vuminprime [i] Vumin

Prime [i] uminrarr

[i] if S[i] 0

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S[i] 0

Vuminprime [i] + Vumin

Prime [i] uminrarr

[i] if S[i] 1

Vumaxprime [i] + Vumax

Prime [i] umaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(5)

Finally for each node u it computes Iu MT11Vuminprime 1113966

MT12VuminPrime MT

21Vumaxprime MT

22VumaxPrime )rough replacing the

plaintext vectors uminrarr and umax

rarr with the encrypted index Iuan encrypted index tree IT is created

(iv) TrapdoorGen (sk Q) given a query keyword set Qthe algorithm first extends Q to a new semantickeyword set Qprime )e process is as follows

(a) It generates a new keyword set Qprime which isinitialized to an empty set

r

ndash05 ndash02 ndash03

05 09 08

ndash04 ndash02 ndash03

03 03 08

ndash05 07 ndash02

05 09 08

ndash02

03 08

ndash02

03 03

ndash0209

05 07 08

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

05 07 08

ndash05

ndash05

ndash0209

ndash01ndash02

01 00 02

00 ndash03 0001 ndash03 02

ndash02 ndash03

qqmin

qmax

r11 r12

r21 r22

f1 f2 f3 f4

f5 f6

umin

umax

(1)

(2)

(4) (3) (5)(6)

(7)

ndash04 07

Figure 4 An example of Algorithm 1 and Algorithm 2 (example 1)

Q

01 ndash02 03 ndash04 05

01 0 03 0 050 ndash02 0 ndash04 0

q

qmin qmax

Figure 5 An example of the vector generation of the query QMethod M3 q

rarr is a vector generated by adopting the keywordconversion method mentioned in Section 25 qmin

rarr holds all thenegative part of q

rarr while qmaxrarr holds the positive part

8 Security and Communication Networks

(b) Note that each keyword in the dictionary isassociated with a group of keywords semanti-cally related to this keyword For each keyword qin Q it randomly chooses kprime semantic keywordsbased on the dictionary and inserts these key-words into Qprime where kprime is chosen dynamicallyand k isin [1 N]

)en based onQprime the TrapdoorGen algorithm generatesa pair of vectors qmin

rarr and qmaxrarr by adopting the method M3

After this it generates two random vector pairs Qqminprime QqminPrime1113966 1113967

and Qqmaxprime QqmaxPrime1113966 1113967 for the vectors of qmin

rarr and qmaxrarr respec-

tively)is process is similar to the process in the IndexBuildalgorithm and can be expressed as the following equations

Qqminprime [i] Qqmin

Prime [i] qminrarr

[i] if S[i] 0

Qqmaxprime [i] Qqmax

Prime [i] qmaxrarr

[i] if S[i] 0

Qqminprime [i] + Qqmin

Prime [i] qminrarr

[i] if S[i] 1

Qqmaxprime [i] + Qqmax

Prime [i] qmaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(6)

Finally this algorithm generates TQ Mminus111Qqminprime 1113966

Mminus112QqminPrime Mminus1

21Qqmaxprime Mminus1

22QqmaxPrime as the trapdoor for Q

(v) Search (sk TQ IT) for each node u in IT the al-gorithm computes

Iu middot TQ MT11Vuminprime middot M

minus111Qqminprime1113872 1113873 + M

T12VuminPrime middot M

minus112QqminPrime1113872 1113873

+ MT21Vumaxprime middot M

minus121Qqmaxprime1113872 1113873 + M

T22VumaxPrime middot M

minus122QqmaxPrime1113872 1113873

Vuminprime middot Qqminprime1113872 1113873 + Vumin

Prime middot QqminPrime1113872 1113873 + Vumax

prime middot Qqmaxprime1113872 1113873

+ VumaxPrime middot QqmaxPrime1113872 1113873

uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

Score(u Q)

(7)

According to equation (3) the relevance score calculatedfrom the encrypted vector Iu and the trapdoor TQ equals thevalue of Score (u Q) By using this property the algorithmcan utilize the SearchIndexTree algorithm (Algorithm 2) toperform ranked search

33 Dynamic Update Operations Besides search operationthe proposed scheme also supports some dynamic opera-tions eg documents insertion and deletion satisfying therequirement of real-world application Because the proposedscheme is built over a balanced binary tree the updateoperations are realized by modifying the nodes in the treeInspired by the update method introduced in [14 15] theupdate algorithm is presented as follows

(i) UpdateInfoGen (sk Ts fi Utype) this algorithm isexecuted by the data owners and generates the up-date information Is ci to the cloud server where Tsis a set containing all the update nodes Is is anencrypted form of Ts fi is the target document ci isan encrypted form of fi and Utype is the update typeIn order to reduce the communication cost the dataowners will store the unencrypted index tree on itsown device For the Utype isin Ins Del the algorithmworks as follows

(a) If Utype ldquoDelrdquo it means that the algorithm willdelete a document fi from the tree )e algorithmfirst finds the leaf node associated with thedocument fi and deletes it In addition internalnodes associated with this leaf node are also addedto Ts Specifically if the deletion operation willbreak the balance of the index tree the algorithmcan set the target leaf node as a fake node insteadof removing it After this the algorithm encryptsTs to generate Is Finally the algorithm sends Is tothe cloud server and sets ci as null

Input A vector qrarr of query Q a semantic dictionary D generated by applying ldquoWord2Vecrdquo to F a root node u of IndexTree and

RListOutput RList

(1) Split qrarr into qmin

rarr and qmaxrarr according to the method M3

(2) if u is an internal node then(3) if Score (u Q)gt k-th score then(4) SearchIndexTree ( q

rarr D uPl RList)(5) SearchIndexTree ( q

rarr D uPr RList)(6) else(7) return(8) end if(9) else(10) if Score (u Q)gt k-th score then Update RList(11) Delete the element holding the smallest relevance score in RList(12) Insert a new element ltScore (u Q) uFIDgt in the Rlist and sort the elements in RList(13) end if(14) return(15) end if

ALGORITHM 2 SearchIndexTree (QueryVector qrarr Dictionary D TreeNode u RList)

Security and Communication Networks 9

(b) If Utype ldquoInsrdquo it means that the algorithm willinsert a document fi to the tree )e algorithmfirst creates a leaf node for fi according to themethodM1 introduced in Section 31 and insertsthis leaf node to Ts )en based on the methodM2 the algorithm updates the vectors of theinternal nodes which are placed on the path fromroot to the new leaf node and inserts these in-ternal nodes to Ts Here the algorithm prefers toreplace the fake leaf node with the new leaf noderather than insert a new leaf node Finally thealgorithm encrypts Ts and fi to generate Is and cirespectively and sends them to the cloud server

(ii) Update (IT C Is ci Utype) this algorithm is exe-cuted by the cloud server to update the index tree ITwith encrypted nodes set Is After this ifUtype ldquoDelrdquo then the algorithm removes ci fromCOtherwise the algorithm inserts ci to C

Note that after a period of insertion and deletion op-erations the number of keywords in the dictionary should bechanged Because the dimensions of the index and trapdoorvectors in the previous schemes are linear with the numberof keywords in the dictionary these schemes have to rebuildthe search index tree By contrast our scheme will not beaffected by this problem For the proposed scheme thedimensions of the vectors in the index and trapdoor aredetermined by the tool of ldquoWord2vecrdquo and set by the usersFor example if we set the dimension of the vector as 200 thedimension of each keywordrsquos vector is 200 and thus thedimensions of the vectors of umin

rarr umaxrarr qmin

rarr and qmaxrarr are all

200 According to the above analysis our scheme is moresuitable for the update operations than the previousschemes

34 SecurityAnalysis In this section we analyse the securityof the proposed SSE-DMKRS scheme according to theprivacy requirement introduced in Section 23

(1) Index and Trapdoor Privacy In the proposed schemeeach node u in the index tree and the query Q in thetrapdoor are encrypted by using the secure KNNalgorithm introduced in [32] )us the attackerscannot obtain the original vectors in the tree nodesand the query which means that the index andtrapdoor privacy are well protected

(2) Trapdoor Unlinkability In the trapdoor generationphase the query vector will be split randomlyMoreover the same keyword set Q will be extendedto bemultiple different semantic keyword setsQprime Sothe same query Q will be encrypted to be differenttrapdoors which means that the goal of the trapdoorunlinkability is achieved

(3) Keyword Privacy Since the index and the trapdoorare protected by the secure KNN algorithm theadversary cannot infer the plaintext informationfrom the index and the trapdoor under the knownciphertext model Considering that the known

background model is common in real-world appli-cations we will analyse the security of the proposedscheme under the known backgroundmodel For theTrapdoorGen algorithm the original query keywordsetQ is extended to a new setQprime Specifically for eachkeyword q inQ randomly choosing a number kprime thealgorithm chooses kprime semantic keywords related to qby utilizing the dictionary and inserts these keywordsinto the Qprime Suppose that each keyword is associatedwith N semantic keywords in the dictionary eachkeyword can generate 2N different keyword sets sinceeach semantic keyword can be chosen or not Forexample if a keyword q is associated with threesemantic keywords q1 q2 q3 then q can generate 23keyword sets q q q1 q q2 q q3 q q1 q2 qq1 q3 q q2 q3 and q q1 q2 q3 Since the queryQusually contains more than one keyword Q willgenerate more than 2N different semantic keywordsets According to this method the final similarityscore is obfuscated by these random semantic key-word sets As the analysis in [14 15] our scheme canprotect the keyword privacy under the knownbackground model

4 Proposed Scheme

In this section we analyse the proposed SSE-DMKRSscheme theoretically and experimentally A detailed ex-periment is given to demonstrate that our scheme can ef-ficiently perform dynamic ranked keywords search over theencrypted data Our experiment is run on Intelreg Coretrade i7CPU at a 290GHz processor and 16GB memory size and isbased on a real-world e-mail dataset called Enron e-maildataset [35] We mainly analyse the performance of ourscheme in two aspects (1) the efficiency of the proposedscheme including index building trapdoor generationsearch and update (2) the relationship between the searchprecision and the privacy level Moreover in order to showthe advantages of our scheme we also compare our schemeto two previous schemes related to our scheme For sim-plicity we denote these two schemes introduced in [14 15]by X15 and G19

41 Efficiency

411 Index Building )e process of index building mainlyconsists of two steps (1) creating an unencrypted index treeby utilizing Algorithm 1 (2) encrypting each node in the treeby using the secure KNN scheme In the tree building stepAlgorithm 1 generates O (n) nodes based on the documentset F Because each node has two vectors umin

rarr umaxrarr whose

dimensions are both d the vector splitting process needs O(d) time and the matrix multiplication operations take O(d times d) time in the encryption step According to these twosteps the whole time complexity of index building isO (nd2)which means that the time cost for index building mainlydepends on the number of documents in F and the di-mension of each nodersquos vector

10 Security and Communication Networks

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 8: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

dimension of uminrarr and umax

rarr )en it randomlygenerates a d-bit vector S Finally it outputs thesecret key sk S M11 M12 M21 M22

(ii) DictionaryBuild (F) given the document set F

f1 f2 fn the algorithm runs ldquoWord2vecrdquo togenerate the dictionary D of F In the dictionaryD each keyword is associated with a vectorrepresentation Besides each keyword is alsocorresponding with a set of semantically relatedkeywords

(iii) IndexBuild (sk F D) given the document set F andthe dictionaryD for F the algorithm first creates theindex tree T by using the algorithm BuildIndexTree(F D) (Algorithm 1) )en for each node u in thetree T the algorithm generates two random vector

pairs Vuminprime VuminPrime1113966 1113967 and Vumax

Prime VumaxPrime1113966 1113967 for the vec-

tors of uminrarr and umax

rarr respectively More precisely ifS [i] 0 it sets Vumin

prime [i] VuminPrime [i] umin

rarr[i] and

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S [i] 1Vuminprime VuminPrime Vumaxprime VumaxPrime are set as four random

values under the constraints Vuminprime [i] Vumin

Prime [i]

uminrarr

[i] and Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] )is pro-cess is expressed as the following equation

Vuminprime [i] Vumin

Prime [i] uminrarr

[i] if S[i] 0

Vumaxprime [i] Vumax

Prime [i] umaxrarr

[i] if S[i] 0

Vuminprime [i] + Vumin

Prime [i] uminrarr

[i] if S[i] 1

Vumaxprime [i] + Vumax

Prime [i] umaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(5)

Finally for each node u it computes Iu MT11Vuminprime 1113966

MT12VuminPrime MT

21Vumaxprime MT

22VumaxPrime )rough replacing the

plaintext vectors uminrarr and umax

rarr with the encrypted index Iuan encrypted index tree IT is created

(iv) TrapdoorGen (sk Q) given a query keyword set Qthe algorithm first extends Q to a new semantickeyword set Qprime )e process is as follows

(a) It generates a new keyword set Qprime which isinitialized to an empty set

r

ndash05 ndash02 ndash03

05 09 08

ndash04 ndash02 ndash03

03 03 08

ndash05 07 ndash02

05 09 08

ndash02

03 08

ndash02

03 03

ndash0209

05 07 08

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

ndash02 03 07 ndash04 ndash02 08

03 ndash02 03

ndash02 ndash01 ndash03

05 07 08

ndash05

ndash05

ndash0209

ndash01ndash02

01 00 02

00 ndash03 0001 ndash03 02

ndash02 ndash03

qqmin

qmax

r11 r12

r21 r22

f1 f2 f3 f4

f5 f6

umin

umax

(1)

(2)

(4) (3) (5)(6)

(7)

ndash04 07

Figure 4 An example of Algorithm 1 and Algorithm 2 (example 1)

Q

01 ndash02 03 ndash04 05

01 0 03 0 050 ndash02 0 ndash04 0

q

qmin qmax

Figure 5 An example of the vector generation of the query QMethod M3 q

rarr is a vector generated by adopting the keywordconversion method mentioned in Section 25 qmin

rarr holds all thenegative part of q

rarr while qmaxrarr holds the positive part

8 Security and Communication Networks

(b) Note that each keyword in the dictionary isassociated with a group of keywords semanti-cally related to this keyword For each keyword qin Q it randomly chooses kprime semantic keywordsbased on the dictionary and inserts these key-words into Qprime where kprime is chosen dynamicallyand k isin [1 N]

)en based onQprime the TrapdoorGen algorithm generatesa pair of vectors qmin

rarr and qmaxrarr by adopting the method M3

After this it generates two random vector pairs Qqminprime QqminPrime1113966 1113967

and Qqmaxprime QqmaxPrime1113966 1113967 for the vectors of qmin

rarr and qmaxrarr respec-

tively)is process is similar to the process in the IndexBuildalgorithm and can be expressed as the following equations

Qqminprime [i] Qqmin

Prime [i] qminrarr

[i] if S[i] 0

Qqmaxprime [i] Qqmax

Prime [i] qmaxrarr

[i] if S[i] 0

Qqminprime [i] + Qqmin

Prime [i] qminrarr

[i] if S[i] 1

Qqmaxprime [i] + Qqmax

Prime [i] qmaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(6)

Finally this algorithm generates TQ Mminus111Qqminprime 1113966

Mminus112QqminPrime Mminus1

21Qqmaxprime Mminus1

22QqmaxPrime as the trapdoor for Q

(v) Search (sk TQ IT) for each node u in IT the al-gorithm computes

Iu middot TQ MT11Vuminprime middot M

minus111Qqminprime1113872 1113873 + M

T12VuminPrime middot M

minus112QqminPrime1113872 1113873

+ MT21Vumaxprime middot M

minus121Qqmaxprime1113872 1113873 + M

T22VumaxPrime middot M

minus122QqmaxPrime1113872 1113873

Vuminprime middot Qqminprime1113872 1113873 + Vumin

Prime middot QqminPrime1113872 1113873 + Vumax

prime middot Qqmaxprime1113872 1113873

+ VumaxPrime middot QqmaxPrime1113872 1113873

uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

Score(u Q)

(7)

According to equation (3) the relevance score calculatedfrom the encrypted vector Iu and the trapdoor TQ equals thevalue of Score (u Q) By using this property the algorithmcan utilize the SearchIndexTree algorithm (Algorithm 2) toperform ranked search

33 Dynamic Update Operations Besides search operationthe proposed scheme also supports some dynamic opera-tions eg documents insertion and deletion satisfying therequirement of real-world application Because the proposedscheme is built over a balanced binary tree the updateoperations are realized by modifying the nodes in the treeInspired by the update method introduced in [14 15] theupdate algorithm is presented as follows

(i) UpdateInfoGen (sk Ts fi Utype) this algorithm isexecuted by the data owners and generates the up-date information Is ci to the cloud server where Tsis a set containing all the update nodes Is is anencrypted form of Ts fi is the target document ci isan encrypted form of fi and Utype is the update typeIn order to reduce the communication cost the dataowners will store the unencrypted index tree on itsown device For the Utype isin Ins Del the algorithmworks as follows

(a) If Utype ldquoDelrdquo it means that the algorithm willdelete a document fi from the tree )e algorithmfirst finds the leaf node associated with thedocument fi and deletes it In addition internalnodes associated with this leaf node are also addedto Ts Specifically if the deletion operation willbreak the balance of the index tree the algorithmcan set the target leaf node as a fake node insteadof removing it After this the algorithm encryptsTs to generate Is Finally the algorithm sends Is tothe cloud server and sets ci as null

Input A vector qrarr of query Q a semantic dictionary D generated by applying ldquoWord2Vecrdquo to F a root node u of IndexTree and

RListOutput RList

(1) Split qrarr into qmin

rarr and qmaxrarr according to the method M3

(2) if u is an internal node then(3) if Score (u Q)gt k-th score then(4) SearchIndexTree ( q

rarr D uPl RList)(5) SearchIndexTree ( q

rarr D uPr RList)(6) else(7) return(8) end if(9) else(10) if Score (u Q)gt k-th score then Update RList(11) Delete the element holding the smallest relevance score in RList(12) Insert a new element ltScore (u Q) uFIDgt in the Rlist and sort the elements in RList(13) end if(14) return(15) end if

ALGORITHM 2 SearchIndexTree (QueryVector qrarr Dictionary D TreeNode u RList)

Security and Communication Networks 9

(b) If Utype ldquoInsrdquo it means that the algorithm willinsert a document fi to the tree )e algorithmfirst creates a leaf node for fi according to themethodM1 introduced in Section 31 and insertsthis leaf node to Ts )en based on the methodM2 the algorithm updates the vectors of theinternal nodes which are placed on the path fromroot to the new leaf node and inserts these in-ternal nodes to Ts Here the algorithm prefers toreplace the fake leaf node with the new leaf noderather than insert a new leaf node Finally thealgorithm encrypts Ts and fi to generate Is and cirespectively and sends them to the cloud server

(ii) Update (IT C Is ci Utype) this algorithm is exe-cuted by the cloud server to update the index tree ITwith encrypted nodes set Is After this ifUtype ldquoDelrdquo then the algorithm removes ci fromCOtherwise the algorithm inserts ci to C

Note that after a period of insertion and deletion op-erations the number of keywords in the dictionary should bechanged Because the dimensions of the index and trapdoorvectors in the previous schemes are linear with the numberof keywords in the dictionary these schemes have to rebuildthe search index tree By contrast our scheme will not beaffected by this problem For the proposed scheme thedimensions of the vectors in the index and trapdoor aredetermined by the tool of ldquoWord2vecrdquo and set by the usersFor example if we set the dimension of the vector as 200 thedimension of each keywordrsquos vector is 200 and thus thedimensions of the vectors of umin

rarr umaxrarr qmin

rarr and qmaxrarr are all

200 According to the above analysis our scheme is moresuitable for the update operations than the previousschemes

34 SecurityAnalysis In this section we analyse the securityof the proposed SSE-DMKRS scheme according to theprivacy requirement introduced in Section 23

(1) Index and Trapdoor Privacy In the proposed schemeeach node u in the index tree and the query Q in thetrapdoor are encrypted by using the secure KNNalgorithm introduced in [32] )us the attackerscannot obtain the original vectors in the tree nodesand the query which means that the index andtrapdoor privacy are well protected

(2) Trapdoor Unlinkability In the trapdoor generationphase the query vector will be split randomlyMoreover the same keyword set Q will be extendedto bemultiple different semantic keyword setsQprime Sothe same query Q will be encrypted to be differenttrapdoors which means that the goal of the trapdoorunlinkability is achieved

(3) Keyword Privacy Since the index and the trapdoorare protected by the secure KNN algorithm theadversary cannot infer the plaintext informationfrom the index and the trapdoor under the knownciphertext model Considering that the known

background model is common in real-world appli-cations we will analyse the security of the proposedscheme under the known backgroundmodel For theTrapdoorGen algorithm the original query keywordsetQ is extended to a new setQprime Specifically for eachkeyword q inQ randomly choosing a number kprime thealgorithm chooses kprime semantic keywords related to qby utilizing the dictionary and inserts these keywordsinto the Qprime Suppose that each keyword is associatedwith N semantic keywords in the dictionary eachkeyword can generate 2N different keyword sets sinceeach semantic keyword can be chosen or not Forexample if a keyword q is associated with threesemantic keywords q1 q2 q3 then q can generate 23keyword sets q q q1 q q2 q q3 q q1 q2 qq1 q3 q q2 q3 and q q1 q2 q3 Since the queryQusually contains more than one keyword Q willgenerate more than 2N different semantic keywordsets According to this method the final similarityscore is obfuscated by these random semantic key-word sets As the analysis in [14 15] our scheme canprotect the keyword privacy under the knownbackground model

4 Proposed Scheme

In this section we analyse the proposed SSE-DMKRSscheme theoretically and experimentally A detailed ex-periment is given to demonstrate that our scheme can ef-ficiently perform dynamic ranked keywords search over theencrypted data Our experiment is run on Intelreg Coretrade i7CPU at a 290GHz processor and 16GB memory size and isbased on a real-world e-mail dataset called Enron e-maildataset [35] We mainly analyse the performance of ourscheme in two aspects (1) the efficiency of the proposedscheme including index building trapdoor generationsearch and update (2) the relationship between the searchprecision and the privacy level Moreover in order to showthe advantages of our scheme we also compare our schemeto two previous schemes related to our scheme For sim-plicity we denote these two schemes introduced in [14 15]by X15 and G19

41 Efficiency

411 Index Building )e process of index building mainlyconsists of two steps (1) creating an unencrypted index treeby utilizing Algorithm 1 (2) encrypting each node in the treeby using the secure KNN scheme In the tree building stepAlgorithm 1 generates O (n) nodes based on the documentset F Because each node has two vectors umin

rarr umaxrarr whose

dimensions are both d the vector splitting process needs O(d) time and the matrix multiplication operations take O(d times d) time in the encryption step According to these twosteps the whole time complexity of index building isO (nd2)which means that the time cost for index building mainlydepends on the number of documents in F and the di-mension of each nodersquos vector

10 Security and Communication Networks

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 9: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

(b) Note that each keyword in the dictionary isassociated with a group of keywords semanti-cally related to this keyword For each keyword qin Q it randomly chooses kprime semantic keywordsbased on the dictionary and inserts these key-words into Qprime where kprime is chosen dynamicallyand k isin [1 N]

)en based onQprime the TrapdoorGen algorithm generatesa pair of vectors qmin

rarr and qmaxrarr by adopting the method M3

After this it generates two random vector pairs Qqminprime QqminPrime1113966 1113967

and Qqmaxprime QqmaxPrime1113966 1113967 for the vectors of qmin

rarr and qmaxrarr respec-

tively)is process is similar to the process in the IndexBuildalgorithm and can be expressed as the following equations

Qqminprime [i] Qqmin

Prime [i] qminrarr

[i] if S[i] 0

Qqmaxprime [i] Qqmax

Prime [i] qmaxrarr

[i] if S[i] 0

Qqminprime [i] + Qqmin

Prime [i] qminrarr

[i] if S[i] 1

Qqmaxprime [i] + Qqmax

Prime [i] qmaxrarr

[i] if S[i] 1

⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

⎫⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎭

i isin [1 d]

(6)

Finally this algorithm generates TQ Mminus111Qqminprime 1113966

Mminus112QqminPrime Mminus1

21Qqmaxprime Mminus1

22QqmaxPrime as the trapdoor for Q

(v) Search (sk TQ IT) for each node u in IT the al-gorithm computes

Iu middot TQ MT11Vuminprime middot M

minus111Qqminprime1113872 1113873 + M

T12VuminPrime middot M

minus112QqminPrime1113872 1113873

+ MT21Vumaxprime middot M

minus121Qqmaxprime1113872 1113873 + M

T22VumaxPrime middot M

minus122QqmaxPrime1113872 1113873

Vuminprime middot Qqminprime1113872 1113873 + Vumin

Prime middot QqminPrime1113872 1113873 + Vumax

prime middot Qqmaxprime1113872 1113873

+ VumaxPrime middot QqmaxPrime1113872 1113873

uminrarr

middot qminrarr

+ umaxrarr

middot qmaxrarr

Score(u Q)

(7)

According to equation (3) the relevance score calculatedfrom the encrypted vector Iu and the trapdoor TQ equals thevalue of Score (u Q) By using this property the algorithmcan utilize the SearchIndexTree algorithm (Algorithm 2) toperform ranked search

33 Dynamic Update Operations Besides search operationthe proposed scheme also supports some dynamic opera-tions eg documents insertion and deletion satisfying therequirement of real-world application Because the proposedscheme is built over a balanced binary tree the updateoperations are realized by modifying the nodes in the treeInspired by the update method introduced in [14 15] theupdate algorithm is presented as follows

(i) UpdateInfoGen (sk Ts fi Utype) this algorithm isexecuted by the data owners and generates the up-date information Is ci to the cloud server where Tsis a set containing all the update nodes Is is anencrypted form of Ts fi is the target document ci isan encrypted form of fi and Utype is the update typeIn order to reduce the communication cost the dataowners will store the unencrypted index tree on itsown device For the Utype isin Ins Del the algorithmworks as follows

(a) If Utype ldquoDelrdquo it means that the algorithm willdelete a document fi from the tree )e algorithmfirst finds the leaf node associated with thedocument fi and deletes it In addition internalnodes associated with this leaf node are also addedto Ts Specifically if the deletion operation willbreak the balance of the index tree the algorithmcan set the target leaf node as a fake node insteadof removing it After this the algorithm encryptsTs to generate Is Finally the algorithm sends Is tothe cloud server and sets ci as null

Input A vector qrarr of query Q a semantic dictionary D generated by applying ldquoWord2Vecrdquo to F a root node u of IndexTree and

RListOutput RList

(1) Split qrarr into qmin

rarr and qmaxrarr according to the method M3

(2) if u is an internal node then(3) if Score (u Q)gt k-th score then(4) SearchIndexTree ( q

rarr D uPl RList)(5) SearchIndexTree ( q

rarr D uPr RList)(6) else(7) return(8) end if(9) else(10) if Score (u Q)gt k-th score then Update RList(11) Delete the element holding the smallest relevance score in RList(12) Insert a new element ltScore (u Q) uFIDgt in the Rlist and sort the elements in RList(13) end if(14) return(15) end if

ALGORITHM 2 SearchIndexTree (QueryVector qrarr Dictionary D TreeNode u RList)

Security and Communication Networks 9

(b) If Utype ldquoInsrdquo it means that the algorithm willinsert a document fi to the tree )e algorithmfirst creates a leaf node for fi according to themethodM1 introduced in Section 31 and insertsthis leaf node to Ts )en based on the methodM2 the algorithm updates the vectors of theinternal nodes which are placed on the path fromroot to the new leaf node and inserts these in-ternal nodes to Ts Here the algorithm prefers toreplace the fake leaf node with the new leaf noderather than insert a new leaf node Finally thealgorithm encrypts Ts and fi to generate Is and cirespectively and sends them to the cloud server

(ii) Update (IT C Is ci Utype) this algorithm is exe-cuted by the cloud server to update the index tree ITwith encrypted nodes set Is After this ifUtype ldquoDelrdquo then the algorithm removes ci fromCOtherwise the algorithm inserts ci to C

Note that after a period of insertion and deletion op-erations the number of keywords in the dictionary should bechanged Because the dimensions of the index and trapdoorvectors in the previous schemes are linear with the numberof keywords in the dictionary these schemes have to rebuildthe search index tree By contrast our scheme will not beaffected by this problem For the proposed scheme thedimensions of the vectors in the index and trapdoor aredetermined by the tool of ldquoWord2vecrdquo and set by the usersFor example if we set the dimension of the vector as 200 thedimension of each keywordrsquos vector is 200 and thus thedimensions of the vectors of umin

rarr umaxrarr qmin

rarr and qmaxrarr are all

200 According to the above analysis our scheme is moresuitable for the update operations than the previousschemes

34 SecurityAnalysis In this section we analyse the securityof the proposed SSE-DMKRS scheme according to theprivacy requirement introduced in Section 23

(1) Index and Trapdoor Privacy In the proposed schemeeach node u in the index tree and the query Q in thetrapdoor are encrypted by using the secure KNNalgorithm introduced in [32] )us the attackerscannot obtain the original vectors in the tree nodesand the query which means that the index andtrapdoor privacy are well protected

(2) Trapdoor Unlinkability In the trapdoor generationphase the query vector will be split randomlyMoreover the same keyword set Q will be extendedto bemultiple different semantic keyword setsQprime Sothe same query Q will be encrypted to be differenttrapdoors which means that the goal of the trapdoorunlinkability is achieved

(3) Keyword Privacy Since the index and the trapdoorare protected by the secure KNN algorithm theadversary cannot infer the plaintext informationfrom the index and the trapdoor under the knownciphertext model Considering that the known

background model is common in real-world appli-cations we will analyse the security of the proposedscheme under the known backgroundmodel For theTrapdoorGen algorithm the original query keywordsetQ is extended to a new setQprime Specifically for eachkeyword q inQ randomly choosing a number kprime thealgorithm chooses kprime semantic keywords related to qby utilizing the dictionary and inserts these keywordsinto the Qprime Suppose that each keyword is associatedwith N semantic keywords in the dictionary eachkeyword can generate 2N different keyword sets sinceeach semantic keyword can be chosen or not Forexample if a keyword q is associated with threesemantic keywords q1 q2 q3 then q can generate 23keyword sets q q q1 q q2 q q3 q q1 q2 qq1 q3 q q2 q3 and q q1 q2 q3 Since the queryQusually contains more than one keyword Q willgenerate more than 2N different semantic keywordsets According to this method the final similarityscore is obfuscated by these random semantic key-word sets As the analysis in [14 15] our scheme canprotect the keyword privacy under the knownbackground model

4 Proposed Scheme

In this section we analyse the proposed SSE-DMKRSscheme theoretically and experimentally A detailed ex-periment is given to demonstrate that our scheme can ef-ficiently perform dynamic ranked keywords search over theencrypted data Our experiment is run on Intelreg Coretrade i7CPU at a 290GHz processor and 16GB memory size and isbased on a real-world e-mail dataset called Enron e-maildataset [35] We mainly analyse the performance of ourscheme in two aspects (1) the efficiency of the proposedscheme including index building trapdoor generationsearch and update (2) the relationship between the searchprecision and the privacy level Moreover in order to showthe advantages of our scheme we also compare our schemeto two previous schemes related to our scheme For sim-plicity we denote these two schemes introduced in [14 15]by X15 and G19

41 Efficiency

411 Index Building )e process of index building mainlyconsists of two steps (1) creating an unencrypted index treeby utilizing Algorithm 1 (2) encrypting each node in the treeby using the secure KNN scheme In the tree building stepAlgorithm 1 generates O (n) nodes based on the documentset F Because each node has two vectors umin

rarr umaxrarr whose

dimensions are both d the vector splitting process needs O(d) time and the matrix multiplication operations take O(d times d) time in the encryption step According to these twosteps the whole time complexity of index building isO (nd2)which means that the time cost for index building mainlydepends on the number of documents in F and the di-mension of each nodersquos vector

10 Security and Communication Networks

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 10: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

(b) If Utype ldquoInsrdquo it means that the algorithm willinsert a document fi to the tree )e algorithmfirst creates a leaf node for fi according to themethodM1 introduced in Section 31 and insertsthis leaf node to Ts )en based on the methodM2 the algorithm updates the vectors of theinternal nodes which are placed on the path fromroot to the new leaf node and inserts these in-ternal nodes to Ts Here the algorithm prefers toreplace the fake leaf node with the new leaf noderather than insert a new leaf node Finally thealgorithm encrypts Ts and fi to generate Is and cirespectively and sends them to the cloud server

(ii) Update (IT C Is ci Utype) this algorithm is exe-cuted by the cloud server to update the index tree ITwith encrypted nodes set Is After this ifUtype ldquoDelrdquo then the algorithm removes ci fromCOtherwise the algorithm inserts ci to C

Note that after a period of insertion and deletion op-erations the number of keywords in the dictionary should bechanged Because the dimensions of the index and trapdoorvectors in the previous schemes are linear with the numberof keywords in the dictionary these schemes have to rebuildthe search index tree By contrast our scheme will not beaffected by this problem For the proposed scheme thedimensions of the vectors in the index and trapdoor aredetermined by the tool of ldquoWord2vecrdquo and set by the usersFor example if we set the dimension of the vector as 200 thedimension of each keywordrsquos vector is 200 and thus thedimensions of the vectors of umin

rarr umaxrarr qmin

rarr and qmaxrarr are all

200 According to the above analysis our scheme is moresuitable for the update operations than the previousschemes

34 SecurityAnalysis In this section we analyse the securityof the proposed SSE-DMKRS scheme according to theprivacy requirement introduced in Section 23

(1) Index and Trapdoor Privacy In the proposed schemeeach node u in the index tree and the query Q in thetrapdoor are encrypted by using the secure KNNalgorithm introduced in [32] )us the attackerscannot obtain the original vectors in the tree nodesand the query which means that the index andtrapdoor privacy are well protected

(2) Trapdoor Unlinkability In the trapdoor generationphase the query vector will be split randomlyMoreover the same keyword set Q will be extendedto bemultiple different semantic keyword setsQprime Sothe same query Q will be encrypted to be differenttrapdoors which means that the goal of the trapdoorunlinkability is achieved

(3) Keyword Privacy Since the index and the trapdoorare protected by the secure KNN algorithm theadversary cannot infer the plaintext informationfrom the index and the trapdoor under the knownciphertext model Considering that the known

background model is common in real-world appli-cations we will analyse the security of the proposedscheme under the known backgroundmodel For theTrapdoorGen algorithm the original query keywordsetQ is extended to a new setQprime Specifically for eachkeyword q inQ randomly choosing a number kprime thealgorithm chooses kprime semantic keywords related to qby utilizing the dictionary and inserts these keywordsinto the Qprime Suppose that each keyword is associatedwith N semantic keywords in the dictionary eachkeyword can generate 2N different keyword sets sinceeach semantic keyword can be chosen or not Forexample if a keyword q is associated with threesemantic keywords q1 q2 q3 then q can generate 23keyword sets q q q1 q q2 q q3 q q1 q2 qq1 q3 q q2 q3 and q q1 q2 q3 Since the queryQusually contains more than one keyword Q willgenerate more than 2N different semantic keywordsets According to this method the final similarityscore is obfuscated by these random semantic key-word sets As the analysis in [14 15] our scheme canprotect the keyword privacy under the knownbackground model

4 Proposed Scheme

In this section we analyse the proposed SSE-DMKRSscheme theoretically and experimentally A detailed ex-periment is given to demonstrate that our scheme can ef-ficiently perform dynamic ranked keywords search over theencrypted data Our experiment is run on Intelreg Coretrade i7CPU at a 290GHz processor and 16GB memory size and isbased on a real-world e-mail dataset called Enron e-maildataset [35] We mainly analyse the performance of ourscheme in two aspects (1) the efficiency of the proposedscheme including index building trapdoor generationsearch and update (2) the relationship between the searchprecision and the privacy level Moreover in order to showthe advantages of our scheme we also compare our schemeto two previous schemes related to our scheme For sim-plicity we denote these two schemes introduced in [14 15]by X15 and G19

41 Efficiency

411 Index Building )e process of index building mainlyconsists of two steps (1) creating an unencrypted index treeby utilizing Algorithm 1 (2) encrypting each node in the treeby using the secure KNN scheme In the tree building stepAlgorithm 1 generates O (n) nodes based on the documentset F Because each node has two vectors umin

rarr umaxrarr whose

dimensions are both d the vector splitting process needs O(d) time and the matrix multiplication operations take O(d times d) time in the encryption step According to these twosteps the whole time complexity of index building isO (nd2)which means that the time cost for index building mainlydepends on the number of documents in F and the di-mension of each nodersquos vector

10 Security and Communication Networks

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 11: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

Since the dimensions of each nodersquos vector in X15 andG19 are both linear with the number of keywords in thedictionary (m) the time costs for index building in X15 andG19 are both O (nm2) Due to d≪m we can argue that thetime cost for index building in our scheme is much less thanthat in X15 and G19 In addition for the scheme G15 theinternal nodes are constructed by the tool called bloom filterand thus the dimension of each internal nodersquos vector islinear with b Since b is usually smaller than m the indexbuilding time in G19 is less than that in X15

Figure 6(a) shows that the time cost for index building inour scheme is much less than that in X15 and G19 Moreprecisely when n 1000m 20000 d 1000 and b 10000the time consumption for index building in X15 and G19 isnearly 100sim200 times that in our scheme respectively As mincreases the advantages of our scheme will become evenmore significant

In addition because the index tree has O (n) nodes andeach node holds two d-dimensional vectors the spacecomplexity of the index tree is O (nd) By contrast the spacecomplexities of the index tree in X15 and G19 are both O(nm) From Table 3 even if we set n 1000 m 20000d 1000 and b 10000 the storage cost of the index tree inour scheme is still much less than that in X15 and G19

412 Trapdoor Generation In our scheme the query isconverted to be two vectors qmin

rarr and qmaxrarr whose dimen-

sions are both d )e trapdoor generation process is tomultiply these two vectors by the d times d matrices in the keySo the time complexity of trapdoor generation in ourscheme is O (d2) By contrast since the dimensions of queryvectors in X15 and G19 are both m the time complexities oftrapdoor generation are both O (m2) )us the time cost oftrapdoor generation of our scheme is much less than that inX15 and G19 Particularly from Figure 6(b) when n 1000m 20000 and d 1000 the time cost for trapdoor gener-ation in our scheme is 15ms while that in G19 and X15 is287ms and 290ms respectively

413 Search In the search process if the relevance score ofan internal node u and the queryQ is less than the minimumrelevance score of the current top-k documents the subtreewhich uses node u as the root node will not be accessed)us not all of the nodes in the tree will be accessed duringthe search process We suppose that there are θ leaf nodesthat contain at least one keyword in the query Q Since theheight of the tree is O (log n) and the time complexity of therelevance score calculation is O (d) the time complexity ofthe search process is O (θ d middot log n) For the scheme X15because the time complexity of relevance score calculation isO (m) the time complexity of the search process is O(θm middot log n) in X15 For the scheme G19 because each in-ternal node contains a Bloom filter whose size is b and eachleaf node involves a vector whose size is m the timecomplexity of search process in G19 is O (θ(m + b middot log n))From Figure 6(c) when n 1000 m 20000 d 1000 andb 10000 the search time cost in our scheme is 36ms whilethat in G19 and X15 is 135ms and 214ms respectively

414 Update When the data owners want to insert or deletea document they will not only insert or delete a leaf nodebut also update O (log n) internal nodes Since the en-cryption time for each node is O (d2) the time complexity ofan update operation isO (log nmiddotd2) For X15 scheme becausethe encryption time for each node is O (m2) the timecomplexity of an update operation is O (log nmiddotm2) For G19scheme because the internal nodes are based on the Bloomfilter which is not encrypted the time cost for updating theinternal nodes can be ignored )us the time complexity ofupdate in G19 isO (m2) since only the leaf node is encryptedFrom Figure 6(d) when n 1000 m 20000 d 1000 andb 10000 the time cost for updating one document in ourscheme is 16ms while that in X15 and G19 is 1020ms and107ms respectively

42 Precision and Privacy )e search precision of ourscheme is affected by a group of semantic keywords relatedto the original index and query keywords We measure ourscheme by adopting a metric called ldquoprecisionrdquo defined in[12] )e metric of precision is defined as follows

Pk kprimek

(8)

where kprime is the number of real top-k documents in theretrieved k documents

In addition the semantic keywords in the index andquery keyword set will disturb the relevance score calcu-lation in the search process which makes it harder foradversaries to identify keywords in the index and trapdoorthrough the statistical information about the dataset Tomeasure the disturbance extent of the relevance score we usethe following equation called ldquorank privacyrdquo introduced in[12] to quantify this obscureness

Pkprime 1113944

ri minus riprime

11138681113868111386811138681113868111386811138681113868

k2 (9)

where ri is the rank number of the document i in the re-trieved top-k documents and ri

prime is document iprimes real ranknumber in the real ranked results

We compare our scheme to the schemes of X15 and G19in terms of ldquoprecisionrdquo and ldquorank privacyrdquo Note that animportant parameter in the previous two schemes is astandard deviation σ which is utilized to adjust the relevancescore for the dummy keywords In the comparison we setσ 005 which is usually used in the previous schemesBesides in our scheme we set the number of semantickeywords for each keyword in the dictionary is 100 and thedimension of each nodersquos vector is 1000 (d 1000) Based onthese settings the comparison is illustrated in Figure 7

From Figure 7 as k grows from 10 to 50 the precision ofour scheme decreases slightly from 59 to 55 and the rankprivacy increases slightly from 26 to 28 For the schemesX15 and G19 the precision decreases and the rank privacyincreases when k grows )is characteristic exists in all threeschemes Because the vector representations for the indextree and query in our scheme are compressed deeply somestatistical information in the index and the query will be lost

Security and Communication Networks 11

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 12: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

0

100

300

500

Tim

e cos

t of i

ndex

bui

ldin

g (1

03 ms)

200001600012000Dictionary size

Scheme nameX15G19Ours

(a)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of t

rapd

oor g

ener

atio

n (1

03 ms)

16000 2000012000Dictionary size

(b)

Scheme nameX15G19Ours

16000 2000012000Dictionary size

00

01

02

03

04

05

Tim

e cos

t of s

earc

h (1

03 ms)

(c)

Scheme nameX15G19Ours

00

01

02

03

04

05

Tim

e cos

t of u

pdat

ing

(103 m

s)

16000 2000012000Dictionary size

(d)

Figure 6 Impact of m on the time cost of index building (a) trapdoor generation (b) search (c) and update (d) (n 1000 d 1000b 10000 and m (12000 14000 16000 18000 20000))

Table 3 Storage consumption of the index tree (MB)

Dictionary size [14] [15] Vector dimension Proposedm 12000 188 174 d 200 7m 14000 219 190 d 400 14m 16000 251 206 d 600 20m 18000 283 222 d 800 26m 20000 315 238 d 1000 33

12 Security and Communication Networks

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 13: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

Scheme nameX15G19Ours

0

20

40

60

80Pr

ecisi

on (

)

30 40 5010 20Number of retrieved documents

(a)

Scheme nameX15G19Ours

0

20

40

60

80

20 30 40 5010Number of retrieved documents

Rank

priv

acy

()

(b)

Figure 7 )e precision (a) and rank privacy (b) of searches with different numbers of retrieved documents (n 1000 d 1000 b 10000m 12000 and σ 005)

0

2

4

6

8

10

12

Tim

e cos

t (s)

400 600 800 1000200Vector size

Index building

(a)

000

002

004

006

008

010

Tim

e cos

t (s)

400 600 800 1000200Vector size

Trapdoor generationSearchUpdate

(b)

Figure 8 Impact of d on the time cost of index building (a) and trapdoor generation search and update (b) (n 1000 and d (200 400 600800 1000))

Security and Communication Networks 13

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 14: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

)us the precision of our scheme is less than that in X15 andG19 However the rank privacy in our scheme is accordinglymore than that in X15 and G19

43 Impact of the Dimension of Vector Representation)e dimension of the vector representation (d) which we setin the ldquoWord2vecrdquo is an important parameter in our schemeNext we give the discussion of the impact of d for ourscheme )e impact of d on the efficiency of our scheme isgiven in Figure 8 From Figure 8 we know that the time costsof index building trapdoor generation search and updateall increase when d grows Besides Figure 9 gives an il-lustration of the impact of d on the precision and rankprivacy in our scheme As d increases from 200 to 1000 theprecision of our scheme increases slightly while the rankprivacy decreases gradually accordingly )ese phenomenaare all consistent with our previous theoretical analysis Soin the proposed scheme data users can balance the efficiencyand accuracy by adjusting the parameter d to satisfy therequirements of different applications

44Discussion From the experiment results when n 1000m 20000 d 200 and b 10000 the time cost of indexbuilding is 3 s the generation time of a single trapdoor is15ms and the search time is 36ms which are all muchbetter than the previous schemes X15 and G19 Efficiency inour scheme demonstrates that our scheme is extremelysuitable for practical applications especially the mobilecloud setting in which the clients have limited computationand storage resources

)e experiment result shows that the precision of ourscheme is less than that in the previous two schemes whilethe rank privacy is more than that in the previous schemesaccordingly In addition by using the ldquoWord2vecrdquo methodthe vector representations used in our scheme contain thesemantic information of the documents and queries Basedon these facts we argue that the proposed scheme is suitablefor applications requiring similarity and semantic searchsuch as mobile recommendation system mobile searchengine and online shopping system

5 Conclusions

In this paper by applying ldquoWord2Vecrdquo to construct thevector representations of the documents and queries andadopting the balanced binary tree to index the documentswe proposed a searchable symmetric encryption schemesupporting dynamic multikeyword ranked search Com-pared with the previous schemes our scheme can tre-mendously reduce the time costs of index building trapdoorgeneration search and update Moreover the storage cost ofthe secure index is also reduced significantly Consideringthat the precision of our scheme can be further improved wewill construct a more accurate scheme based on the recentinformation retrieval techniques in the future work

Data Availability

)e data used to support the findings of this study isavailable from the following website Httpwwwcscmuedusimenron

Vector sized = 200d = 400

d = 800d = 1000

d = 600

20 30 40 5010Number of retrieved documents

50

52

54

56

58

60Pr

ecisi

on (

)

(a)

Vector sized = 200d = 400

d = 800d = 1000

d = 600

25

30

35

40

Rank

priv

acy

()

20 30 40 5010Number of retrieved documents

(b)

Figure 9 )e precision (a) and rank privacy (b) of searches with different vector dimensions (n 1000 and d (200 400 600 800 1000))

14 Security and Communication Networks

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 15: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

Conflicts of Interest

)e authors declare that they have no conflicts of interestregarding the publication of this paper

Acknowledgments

)e authors gratefully acknowledge the support of theNational Natural Science Foundation of China under Grantsnos 61402393 and 61601396 and the Nanhu Scholars Pro-gram for Young Scholars of XYNU

References

[1] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searching on encrypted datardquo in Proceedings of the 2000IEEE Symposium on Research in Security and Privacy Ber-keley CA USA May 2000

[2] Y Zhu D Ma and S Wang ldquoSecure data retrieval of out-sourced data with complex query supportrdquo in Proceedings ofthe 2012 32nd International Conference on DistributedComputing Systems Workshops pp 481ndash490 Macau ChinaJune 2012

[3] Z Fu K Ren J Shu X Sun and F Huang ldquoEnablingpersonalized search over encrypted outsourced data withefficiency improvementrdquo IEEE Transactions on Parallel andDistributed Systems vol 27 no 9 pp 2546ndash2559 2015

[4] E J Goh ldquoSecure indexesrdquo IACR Cryptology ePrint Archivevol 2003 p 216 2003

[5] R Curtmola J Garay S Kamara and R OstrovskyldquoSearchable symmetric encryption improved definitions andefficient constructionsrdquo Journal of Computer Security vol 19no 5 pp 895ndash934 2011

[6] J W Byun D H Lee and J Lim ldquoEfficient conjunctivekeyword search on encrypted data storage systemrdquo EuropeanPublic Key Infrastructure Workshop Springer Berlin Ger-many pp 184ndash196 2006

[7] L Ballard S Kamara and F Monrose ldquoAchieving efficientconjunctive keyword searches over encrypted datardquo Infor-mation and Communications Security Springer BerlinGermany pp 414ndash426 2005

[8] Z Fu X Wu C Guan X Sun and K Ren ldquoToward efficientmulti-keyword fuzzy search over encrypted outsourced datawith accuracy improvementrdquo IEEE Transactions on Infor-mation Forensics and Security vol 11 no 12 pp 2706ndash27162017

[9] M Kuzu M S Islam and M Kantarcioglu ldquoEfficient simi-larity search over encrypted datardquo in Proceedings of the 2012IEEE 28th International Conference on Data Engineeringpp 1156ndash1167 Washington DC USA April 2012

[10] S Zerr D Olmedilla W Nejdl and W Siberski ldquoZerber + rtop-k retrieval from a confidential indexrdquo in Proceedings of the12th International Conference on Extending Database Tech-nology Advances in Database Technology pp 439ndash449 SaintPetersburg Russia March 2009

[11] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquoIEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[12] N Cao C Wang M Li et al ldquoPrivacy-preserving multi-keyword ranked search over encrypted cloud datardquo IEEETransactions on Parallel and Distributed Systems vol 25 no 1pp 222ndash233 2013

[13] W Sun B Wang N Cao et al ldquoPrivacy-preserving multi-keyword text search in the cloud supporting similarity-basedrankingrdquo in Proceedings of the 8th ACM SIGSAC Symposiumon Information Computer and Communications Securitypp 71ndash82 Hangzhou China 2013

[14] Z Xia XWang X Sun and QWang ldquoA secure and dynamicmulti-keyword ranked search scheme over encrypted clouddatardquo IEEE Transactions on Parallel and Distributed Systemsvol 27 no 2 pp 340ndash352 2016

[15] C Guo R Zhuang C-C Chang and Q Yuan ldquoDynamicmulti-keyword ranked search based on bloom filter overencrypted cloud datardquo IEEE Access vol 7 pp 35826ndash358372019

[16] D Cash S Jarecki C Jutla et al ldquoHighly-scalable searchablesymmetric encryption with support for boolean queriesrdquoAnnual Cryptology Conference Springer Berlin Germanypp 353ndash373 2013

[17] D Cash J Jaeger S Jarecki et al ldquoDynamic searchableencryption in very-large databases data structures andimplementationrdquo in Proceedings of the Network and Dis-tributed System Security Symposium pp 23ndash26 San DiegoCA USA February 2014

[18] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquo Communications of the ACM vol 13 no 7pp 422ndash426 1970

[19] D Boneh G D Crescenzo R Ostrovsky et al ldquoPublic keyencryption with keyword searchrdquo International Conference onthe 2eory and Applications of Cryptographic Techniquespp 506ndash522 Springer Berlin Germany 2004

[20] Y Zhang Y Li and Y Wang ldquoConjunctive and disjunctivekeyword search over encrypted mobile cloud data in publickey systemrdquoMobile Information Systems vol 2018 Article ID3839254 11 pages 2018

[21] J Katz A Sahai and B Waters ldquoPredicate encryption sup-porting disjunctions polynomial equations and innerproductsrdquo Advances in CryptologyndashEUROCRYPT 2008pp 146ndash162 Springer Berlin Germany 2008

[22] Y Zhang Y Li and Y Wang ldquoSecure and efficient searchablepublic key encryption for resource constrained environmentbased on pairings under prime order grouprdquo Security andCommunication Networks vol 2019 Article ID 528080614 pages 2019

[23] Y Wu J Hou J Liu W Zhou and S Yao ldquoNovel multi-keyword search on encrypted data in the cloudrdquo IEEE Accessvol 7 pp 31984ndash31996 2019

[24] P Xu Q Wu W Wang W Susilo J Domingo-Ferrer andH Jin ldquoGenerating searchable public-key ciphertexts withhidden structures for fast keyword searchrdquo IEEE Transactionson Information Forensics and Security vol 10 no 9pp 1993ndash2006 2017

[25] P Xu S He W Wang W Susilo and H Jin ldquoLightweightsearchable public-key encryption for cloud-assisted wirelesssensor networksrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 8 pp 3712ndash3723 2017

[26] F Han J Qin H Zhao and J Hu ldquoA general transformationfrom KP-ABE to searchable encryptionrdquo Future GenerationComputer Systems vol 30 pp 107ndash115 2014

[27] H Kai G Jun W Jian J Weng J K Liu and X Yi ldquoAt-tribute-based hybrid boolean keyword search over outsourcedencrypted datardquo IEEE Transactions on Dependable and SecureComputing p 1 2018

[28] M Sepehri S Cimato E Damiani and C Y Yeun ldquoDatasharing on the cloud a scalable proxy-based protocol forprivacy-preserving queriesrdquo in Proceedings of the 2015 IEEE

Security and Communication Networks 15

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks

Page 16: EfficientSearchableSymmetricEncryptionSupportingDynamic ...downloads.hindawi.com/journals/scn/2020/7298518.pdf · ResearchArticle EfficientSearchableSymmetricEncryptionSupportingDynamic

TrustcomBigDataSEISPA pp 1357ndash1362 Helsinki FinlandAugust 2015

[29] M Sepehri S Cimato and E Damiani ldquoEfficient imple-mentation of a proxy-based protocol for data sharing on thecloudrdquo in Proceedings of the Fifth ACM InternationalWorkshop on Security in Cloud Computing pp 67ndash74 NewYork NY USA April 2017

[30] Y Zhang Y Wang and Y Li ldquoSearchable public key en-cryption supporting semantic multi-keywords searchrdquo IEEEAccess vol 7 pp 122078ndash122090 2019

[31] T Mikolov K Chen G Corrado et al ldquoEfficient estimation ofword representations in vector spacerdquo 2013 httpsarxivorgabs13013781

[32] W K Wong D W-L Cheung B Kao and N MamoulisldquoSecure kNN computation on encrypted databasesrdquo in Pro-ceedings of the 2009 ACM SIGMOD International Conferenceon Management of Data pp 139ndash152 New York NY USA2009

[33] S Zerr E Demidova D Olmedilla W Nejdl M Winslettand S Mitra ldquoZerber r-confidential indexing for distributeddocumentsrdquo in Proceedings of the 11th International Con-ference on Extending Database Technology Advances in Da-tabase Technology pp 287ndash298 Nantes France March 2008

[34] C D Manning P Raghavan and H SchAtildeijtze Introductionto Information Retrieval Cambridge University Press Cam-bridge UK 2008

[35] W W Cohen ldquoEnron E-mail datasetrdquo httpwwwcscmuedusimenron

16 Security and Communication Networks


Recommended