Fast, Private and Verifiable:Server-aided Approximate
Similarity Computation overLarge-Scale Datasets
Shuo Qiu, Boyang Wang, Ming Li, Jesse Victors, Jiqiang Liu,Yangfeng Shi, Wei Wang
4th ACM International Workshop on Security in Cloud ComputingSCC 2016
Xi’an - China - 2016
SWIM SeminarAugust 4, 2016Mateus Cruz
Introduction Preliminaries Proposal Experiments Conclusion
OUTLINE
1 Introduction
2 Preliminaries
3 Proposal
4 Experiments
5 Conclusion
Introduction Preliminaries Proposal Experiments Conclusion
OUTLINE
1 Introduction
2 Preliminaries
3 Proposal
4 Experiments
5 Conclusion
Introduction Preliminaries Proposal Experiments Conclusion
OVERVIEW
Use of Jaccard similarity (JS)Privacy concerns
Ï Wants to disclose only the similarity
Previous approaches use MPC1
Ï High performance overheads
1Multi-party Computation1 / 16
Introduction Preliminaries Proposal Experiments Conclusion
CONTRIBUTIONS
Protocol 1Ï Assumes semi-honest serverÏ Uses MinHash and deterministic encryptionÏ Only leaks Jaccard similarity
Protocol 2Ï Uses Protocol 1Ï Verifies whether the server is malicious
2 / 16
Introduction Preliminaries Proposal Experiments Conclusion
OUTLINE
1 Introduction
2 Preliminaries
3 Proposal
4 Experiments
5 Conclusion
Introduction Preliminaries Proposal Experiments Conclusion
JACCARD SIMILARITY (JS)
Measure similarity between sets A and B
JS(A,B) = A∩BA∪B
ExampleA = {1,2,4}
B = {2,4,8,9}
JS(A,B) = A∩BA∪B = 2
5
3 / 16
Introduction Preliminaries Proposal Experiments Conclusion
MINHASH
Approximation of Jaccard similarityCalculates k hash functions: {h1, . . . ,hk}
Uses the minimum hash value: min{hi(A)}Generate signatures from sets
Ï Compact representations of setsÏ Signatures of A: h(k)(A) = {min{hi(A)}}k
i=1– Length k
JS(A,B) ≈ |h(k)(A)∩h(k)(B)|k
4 / 16
Introduction Preliminaries Proposal Experiments Conclusion
DETERMINISTIC ENCRYPTION
Same ciphertext for the same messageÏ m1 = m2 → Enc(m1) = Enc(m2)
Allows equality checksAlgorithms
Ï sk ←KeyGen(1λ)– Security parameter λ, secret key sk
Ï c ←Enc(sk,m)– Message m, ciphertext c
Ï m ←Dec(sk,c)Ï Dec(sk,Enc(sk,m)) = m
5 / 16
Introduction Preliminaries Proposal Experiments Conclusion
ADVERSARY MODEL
Semi-honest adversary (Protocol 1)Ï Follows the protocolÏ Tries to learn from the data
Malicious adversary (Protocol 2)Ï May not execute the protocol correctly
– Returns a random similarity (Case I)– Returns a partial result (Case II)– Returns a false approximation (Case III)
No collusion between parties
6 / 16
Introduction Preliminaries Proposal Experiments Conclusion
OUTLINE
1 Introduction
2 Preliminaries
3 Proposal
4 Experiments
5 Conclusion
Introduction Preliminaries Proposal Experiments Conclusion
PROBLEM DEFINITION
Calculate similarity between setsÏ Using Jaccard similarityÏ Alice has set AÏ Bob has set BÏ Compute similarity on remote server
Security requirementsÏ Alice, Bob and the server only learn JS(A,B)Ï Alice does not learn |B|Ï Bob does not learn |A|Ï The server does not learn |A|, |B|, |A∩B|
7 / 16
Introduction Preliminaries Proposal Experiments Conclusion
PROTOCOL 1 (SEMI-HONEST SERVER)
Each client...1 Computes MinHash signatures
– Using k shared hash functions2 Encrypts signatures
– Using deterministic encryption– Secret key shared between Alice and Bob
Allows equality checksbetween ciphertexts
3 Sends ciphertexts to the serverThe server...
4 Calculates the JS(A,B)– By comparing encrypted signatures
5 Returns JS(A,B) to clients
8 / 16
Introduction Preliminaries Proposal Experiments Conclusion
PROTOCOL 2 (MALICIOUS SERVER)
Two-round consistency checkRound 1
Ï Calculate JS(A,B)
Round 2Ï Calculate JS(DA,DB)
– DA = A∪S0 ∪S1
– DB = B∪S0 ∪S2
– S0,S1,S2 are disjoint dummy setsÏ Check JS(A,B) and JS(DA,DB)
– Find out whether the server is really malicious
9 / 16
Introduction Preliminaries Proposal Experiments Conclusion
ADDITIONAL NOTATION
|A| = |B| = n and |S0| = |S1| = |S2| = tε: Approximation bias
Ï ε= 1pk, k is the number of hash functions
σ: Real similarity between A and BÏ σ= |A∩B|
2n−|A∩B|σd: Real similarity between DA and DB
Ï σd = |A∩B|+t2n−|A∩B|+3t
σ1: Approx. similarity between A and BÏ σ1 ∈ [σ−ε,σ+ε]
σ2: Approx. similarity between DA and DBÏ σ2 ∈ [σd −ε,σd +ε]
10 / 16
Introduction Preliminaries Proposal Experiments Conclusion
CONSISTENCY CHECK
Can detect malicious serversApply a map f : σ→σd
Ï σd = f (σ) = (2n+t)σ+t3tσ+2n+3t
Given σ1 and σ2, Alice...Ï Outputs 1 if σ2 ∈ [f (σ1 −ε)−ε, f (σ1 +ε)+ε]Ï Outputs 0 otherwise
11 / 16
Introduction Preliminaries Proposal Experiments Conclusion
ACCURACY OF CONSISTENCY CHECK
Evaluates whether the check worksFalse positives
Ï Honest server, but check says it is maliciousFalse negatives
Ï Malicious server, but check says it is honest
12 / 16
Introduction Preliminaries Proposal Experiments Conclusion
OUTLINE
1 Introduction
2 Preliminaries
3 Proposal
4 Experiments
5 Conclusion
Introduction Preliminaries Proposal Experiments Conclusion
SETUP
HardwareÏ Client
– Windows Server 7 with 8 vCPUs– 14GB RAM
Ï Server– Windows Server 2012 with 8 vCPUs– 12GB RAM
SoftwareÏ C++Ï Crypto++ libraryÏ AES-ECB cryptosystem
13 / 16
Introduction Preliminaries Proposal Experiments Conclusion
EFFICIENCY
Pipeline modeÏ Single thread
Parallel modeÏ Multiple threadsÏ Calculate signatures concurrently
14 / 16
Introduction Preliminaries Proposal Experiments Conclusion
VERIFIABILITY
False Positive Rate (FPR)False Negative Rate (FNR)
15 / 16
Introduction Preliminaries Proposal Experiments Conclusion
OUTLINE
1 Introduction
2 Preliminaries
3 Proposal
4 Experiments
5 Conclusion
Introduction Preliminaries Proposal Experiments Conclusion
CONCLUSION
Secure and scalable similarity computationÏ Using MinHash and deterministic encryption
Benefits from parallel executionÏ Speedups of about 5 times
Detection of malicious serverÏ Can have false positives and false negatives
16 / 16
Detailed Protocols
PROTOCOL 1: SETUP
DE = {KeyGen,Enc,Dec}Ï Secret key sk ← DE.KeyGen(1λ)Ï sk shared between Alice and Bob
Alice has input A, and Bob has input BÏ |A| = |B| = n
k random hash functions {h1, . . . ,hk}
Detailed Protocols
PROTOCOL 1: STEPS
1 Alice (Bob) computes signatures of A (B)Ï h(k)(A) = {min{hi(A)k
i=1}}Ï h(k)(B) = {min{hi(B)k
i=1}}
2 Alice (Bob) calculates ciphertextsÏ TA ← DE.Enc(sk,h(k)(A))Ï TB ← DE.Enc(sk,h(k)(B))
3 Alice (Bob) sends TA (TB) to the server4 The server computes the similarity σ
Ï σ= |TA∩TB|k
5 The server returns σ to both clients
Detailed Protocols
PROTOCOL 2: SETUP
DE = {KeyGen,Enc,Dec}Ï Secret key sk ← DE.KeyGen(1λ)Ï sk shared between Alice and Bob
Alice has input A, and Bob has input BÏ |A| = |B| = nÏ A,B ⊆D⊆ E
– E is the whole data space
k random hash functions {h1, . . . ,hk}
Detailed Protocols
PROTOCOL 2: STEPS
1 Alice chooses dummy sets S0,S1,S2Ï S0,S1,S2 ⊆D ′ ⊆ EÏ D∩D ′ =;
– A,B ⊆DÏ S0 ∩S1 ∩S2 =;Ï |S0| = |S1| = |S2| = t
– |A| = |B| = n
2 Alice and Bob obtain JS(A,B) = |TA∩TB|k
Ï Following Protocol 13 Alice (Bob) generate DA (DB)
Ï DA = A∪S0 ∪S1Ï DB = B∪S0 ∪S2