Http://rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent...

Post on 01-Jan-2016

216 views 1 download

transcript

http://www.cs.ucla.edu/~rafail/

Private Keyword Private Keyword Search on Search on

Streaming DataStreaming Data

Rafail Ostrovsky William Skeith UCLA

(patent pending)(patent pending)

Motivating ExampleMotivating Example

The intelligence community collects data The intelligence community collects data from multiple sources that might potentially from multiple sources that might potentially be “useful” for future analysis.be “useful” for future analysis. Network trafficNetwork traffic Chat roomsChat rooms Web sites, etc…Web sites, etc…

However, what is “useful” is often However, what is “useful” is often classified.classified.

Current PracticeCurrent Practice

Continuously transfer all data to a Continuously transfer all data to a secure environment.secure environment.

After data is transferred, filter in the After data is transferred, filter in the classified environment, keep only classified environment, keep only small fraction of documents.small fraction of documents.

¢¢¢! D(1,3)! D(1,2)! D(1,1)!

¢¢¢! D(2,3)! D(2,2)! D(2,1)!

¢¢¢! D(3,3)! D(3,2)! D(3,1)!

Classified EnvironmentClassified Environment

FilterFilter StorageStorageD(3,1)D(1,1)D(1,2)D(2,2)D(2,3)D(3,2)D(2,1)D(1,3)D(3,3)

Filter rules are Filter rules are

written by an written by an

analyst and are analyst and are

classified!classified!

Current PracticeCurrent Practice

Drawbacks:Drawbacks:CommunicationCommunicationProcessingProcessing

How to improve performance?How to improve performance?

Distribute work to many locations on Distribute work to many locations on a networka network

Seemingly ideal solution, but…Seemingly ideal solution, but…Major problem:Major problem:

Not clear how to maintain privacy, which Not clear how to maintain privacy, which is the focus of this talkis the focus of this talk

¢¢¢! D(1,3)! D(1,2)! D(1,1)!

¢¢¢! D(2,3)! D(2,2)! D(2,1)!

¢¢¢! D(3,3)! D(3,2)! D(3,1)!

Classified Classified EnvironmentEnvironmentFilterFilter

StorageStorage

EE (D(D(1,2)(1,2)))

EE (D(D(1,3)(1,3)))

FilterFilter

StorageStorage

EE (D(D(2,2)(2,2)))

FilterFilter

StorageStorage

DecryptDecrypt

StorageStorage

DD(1,2)(1,2)

DD(1,3)(1,3)

DD(2,2)(2,2)

Example Filter:Example Filter:Look for all documents that contain special Look for all documents that contain special

classified keywords, selected by an analystclassified keywords, selected by an analystPerhaps an alias of a dangerous criminalPerhaps an alias of a dangerous criminal

PrivacyPrivacyMust hide what words are used to create the Must hide what words are used to create the

filterfilterOutput must be encryptedOutput must be encrypted

More generally:More generally:

We define the notion of Public Key We define the notion of Public Key Program ObfuscationProgram Obfuscation

Encrypted version of a programEncrypted version of a programPerforms same functionality as un-obfuscated Performs same functionality as un-obfuscated

program, but:program, but:Produces encrypted outputProduces encrypted output Impossible to reverse engineerImpossible to reverse engineer

A little more formally:A little more formally:

Public Key Program ObfuscationPublic Key Program Obfuscation

PrivacyPrivacy

Related NotionsRelated Notions

PIR (Private Information Retrieval) PIR (Private Information Retrieval) [CGKS],[KO],[CMS]…[CGKS],[KO],[CMS]…

Keyword PIR [KO],[CGN],[FIPR]Keyword PIR [KO],[CGN],[FIPR]Program Obfuscation [BGIRSVY]…Program Obfuscation [BGIRSVY]…

Here output is identical to un-obfuscated Here output is identical to un-obfuscated program, but in our case it is encrypted.program, but in our case it is encrypted.

Public Key Program ObfuscationPublic Key Program ObfuscationA more general notion than PIR, with lots of A more general notion than PIR, with lots of

applicationsapplications

What we wantWhat we want

¢¢¢! D(1,3)! D(1,2)! D(1,1)! FilterFilterStorageStorage

This is matching document #2

This is a Non-matching document

This is matching document #1

This is matching document #3

This is a Non-matching document

This is a Non-matching document

How to accomplish this?How to accomplish this?

Several Solutions based on Several Solutions based on Homomorphic EncryptionsHomomorphic Encryptions

For this talk: Paillier EncryptionFor this talk: Paillier Encryption Properties:Properties:

Plaintext set = Plaintext set = ZZnn

Ciphertext set = Ciphertext set = ZZ**nn22

Homomorphic, i.e., Homomorphic, i.e., EE(x)(x)EE(y) = (y) = EE(x+y)(x+y)

Simplifying Assumptions for this Simplifying Assumptions for this TalkTalk

All keywords come from some poly-size All keywords come from some poly-size dictionarydictionary

Truncate documents beyond a certain Truncate documents beyond a certain lengthlength

wwt-2t-2 EE(1)(1)

wwt-1t-1 EE(0)(0)

wwtt EE(0)(0)

ww11 EE(0)(0)

ww22 EE(1)(1)

ww33 EE(0)(0)

ww44 EE(0)(0)

ww55 EE(1)(1)

.

.

.

D

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

EE(0(0))

(g,gD)

¤=

¤=

¤=

Dic

tiona

ry

Output Buffer

This is matching document #1

This is matching document#3

This is matching document #2

Here’s another matching document

Collisions cause two problems:

1. Good documents are destroyed

2. Non-existent documents could be fabricated

We’ll make use of two We’ll make use of two combinatorial lemmas…combinatorial lemmas…

How to detect collisions?How to detect collisions?

Append a highly structured, (yet random) Append a highly structured, (yet random) k-bit string to the messagek-bit string to the message

The sum of two or more such strings will The sum of two or more such strings will be another such string with negligible be another such string with negligible probability in kprobability in k

Specifically, partition k bits into triples of Specifically, partition k bits into triples of bits, and set exactly one bit from each bits, and set exactly one bit from each triple to 1triple to 1

100|001|100|010|010|100|001|010|010100|001|100|010|010|100|001|010|010

010|001|010|001|100|001|100|001|010010|001|010|001|100|001|100|001|010

010|100|100|100|010|001|010|001|010010|100|100|100|010|001|010|001|010

100|100|010|100|100|010|111111|100|100||100|100|111111|010|010|010|010

==

Detecting Overflow > mDetecting Overflow > m

Double buffer size from m to 2mDouble buffer size from m to 2m If m < #documents < 2m, output “overflow”If m < #documents < 2m, output “overflow” If #documents > 2m, then expected If #documents > 2m, then expected

number of collisions is large, thus output number of collisions is large, thus output “overflow” in this case as well.“overflow” in this case as well.

Not yet in eprint version, will appear soon, as well as some other Not yet in eprint version, will appear soon, as well as some other

extensionsextensions. .

More from the paper that we don’t More from the paper that we don’t have time to discuss…have time to discuss…

Reducing program size below dictionary Reducing program size below dictionary size (using size (using – Hiding from [CMS]) – Hiding from [CMS])

Queries containing AND (using [BGN] Queries containing AND (using [BGN] machinery)machinery)

Eliminating negligible error (using perfect Eliminating negligible error (using perfect hashing)hashing)

Scheme based on arbitrary homomorphic Scheme based on arbitrary homomorphic encryptionencryption

ConclusionsConclusions

Private searching on streaming dataPrivate searching on streaming dataPublic key program obfuscation, more Public key program obfuscation, more

general than PIRgeneral than PIRPractical, efficient protocolsPractical, efficient protocolsMany open problemsMany open problems

Thanks For Listening!