Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
Privacy Preserving Indexing of Documents on
the Network
Mayank BawaRoberto J. Bayardo Jr.
Rakesh [email protected]
Sharing Private Content
• Rapid growth in Private & Semi-Private information on the network – Experimental results of drug tests– Drafts of research papers, patents,…– Architectural CAD documents
• Mechanisms to search information have failed to keep pace– Public Information: Google, Yahoo!– Private Information: ???
Talk Overview
1. Content Privacy issues in sharing access-controlled content
2. Data structure for search on access-controlled content
3. Algorithm for building such a data structure
Provider
• Shares documents• Enforces access policy
P1
Alzheimer’s Disease (Alice, Bob)
AIDS (Alice)
…
Small-Pox (Alice, Bob, Lisa, …)
P1 P2 P3
P32 P2026
Searcher
• Wants documents that match her keyword query Q
• Has an identity
Alice
P1 P2 P3
P32 P2026
Q = “Amyloid Peptide”
Automating Search
A searcher s issues a query q expecting a set of documents d such that
1. d is shared by some provider p
2. d matches the query q
3. d is accessible to s as dictated by p’s access policy
Content Privacy
An adversary A should not be able to deduce, using the search mechanism, that provider P is sharing document d with keywords q unless A has been granted access to d by P
Soln #1: Document Index
P2 P1 P3
P32 P2026
Alice
Q = “Amyloid Peptide”
Inverted Index
P1
Documents
Access Policy
?Alice
Soln #2: Keyword Index
P2 P1 P3
P32 P2026
Alice/George
Q = “Amyloid Peptide”
Keyword Index
P1
Keywords
Soln #2: Keyword Index
P2 P1 P3
P32 P2026
Alice/George
P1 has a document with
words “Amyloid Peptide”
Keyword Index
Keyword Index
ti {p: ti d,provider(d)= p}
ExampleAmyloid {…, P1, …}Peptide {…, P1, …}
Problem Cause Every term is mapped precisely
Soln #2: Keyword Index
Intuition
Add “false positives”
Example
Amyloid {…, P1, P2,…}
Peptide {…, P1, P2,…}
Soln #3: Privacy Preserving Index
Soln #3: Privacy Preserving Index (PPI)
P2 P1 P3
P32 P2026
Alice/George
Q = “Amyloid Peptide”
Privacy Preserving Index
P1
P2
Soln #3: Privacy Preserving Index (PPI)
P2 P1 P3
P32 P2026
Alice/George
P1 or P2 may have a document
with words “Amyloid Peptide”
Privacy Preserving Index
Soln #3: Privacy Preserving Index
Privacy Preserving Index
ti M P
[A] M = only if dj:ti dj
[B] M = Ptrue Pfalse,|Pfalse| |Ptrue|
[C] M = P
Completeness, Quantifiable Privacy on Reiter-Rubin scale, Loss in Selectivity
Consistency of Behavior
1. Results for “Peptide” should tally with results from searches earlier
2. Results for “Amyloid Peptide” “Amyloid” and “Peptide” should tally
3. …
Filtering of “noise” impossible
Step 3:Group (OR) Vector
]1log[,3max(
10:Error
)}1(78{
)1(1
c
r
Theorem: After r rounds, the Group Vector
subsumes with prob. 1iGiV
Searches
P2 P1 P3
P32 P2026Group A
Group F
Group S
Keyword Index (PPI)
Alice/George
Q = “Amyloid Peptide”
Group
F
Intuition:3.Group Vector
Group Vector is a logical OR => Members are indistinguishable
Privacy size of group
Intuition:3.Group Vector
Group Vector is a logical OR => Members are indistinguishable
Privacy size of group
Search Cost size of group
Privacy vs Performance Tradeoff
Evaluation Procedure
• YouServ: Personal web-server deployed within IBM corporate intranet since 2001
• Content from 324 YouServ web-servers
• Partitioned into privacy groups of size C
• Query Set consisting of 100 queries chosen randomly from YouServ query logs
Summary
• Searches on access-controlled data– Privacy Preserving Indexes– Randomized Construction
• Project Home– Google: Stanford Peers– Google: IBM YouServ
Comments & Questions
• Google: Stanford Peers– http://www-db.stanford.edu/peers
• Google: IBM YouServ– http://almaden.ibm.com/cs/people/
bayardo/userv
Growing Privacy Concerns
• Popular Press– Economist: The End of Privacy(’99)– Time: The Death of Privacy(’97)
• Govt. Directives/Commissions– European Union Directive on Privacy Protection(’98)
– Canadian Personal Information Protection Act(’01)
Context
“The misuse of subpoena process by an adult entertainment company emphasizes the potential for abuse with insufficient privacy protections in the law.”
--- Cindy Cohen(Legal Director, Electronic Frontier Foundation)
Context
“Better support for anonymity and privacy is sorely needed […] amid the RIAA’s campaign to subpoena information about customers.”
--- Wendy Seltzer
(Staff Attorney, Electronic Frontier Foundation)
Growing Privacy Concerns
In 07/2003, the RIAA began filing - at the rate of 75 or more per day – DMCA Section 512(h) subpoenas to force ISPs to identify file sharers.
DMCA 512(h) subpoenas are issued without prior judicial review […and so…] may be used to obtain identity information in cases where there is no copyright infringement.
Growing Privacy Concerns
• Unfair Walmart/KMart against a customer who posted their prices at a comparison-shopping site
• Errors RIAA against Prof. Usher at Penn State Dept. of Astronomy & Astrophysics [+dozen other cases]
• Vested A person against ISPs to erase record of his past messages
• Others Against Internet Archive,…
Adversary
Passive (observes sent messages: queries, responses, indexes)
Active (acts deliberately: searcher, provider, indexer)
Global/Local view
Collude/Independent actions
Search Methodology
Privacy Preserving Index
ti M P
[A] M = only if dj:ti dj
[B] M = Ptrue Pfalse,|Pfalse| |Ptrue|
[C] M = P
Loss in Selectivity |Pfalse|/|Ptrue| for [B]; at most 2 for [C]
Search Methodology
Privacy Preserving Index
ti M P
[A] M = only if dj:ti dj
[B] M = Ptrue Pfalse,|Pfalse| |Ptrue|
[C] M = P
Correctness No true positives excluded; provider enforces access control
Search Methodology
Privacy Preserving Index
ti M P
[A] M = only if dj:ti dj
[B] M = Ptrue Pfalse,|Pfalse| |Ptrue|
[C] M = P
Privacy All providers equivalent in [A,C]
0 1/2 1
[B]
3.Constructing OR Vector
Group F outi
ii
ini
ii
ii
PprobwithB
Bbifelse
PprobwithB
Bbifelse
nopBbif
. 0
)10(
. 1
)01(
)(
inout
in
PP
P
Start
1 2
1
: iBib
3.Constructing OR Vector
Group F outi
ii
ini
ii
ii
PprobwithB
Bbifelse
PprobwithB
Bbifelse
nopBbif
. 0
)10(
. 1
)01(
)(
inout
inin
PP
PP
RoundEvery
1 2
: ib iB
Construction Properties
Completeness: For any query q, the result set Mq contains all providers that share documents matching q
Correctness: The mapping Mq is expected to be a Privacy Preserving Index
Construction Properties
Privacy: Within a privacy group G, an active adversary can only breach its neighbor’s privacy with probability < 0.71 (Possible Innocence)
0 1/2 1