1
Steven Whang, Hector Garcia-Molina Stanford University
Managing Information Leakage
2
3
4
5
6
7
8
Information Leakage
Joseph
9
Joseph
Information Leakage
10
Model
• p = {<N,n1>,<C,c1>,<C,c2>,<P,p1>,<A,a1>} • D ={s={<N,n1>,<C,c1>,<P,p1>},
t= {<N,n1>,<C,c2>}} • M(x,y) = true if same N,C or N,P • µ(x,y) = x U y
11
Record Leakage
• p = {<N,n1>,<C,c1>,<C,c2>,<P,p1>,<A,a1>} • r = {<N,n1>,<C,c1>,<P,x>} • Lr(p, r) = |p∩r|-|r-p| = 2-1 = 1 – In general, Lr can be any function
12
Query Leakage
• q = {<N,n1>,<C,c1>}
13
Query Leakage
• q = {<N,n1>,<C,c1>}
14
Query Leakage
• q = {<N,n1>,<C,c1>}
15
q
t s
Query Leakage
• q = {<N,n1>,<C,c1>}
16
q
t s
µ(q,s)
Query Leakage
• q = {<N,n1>,<C,c1>} • Lq(p, q, D)
= max{Lr(p, µ(q,s))} = max{3} = 3
17
q
t s
µ(q,s)
Database Leakage
• Ld(p, D) = max{Lq(p, s, D-{s}), Lq(p, t, D-{t})} = max{Lr(p, s), Lr(p, t)} = max{3, 2} = 3
18
Key Features
• Incorporated Entity Resolution • Privacy: NOT all or nothing
• Uncertainty • Incorrect Information
19
No privacy Perfect privacy
Interesting Problems
• Releasing Critical Information • Comparing Entity Resolution Algorithms • Releasing Disinformation
20
Releasing Disinformation
Joseph
21
Joseph
Releasing Disinformation
22
Joseph
XXX
Releasing Disinformation
23
YYY
Joseph
XXX
Releasing Disinformation
24
Releasing Disinformation
• Minimize Ld(p, DUS) s.t.
25
Conclusion
• We have formalized information leakage – Incorporated Entity Resolution – Privacy: NOT all or nothing – Uncertainty – Incorrect Information
• We have listed several challenges for managing information leakage
26
Thanks!
27
Releasing Critical Information
• u={<N,n1>,<C,c2>,<P,p1>} • Ld(p, D1) = 3 • Ld(p, D2) = 3
28
Online shopping websites
Releasing Critical Information
• u={<N,n1>,<C,c2>,<P,p1>} • Ld(p, D1) = 3 Ld(p, D1U{u}) = 3 • Ld(p, D2) = 3 Ld(p, D2U{u}) = 4
29
Online shopping websites
Related Work
• ReputationDefender – Promotes positive information
• Track-Me-Not – Obfuscates search queries
30
Current Work
• Model – Distinguish attributes, better leakage measures,
update/delete, utility, privacy measure, …
• Implementation – Bogus creation, scalability, …
• More problems – Negative effect of disinformation, promoting good
information, enhance record, check hypothesis, …
31