Auditing Batches of SQL Queries
Rajeev MotwaniShubha NabarDilys Thomas
Stanford University
Database Query Auditing
• Auditing Aggregate (Sum, Max, Median) queries
• Perfect Privacy
• Auditing SQL Queries
• Auditing a Batch of SQL Queries
Aggregate Queries• [C86] Chin: Security problems on inference control for
sum, max and min queries. JACM 1986• [CO82] Chin, Ozsoyglu: Auditing and inference control
in statistical databases. TSE 1982• [DJL79] Dobkin, Jones, Lipton: Secure Databases:
Protection against user influence. TODS 1979• [KMN05] Kenthapadi, Mishra, Nissim: Simulatable
auditing. PODS 2005• [KPR00] Kleinberg, Papadimitriou, Raghavan: Auditing
Boolean Attributes. PODS 2000• [R79] S. P. Reiss. Security in databases: A
combinatorial study. JACM 1979
Aggregate Queries
• How many aggregate queries: sum / max / median queries can you pose to a database of numbers before you find out the value of an element
• Some amount of work in the 80’s
• Theoretically interesting and basis of more practical schemes today
Perfect Privacy
• [MS04] Miklau, Suciu: A formal analysis of information disclosure in data exchange. SIGMOD 2004
• [MG06] Machanavajhala, Gehrke: On the efficiency of checking perfect privacy. PODS 2006
Perfect Privacy[MS04,MG06]
• Table Patient(Name, Phone number)• Want to keep secret: All phone-numbers in
the database• Query: select name from Patient• Perfect Privacy violation!• Reveals some information --- the phone
database is not empty.• Too strong
SQL Auditing: Single Table
• Audit for address, SSN and phone numbers of all patients with diabetes
• Say Alice has diabetes• Then any query that returns the address, SSN
and phone number of Alice is suspicious wrt to the audit expression
[ABFKRS04] Agrawal, Bayardo, Faloutsos,
Kiernan, Rantzau, Srikant: Auditing compliance with a Hippocratic Database VLDB2004
Auditing SQL Queries[ABFKRS04]
• An audit expression is like a SQL Query
AUDIT audit list
FROM table list
WHERE condition list
ExampleSELECT zipcode
FROM Patients p
WHERE p.disease = ‘diabetes’
AUDIT zipcode
FROM Patients p
WHERE p.disease = ‘high blood pressure’
AUDIT disease
FROM Patients p
WHERE p.zipcode = 94305
Suspicious if someone in 94305 has diabetes
Not Suspicious wrt this
Formally, SQL Auditing
• Query Q=COQPQ
(T £ R))
• Audit expression A= COA(PA
( T £ S))
• Where, T =T1 £ T2 £ T2 …. Tn
R=R1 £ R2 £ R2 …. Rn
S=S1 £ S2 £ S2 …. Sn
SQL Auditing: Q suspicious wrt A
£ £
T(T)
R(R)S(S)
(1)9 v 2 T : (a) R Æ T(R £ {v} ) (b) S Æ T({v} £ S )
(2) All audited columns are projected by the query
Requires execution of queries on the database
v
Auditing a Batch of SQL Queries
Previous work for
(1)Batch of queries like sum, max and median
--can answers be stitched together to reveal more than what a single query can reveal?
(2)Singleton SQL queries
We introduce the notion of auditing a batch of SQL queries
SQL Auditing• Batch of SQL queries, each of form
Project col1 col2 col3 …. colk From R Where C1 and C2 and C3 and … Cj
Each Ci : (colm = value), (colm <= value) , (colm >= value), (value1 <= colm <= value2)
col1, col2, .. colk includes primary key so that result of query can be joined with other results
Semantically Suspicious
• A query batch Q1, Q2, .. Qn is said to be suspicious wrt to an audit expression A if an expression combining the results of these queries as base tables is suspicious wrt A
• Natural extension of a suspicious query to a query batch
Syntactically Suspicious
• A query batch is said to be syntactically suspicious with respect to an audit expression A if there exists an instantiation of the database tables for which it is suspicious wrt A
• Does not require execution of the queries against the table
SQL Batch AuditingQuery 1
Query 2
Query 3
Audited tuple columns are
covered
Query 4
Audit expression
Query batch suspicious wrt audit expression iff queries together cover all audited columns
of at least audited tuple
syntacticallysemantically
on table Ton some table T
Syntactic and Semantic Auditing
• Syntactially suspicious implies semantically suspicious
• To check semantic suspiciousness check for syntactic suspiciousness first and then execute the queries on tables to verify
• How to check syntactic suspiciousness covered next
Compatible Queries• A batch of queries is compatible if the
conjunction of their selection conditions is satisfiable.
• To test compatibility of a set of queries you only need to check pairwise compatibility [Helley’s Theorem]
Syntactic Auditing: Graph ProblemQuery 1
Query 2Query 3
Query 4=AuditExp
{ }
{ } { }
Suspicious iff there exists an independent set, including audit expression that covers all
audited colors
Syntactic Auditing
• Query batch suspicious iff there is a subset of queries compatible with the audit expression and they cover all audited columns.
• Need not consider hyperedges as due to Helley’s Theorem you only need to check pairwise compatibility
• Independent set implies the query batch is compatible
• Has all audited colors implies that all audited columns are covered
Syntactic Auditing is NP complete
• Reduction from 3-SAT
X1
X1
X2 X3X4
X2 X3X4
X1 X2 X4Ç Ç X2 X3 X4Ç Ç
Semantic Auditing
• If table given an implicit representation then NP complete
• Explicitly mentioned table, polynomial time algorithm
THANK YOU!
Questions?