Post on 10-Mar-2018
transcript
CryptDB: Processing Queries on an Encrypted Database
Raluca Ada Popa, Catherine M. S. Redfield, Nickolai Zeldovich, and Hari Balakrishnan
MIT CSAIL
Application DB Server SQL
User 1
User 2
User 3
} Confidential data leaks from databases (DB)
} 2012: hackers extracted 6.5 million hashed passwords from the DB of LinkedIn
Problem
System
administrator
Threat: passive DB server attacks
Hackers
} Process SQL queries on encrypted data
1. First practical DBMS to process most SQL queries on
encrypted data Hide DB from sys. admins., outsource DB to the cloud
2. Modest overhead: 26% throughput loss for TPC-C
Contribu.ons
3. No changes to DBMS (e.g., Postgres, MySQL) and no changes to applications
salary
60 100
800
100 100 =
query ?
index
60
Unencrypted databases
fast insecure
FHE
[Gentry’09], [GHS’12],..
…
100 800
salary
xa32601 x8199f3
x62d03b
xcef3f7 …
circuit C
output
slow strong security
salary
x4be2 x95c6
x2ea8
x17ce x98aa =
query ?
index
x4be2
… x17ce x2ea8
CryptDB
fast high degree of security
query input x24ab1c
Most SQL uses a limited set of operations Security: Reveal only relations among data that are required by queries at column granularity
Unencrypted databases
fast insecure
FHE
slow strong security
salary
x4be2 x95c6
x2ea8
x17ce x98aa =
query ?
index
x4be2
… x17ce x2ea8
CryptDB
fast high degree of security
Other work: weaker security, func.onality, and/or efficiency: Ø Search on encrypted data (e.g., [Song et al.,’00]) Ø Systems proposals (e.g., [Hacigumus et al.,’02])
Ø Require significant client-‐side processing
System Setup
DB Server transformed query Proxy plain query
Ø Stores schema, master key Ø No data storage Ø No query execution
Under passive attack
Application decrypted results encrypted results
Trusted
Ø Process queries completely at the DBMS, on encrypted database
Encrypted DB
col1/rank col2/name
table1/emp
SELECT * FROM emp WHERE salary = 100
x934bc1
x5a8c34
x5a8c34
x84a21c
SELECT * FROM table1 WHERE col3 = x5a8c34
Proxy
? x5a8c34
x5a8c34
? x5a8c34
x5a8c34
x4be219
x95c623
x2ea887
x17cea7
col3/salary
Application
60
100
800
100
Randomized encryption
Deterministic encryption
col1/rank col2/name
table1 (emp)
x934bc1
x5a8c34
x5a8c34
x84a21c
x638e54
x638e54
x922eb4
x1eab81
SELECT * FROM table1 WHERE col3 ≥ x638e54 Proxy
x638e54
x922eb4
x638e54
col3/salary
Application
60
100
800
100
Deterministic encryption
SELECT * FROM emp WHERE salary ≥ 100
OPE (order) encryption
1. Use SQL-aware set of encryption schemes
Two techniques
2. Adjust encryption of database based on queries
Encryp.on schemes
e.g., =, !=, IN, COUNT, GROUP BY, DISTINCT
Scheme
RND
HOM
DET
SEARCH
JOIN
OPE
Function
none
+
equality
join
word search
order
Construction
AES in CBC
AES in CMC
Paillier
our new scheme
Song et al.,‘00
BCLO’09 e.g., >, <, ORDER BY, SORT, MAX, MIN, GREATEST
restricted ILIKE
Highest
Security
e.g., sum
+ our new scheme
} Adjust (ti,Cm
i): Cm (with )
} Encrypt (SK, m, col i): Cm
i (with ) - deterministic
JOIN } Do not know columns to be joined a priori!
col j col i
Proxy
Join key col i – col j
} KeyGen (sec. param): SK
} Token (SK, col i, col j): (ti, tj)
JOIN (cont’d)
} Security: do not learn join relations without token } Implementation:
} 192 bits long, 0.52 ms encrypt, 0.56 ms adjust
col j col i
Proxy
Join key col i – col j
Encryp.on schemes Scheme
RND
HOM
DET
SEARCH
JOIN
OPE
Function
none
+, *
equality
join
word search
order
Construction
AES in CBC
AES in CMC
Paillier
our new scheme
Song et al.,‘00
Boldyreva et al.’09
Highest
Security
+ our new scheme
Functionality
How to encrypt each data item?
Ø Encryption schemes needed depend on queries
Ø May not know queries ahead of time
Leaks order!
rank
ALL?
col1-RND
col1-HOM
col1-SEARCH
col1-DET
col1-JOIN
col1-OPE
‘CEO’
‘worker’
int value HOM
Onion Add
Onions of encryptions
value JOIN
DET RND
Onion Equality
Onion Search
Ø Same key for all items in a column for same onion layer Ø Start out the database with the most secure encryp.on scheme
OR each value
value OPE-JOIN
OPE RND
Onion Order
text value SEARCH
Adjust encryp.on
Ø Strip off layers of the onions Ø Proxy gives keys to server using a SQL UDF
(“user-defined function”) Ø Proxy remembers onion layer for columns
Ø Do not put back onion layer
Example:
SELECT * FROM emp WHERE rank = ‘CEO’;
emp:
rank name salary
‘CEO’ ‘worker’
‘CEO’
JOIN DET RND
Onion Equality
col1-OnionEq
col1-OnionOrder
col1-OnionSearch
col2-OnionEq
table 1:
… … … RND
RND
SEARCH RND
SEARCH RND
RND
RND
Example (cont’d)
UPDATE table1 SET col1-OnionEq = Decrypt_RND(key, col1-OnionEq);
‘CEO’
JOIN DET RND
SELECT * FROM table1 WHERE col1-OnionEq = xda5c0407;
DET
Onion Equality
RND
RND
SELECT * FROM emp WHERE rank = ‘CEO’;
DET
DET
col1-OnionEq
col1-OnionOrder
col1-OnionSearch
col2-OnionEq
table 1
… … … RND
RND
SEARCH RND
SEARCH RND
Ø Encryption schemes exposed for each column are the most secure enabling queries
Ø Overall: Reveal only data relations needed for query type, at column granularity
• aggregation on a column HOM nothing
Security guarantees
• equality predicate on a column DET repeats
• Never reveals plaintext
Queries encryption schemes leakage
common in practice
• no filter on a column RND nothing
Security threshold
RND HOM
SEARCH
DET DETJOIN
OPE OPEJOIN
virtually nothing
Plaintext
repeats
order
everything
Leakage:
SSN column
Threshold
Most sensitive columns naturally stay above threshold.
≥ repeats
Implementa.on
CryptDB Proxy
Unmodified DBMS
CryptDB SQL UDFs
(user-defined functions)
Server
query
results
transformed query
encrypted results
SQL Interface
Ø No change to the DBMS Ø Portable: from Postgres to MySQL with 86 lines
Application
Ø No change to applica.ons
Evalua.on 1. Does it support real queries/applica.ons? 2. What is the resul.ng confiden.ality? 3. What is the performance overhead?
Queries not supported Ø More complex operators, e.g., trigonometry Ø Opera.ons that require combining encryp.on schemes Ø e.g., T1.a + T1.b > T2.c
Extensions: split queries, precompute columns, use FHE or other encryp.on schemes
Real queries/applica.ons
Applica.on Total columns
Encrypted columns
phpBB 563 23 HotCRP 204 22 grad-‐apply 706 103 TPC-‐C 92 92 sql.mit.edu 128,840 128,840
# cols not supported
0 0 0 0
1,094
SELECT 1/log(series_no+1.2) … … WHERE sin(la.tude + PI()) …
Resul.ng confiden.ality
Applica.on Total columns
Encrypted columns
phpBB 563 23 HotCRP 204 22 grad-‐apply 706 103 TPC-‐C 92 92 sql.mit.edu 128,840 128,840
Min level is RND
21 18 95 65
80,053
Min level is DET
1 1 6 19
34,212
Min level is OPE
1 2 2 8
13,131
Most columns at RND Most columns at OPE analyzed
were less sensitive
Performance DB server throughput
CryptDB Proxy
Encrypted database
Application 1
CryptDB:
Plain database
Application 1
MySQL:
CryptDB Proxy
Application 2
Application 2
Latency
Ø Hardware: 2.4 GHz Intel Xeon E5620 – 8 cores, 12 GB RAM
0
10000
20000
30000
40000
50000
1 2 3 4 5 6 7 8
Quer
ies
/ se
c
Number of server cores
MySQLCryptDB
Figure 10: Throughput for TPC-C queries, for a varying number ofcores on the underlying MySQL DBMS server.
0
2000
4000
6000
8000
10000
12000
14000
Equality
JoinRange
SumD
elete
Insert
Upd. set
Upd. inc
Quer
ies
/ se
c
MySQLCryptDB
Strawman
Figure 11: Throughput of different types of SQL queries from the TPC-C query mix running under MySQL, CryptDB, and the strawman design.“Upd. inc” stands for UPDATE that increments a column, and “Upd. set”stands for UPDATE which sets columns to a constant.
8.4.1 TPC-CWe compare the performance of a TPC-C query mix when runningon an unmodified MySQL server versus on a CryptDB proxy in frontof the MySQL server. We trained CryptDB on the query set (§3.5.2)so there are no onion adjustments during the TPC-C experiments.Figure 10 shows the throughput of TPC-C queries as the number ofcores on the server varies from one to eight. In all cases, the serverspends 100% of its CPU time processing queries. Both MySQL andCryptDB scale well initially, but start to level off due to internallock contention in the MySQL server, as reported by SHOW STATUSLIKE ’Table%’. The overall throughput with CryptDB is 21–26%lower than MySQL, depending on the exact number of cores.
To understand the sources of CryptDB’s overhead, we measurethe server throughput for different types of SQL queries seen inTPC-C, on the same server, but running with only one core enabled.Figure 11 shows the results for MySQL, CryptDB, and a strawmandesign; the strawman performs each query over data encrypted withRND by decrypting the relevant data using a UDF, performing thequery over the plaintext, and re-encrypting the result (if updatingrows). The results show that CryptDB’s throughput penalty is great-est for queries that involve a SUM (2.0× less throughput) and forincrementing UPDATE statements (1.6× less throughput); these arethe queries that involve HOM additions at the server. For the othertypes of queries, which form a larger part of the TPC-C mix, thethroughput overhead is modest. The strawman design performspoorly for almost all queries because the DBMS’s indexes on the
Query (& scheme) MySQL CryptDBServer Server Proxy Proxy�
Select by = (DET) 0.10 ms 0.11 ms 0.86 ms 0.86 msSelect join (JOIN) 0.10 ms 0.11 ms 0.75 ms 0.75 msSelect range (OPE) 0.16 ms 0.22 ms 0.78 ms 28.7 msSelect sum (HOM) 0.11 ms 0.46 ms 0.99 ms 0.99 msDelete 0.07 ms 0.08 ms 0.28 ms 0.28 msInsert (all) 0.08 ms 0.10 ms 0.37 ms 16.3 msUpdate set (all) 0.11 ms 0.14 ms 0.36 ms 3.80 msUpdate inc (HOM) 0.10 ms 0.17 ms 0.30 ms 25.1 msOverall 0.10 ms 0.12 ms 0.60 ms 10.7 ms
Figure 12: Server and proxy latency for different types of SQL queriesfrom TPC-C. For each query type, we show the predominant encryptionscheme used at the server. Due to details of the TPC-C workload, eachquery type affects a different number of rows, and involves a differentnumber of cryptographic operations. The left two columns correspond toserver throughput, which is also shown in Figure 11. “Proxy” shows thelatency added by CryptDB’s proxy; “Proxy�” shows the proxy latencywithout the ciphertext pre-computing and caching optimization (§3.5).Bold numbers show where pre-computing and caching ciphertexts helps.The “Overall” row is the average latency over the mix of TPC-C queries.“Update set” is an UPDATE where the fields are set to a constant, and“Update inc” is an UPDATE where some fields are incremented.
Scheme Encrypt Decrypt Special operationBlowfish (1 int.) 0.0001 ms 0.0001 ms —AES-CBC (1 KB) 0.008 ms 0.007 ms —AES-CMC (1 KB) 0.016 ms 0.015 ms —OPE (1 int.) 9.0 ms 9.0 ms Compare: 0 msSEARCH (1 word) 0.01 ms 0.004 ms Match: 0.001 msHOM (1 int.) 9.7 ms 0.7 ms Add: 0.005 msJOIN-ADJ (1 int.) 0.52 ms — Adjust: 0.56 ms
Figure 13: Microbenchmarks of cryptographic schemes, per unit ofdata encrypted (one 32-bit integer, 1 KB, or one 15-byte word of text),measured by taking the average time over many iterations.
RND-encrypted data are useless for operations on the underlyingplaintext data. It is pleasantly surprising that the higher security ofCryptDB over the strawman also brings better performance.
To understand the latency introduced by CryptDB’s proxy, wemeasure the server and proxy processing times for the same typesof SQL queries as above. Figure 12 shows the results. We cansee that there is an overall server latency increase of 20% withCryptDB, which we consider modest. The proxy adds an averageof 0.60 ms to a query; of that time, 24% is spent in MySQL proxy,23% is spent in encryption and decryption, and the remaining 53% isspent parsing and processing queries. The cryptographic overhead isrelatively small because most of our encryption schemes are efficient;Figure 13 shows their performance. OPE and HOM are the slowest,but the ciphertext pre-computing and caching optimization (§3.5)masks the high latency of queries requiring OPE and HOM. Proxy�in Figure 12 shows the latency without these optimizations, whichis significantly higher for the corresponding query types. SELECTqueries that involve a SUM use HOM but do not benefit from thisoptimization, because the proxy performs decryption, rather thanencryption.
In all TPC-C experiments, the proxy used less than 20 MB ofmemory. Caching ciphertexts for the 30,000 most common valuesfor OPE accounts for about 3 MB, and pre-computing ciphertextsand randomness for 30,000 values at HOM required 10 MB.
8.4.2 Multi-User Web ApplicationsTo evaluate the impact of CryptDB on application performance, wemeasure the throughput of phpBB for a workload with 10 parallelclients, which ensured 100% CPU load at the server. Each clientcontinuously issued HTTP requests to browse the forum, write and
97
TPC-‐C performance
Throughput loss 26%
Ø Latency (ms/query): 0.10 MySQL vs. 0.72 CryptDB
TPC-‐C microbenchmarks
CryptDB is practical
0 2000 4000 6000 8000
10000 12000 14000
Equality
JoinRange
DeleteInsert
Upd. set
Upd. inc
Sum
Que
ries /
sec
MySQLCryptDB
No cryptography at the DB server in the steady state!
Homomorphic addition