+ All Categories
Home > Documents > CryptDB: Processing Queries on an Encrypted …. First practical DBMS to process most SQL queries on...

CryptDB: Processing Queries on an Encrypted …. First practical DBMS to process most SQL queries on...

Date post: 10-Mar-2018
Category:
Upload: vanthuy
View: 230 times
Download: 0 times
Share this document with a friend
30
CryptDB: Processing Queries on an Encrypted Database Raluca Ada Popa, Catherine M. S. Redfield, Nickolai Zeldovich, and Hari Balakrishnan MIT CSAIL
Transcript

CryptDB: Processing Queries on an Encrypted Database

Raluca Ada Popa, Catherine M. S. Redfield, Nickolai Zeldovich, and Hari Balakrishnan

MIT CSAIL

Application DB Server SQL

User 1

User 2

User 3

}  Confidential data leaks from databases (DB)

}  2012: hackers extracted 6.5 million hashed passwords from the DB of LinkedIn

Problem  

System

administrator

Threat: passive DB server attacks

Hackers

}  Process SQL queries on encrypted data

1.  First practical DBMS to process most SQL queries on

encrypted data Hide DB from sys. admins., outsource DB to the cloud

2.  Modest overhead: 26% throughput loss for TPC-C

Contribu.ons  

3.  No changes to DBMS (e.g., Postgres, MySQL) and no changes to applications

salary

60 100

800

100 100 =

query ?

index

60

Unencrypted databases

fast insecure

FHE

[Gentry’09], [GHS’12],..

100 800

salary

xa32601 x8199f3

x62d03b

xcef3f7 …

circuit C

output

slow strong security

salary

x4be2 x95c6

x2ea8

x17ce x98aa =

query ?

index

x4be2

… x17ce x2ea8

CryptDB

fast high degree of security

query input x24ab1c

Most SQL uses a limited set of operations Security: Reveal only relations among data that are required by queries at column granularity

Unencrypted databases

fast insecure

FHE

slow strong security

salary

x4be2 x95c6

x2ea8

x17ce x98aa =

query ?

index

x4be2

… x17ce x2ea8

CryptDB

fast high degree of security

Other  work:  weaker  security,  func.onality,  and/or  efficiency:  Ø  Search  on  encrypted  data  (e.g.,  [Song  et  al.,’00])  Ø  Systems  proposals  (e.g.,  [Hacigumus  et  al.,’02])  

Ø  Require  significant  client-­‐side  processing  

 

 

System  Setup  

DB Server transformed query Proxy plain query

Ø  Stores schema, master key Ø  No data storage Ø  No query execution

Under passive attack

Application decrypted results encrypted results

Trusted

Ø  Process queries completely at the DBMS, on encrypted database

Encrypted DB

col1/rank col2/name

table1/emp

SELECT * FROM emp WHERE salary = 100

x934bc1

x5a8c34

x5a8c34

x84a21c

SELECT * FROM table1 WHERE col3 = x5a8c34

Proxy

? x5a8c34

x5a8c34

? x5a8c34

x5a8c34

x4be219

x95c623

x2ea887

x17cea7

col3/salary

Application

60

100

800

100

Randomized encryption

Deterministic encryption

col1/rank col2/name

table1 (emp)

x934bc1

x5a8c34

x5a8c34

x84a21c

x638e54

x638e54

x922eb4

x1eab81

SELECT * FROM table1 WHERE col3 ≥ x638e54 Proxy

x638e54

x922eb4

x638e54

col3/salary

Application

60

100

800

100

Deterministic encryption

SELECT * FROM emp WHERE salary ≥ 100

OPE (order) encryption

1.  Use SQL-aware set of encryption schemes

 

 

Two techniques

 

2.  Adjust encryption of database based on queries

 

Encryp.on  schemes  

e.g., =, !=, IN, COUNT, GROUP BY, DISTINCT

Scheme

RND

HOM

DET

SEARCH

JOIN

OPE

Function

none

+

equality

join

word search

order

Construction

AES in CBC

AES in CMC

Paillier

our new scheme

Song et al.,‘00

BCLO’09 e.g., >, <, ORDER BY, SORT, MAX, MIN, GREATEST

restricted ILIKE

Highest

Security

e.g., sum

+ our new scheme

}  Adjust (ti,Cm

i): Cm (with )

}  Encrypt (SK, m, col i): Cm

i (with ) - deterministic

JOIN }  Do not know columns to be joined a priori!

col j col i

Proxy

Join key col i – col j

}  KeyGen (sec. param): SK

}  Token (SK, col i, col j): (ti, tj)

JOIN (cont’d)

 

}  Security: do not learn join relations without token }  Implementation:

}  192 bits long, 0.52 ms encrypt, 0.56 ms adjust

col j col i

Proxy

Join key col i – col j

Encryp.on  schemes  Scheme

RND

HOM

DET

SEARCH

JOIN

OPE

Function

none

+, *

equality

join

word search

order

Construction

AES in CBC

AES in CMC

Paillier

our new scheme

Song et al.,‘00

Boldyreva et al.’09

Highest

Security

+ our new scheme

Functionality

How to encrypt each data item?

Ø   Encryption schemes needed depend on queries

Ø  May not know queries ahead of time

Leaks order!

rank

ALL?

col1-RND

col1-HOM

col1-SEARCH

col1-DET

col1-JOIN

col1-OPE

‘CEO’

‘worker’

int value HOM

Onion Add

Onions of encryptions

value JOIN

DET RND

Onion Equality

Onion Search

Ø   Same  key  for  all  items  in  a  column  for  same  onion  layer  Ø   Start  out  the  database  with  the  most  secure  encryp.on  scheme  

OR each value

value OPE-JOIN

OPE RND

Onion Order

text value SEARCH

Adjust  encryp.on  

Ø Strip off layers of the onions Ø Proxy gives keys to server using a SQL UDF

(“user-defined function”) Ø Proxy remembers onion layer for columns

Ø Do not put back onion layer  

Example:  

SELECT  *  FROM  emp  WHERE  rank  =  ‘CEO’;  

emp:

rank name salary

‘CEO’ ‘worker’

‘CEO’

JOIN DET RND

Onion Equality

col1-OnionEq

col1-OnionOrder

col1-OnionSearch

col2-OnionEq

table 1:

… … … RND

RND

SEARCH RND

SEARCH RND

RND

RND

Example  (cont’d)  

UPDATE table1 SET col1-OnionEq = Decrypt_RND(key, col1-OnionEq);

‘CEO’

JOIN DET RND

SELECT * FROM table1 WHERE col1-OnionEq = xda5c0407;

DET

Onion Equality

RND

RND

SELECT  *  FROM  emp  WHERE  rank  =  ‘CEO’;  

DET

DET

col1-OnionEq

col1-OnionOrder

col1-OnionSearch

col2-OnionEq

table 1

… … … RND

RND

SEARCH RND

SEARCH RND

Ø  Encryption schemes exposed for each column are the most secure enabling queries

Ø  Overall: Reveal only data relations needed for query type, at column granularity

•  aggregation on a column HOM nothing

Security  guarantees  

•  equality predicate on a column DET repeats

•  Never reveals plaintext

Queries encryption schemes leakage

common in practice

•  no filter on a column RND nothing

Security  threshold  

RND HOM

SEARCH

DET DETJOIN

OPE OPEJOIN

   

virtually nothing

Plaintext

   

repeats

   order

   

everything

Leakage:

SSN column

Threshold

Most sensitive columns naturally stay above threshold.

≥ repeats

Implementa.on  

CryptDB Proxy

Unmodified DBMS

CryptDB SQL UDFs

(user-defined functions)

Server

query

results

transformed query

encrypted results

SQL Interface

Ø  No  change  to  the  DBMS  Ø  Portable:  from  Postgres  to  MySQL  with  86  lines  

Application

Ø  No  change  to  applica.ons  

Evalua.on    1.  Does  it  support  real  queries/applica.ons?    2.  What  is  the  resul.ng  confiden.ality?  3.  What  is  the  performance  overhead?  

Queries  not  supported    Ø  More  complex  operators,  e.g.,  trigonometry  Ø   Opera.ons  that  require  combining  encryp.on  schemes  Ø  e.g.,  T1.a  +  T1.b  >  T2.c  

           Extensions:  split  queries,  precompute  columns,  use  FHE  or  other  encryp.on  schemes  

Real  queries/applica.ons  

Applica.on   Total  columns  

Encrypted  columns  

phpBB   563   23  HotCRP   204   22  grad-­‐apply   706   103  TPC-­‐C   92   92  sql.mit.edu   128,840   128,840  

#  cols  not  supported  

0  0  0  0  

1,094  

SELECT  1/log(series_no+1.2)  …    …  WHERE  sin(la.tude  +  PI())  …    

Resul.ng  confiden.ality  

Applica.on   Total  columns  

Encrypted  columns  

phpBB   563   23  HotCRP   204   22  grad-­‐apply   706   103  TPC-­‐C   92   92  sql.mit.edu   128,840   128,840  

Min  level  is  RND  

21  18  95  65  

80,053  

Min  level  is  DET  

1  1  6  19  

34,212  

Min  level  is  OPE  

1  2  2  8  

13,131  

Most columns at RND Most columns at OPE analyzed

were less sensitive

Performance  DB server throughput

CryptDB Proxy

Encrypted database

Application 1

CryptDB:

Plain database

Application 1

MySQL:

CryptDB Proxy

Application 2

Application 2

Latency

Ø  Hardware:  2.4  GHz  Intel  Xeon  E5620  –  8  cores,  12  GB  RAM  

0

10000

20000

30000

40000

50000

1 2 3 4 5 6 7 8

Quer

ies

/ se

c

Number of server cores

MySQLCryptDB

Figure 10: Throughput for TPC-C queries, for a varying number ofcores on the underlying MySQL DBMS server.

0

2000

4000

6000

8000

10000

12000

14000

Equality

JoinRange

SumD

elete

Insert

Upd. set

Upd. inc

Quer

ies

/ se

c

MySQLCryptDB

Strawman

Figure 11: Throughput of different types of SQL queries from the TPC-C query mix running under MySQL, CryptDB, and the strawman design.“Upd. inc” stands for UPDATE that increments a column, and “Upd. set”stands for UPDATE which sets columns to a constant.

8.4.1 TPC-CWe compare the performance of a TPC-C query mix when runningon an unmodified MySQL server versus on a CryptDB proxy in frontof the MySQL server. We trained CryptDB on the query set (§3.5.2)so there are no onion adjustments during the TPC-C experiments.Figure 10 shows the throughput of TPC-C queries as the number ofcores on the server varies from one to eight. In all cases, the serverspends 100% of its CPU time processing queries. Both MySQL andCryptDB scale well initially, but start to level off due to internallock contention in the MySQL server, as reported by SHOW STATUSLIKE ’Table%’. The overall throughput with CryptDB is 21–26%lower than MySQL, depending on the exact number of cores.

To understand the sources of CryptDB’s overhead, we measurethe server throughput for different types of SQL queries seen inTPC-C, on the same server, but running with only one core enabled.Figure 11 shows the results for MySQL, CryptDB, and a strawmandesign; the strawman performs each query over data encrypted withRND by decrypting the relevant data using a UDF, performing thequery over the plaintext, and re-encrypting the result (if updatingrows). The results show that CryptDB’s throughput penalty is great-est for queries that involve a SUM (2.0× less throughput) and forincrementing UPDATE statements (1.6× less throughput); these arethe queries that involve HOM additions at the server. For the othertypes of queries, which form a larger part of the TPC-C mix, thethroughput overhead is modest. The strawman design performspoorly for almost all queries because the DBMS’s indexes on the

Query (& scheme) MySQL CryptDBServer Server Proxy Proxy�

Select by = (DET) 0.10 ms 0.11 ms 0.86 ms 0.86 msSelect join (JOIN) 0.10 ms 0.11 ms 0.75 ms 0.75 msSelect range (OPE) 0.16 ms 0.22 ms 0.78 ms 28.7 msSelect sum (HOM) 0.11 ms 0.46 ms 0.99 ms 0.99 msDelete 0.07 ms 0.08 ms 0.28 ms 0.28 msInsert (all) 0.08 ms 0.10 ms 0.37 ms 16.3 msUpdate set (all) 0.11 ms 0.14 ms 0.36 ms 3.80 msUpdate inc (HOM) 0.10 ms 0.17 ms 0.30 ms 25.1 msOverall 0.10 ms 0.12 ms 0.60 ms 10.7 ms

Figure 12: Server and proxy latency for different types of SQL queriesfrom TPC-C. For each query type, we show the predominant encryptionscheme used at the server. Due to details of the TPC-C workload, eachquery type affects a different number of rows, and involves a differentnumber of cryptographic operations. The left two columns correspond toserver throughput, which is also shown in Figure 11. “Proxy” shows thelatency added by CryptDB’s proxy; “Proxy�” shows the proxy latencywithout the ciphertext pre-computing and caching optimization (§3.5).Bold numbers show where pre-computing and caching ciphertexts helps.The “Overall” row is the average latency over the mix of TPC-C queries.“Update set” is an UPDATE where the fields are set to a constant, and“Update inc” is an UPDATE where some fields are incremented.

Scheme Encrypt Decrypt Special operationBlowfish (1 int.) 0.0001 ms 0.0001 ms —AES-CBC (1 KB) 0.008 ms 0.007 ms —AES-CMC (1 KB) 0.016 ms 0.015 ms —OPE (1 int.) 9.0 ms 9.0 ms Compare: 0 msSEARCH (1 word) 0.01 ms 0.004 ms Match: 0.001 msHOM (1 int.) 9.7 ms 0.7 ms Add: 0.005 msJOIN-ADJ (1 int.) 0.52 ms — Adjust: 0.56 ms

Figure 13: Microbenchmarks of cryptographic schemes, per unit ofdata encrypted (one 32-bit integer, 1 KB, or one 15-byte word of text),measured by taking the average time over many iterations.

RND-encrypted data are useless for operations on the underlyingplaintext data. It is pleasantly surprising that the higher security ofCryptDB over the strawman also brings better performance.

To understand the latency introduced by CryptDB’s proxy, wemeasure the server and proxy processing times for the same typesof SQL queries as above. Figure 12 shows the results. We cansee that there is an overall server latency increase of 20% withCryptDB, which we consider modest. The proxy adds an averageof 0.60 ms to a query; of that time, 24% is spent in MySQL proxy,23% is spent in encryption and decryption, and the remaining 53% isspent parsing and processing queries. The cryptographic overhead isrelatively small because most of our encryption schemes are efficient;Figure 13 shows their performance. OPE and HOM are the slowest,but the ciphertext pre-computing and caching optimization (§3.5)masks the high latency of queries requiring OPE and HOM. Proxy�in Figure 12 shows the latency without these optimizations, whichis significantly higher for the corresponding query types. SELECTqueries that involve a SUM use HOM but do not benefit from thisoptimization, because the proxy performs decryption, rather thanencryption.

In all TPC-C experiments, the proxy used less than 20 MB ofmemory. Caching ciphertexts for the 30,000 most common valuesfor OPE accounts for about 3 MB, and pre-computing ciphertextsand randomness for 30,000 values at HOM required 10 MB.

8.4.2 Multi-User Web ApplicationsTo evaluate the impact of CryptDB on application performance, wemeasure the throughput of phpBB for a workload with 10 parallelclients, which ensured 100% CPU load at the server. Each clientcontinuously issued HTTP requests to browse the forum, write and

97

TPC-­‐C  performance  

Throughput loss 26%

Ø  Latency  (ms/query):  0.10  MySQL  vs.  0.72  CryptDB  

TPC-­‐C  microbenchmarks  

CryptDB is practical

0 2000 4000 6000 8000

10000 12000 14000

Equality

JoinRange

DeleteInsert

Upd. set

Upd. inc

Sum

Que

ries /

sec

MySQLCryptDB

No cryptography at the DB server in the steady state!

Homomorphic addition

Demo  

Conclusions  

1.  The  first  prac.cal  DBMS  for  running  most  standard  queries  on  encrypted  data  

Thanks!

2.  Modest  overhead  and  no  changes  to  DBMS  

CryptDB:

Website: http://css.csail.mit.edu/cryptdb/


Recommended