Date post: | 19-Jul-2015 |
Category: |
Technology |
Upload: | skyhigh-networks-cloud-security-software |
View: | 474 times |
Download: | 0 times |
Searching Encrypted Cloud Data: Case Study on Academia + Industry (Done Right)
Alexandra (Sasha) Boldyreva
School of Computer Science in the College of Computing Georgia Institute of Technology
TWO WORLDS OF CRYPTO DEVELOPMENT
Academia Industry
Ø Why are the two worlds so disjointed?
Ø Is this unavoidable?
TWO WORLDS, A CLOSER LOOK: ACADEMIA
Priorities in protocol design
Ø Competitiveness
Ø Can be published
Ø Novel Ø Non-trivial, uses interesting ideas Ø Provably-secure Ø Uses novel useful technics Ø Has impact on future research
TWO WORLDS, A CLOSER LOOK: INDUSTRY
Priorities in protocol design
Ø Competitiveness
Ø Can be sold
Ø Novel Ø Useful Ø Very efficient Ø Legislation compliant Ø Resists obvious attacks
TWO WORLDS, A CLOSER LOOK
Priorities in protocol design – Let’s highlight the most importance differences:
ACADEMIA INDUSTRY
Ø Novel Ø Non-trivial, uses interesting ideas Ø Provably-secure Ø Uses novel useful technics Ø Has impact on future research
Ø Novel Ø Useful Ø Very efficient Ø Legislation compliant Ø Resists obvious attacks
TWO WORLDS, A CLOSER LOOK
ACADEMIA INDUSTRY
Ø Public Ø Complex Ø Not very efficient Ø Rarely used Ø Provably-secure (provide security
guarantees)
Ø Often proprietary Ø Simpler Ø Very efficient Ø Solve real problems Ø Used Ø Security is not well understood
They lead to differences in schemes’ common properties …
TWO WORLDS, A CLOSER LOOK
How academics view crypto products from the industry …. Prac88oners should not design crypto
schemes as they cannot prove their schemes secure. They should use our schemes.
TWO WORLDS, A CLOSER LOOK
Provable security is a great methodology that allows us to have schemes with security guarantees. However,
Ø the definitions (and proofs) are very often hard to understand and judge how do they “match” reality;
Ø it is hard to have schemes which are provably-secure under well-studied assumptions for strong definitions and are efficient.
TWO WORLDS, A CLOSER LOOK
How practitioners view the work produced by academia … Academics do not understand what is needed
in prac8ce. What they call “efficient” and “prac8cal” are not. Their papers are hard to understand. Strong security is a hassle.
PRODUCT OVERVIEW
Skyhigh Networks' product allows customers to use the existing cloud service providers with added security and without losing functionality, e.g. search. Employ
Ø symmetric searchable encryption (can search for encrypted keyword or sort encrypted data),
Ø format-preserving encryption
Ø …
IMPORTANT CONSIDERATIONS
① Which schemes do we employ and how do we shepherd an algorithm from concep8on to deployment?
② How do we op8mize cryptographic schemes without invalida8ng their security proofs?
③ In what situa8ons is it appropriate to trade security for func8onality in a piece of commercial soJware?
These ques8ons must be answered before a new algorithm reaches a customer.
GENERAL CONCERNS WITH ENCRYPTION
No secure coding guidelines - hard to know what is acceptable and when
Fighting misinformation in the market • Consumers don’t understand security/usability tradeoffs • They expect full security and full functionality for their data
Weighing tradeoffs between security and functionality • When is it appropriate to have a weakened security guarantee • For what kinds of data?
THE START OF GREAT COLLABORATION
Skyhigh reached out to me, as I was actively working on protocols for efficiently-searchable encryption.
MY WORK ON SEARCHABLE ENCRYPTION
Georgios Amana8dis Nathan CheneOe
Younho Lee Adam O’Neill
Joint effort with my colleagues:
CLOUD STORAGE
• A.k.a. Database-‐as-‐a-‐Service • Server efficiently responds to client’s queries/updates
• Query efficiency: search 8me sub-‐linear in database size • Query func8onality: exact-‐match, range, error-‐tolerant (fuzzy),…
Cloud Server (database)
Client
($35k, rec1)
($50k, rec2)
($68k, rec3)
($72k, rec4)
($95k, rec5)
ExactMatch($68k)
($68k, rec3)
CLOUD STORAGE
Cloud Server (database)
Client
($35k, rec1)
($50k, rec2)
($68k, rec3)
($72k, rec4)
($95k, rec5)
Range($40k, $68k)
{($50k, rec2)($68k, rec3)}
• A.k.a. Database-‐as-‐a-‐Service • Server efficiently responds to client’s queries/updates
• Query efficiency: search 8me sub-‐linear in database size • Query func8onality: exact-‐match, range, error-‐tolerant (fuzzy),…
CLOUD STORAGE
Cloud Server (database)
Client
($35k, rec1)
($50k, rec2)
($68k, rec3)
($72k, rec4)
($95k, rec5)
Fuzzy($70k)
{($68k, rec3)($72k, rec4)}
• A.k.a. Database-‐as-‐a-‐Service • Server efficiently responds to client’s queries/updates
• Query efficiency: search 8me sub-‐linear in database size • Query func8onality: exact-‐match, range, error-‐tolerant (fuzzy),…
SECURE CLOUD STORAGE: GOALS
Three goals: security, efficiency, func8onality
Secure Cloud Server (encrypted database)
Client
(EncK($72k), rec4)
(EncK($68k), rec3)
(EncK($95k), rec5)
(EncK($35k), rec1)
(EncK($50k), rec2)
Security searchable data is symmetrically encrypted
Efficiency server responds to query in sub-‐linear 8me
Func8onality various query types, data updates, …
EFFICIENT SEARCHABLE ENCRYPTION
¡ The study of schemes balancing these goals is efficient searchable encryp8on (ESE) § Cryptographic efforts oJen focus on strong security
§ Prac88oners wonder: how much security is possible without sacrificing efficient func8onality?
¡ Efficiency, security, and func8onality are at odds § E.g., strong encryp8on requires linear search 8me
PAST RESULTS IN SEARCHABLE SYMMETRIC ENCRYPTION
Security Efficiency Func8onality
Oblivious RAM [GO96] Excellent Imprac8cal All query types
Fully homomorphic encryp8on [G09] Excellent Imprac8cal
All query types
Exact-‐match SSE [SLDHJ10,GO96,G09,33,CM05]
Great Linear+ Exact-‐match
Exact-‐match SSE [CGKO06,SWP00,KO12]
Great Sub-‐linear Exact-‐match No dynamic updates
Range-‐query SSE [BW07] Great Linear+ Range
Prefix-‐preserving encryp8on [KIK12,BBKN01,XFAM02]
Vulnerable Sub-‐linear Range; specialized implementa8on
Order-‐preserving encryp8on [AKSX04] Undefined/Unknown
Sub-‐linear Range; simple to implement
Efficient fuzzy-‐searchable encryp8on [KIK12]
Undefined/Unknown
Sub-‐linear Error-‐tolerant
OUR GOALS
Provide provably-‐secure solu8ons for suppor8ng efficient (sublinear) Ø exact-‐match Ø range Ø error-‐tolerant
search on encrypted data
OUR RESULTS
Provide provably-‐secure solu8ons for suppor8ng efficient (sublinear)
• exact-‐match: efficiently-‐searchable encryp8on [ABO07], • range: order-‐preserving encryp8on (OPE) [BCLO09,BCO11], • error-‐tolerant: fuzzy-‐searchable encryp8on [BC14]
search on encrypted data
ORDER-PRESERVING ENCRYPTION (OPE)
A symmetric encryp8on scheme is order-‐preserving if encryp8on is determinis8c and strictly increasing.
Example OPE func8on for K $ � KeyGen
EncK(·)
plaintexts
ciph
ertexts
ORDER-PRESERVING ENCRYPTION (OPE)
A symmetric encryp8on scheme is order-‐preserving if encryp8on is determinis8c and strictly increasing.
Example OPE func8on for K $ � KeyGen
EncK(·)
m1m0
EncK(m0)
EncK(m1)
ORIGINS OF OPE
Ø OPE has a long history in the form of one-‐part codes. Ø In a one-‐part code, code words and transla8ons have the same order Ø To encrypt or decrypt requires only a single look-‐up table
Ø More recently, [AKSX04] suggested OPE as a protocol to support range queries for secure cloud storage.
EFFICIENT RANGE QUERIES VIA OPE
• Range query support is effortless using OPE [AKSX04]
• Can we make it secure? • Actually… how to even define security?
Client Server (encrypted database)
(EncK($35k), rec1)
(EncK($50k), rec2)
(EncK($68k), rec3)
(EncK($72k), rec4)
(EncK($95k), rec5)
Range($40k, $68k)Range(EncK($40k),EncK($68k))
{(EncK($50k), rec2) , (EncK($68k), rec3)}
TOWARDS OPE SECURITY MODEL
• OPE cannot be IND-‐CPA because it is determinis8c. • We have to weaken IND-‐CPA defini8on.
ATTEMPT #1: IND-DISTINCTCPA
• What if equality paOerns of LEFT and RIGHT queries must match? • Suitable for determinis8c encryp8on • S8ll unachievable by an OPE scheme, because order is leaked!
LEFT
L oracle (M0,M1)
EK(M0) A
b
RIGHT
R oracle (M0,M1)
EK(M1) A
b
M0 M0
M1 M1
EK(Mb) EK(Mb)
LEFT
RIGHT
Ciphertexts
Query pairs
EK(Mb) EK(Mb) Guess b = 1
*
*
* * Guess b = 0
ATTEMPT #2: IND-ORDEREDCPA
• What if order paOerns of LEFT and RIGHT queries must match?
LEFT (M0,M1)
EK(M0) A
b
RIGHT (M0,M1)
EK(M1) A
b
LEFT
RIGHT
Ciphertexts
Query pairs
M0 M0 M0 M0
M1 M1 M1 M1 M1
EK(Mb) EK(Mb) EK(Mb) EK(Mb)
Not allowed!
M0 2
2
2
3
3
3
4
4
4
1
1
1
5
5
L oracle R oracle
ATTEMPT #2: IND-ORDEREDCPA
• In fact, there is s8ll a general aOack against any OPE scheme[BCLO09]. • Demonstrates that OPE must leak rela8ve distance of plaintexts.
A DIFFERENT APPROACH TO SECURITY
• Instead of trying to relax IND-‐CPA further, we take an approach similar to PRF
• Require that an OPE is indis8nguishable from an “ideal” object, namely a random order-‐preserving func8on (ROPF).
POPF-SECURITY
We call an OPE scheme PseudorandomOPF-‐secure if no efficient adversary can output 1 with no8ceably different probabili8es between the two experiments.
TOWARD A CONSTRUCTION
• It is not immediately clear how the regular building block, a blockcipher, helps.
• Solu8on: combinatorics and sta8s8cs!
OPFS AND COMBINATIONS
Ø Observa8on: There is a bijec8on between the set of OPFs from [M] to [N] and the set of M-‐out-‐of-‐N combina8ons.
Ø Example:
THE NHGD CONNECTION
Ø This value follows the nega8ve hypergeometric distribu8on (NHGD) on parameters: range [N], domain [M], index i.
Ø Assume we have an efficient way to sample NHGD.
Pr [NHGD([N ], [M ], i) = c] =
�c�1i�1
��N�cM�i
��NM
�
Lazy-‐sampling a POPF on a message i (domain [M], range [N])
≅ Lazy-‐sampling the ith largest element of a (pseudo)random M-‐element subset of [N].
SINGLE-POINT LAZY SAMPLING
Example of lazy-‐sampling a single point:
2 3 4
1
5 6 7 8 9
1 2 3 4 5 plaintexts
ciph
ertexts
?
Domain [5], range [9]. To encrypt only i = 3: sample NHGD([9],[5],3). Suppose the outcome is 6. This occurs with probability
{?,?,6,?,?} (incomplete) OPF
and specifies the (incomplete) 5-‐element subset
?
Pr [NHGD([9], [5], 3) = 6]
=
�52
��32
��95
� ⇡ 0.24
MULTI-POINT LAZY-SAMPLING
Ø For the func8on to be determinis8c and order-‐preserving, lazy-‐sampling must take into account “exis8ng” points when selec8ng new points.
Ø An inefficient method would be to remember every exis8ng point and adjust further sampling parameters accordingly.
Ø But to make our eventual scheme stateless, we will instead take a binary search approach. Ø For now, assume a state consis8ng of pre-‐determined random coins (bitstrings) r1,r2,…,rM and consider this as the key to our scheme
LAZY-SAMPLING EXAMPLE
2 3 4
1
5 6 7 8 9
1 2 3 4 5 6 7
10 11 12 13 14 15 16
?
NHGD([16],[7],4;r4) → 10 NHGD([9],[3],2;r2) → 5 NHGD({6,7,8,9},{3},3;r3) → 6
Encrypt “3”
?
?
?
?
2 3 4
1
5 6 7 8 9
1 2 3 4 5 6 7
10 11 12 13 14 15 16
?
?
Under coins r1,r2,…,rM:
NHGD([16],[7],4;r4) → 10 NHGD([9],[3],2;r2) → 5 NHGD([4],{1},1;r3) → 2
Encrypt “1”
2 3 4
1
5 6 7 8 9
1 2 3 4 5 6 7
10 11 12 13 14 15 16
?
REMARKS ON LAZY-SAMPLING
No8ce that Ø Given random fixed coins for NHGD, we will lazily construct a (pseudo)random OPF
Ø Each encryp8on uses at most log2(M) calls to the NHGD sampler Ø Efficiency: log2(M) ·∙ tNHGD Ø The state consists only of the coins r1,r2,…,rM
REMOVING THE STATE
Instead of storing the random coins, we use a pseudorandom func8on (PRF) that takes as input the parameters to NHGD. The secret key to our scheme is just the key K to the blockcipher
NHGD(D1,R1,x1; )
PRFK(D1,R1,x1)
r1
r1 NHGD(D2,R2,x2; )
PRFK(D2,R2,x2)
r2
r2 NHGD(D3,R3,x3; ) r3
PRFK(D3,R3,x3)
r3
MOVE TO HYPERGEOMETRIC
Ø There does not seem to be an efficient NHGD algorithm. ! Ø Instead we use a related distribu8on: Hypergeometric Distribu8on (HGD), which can be sampled efficiently [KS85]. Ø It describes how many members of a random M-‐set are less than value y, for 1 ≤ y ≤ N
Ø HGD can be used if we slightly modify the algorithms.
Ø This gives rise to a POPF-‐secure OPE. ☺ Ø Efficiency is the same, log M ·∙ tHGD , on average.
RECAP OF OPE
Ø Appropriate defini8on of security: POPF Ø Our later study [BCO11] helped to clarify security leakage of POPF.
Ø POPF-‐secure OPE construc8on via lazy-‐sampling on the HGD distribu8on.
GREAT COLLABORATION AT A GLANCE
Skyhigh and myself had numerous fruitful discussions. I was incredibly pleased with their approach and questions.
Ø They valued and wanted to understand provable security, and wanted to employ provably secure schemes.
Ø They asked great questions and listened. Ø They think open source is a must. Ø They read academic papers and attended academic conferences. Ø They hired the Advisory board of crypto experts. Ø They managed to make us think. Ø They managed to spark new research projects.
CHALLENGES WITH DEPLOYING OPE
Speed of algorithm • HGD sampling means un-‐op8mized implementa8on is very slow. • Op8miza8on required extensive use of low-‐level floa8ng point libraries to speed up HGD sampling
Ciphertext length • Padding is required to preserve lexicographic orders. Padding plaintexts also means the ciphertexts are long.
Need to fix input and output lengths in advance • No known secure way to use OPE like a block cipher • Makes using OPE for different types of data (longer/shorter) difficult
What order is preserved? • Lexicographic? Numeric? Alphabe8c? ASCII-‐be8c? • Different orderings require different func8ons to encode input as integers before encryp8on • Needs to be the same order as cloud applica8on, but different apps could have different orderings
MORE CHALLENGES
Ø Tradeoff of security for functionality Ø Everybody wants to search everything, all the time.
Ø Can’t have great security at the same time.
Ø What security level is appropriate?
Ø How to explain to customers the security they are getting?
Ø May be easier for exact-match queries. However, Ø When is frequency analysis an appropriate risk?
Ø For what data?
Ø Non-trivial for OPE
New research project
White paper
MORE CHALLENGES
Ø Granularity of exact-match search
Ø Encrypt every word? Every line? Every paragraph? What is appropriate tradeoff of usability for security?
Ø If OPE can be stateful, can we improve efficiency?
New research project