+ All Categories
Home > Technology > Searching Encrypted Cloud Data: Academia and Industry Done Right

Searching Encrypted Cloud Data: Academia and Industry Done Right

Date post: 19-Jul-2015
Category:
Upload: skyhigh-networks-cloud-security-software
View: 474 times
Download: 0 times
Share this document with a friend
Popular Tags:
61
Searching Encrypted Cloud Data: Case Study on Academia + Industry (Done Right) Alexandra (Sasha) Boldyreva School of Computer Science in the College of Computing Georgia Institute of Technology
Transcript

Searching Encrypted Cloud Data: Case Study on Academia + Industry (Done Right)

Alexandra (Sasha) Boldyreva

School of Computer Science in the College of Computing Georgia Institute of Technology

Academia

TWO WORLDS OF CRYPTO DEVELOPMENT

Industry

TWO WORLDS OF CRYPTO DEVELOPMENT

Academia Industry

Ø   Why  are  the  two  worlds  so  disjointed?  

Ø   Is  this  unavoidable?  

TWO WORLDS, A CLOSER LOOK: ACADEMIA

Priorities in protocol design

Ø Competitiveness

Ø Can be published

Ø Novel Ø Non-trivial, uses interesting ideas Ø  Provably-secure Ø Uses novel useful technics Ø Has impact on future research

TWO WORLDS, A CLOSER LOOK: INDUSTRY

Priorities in protocol design

Ø Competitiveness

Ø Can be sold

Ø Novel Ø Useful Ø Very efficient Ø  Legislation compliant Ø Resists obvious attacks

TWO WORLDS, A CLOSER LOOK

Priorities in protocol design – Let’s highlight the most importance differences:

ACADEMIA   INDUSTRY  

Ø  Novel Ø  Non-trivial, uses interesting ideas Ø  Provably-secure Ø  Uses novel useful technics Ø  Has impact on future research  

Ø  Novel Ø  Useful Ø  Very efficient Ø  Legislation compliant Ø  Resists obvious attacks

TWO WORLDS, A CLOSER LOOK

ACADEMIA   INDUSTRY  

Ø  Public Ø  Complex Ø  Not very efficient Ø  Rarely used Ø  Provably-secure (provide security

guarantees)  

Ø  Often proprietary Ø  Simpler Ø  Very efficient Ø  Solve real problems Ø  Used Ø  Security is not well understood

They lead to differences in schemes’ common properties …

TWO WORLDS, A CLOSER LOOK

How academics view crypto products from the industry …. Prac88oners  should  not  design  crypto  

schemes  as  they  cannot  prove  their  schemes  secure.  They  should  use  our  schemes.  

TWO WORLDS, A CLOSER LOOK

Provable security is a great methodology that allows us to have schemes with security guarantees. However,

Ø  the definitions (and proofs) are very often hard to understand and judge how do they “match” reality;

Ø  it is hard to have schemes which are provably-secure under well-studied assumptions for strong definitions and are efficient.

TWO WORLDS, A CLOSER LOOK

How practitioners view the work produced by academia … Academics  do  not  understand  what  is  needed  

in  prac8ce.  What  they  call  “efficient”  and  “prac8cal”  are  not.  Their  papers  are  hard  to  understand.  Strong  security  is  a  hassle.    

TWO WORLDS, A CLOSER LOOK

TWO WORLDS, CAN THEY WORK TOGETHER?

TWO WORLDS, CAN THEY WORK TOGETHER?

This  is  possible!  

REAL-LIFE SUCCESS STORY

Crypto researchers

WHAT IS SKYHIGH?

PRODUCT OVERVIEW

Skyhigh Networks' product allows customers to use the existing cloud service providers with added security and without losing functionality, e.g. search. Employ

Ø  symmetric searchable encryption (can search for encrypted keyword or sort encrypted data),

Ø  format-preserving encryption

Ø  …

831 services used by an enterprise on average

CLOUD USE GROWING RAPIDLY

SECURITY CONCERNS INHIBITING USE

INDUSTRY TURNS TO ENCRYPTION

IMPORTANT CONSIDERATIONS

① Which  schemes  do  we  employ  and  how  do  we  shepherd  an  algorithm  from  concep8on  to  deployment?    

② How  do  we  op8mize  cryptographic  schemes  without  invalida8ng  their  security  proofs?    

③ In  what  situa8ons  is  it  appropriate  to  trade  security  for  func8onality  in  a  piece  of  commercial  soJware?    

 These  ques8ons  must  be  answered  before  a  new  algorithm  reaches  a  customer.    

GENERAL CONCERNS WITH ENCRYPTION

No secure coding guidelines - hard to know what is acceptable and when

Fighting misinformation in the market •  Consumers don’t understand security/usability tradeoffs •  They expect full security and full functionality for their data

Weighing tradeoffs between security and functionality •  When is it appropriate to have a weakened security guarantee •  For what kinds of data?

THE START OF GREAT COLLABORATION

Skyhigh reached out to me, as I was actively working on protocols for efficiently-searchable encryption.

MY WORK ON SEARCHABLE ENCRYPTION

Georgios  Amana8dis  Nathan  CheneOe  

Younho  Lee  Adam  O’Neill  

Joint  effort  with  my  colleagues:  

CLOUD STORAGE

• A.k.a.  Database-­‐as-­‐a-­‐Service  •  Server  efficiently  responds  to  client’s  queries/updates  

•  Query  efficiency:  search  8me  sub-­‐linear  in  database  size  •  Query  func8onality:  exact-­‐match,  range,  error-­‐tolerant  (fuzzy),…  

Cloud  Server  (database)  

Client  

($35k, rec1)

($50k, rec2)

($68k, rec3)

($72k, rec4)

($95k, rec5)

ExactMatch($68k)

($68k, rec3)

CLOUD STORAGE

Cloud  Server  (database)  

Client  

($35k, rec1)

($50k, rec2)

($68k, rec3)

($72k, rec4)

($95k, rec5)

Range($40k, $68k)

{($50k, rec2)($68k, rec3)}

• A.k.a.  Database-­‐as-­‐a-­‐Service  •  Server  efficiently  responds  to  client’s  queries/updates  

•  Query  efficiency:  search  8me  sub-­‐linear  in  database  size  •  Query  func8onality:  exact-­‐match,  range,  error-­‐tolerant  (fuzzy),…  

CLOUD STORAGE

Cloud  Server  (database)  

Client  

($35k, rec1)

($50k, rec2)

($68k, rec3)

($72k, rec4)

($95k, rec5)

Fuzzy($70k)

{($68k, rec3)($72k, rec4)}

• A.k.a.  Database-­‐as-­‐a-­‐Service  •  Server  efficiently  responds  to  client’s  queries/updates  

•  Query  efficiency:  search  8me  sub-­‐linear  in  database  size  •  Query  func8onality:  exact-­‐match,  range,  error-­‐tolerant  (fuzzy),…  

SECURE CLOUD STORAGE: GOALS

Three  goals:  security,  efficiency,  func8onality  

Secure  Cloud  Server  (encrypted  database)  

Client  

(EncK($72k), rec4)

(EncK($68k), rec3)

(EncK($95k), rec5)

(EncK($35k), rec1)

(EncK($50k), rec2)

Security  searchable  data  is  symmetrically  encrypted  

Efficiency  server  responds  to  query  in  sub-­‐linear  8me  

Func8onality  various  query  types,  data  updates,  …  

EFFICIENT SEARCHABLE ENCRYPTION

¡  The  study  of  schemes  balancing  these  goals  is  efficient  searchable  encryp8on  (ESE)  §  Cryptographic  efforts  oJen  focus  on  strong  security  

§  Prac88oners  wonder:  how  much  security  is  possible  without  sacrificing  efficient  func8onality?  

¡  Efficiency,  security,  and  func8onality  are  at  odds  §  E.g.,  strong  encryp8on  requires  linear  search  8me  

PAST RESULTS IN SEARCHABLE SYMMETRIC ENCRYPTION

Security   Efficiency   Func8onality  

Oblivious  RAM  [GO96]   Excellent   Imprac8cal   All  query  types  

Fully  homomorphic  encryp8on  [G09]   Excellent   Imprac8cal    

All  query  types    

Exact-­‐match  SSE  [SLDHJ10,GO96,G09,33,CM05]  

Great   Linear+   Exact-­‐match  

Exact-­‐match  SSE  [CGKO06,SWP00,KO12]  

Great   Sub-­‐linear   Exact-­‐match  No  dynamic  updates  

Range-­‐query  SSE  [BW07]   Great   Linear+   Range  

Prefix-­‐preserving  encryp8on  [KIK12,BBKN01,XFAM02]  

Vulnerable   Sub-­‐linear   Range;  specialized  implementa8on  

Order-­‐preserving  encryp8on  [AKSX04]   Undefined/Unknown  

Sub-­‐linear   Range;  simple  to  implement    

Efficient  fuzzy-­‐searchable  encryp8on  [KIK12]  

Undefined/Unknown  

Sub-­‐linear   Error-­‐tolerant  

OUR GOALS

Provide  provably-­‐secure  solu8ons  for  suppor8ng  efficient  (sublinear)  Ø exact-­‐match    Ø range  Ø error-­‐tolerant    

search  on  encrypted  data  

OUR RESULTS

Provide  provably-­‐secure  solu8ons  for  suppor8ng  efficient  (sublinear)  

•  exact-­‐match:  efficiently-­‐searchable  encryp8on  [ABO07],    •  range:  order-­‐preserving  encryp8on  (OPE)  [BCLO09,BCO11],  •  error-­‐tolerant:  fuzzy-­‐searchable  encryp8on  [BC14]  

search  on  encrypted  data  

ORDER-PRESERVING ENCRYPTION

ORDER-PRESERVING ENCRYPTION (OPE)

A  symmetric  encryp8on  scheme  is  order-­‐preserving  if  encryp8on  is  determinis8c  and  strictly  increasing.  

Example  OPE  func8on  for  K $ � KeyGen

EncK(·)

plaintexts  

ciph

ertexts  

ORDER-PRESERVING ENCRYPTION (OPE)

A  symmetric  encryp8on  scheme  is  order-­‐preserving  if  encryp8on  is  determinis8c  and  strictly  increasing.  

Example  OPE  func8on  for  K $ � KeyGen

EncK(·)

m1m0

EncK(m0)

EncK(m1)

ORIGINS OF OPE

Ø OPE  has  a  long  history  in  the  form  of  one-­‐part  codes.  Ø In  a  one-­‐part  code,  code  words  and  transla8ons  have  the  same  order  Ø To  encrypt  or  decrypt  requires  only  a  single  look-­‐up  table    

Ø More  recently,  [AKSX04]  suggested  OPE  as  a  protocol  to  support  range  queries  for  secure  cloud  storage.  

EFFICIENT RANGE QUERIES VIA OPE

• Range  query  support  is  effortless  using  OPE  [AKSX04]    

• Can  we  make  it  secure?  • Actually…  how  to  even  define  security?  

Client   Server  (encrypted  database)  

(EncK($35k), rec1)

(EncK($50k), rec2)

(EncK($68k), rec3)

(EncK($72k), rec4)

(EncK($95k), rec5)

Range($40k, $68k)Range(EncK($40k),EncK($68k))

{(EncK($50k), rec2) , (EncK($68k), rec3)}

OPE SECURITY MODEL

TOWARDS OPE SECURITY MODEL

• OPE  cannot  be  IND-­‐CPA  because  it  is  determinis8c.  • We  have  to  weaken  IND-­‐CPA  defini8on.  

ATTEMPT  #1: IND-DISTINCTCPA

• What  if  equality  paOerns  of  LEFT  and  RIGHT  queries  must  match?  •  Suitable  for  determinis8c  encryp8on  •  S8ll  unachievable  by  an  OPE  scheme,  because  order  is  leaked!  

LEFT  

L  oracle  (M0,M1)  

EK(M0)  A  

b  

RIGHT  

R  oracle  (M0,M1)  

EK(M1)  A  

b  

M0    M0  

M1   M1  

EK(Mb)   EK(Mb)  

LEFT  

RIGHT  

Ciphertexts  

Query  pairs  

EK(Mb)  EK(Mb)   Guess  b  =  1  

*  

*  

*   *   Guess  b  =  0  

ATTEMPT #2: IND-ORDEREDCPA

• What  if  order  paOerns  of  LEFT  and  RIGHT  queries  must  match?  

LEFT  (M0,M1)  

EK(M0)  A  

b  

RIGHT  (M0,M1)  

EK(M1)  A  

b  

LEFT  

RIGHT  

Ciphertexts  

Query  pairs  

M0   M0   M0  M0  

M1   M1   M1   M1   M1  

EK(Mb)   EK(Mb)   EK(Mb)   EK(Mb)  

Not  allowed!  

M0  2  

2  

2  

3  

3  

3  

4  

4  

4  

1  

1  

1  

5  

5  

L  oracle   R  oracle  

ATTEMPT #2: IND-ORDEREDCPA

•  In  fact,  there  is  s8ll  a  general  aOack  against  any  OPE  scheme[BCLO09].  • Demonstrates  that  OPE  must  leak  rela8ve  distance  of  plaintexts.  

A DIFFERENT APPROACH TO SECURITY

• Instead  of  trying  to  relax  IND-­‐CPA  further,  we  take  an  approach  similar  to  PRF  

• Require  that  an  OPE  is  indis8nguishable  from  an  “ideal”  object,  namely  a  random  order-­‐preserving  func8on  (ROPF).  

POPF-SECURITY

We  call  an  OPE  scheme  PseudorandomOPF-­‐secure  if  no  efficient  adversary  can  output  1  with  no8ceably  different  probabili8es  between  the  two  experiments.  

SECURE OPE CONSTRUCTION

TOWARD A CONSTRUCTION

• It  is  not  immediately  clear  how  the  regular  building  block,  a  blockcipher,  helps.  

• Solu8on:  combinatorics  and  sta8s8cs!  

OPFS AND COMBINATIONS

Ø Observa8on:  There  is  a  bijec8on  between  the  set  of  OPFs  from  [M]  to  [N]  and  the  set  of  M-­‐out-­‐of-­‐N  combina8ons.  

Ø Example:  

THE NHGD CONNECTION

Ø This  value  follows  the  nega8ve  hypergeometric  distribu8on  (NHGD)  on  parameters:  range  [N],  domain  [M],  index  i.  

Ø Assume  we  have  an  efficient  way  to  sample  NHGD.  

Pr [NHGD([N ], [M ], i) = c] =

�c�1i�1

��N�cM�i

��NM

Lazy-­‐sampling  a  POPF  on  a  message  i  (domain  [M],  range  [N])  

≅ Lazy-­‐sampling  the  ith  largest  element  of  a  (pseudo)random  M-­‐element  subset  of  [N].  

SINGLE-POINT LAZY SAMPLING

Example  of  lazy-­‐sampling  a  single  point:  

2  3  4  

1  

5  6  7  8  9  

1   2   3   4   5  plaintexts  

ciph

ertexts  

?  

Domain  [5],  range  [9].  To  encrypt  only  i  =  3:  sample  NHGD([9],[5],3).  Suppose  the  outcome  is  6.    This  occurs  with  probability  

{?,?,6,?,?}   (incomplete)  OPF  

   and  specifies  the  (incomplete)  5-­‐element  subset  

?  

Pr [NHGD([9], [5], 3) = 6]

=

�52

��32

��95

� ⇡ 0.24

MULTI-POINT LAZY-SAMPLING

Ø For  the  func8on  to  be  determinis8c  and  order-­‐preserving,  lazy-­‐sampling  must  take  into  account  “exis8ng”  points  when  selec8ng  new  points.  

Ø An  inefficient  method  would  be  to  remember  every  exis8ng  point  and  adjust  further  sampling  parameters  accordingly.  

Ø But  to  make  our  eventual  scheme  stateless,  we  will  instead  take  a  binary  search  approach.  Ø For  now,  assume  a  state  consis8ng  of  pre-­‐determined  random  coins  (bitstrings)  r1,r2,…,rM  and  consider  this  as  the  key  to  our  scheme  

LAZY-SAMPLING EXAMPLE

2  3  4  

1  

5  6  7  8  9  

1   2   3   4   5   6   7  

10  11  12  13  14  15  16  

?  

NHGD([16],[7],4;r4)  →  10  NHGD([9],[3],2;r2)  →  5  NHGD({6,7,8,9},{3},3;r3)  →  6  

Encrypt  “3”  

?  

?  

?  

?  

2  3  4  

1  

5  6  7  8  9  

1   2   3   4   5   6   7  

10  11  12  13  14  15  16  

?  

?  

Under  coins  r1,r2,…,rM:  

NHGD([16],[7],4;r4)  →  10  NHGD([9],[3],2;r2)  →  5  NHGD([4],{1},1;r3)  →  2  

Encrypt  “1”  

2  3  4  

1  

5  6  7  8  9  

1   2   3   4   5   6   7  

10  11  12  13  14  15  16  

?  

REMARKS ON LAZY-SAMPLING

No8ce  that    Ø Given  random  fixed  coins  for  NHGD,  we  will  lazily  construct  a  (pseudo)random  OPF  

Ø Each  encryp8on  uses  at  most  log2(M)  calls  to  the  NHGD  sampler  Ø Efficiency:  log2(M)  ·∙  tNHGD  Ø The  state  consists  only  of  the  coins  r1,r2,…,rM  

REMOVING THE STATE

Instead  of  storing  the  random  coins,  we  use  a  pseudorandom  func8on  (PRF)  that  takes  as  input  the  parameters  to  NHGD.  The  secret  key  to  our  scheme  is  just  the  key  K  to  the  blockcipher  

NHGD(D1,R1,x1;      )  

PRFK(D1,R1,x1)  

r1  

r1     NHGD(D2,R2,x2;      )  

PRFK(D2,R2,x2)  

r2  

r2   NHGD(D3,R3,x3;      )  r3  

PRFK(D3,R3,x3)  

r3  

MOVE TO HYPERGEOMETRIC

Ø There  does  not  seem  to  be  an  efficient  NHGD  algorithm.  !  Ø Instead  we  use  a  related  distribu8on:  Hypergeometric  Distribu8on  (HGD),  which  can  be  sampled  efficiently  [KS85].  Ø It  describes  how  many  members  of  a  random  M-­‐set  are  less  than  value  y,  for  1  ≤  y  ≤  N  

Ø HGD  can  be  used  if  we  slightly  modify  the  algorithms.  

Ø This  gives  rise  to  a  POPF-­‐secure  OPE.  ☺  Ø Efficiency  is  the  same,  log  M  ·∙  tHGD  ,  on  average.  

RECAP OF OPE

Ø Appropriate  defini8on  of  security:  POPF  Ø Our  later  study  [BCO11]  helped  to  clarify  security  leakage  of  POPF.  

Ø POPF-­‐secure  OPE  construc8on  via  lazy-­‐sampling  on  the  HGD  distribu8on.  

COLLABORATION AT A GLANCE

GREAT COLLABORATION AT A GLANCE

Skyhigh and myself had numerous fruitful discussions. I was incredibly pleased with their approach and questions.

Ø  They valued and wanted to understand provable security, and wanted to employ provably secure schemes.

Ø  They asked great questions and listened. Ø  They think open source is a must. Ø  They read academic papers and attended academic conferences. Ø  They hired the Advisory board of crypto experts. Ø  They managed to make us think. Ø  They managed to spark new research projects.

CHALLENGES WITH DEPLOYING OPE

Speed  of  algorithm  •  HGD  sampling  means  un-­‐op8mized  implementa8on  is  very  slow.    •  Op8miza8on  required  extensive  use  of  low-­‐level  floa8ng  point  libraries  to  speed  up  HGD  sampling  

Ciphertext  length  •  Padding  is  required  to  preserve  lexicographic  orders.  Padding  plaintexts  also  means  the  ciphertexts  are  long.    

Need  to  fix  input  and  output  lengths  in  advance  •  No  known  secure  way  to  use  OPE  like  a  block  cipher  •  Makes  using  OPE  for  different  types  of  data  (longer/shorter)  difficult  

What  order  is  preserved?  •  Lexicographic?  Numeric?  Alphabe8c?  ASCII-­‐be8c?    •  Different  orderings  require  different  func8ons  to  encode  input  as  integers  before  encryp8on  •  Needs  to  be  the  same  order  as  cloud  applica8on,  but  different  apps  could  have  different  orderings  

 

MORE CHALLENGES

Ø  Tradeoff of security for functionality Ø  Everybody wants to search everything, all the time.

Ø Can’t have great security at the same time.

Ø What security level is appropriate?

Ø How to explain to customers the security they are getting?

Ø May be easier for exact-match queries. However, Ø When is frequency analysis an appropriate risk?

Ø  For what data?

Ø Non-trivial for OPE

New  research  project  

White  paper  

MORE CHALLENGES

Ø Granularity of exact-match search

Ø  Encrypt every word? Every line? Every paragraph? What is appropriate tradeoff of usability for security?

Ø  If OPE can be stateful, can we improve efficiency?

New  research  project  

CRYPTO ADVISORY BOARD ESTABLISHED IN 2014

Crypto-Advisory Board

CONCLUSIONS

Ø  Industry and academia can be friends

Ø  It’s good to have new friends


Recommended