Post on 15-Jul-2015
transcript
Verifiable Responses to
Accumulo Queries
Cassandra Sparks
Robert K. Cunningham, Ariel Hamlin, Emily Shen,
Mayank Varia, David A. Wilson, Arkady Yerukhimovich
April 29, 2015
This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-002. Opinions, interpretations, recommendations
and conclusions are those of the authors and are not necessarily endorsed by the United States Government.
Verifiable Queries - 2
CS 04/29/15
Introduction to MIT Lincoln Laboratory
Established 1951
Lincoln Laboratory is a Department of Defense FFRDC operated by MIT
FFRDC: Federally Funded Research and Development Center
Verifiable Queries - 3
CS 04/29/15
Technology in Support of National Security
Sensors Information Extraction Communications
Integrated Sensing and Decision Support
(Secure – Countermeasure Resistant)
Purpose
Core Work Areas
Space Control
Intelligence,
Surveillance, and
Reconnaissance Systems
and Technology
Tactical Systems
Air and Missile
Defense TechnologyHomeland ProtectionAir Traffic Control
Communication Systems Advanced TechnologyCyber Security and
Information SciencesEngineering
Current Mission Areas
MIT Lincoln Laboratory
Cyber Security and
Information Sciences
Verifiable Queries - 4
CS 04/29/15
Common Big Data Architecture
CommandersOperators Analysts
Users
MaritimeGround SpaceC2 CyberOSINT
<html>
Data
AirHUMINTWeather
Analytics
A
C
D E
B
Computing
Web
Files
Scheduler
Ingest &
EnrichmentIngest &
EnrichmentIngest
This talk: cryptographically
securing Accumulo
Verifiable Queries - 5
CS 04/29/15
Threats to Accumulo
• Outsourced "cloud" server
– Learn content of data/queries
– Misattribute data to inserting clients
• Malicious insider (likely a sysadmin)
– Learn/change data or queries
– Misinform honest users
• Malicious clients
– Make unauthorized queries
– Learn stored data
– Learn other clients’ queries
• External attacker
– Insert malware, hack, etc
– We won’t detect these, but our crypto provides resiliency
Our focus: security against the server
Verifiable Queries - 6
CS 04/29/15
Querying
Clients
Secure Accumulo Overview
Hadoop Distributed Filesystem
Accumulo
Zookeeper
Network
Inserting
Clients
End-to-end
signatures
Attribute-based
access control
Cell-level
encryption
Verifiable
query
results
System administrator
Data at rest encryption
TLS encryption
Accumulo provides
no safeguards!
We improve the security of Accumulo with cryptography
Verifiable Queries - 7
CS 04/29/15
Outline
• Introduction
• End-to-End Signatures
– Digital Signatures
– Design Overview
– Implementation Details
• Verifiable Query Results
• Conclusion
Verifiable Queries - 8
CS 04/29/15
Accumulo
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Inserts in Accumulo
Inserting
Client
Querying
Client
?Row Column
Family
Column
Qualifier
Visibilit
y
Timestamp Value
Patient A Hospital 1 Diagnoses Doctor 12349857 …
Verifiable Queries - 9
CS 04/29/15
• A signature algorithm has three phases:
MessageMessage
Key
Generation
Digital Signatures
Signing
A signature scheme is secure if an adversary cannot forge a
signature for a new message without having the signing key
Wrong
Message
Verification
Verifiable Queries - 10
CS 04/29/15
Accumulo
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Digital Signatures in Accumulo
Querying
Client
Row Column
Family
Column
Qualifier
Visibility
Field
Timestamp Value
Patient A Hospital 1 Diagnoses Doctor 12349857 …
Inserting
Client
VerifSign
Verifiable Queries - 11
CS 04/29/15
Signature Code
• Implemented in Python as a client-side wrapper
– Uses the pyaccumulo library
– No server-side modifications needed
• Currently in the process of being open-sourced
– Contact pace-contact@ll.mit.edu for updates
• Several interesting design choices:
– Where to store the signature metadata?
– There are many signature algorithms—which one to use?
Verifiable Queries - 12
CS 04/29/15
Storing Signature Metadata
• How do we store the signature of each cell?
Option 1: Separate table Option 2: Value field Option 3: Visibility Field
Pro: original table is
unmodified
Con: twice as many
reads & writes
Pro: value field is good at
storing unstructured data
Con: interferes with iterators
Patient Records
Patient 1 Flu shot
Patient 2 Broken knee
Patient 3 Chicken pox
Doctor
Admin
Admin
Patient Records Signatures
Patient 1 <signature 1>
Patient 2 <signature 2>
Patient 3 <signature 3>
Doctor
Admin
Admin
Patient Records
Patient 1 <signature 1>|Flu shot
Patient 2 <signature 2>|Broken knee
Patient 3 <signature 3>|Chicken pox
Doctor
Admin
Admin
Patient Records
Patient 1 Flu shot
Patient 2 Broken knee
Patient 3 Chicken pox
Doctor|“<signature 1>”
Admin|“<signature 2>”
Admin|“<signature 3>”
Pro: all Accumulo functionality
still works
Con: interferes with visibility label
evaluation optimizations
We support all three options
Verifiable Queries - 13
CS 04/29/15
Signature Algorithm Options
We support RSA and ECDSA signatures, and are investigating
how to safely use MACs
Option 1:
RSA Signatures
Option 2:
Elliptic Curve
Signatures (ECDSA)
Option 3:
Message Authentication
Codes
• Fast signature verification
• Large signature & key size
• Fast signature creation
• Relatively small signature & key sizes
• Symmetric key---uses the same key for signing & verification
• Much faster than RSA and ECDSA
• Con: one malicious client has more power to interfere with integrity
Verifiable Queries - 14
CS 04/29/15
Performance
(curve secp256r1)
Benchmarked on a virtualized single-node Accumulo 1.7.0 instance
Verifiable Queries - 15
CS 04/29/15
Security Summary: Signatures
• Signatures allow clients to verify data integrity
– Malicious server cannot modify or fabricate results
• Signatures cannot verify data completeness
– Server could omit both data & signature to avoid detection
Modification Insertion Omission
Signatures can detect:
Verifiable Queries - 16
CS 04/29/15
Outline
• Introduction
• End-to-End Signatures
• Verifiable Query Results
– Merkle Hash Trees
– Authenticated Skip Lists
• Conclusion
Verifiable Queries - 17
CS 04/29/15
The digest is a small
value (constant size)
that represents the
entire dataset
digest
Authenticated Data Structures
• Data structures that allow provably correct queries
– Correctness defined relative to a trusted, well-known source
– Need to support range queries
VO
Inserting Client
Accumulo Server
Querying Client?
VO
ADS
ADS: Authenticated Data Structure
VO: Verification Object
Verifiable Queries - 18
CS 04/29/15
digest
Merkle Hash Trees
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)Digest is the root
node’s hash value
Verifiable Queries - 19
CS 04/29/15
Merkle Hash Trees
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
range(5, 9)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Naïve solution allows a malicious server
to omit elements at the ends of ranges
Part of the range returned
Part of the verification object
Computed based on returned
information
Verifiable Queries - 20
CS 04/29/15
Naïve Merkle Tree Security
Omitting internal
query results
Signatures:
Naïve MHTs:
Solution: return boundaries of the range
Omitting boundary
query results
Verifiable Queries - 21
CS 04/29/15
Merkle Hash Trees, Revisited
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
range(5, 9)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Part of the range returned
Part of the verification object
Computed based on returned
information
Verifiable Queries - 22
CS 04/29/15
Security Summary: ADSs
Signatures:
Naïve MHTs:
MHTs:
Omitting internal
query results
Omitting boundary
query results
Verifiable Queries - 23
CS 04/29/15
Merkle Hash Tree Disadvantages
• Mostly used for static data
• How to insert elements into MHTs?
Approach 1: Unbalanced Insert Approach 2: Balanced Insert
Linear time
operations!Linear time
insert!
Verifiable Queries - 24
CS 04/29/15
Authenticated Skip Lists
O(log(n)) O(log(n))(expected)
O(n) O(log(n))(expected)
O(log(n)) O(log(n))(expected)
MHT Skip List
Lookup
Insert
Verify
Randomized skip lists
have empirically better
performance than other
tree-like data structures
Verifiable Queries - 25
CS 04/29/15
Outline
• Introduction
• End-to-End Signatures
• Verifiable Query Results
• Conclusion
Verifiable Queries - 26
CS 04/29/15
Additional Work
• Confidentiality to hide data from the server & unauthorized users
– Per-cell encryption allows flexible encryption for different use cases
– Cryptographically enforcing Accumulo’s visibility labels with key management
• Using HMACs for better performance without sacrificing security
• Key management and distribution for all cryptographic components
Verifiable Queries - 27
CS 04/29/15
Conclusion
• Signatures for data tampering detection
– Currently implemented in Python
– Client-side library
– Contact pace-contact@ll.mit.edu to be notified when the code is open-sourced
• Authenticated Data Structures for full query correctness checks
– Working on embedding in Accumulo for greater efficiency
Questions?