Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | hugh-parsons |
View: | 214 times |
Download: | 0 times |
http://www.stealthsoftwareinc.com
Distributing a Classified Search*
Rafail Ostrovsky William Skeith
Stealth Software Technologies, LLC
Topics We’ll Cover Motivational example (“no-fly” list)
Just one of many applications, but it illustrates the ideas well
The process: Generating an encrypted search Distributed execution/result monitoring Decryption/analysis of data
The benefits: Savings in computation Savings in communication Simple and efficient monitoring
The implementation High-performance Parallelized design
Demonstration
Motivational example:“No-fly” list Search for classified names and aliases of
suspected terrorists Knowledge of aliases must be kept secret
If not, the advantage derived from this intelligence may become void
Until now, this precludes a distributed search Without our technology, one must rely on an
“import, then process” method Our technology allows any willing and able
party to help perform the search
Problems with Import, then Process
Expensive in communication Averse to dynamic data Difficult to manage and synchronize
data from vast and disparate sources Expensive in processing
Processing must be done locally Not entirely respectful of citizens’
privacy
Our Technology Allows data to be searched where it naturally
resides, despite the criteria being sensitive or classified
Attractive alternative to the import, then process paradigm
Ideal for dynamic, distributed, streaming data Creates savings in communication and
processing Enables low-latency, low-complexity monitoring Symmetrically preserves privacy
The records of “un-interesting” citizens will not be collected
Process Outline Step 1: (secure environment) given sensitive
or classified search criteria, create an encrypted search
Step 2: (any environment, may be unclassified) migrate encrypted search to multiple machines on any network Every machine runs encrypted search on (local)
data, writing output to small encrypted buffers Migrate encrypted buffers to a classified machine
*as needed* using real-time monitoring
Step 3: (secure environment) decrypt buffers and analyze results
Step 1: Create Encrypted Search
101010101011100000110101011000100100101010101010000101111110100100110100110101011101011001001000111011010110101100010010011100100101101011101010010101010000101110
Encrypted version of search is indistinguishable from a random distribution.
Mohamed AttaHani HanjourZiad Jarrah
Encrypted Search: Provably reveals no information about
search terms The guarantee of security holds even if
an adversary acquires: The encrypted search description The program’s output (which is also
encrypted) The program’s source code
Therefore, it can be distributed outside of a classified environment
Step 2: Distribute Search
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
Step 2: Distribute Search Any willing and able parties may now
participate The outside participants know they are
helping with a search, but remain oblivious as to what they are searching for
Generic program (distributed only once) executes encrypted search descriptions on plaintext data Results are collected in small encrypted
buffers
How does it work? Based on homomorphic encryption
Given E(x), E(y) it holds that E(x)+E(y) = E(x+y)
Allows a party without the decryption key to still do something “useful” with encrypted data, although it remains unreadable
Allows us to conditionally encrypt only matching documents Process outputs E(0) for a non-matching
document, and outputs E(D) if the document D in fact matches the query
Real-Time Monitoring Traditional methods are unpleasant-
typically very complex and communication-intensive
Constant downloads / synchronization High complexity, high communication
Waiting for batches Reduces complexity, but increases latency
and still involves unnecessary communication
Real-Time Monitoring – Our Solution
I’m John Doe.
I’m Jane Lane.Mohammed
Atta.
A small encrypted flag can be periodically transmitted indicating the presence or absence of any search results. This provides a simple mechanism for real-time monitoring.
Small 0/1 flag
(Encrypted)
Real-Time Monitoring
The encrypted flags can be aggregated so that one small value can indicate the presence or absence of results for an entire airport, if desired.
Rather than monitoring a constant stream of thousands of names, one small value can be periodically checked.
Real-Time Monitoring Saves communication- only download
data when needed Furthermore, you only download what you
need Low-overhead, low-complexity method
for monitoring vast data sources Ideal for highly dynamic data Ideal for situations where long
knowledge latency is unacceptable
A Note on Encrypted Flags Encrypted flags can contain a lot,
or only a little information, depending on the application
They can give additional information, e.g. a more specific location where a hit was found and the number of hits
If desired, it can be guaranteed to only take values of “yes” or “no”
Step 3: Decryption
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
110100100101001001001001110110101011
Step 3: Decryption
Once it has been determined that “interesting data” has been collected: Download the small buffers Transfer to a classified environment Then, decrypt buffers to obtain results
Summary of Benefits Strong security guarantees enable
distribution of a sensitive/classified search Massive parallelism Process data where it naturally resides
Creates savings in processing Creates vast savings in storage and
communication Low-latency monitoring on highly
dynamic data and low-latency searching Preserves privacy in both directions
Other applications Google-like search service for the intelligence
community, using an unclassified server farm to perform searches
Distributed intelligence search, similar to SETI@home Monitor news feeds, etc…
Federal to state interactions Agency to agency interactions Ship/Truck manifests, routes, anomalies Truck driver information Private aircraft flight plan/pilot/cargo information Financial data mining
Auditing financial data in private Immigration/visa data-mining
Implementation: Design and Performance Parallelism: a growing industry trend
Intel now ships nearly 100% of its servers with multi-core processors, and over 90% of its desktops
“Multi-core processors represent a major evolution in computing technology… they will eventually become the pervasive computing model” – AMD
Our software dynamically takes advantage of all processors on the client system Absolutely no modification of code nor of
input parameters is necessary
Implementation: Design and Performance Based on independently developed
high-performance library for long integers and number theory 64 bit library outperforms 64 bit optimized
NTL (a well-respected high-performance library) by more than a factor of 7 for multiplication of 1024 bit integers.
Most arithmetic routines are close to optimal, approaching the theoretical limits of the Intel Core 2 µ-arch
Implementation: Design and Performance Makes use of special purpose
arithmetic algorithms, ideal for the task
Processes documents at ≈ 100KB/sec. (for smaller documents) and ≈ 120KB/sec. (for larger documents) on a 2GHz Intel Core 2 Duo It may be of interest to note that the
original prototype (based on NTL) processed documents at ≈ 1KB/sec.
Up next…
Demonstration Questions