Ragib HasanUniversity of Alabama at BirminghamCS 491/691/791 Fall 2011 Lecture 10
09/15/2011
Security and Privacy in Cloud Computing
2
Securing Cloud Computations
09/20/2011 Fall 2011 Lecture 10 | UAB | Ragib Hasan
Goal: Learn about techniques for verifying computations outsourced to a cloud
Review Assignment #5
Du et al., RunTest: Assuring Integrity of Dataflow Processing in Cloud Computing Infrastructures, AsiaCCS 2010
Fall 2011 Lecture 10 | UAB | Ragib Hasan 3
Outsourcing Computations
• Goal? – Outsource a computation by sending the following
to a cloud • A computation (e.g., a (sequence of operations))• Input data
– Get back the final result data set
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 4
Outsourcing Computations: Examples
• Send a large scale image processing job to a cloud
• Analyzing a large scale data set
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 5
Outsourcing Computations: Model
• Dataflow computing is the dominant model• Declares how things connect (unlike
imperative programming, which focuses on how things happen)
• Data objects flow from one node to another, • Each node applies a specific function to data
inputs to produce output data
09/20/2011
6
Verifying Dataflow Computations in a Cloud
09/20/2011 Fall 2011 Lecture 10 | UAB | Ragib Hasan
Scenario User sends her data processing job to the cloud.
Clouds provide dataflow operation as a service (e.g., MapReduce, Hadoop etc.)
Problem: Users have no way of evaluating the correctness of results
Fall 2011 Lecture 10 | UAB | Ragib Hasan 7
Threat Model
• Assets: – Confidentiality of • Input data• Output data• Intermediate data• Functions
– Integrity of computations
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 8
Threat Model
• Attacker:– The cloud provider, or an intruder who controls
part of the cloud– The attacker can (selectively) modify code running
on the inputs, create invalid outputs etc.
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 9
Map Reduce
• Most popular dataflow computing system• Invented by Google and at one time widely
used for indexing webpages and pageranks• Allows large scale reliable computation
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 10
MapReduce Overview
… …
Reduce Phase
DFS…
…
Map Phase
Master
M2
R1
Inpu
t
P1... …Pr
B2…
…Bn
B1
M1
Local WriteRead from
DFS
P1… …Pr
P1… …Pr
Assign
MapTaskAssign ReduceTask
Remote ReadOutput 1
Output r
Write to DFS
… …
Intermediate Result
DFS
10/32
Rr
ReducerMapperMn
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 11
MapReduce: The Map Step
vk
k v
k v
mapvk
vk
…
k vmap
Inputkey-value pairs
Intermediatekey-value pairs
…
k v
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 12
MapReduce: The Reduce Step
k v
…
k v
k v
k v
Intermediatekey-value pairs
group
reduce
reduce
k v
k v
k v
…
k v
…
k v
k v v
v v
Key-value groupsOutput key-value pairs
09/20/2011
Word Count using MapReducemap(key, value):// key: document name; value: text of document
for each word w in value:emit(w, 1)
reduce(key, values):// key: a word; value: an iterator over counts
result = 0for each count v in values:
result += vemit(result)
Fall 2011 Lecture 10 | UAB | Ragib Hasan 14
MapReduce – WordCount Application
Hello World, Bye World!
Hello MapReduce, Goodbye to MapReduce.
Welcome to ACSAC, Goodbye
to ACSAC.
Reduce Phase
DFS Map Phase
Intermediate Result
DFS
M1
M2
M3
(Hello, 1) (Bye, 1)
(World, 1)(World, 1)
(Welcome, 1)(to, 1)(to, 1)
(ACSAC, 1)(Goodbye, 1)(ACSAC, 1)
(Hello, 1)(to, 1)(MapReduce, 1)(Goodbye, 1)(MapReduce, 1)
R1
R2
(Hello, 2) (Bye, 1)(Welcome, 1)(to, 3)
(World, 2)(ACSAC, 2)(Goodbye, 2)(MapReduce, 2)
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 15
Verification in CloudsProblemGiven just the inputs to each node, how to verify the computation done in a cloud
Possible approaches?• Re-computation• Sampling• Replication• Auditing• Attestation• Trusted computing
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 16
Re-computation
• Key idea:– Re-do the computation
• Advantages:– 100% guarantee that any mistakes will always be
detected• Disadvantages:– Worst case cost (a check requires equal time and
same computation cost as the original computation)
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 17
Sampling
• Key idea:– Feed known values in the inputs, check for known
outcomes in the corresponding outputs• Advantages– Efficient
• Disadvantages:– A clever attacker can figure out the test inputs and
be honest for that cycle
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 18
Replication
• Key idea:– Replicate the same computation using multiple set of nodes– Use majority voting to verify correctness
• Advantages:– Computationally faster (same speed since all computations
can run in parallel)• Disadvantages:– Costly, since multiple copies of same computations need to
be run– Can be defeated by a clever adversary
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 19
Auditing
• Key idea:– Have each node sign inputs, what it has done, and
outputs– Later, an auditor can check for correct computation
• Advantages:– Provides non-repudiation– Allows forensic investigation
• Disadvantages:– Adds to computation time due to the crypto– Expensive audits
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 20
Attestation
• Key idea:– Verify a code or path of a computation
• Advantages:– Can ensure that the correct code was run on the
data
• Disadvantages:– Expensive to compute
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 21
Trusted Computing
• Key idea:– Ensure that the cloud nodes are using trustworthy
configuration and software• Advantages
• Disadvantages
09/20/2011
Fall 2011 Lecture 10 | UAB | Ragib Hasan 22
Summary
• Verifying computations is difficult
• Provably secure approaches are often very computation-intensive, and therefore not practical
09/20/2011