+ All Categories
Home > Documents > POND: THE OCEANSTORE PROTOTYPE

POND: THE OCEANSTORE PROTOTYPE

Date post: 13-Mar-2016
Category:
Upload: halla-skinner
View: 27 times
Download: 4 times
Share this document with a friend
Description:
POND: THE OCEANSTORE PROTOTYPE. S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley. Key Ideas. Versioning file system Location independent routing Uses hashes instead of addresses Mapping is done through Tapestry Byzantine update commitment - PowerPoint PPT Presentation
Popular Tags:
33
POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley
Transcript
Page 1: POND: THE OCEANSTORE PROTOTYPE

POND:THE OCEANSTORE PROTOTYPE

S. Rea, P. Eaton, D. Geels,H. Weatherspoon, J. Kubiatowicz

U. C. Berkeley

Page 2: POND: THE OCEANSTORE PROTOTYPE

Key Ideas• Versioning file system• Location independent routing

– Uses hashes instead of addresses– Mapping is done through Tapestry

• Byzantine update commitment – By nodes holding primary copies (inner ring)– Proactive threshold signatures allow inner ring

membership updates

Page 3: POND: THE OCEANSTORE PROTOTYPE

Key Ideas

• Push-based update of other copies– Through an overlay multicast network– Copies are not permanent

• Continuous archiving in erasure-coded form– Very reliable– Very slow access

Page 4: POND: THE OCEANSTORE PROTOTYPE

Motivation

• Find a better solution forlong-term management of data

• Enabling trends:– Near universal connectivity through high-

bandwidth links– Very fast increase of disk storage capacity per

unit cost

Page 5: POND: THE OCEANSTORE PROTOTYPE

OceanStore

• Internet-scale cooperative file system• Will provide

– High durability– Universal accessibility

• Will use a two-tiered storage system• Stores data objects

Page 6: POND: THE OCEANSTORE PROTOTYPE

Two-tiered organization

• Upper tier– Powerful , well connected hosts– Serialize changes and archive results

• Lower tier– Less powerful hosts

• Can be user workstations– Provide storage resources

Page 7: POND: THE OCEANSTORE PROTOTYPE

Two-tiered organization

Archive

Primary replica (in inner ring)

Secondary replica

Secondary replica

Secondary replica

Page 8: POND: THE OCEANSTORE PROTOTYPE

Basic requirements

• OceanStore should1. Let information be accessed from

any location2. Balance the tension between privacy and

information sharing3. Offer an easily understandable and usable

model of data consistency4. Guarantee data integrity

Page 9: POND: THE OCEANSTORE PROTOTYPE

First basic assumption

• Infrastructure cannot be trusted , except in aggregate– Host and routers can fail arbitrarily– Must consider

• Passive failures: host snooping, …• Active failures: host injecting malicious

messages, …

Page 10: POND: THE OCEANSTORE PROTOTYPE

Second basic assumption

• Infrastructure is continuously changing– Performance of communication paths varies– Resources enter and leave the network without

warning– System should

• Be self-organizing andself-repairing

• Aim to be self-tuning

Page 11: POND: THE OCEANSTORE PROTOTYPE

The challenge• Build a system that provides

– An expressive user interface– High data availability– High data durability– High data privacy and integrity

atop an untrusted and ever changing base

More ambitious than FARSITE

Page 12: POND: THE OCEANSTORE PROTOTYPE

The data model

• OceanStore data object– Similar to a traditional file – Ordered sequence of read-only versions

• Versioning– Simplifies consistency issues– Allows recovery of previous versions

• Identical blocks are shared among versions

Page 13: POND: THE OCEANSTORE PROTOTYPE

Data object implementation (I)

• Each data object has an AGUID(Active Globally-Unique Identifier)– Secure hash of application-level name and private

key of owner• Each version has a VGUID (Version GUID)

– BGUID of root block of a version• Each block has a BGUID (Block GUID)

– Secure hash of block contents

Page 14: POND: THE OCEANSTORE PROTOTYPE

A data objectAGUID

VGUIDi VGUIDi+1

M M

root block

Indirect blocks

Data blocks

COW

COW

Page 15: POND: THE OCEANSTORE PROTOTYPE

Data object implementation

• AGUID, VGUID and BGUID arelocation-transparent– OceanStore relies on a lower-level service

to map GIDs into addresses

Page 16: POND: THE OCEANSTORE PROTOTYPE

Application-level consistency (I)

• Updating an object means creating a new version

• Updates are– Atomic– Represented as an array of potential actions

each guarded by a predicate

Page 17: POND: THE OCEANSTORE PROTOTYPE

Application-level consistency (II)

• Actions can be– Appending data– Replacing bytes at a specific address

• Predicates can be– Checking the latest version number of the

object– Verifying values of bytes at a specific address

Page 18: POND: THE OCEANSTORE PROTOTYPE

Application-level consistency (II)

• Actions can be– Appending data– Replacing bytes at a specific address

• Predicates can be– Checking the latest version number of the

object– Verifying values of bytes at a specific address

Page 19: POND: THE OCEANSTORE PROTOTYPE

Application-level consistency (III)

• Predicate and action model– Allows to implement multiple level of

consistency • Atomic transactions satisfying ACID

properties for database applications• Weaker consistency for mailboxes

Page 20: POND: THE OCEANSTORE PROTOTYPE

A footnote• ACID properties of atomic transactions mean

that atomic transactions– Are Atomic– Bring the database from one consistent

state to another consistent state– Isolate their partial results until the

transaction is completed– Guarantee the durability of final result

Page 21: POND: THE OCEANSTORE PROTOTYPE

Virtualization through Tapestry

• OceanStore messages are addressed with a GUID• Tapestry forwards these messages to host

containing a resource with that GUID– Fully decentralized service

• Hosts can– Join tapestry by supplying its GUID– Publish the GUIDs of the resources they have

Page 22: POND: THE OCEANSTORE PROTOTYPE

Replication and consistency (I)

• Each object has a single primary replica• Primary replica

– Serializes and applies all updates– Creates a certificate (heartbeat ) mapping

AGUID of object to GUID of its latest version– Controls access to the object– …

Page 23: POND: THE OCEANSTORE PROTOTYPE

Replication and consistency (II)

• Heartbeat contains– An AGUID– A VGUID– A timestamp– A version sequence number

• Getting the most recent version of object means getting its most recent heartbeat

Page 24: POND: THE OCEANSTORE PROTOTYPE

The inner ring

• Small set of co-operating servers that manage primary replicas

• Implement a Byzantine fault-tolerant protocol to– Agree on all updates to an object– Digitally sign the result

Page 25: POND: THE OCEANSTORE PROTOTYPE

Archival storage

• Stores object versions that are not frequently accessed

• Uses erasure codes– Each block

• Partitioned into m fragments• Encoded into n > m fragments

– Any subset of m fragments suffices to reconstitute the block

Page 26: POND: THE OCEANSTORE PROTOTYPE

Caching of data objects

• Retrieving data from archive is slow• OceanStore also maintains of whole blocks

– Secondary replicas• Heartbeats always come from the

primary replica• Updates of secondary replicas are done through

a dissemination tree

Page 27: POND: THE OCEANSTORE PROTOTYPE

Path of an OceanStore updateAp

plic

atio

n

Archive

Primary replica in inner ring

Secondary replica

Secondary replica

Secondary replica

Page 28: POND: THE OCEANSTORE PROTOTYPE

Updating primary replicas (I)

• Use a Byzantine fault-tolerant protocol– Tolerates up to f failures in a system made up

of 3f + 1 hosts• Protocol uses digitally signed messages using

symmetric key message authentication code– Faster than using public keys– Complicates the Byzantine agreement protocol

Page 29: POND: THE OCEANSTORE PROTOTYPE

Updating primary replicas (II)

• Solution was to use – Symmetric keys for all communications within

the inner ring– Public keys to communicate with all other

machines

Page 30: POND: THE OCEANSTORE PROTOTYPE

Proactive threshold signatures

• (listen to lecture)

Page 31: POND: THE OCEANSTORE PROTOTYPE

Prototype software architecture

Network (Java NBIO)Tapestry

Byza

ntin

eag

reem

ent

Inne

r rin

g

Arch

ive

Diss

emin

atio

ntre

e/re

plic

as

Clie

ntin

terfa

ce

Appl

icat

ion

Page 32: POND: THE OCEANSTORE PROTOTYPE

The prototype

• Written in Java

Page 33: POND: THE OCEANSTORE PROTOTYPE

Conclusion


Recommended