Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

Post on 15-Jan-2016

224 views 0 download

Tags:

transcript

Peer-to-peer archival data trading

Brian Cooper and Hector Garcia-Molina

Stanford University

2 Data trading

Problem: Fragile Data

Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business

3 Data trading

Replication-based preservation

4 Data trading

Replication-based preservation

5 Data trading

Motivation

Several systems use replication Preserve digital collections SAV, others

Archival part of digital library Individual organizations cooperate Not a lot of money to spend

6 Data trading

Goal Reliable replication of digital collections Given that

Resources are limited Sites are autonomous Not all sites are equal

Traditional methods Central control Random Replicate popular

Metric Reliability Not necessarily “efficiency”

7 Data trading

Our solution

Data trading “I’ll store a copy of your collection if you’ll store

a copy of mine” Sites make local decisions

Who to trade with How many copies to make How much space to provide Etc.

8 Data trading

Trading network A series of binary, peer-to-peer trading

links

A

D

B

H

C

E

G

F

9 Data trading

Reliability layer

Archived data

Architecture

Users Users

Filesystem

InfoMonitor

SAV ArchiveSAV Archive

Archived data

Internet

Local archive

Remote archive

Reliability layer

10 Data trading

Overview

Trading model Trading algorithm Simulating trading Simulation results

11 Data trading

Trading model

12 Data trading

Trading model Archive site: an autonomous archiving

provider

13 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials

14 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections

15 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections Archiving client: deposit and retrieve

materials

16 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections Archiving client: deposit and retrieve

materials Data reliability: probability that data is not

lost

17 Data trading

Deeds

A right to use space at another site Bookkeeping mechanism for trades Used, saved, split, or transferred

Trading algorithm Sites trade deeds Sites exercise deeds to

replicate collections

Deed for spaceFor use by: Library of Congress

or for transfer

623 gigabytes

Stanford University

18 Data trading

C

A B

Deed trading

Collection 1

Collection 1

Collection 2

Collection 2 Collectio

n 3Collection 3

19 Data trading

C

The challenge

A B

Collection 3

Collection 1

Collection 2

Collection 1

Collection 2

Collection 3

20 Data trading

C

The challenge

A B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

21 Data trading

Alternative solutions

Are there other ways besides trading?

22 Data trading

Other solutions: central control

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

23 Data trading

Other solutions: client-based

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

24 Data trading

Other solutions: random

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

25 Data trading

Why is trading good?

High reliability Framework for replication

Site autonomy Make local decisions No submission to external authority

Fairness Contribute more = more reliability Must contribute resources

A

D

B

H

C

E

G

F

26 Data trading

Decisions facing an archive

Who to trade with Providing space Advertising space Picking a number of copies Joining a cluster Coping with varying site

reliabilities

27 Data trading

How do we evaluate policies?

Trading simulator Generate scenario Simulate trading with different policies Evaluate reliability for each policy Compare each policy

28 Data trading

Simulation parameters

Number of sites 2 to 15

Site reliability 0.5 to 0.8

Collections per site

4 to 25

Data per collection

50 Gb to 1000 Gb

Space per site 2x data to 7x data

Replication goal 2 to 15 copies

Scenarios per simulation

200

29 Data trading

Reliability

Site reliability Will a site fail? Example: 0.9 = 10% chance of failure

Data reliability How safe is the data? Despite site failures Example: 320 year MTTF

30 Data trading

Example: trading strategy

Who should we try to trade with? The most reliable sites? Sites with reliability close to ours? The sites we have traded with before? Some other policy (like random)?

31 Data trading

1

10

100

1000

10000

0.5 0.6 0.7 0.8 0.9

Local site reliability

Av

era

ge

loc

al d

ata

MT

TF

Clustering MostReliable ClosestReliability

Example: trading strategy

R=0.8

35 Data trading

Results

Clusters of sites?

Social or political clusters E.g. all universities within a particular state Is the cluster big enough? What if it isn’t?

Result A few archives are sufficient E.g. 5 archives to make 3 copies Too many sites is counter-productive

36 Data trading

Trading clusters

39 Data trading

Current and future work Bidding versus direct trading

Local site holds an auction Bids = size of local site’s deed

“Deviant” sites Greedy sites Follow protocol but do not play nice

Access Support searching over collections Distribute indexes via trading

40 Data trading

Current and future work

Security Will sites actually preserve data? Will they give it to others? Can I protect sensitive information? What if I fail and lose my keys? Can I authenticate myself?

41 Data trading

Other parts of SAV project SAV data model

Write-once objects Signature-based naming

How to get objects into SAV InfoMonitor – filesystem Other inputs (Web, DBMS, etc.)

Modeling archival repositories Arturo Crespo Choose best components and design

42 Data trading

Related work Peer-to-peer replication

SAV, Intermemory, LOCKSS, OceanStore… Fault tolerant systems

RAID, mirrored disks, replicated databases

Caching systems (Andrew, Coda) Barter/auction based systems

ContractNet Distributed resource allocation

File Allocation Problem

43 Data trading

Conclusion Important, exciting area

Preservation critical Difficult to accomplish

Many decisions are ad hoc today An effective framework is needed Scientific evaluation of decisions

Trading networks replicate data Model for trading networks Trading algorithm Simulation results

A

D

B

H

C

E

G

F

44 Data trading

For more information

cooperb@stanford.edu http://www-diglib.stanford.edu/