Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
Peer-to-peer archival data trading
Brian Cooper
Joint work with Hector Garcia-Molina
(and others)Stanford University
2 Data trading
Problem: Fragile Data
Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business
3 Data trading
Replication-based preservation
4 Data trading
Replication-based preservation
5 Data trading
Motivation
Several systems use replication Preserve digital collections SAV, others
Archival part of digital library Individual organizations cooperate Not a lot of money to spend
6 Data trading
Goal Reliable replication of digital collections Given that
Resources are limited Sites are autonomous Not all sites are equal
Traditional methods Central control Random Replicate popular
Metric Reliability Not necessarily “efficiency”
7 Data trading
Our solution
Data trading “I’ll store a copy of your collection if you’ll store
a copy of mine” Sites make local decisions
Who to trade with How many copies to make How much space to provide Etc.
8 Data trading
Trading network A series of binary, peer-to-peer trading
links
A
D
B
H
C
E
G
F
9 Data trading
Reliability layer
Archived data
Architecture
Users
Users
Filesystem
InfoMonitor
SAV ArchiveSAV Archive
Archived data
Internet
Local archive
Remote archive
Reliability layer
Service layer
This architecture developed with Arturo Crespo
10 Data trading
Overview
Trading model Trading algorithm Optimizing (and simulating)
trading Some results Some stuff we are still working on
11 Data trading
Trading model
12 Data trading
Trading model Archive site: an autonomous archiving
provider
13 Data trading
Trading model Archive site: an autonomous archiving
provider Digital collection: a set of related digital
materials
14 Data trading
Trading model Archive site: an autonomous archiving
provider Digital collection: a set of related digital
materials Archival storage: stores locally and remotely
owned digital collections
15 Data trading
Trading model Archive site: an autonomous archiving
provider Digital collection: a set of related digital
materials Archival storage: stores locally and remotely
owned digital collections Archiving client: deposit and retrieve
materials
16 Data trading
Trading model Archive site: an autonomous archiving
provider Digital collection: a set of related digital
materials Archival storage: stores locally and remotely
owned digital collections Archiving client: deposit and retrieve
materials Data reliability: probability that data is not
lost
17 Data trading
Deeds
A right to use space at another site Bookkeeping mechanism for trades Used, saved, split, or transferred
Trading algorithm Sites trade deeds Sites exercise deeds to
replicate collections
Deed for spaceFor use by: Library of Congress
or for transfer
623 gigabytes
Stanford University
18 Data trading
C
A B
Deed trading
Collection 1
Collection 1
Collection 2
Collection 2 Collectio
n 3Collection 3
19 Data trading
C
The challenge
A B
Collection 3
Collection 1
Collection 2
Collection 1
Collection 2
Collection 3
20 Data trading
C
The challenge
A B
Collection 3
Collection 1
Collection 2Collection
1
Collection 3 Collection
2
Collection 3
21 Data trading
Alternative solutions
Are there other ways besides trading?
22 Data trading
Other solutions: central control
CA B
Collection 3
Collection 1
Collection 2Collection
1
Collection 3 Collection
2
Collection 3
23 Data trading
Other solutions: client-based
CA B
Collection 3
Collection 1
Collection 2Collection
1
Collection 3 Collection
2
Collection 3
24 Data trading
Other solutions: random
CA B
Collection 3
Collection 1
Collection 2Collection
1
Collection 3 Collection
2
Collection 3
25 Data trading
Why is trading good?
High reliability Framework for replication
Site autonomy Make local decisions No submission to external authority
Fairness Contribute more = more reliability Must contribute resources
A
D
B
H
C
E
G
F
26 Data trading
Decisions facing an archive Who to trade with How much to trade When to ask for a trade Providing space Advertising space Picking a number of copies Coping with varying site reliabilities What to do with acquired resources How to deliver other services
Many many degrees of freedom!
27 Data trading
Our approach Define a basic trading protocol
Deed trading Assume all sites follow same rules
Basic system for trading Extend: not all sites are equal
Some are more reliable or trusted Extend: sites have freedom to negotiate
Bid trading Extend: some sites are malicious
Ensure documents survive despite evildoers For each model, what policies are best?
28 Data trading
How do we evaluate policies?
Trading simulator Generate scenario Simulate trading with different policies Evaluate reliability for each policy Compare each policy
29 Data trading
Simulation parameters
Number of sites 2 to 15
Site reliability 0.5 to 0.8
Collections per site
4 to 25
Data per collection
50 Gb to 1000 Gb
Space per site 2x data to 7x data
Replication goal 2 to 15 copies
Scenarios per simulation
200
30 Data trading
Reliability
Site reliability Will a site fail? Example: 0.9 = 10% chance of failure
Data reliability How safe is the data? Despite site failures Example: 320 year MTTF
31 Data trading
Basic trading approach
How does trading work? Assuming all sites follow “the rules”
Example: advertising policy
“Let’s trade. How much space do you have?”
A B
32 Data trading
Advertising policy
“I have 120 GB”120 GB
Space fractional policy
“I have 60 GB”60 GB
Data proportional policy
“I have 40 GB”
40 GB
40 GBData
A B
A B
A B
33 Data trading
Result
0
0.2
0.4
0.6
0.8
1
1.2
2 3 4 5 6 7
Global FG (storage space as a multiple of data size)
Glo
bal
rel
iab
ilit
y (p
rob
abil
ity
of
no
dat
a lo
ss)
Space-fractional Data-proportional
34 Data trading
Extend: some sites > others May prefer certain sites
More reliable Better reputation Part of same system
Example: who to trade with?
??
?A
35 Data trading
1
10
100
1000
10000
0.5 0.6 0.7 0.8 0.9
Local site reliability
Av
era
ge
loc
al d
ata
MT
TF
Clustering MostReliable ClosestReliability
Who to trade with?
36 Data trading
Extend: freedom to negotiate
Bid for trades
“80 GB”
“95 GB”
“120 GB”
“How much do I pay for 100 GB of your space?”
A
37 Data trading
Bid trading
Questions When do I call auctions? How much do I bid? Can I take advantage of the system
by being clever?
38 Data trading
Extend: some sites are malicious
Secure services Publish: Makes copies to survive failures Search: Find documents Retrieve: Get a copy of a document
Challenges Attacker may delete copy Attacker may provide fake search results Attacker may provide altered document Attacker may disrupt message routing …
Joint work with Mayank Bawa and Neil Daswani
39 Data trading
Current and future work
Access Support searching over collections Distribute indexes via trading
Prototype implementation Basic SAV architecture implemented Trading protocol/policies must be
added Develop security techniques
further
40 Data trading
Current and future work Other topics of interest
Designing peer-to-peer primitives Building other p2p services
Other ways of acquiring data How to archive active systems
Semantic archiving Managing “format obsolescence” Finding data once it is archived
41 Data trading
Other parts of SAV project SAV data model
Write-once objects Signature-based naming
How to get objects into SAV InfoMonitor – filesystem Other inputs (Web, DBMS, etc.)
Modeling archival repositories Arturo Crespo Choose best components and design
42 Data trading
Related work Peer-to-peer replication
SAV, Intermemory, LOCKSS, OceanStore… Fault tolerant systems
RAID, mirrored disks, replicated databases Caching systems (Andrew, Coda) Deep storage (Tivoli)
Barter/auction based systems ContractNet
Distributed resource allocation File Allocation Problem
43 Data trading
Conclusion Important, exciting area
Preservation critical Difficult to accomplish
Many decisions are ad hoc today An effective framework is needed Scientific evaluation of decisions
Trading networks replicate data Model for trading networks Trading algorithm Simulation results
A
D
B
H
C
E
G
F