1p2p, Fall 06
Topics in Database Systems: Data Management in Peer-to-Peer Systems
PART 1: Replication and other issues
2p2p, Fall 06
Agenda για σήμερα
1. Περιγραφή των εργασιών του μαθήματος2. Γενικά για Replication3. Replication Theory for Unstructured (Cohen et al paper)4. Epidemic Algorithms for Updates (Demers et al paper)
3p2p, Fall 06
Term Projects
Εργασίες τριών τύπων
Έχουν κάποιο «ερευνητικό» χαρακτήρα – χρειάζεται να σκεφτείτε
Δεν υπάρχει μία λύση (άρα την ίδια εργασία παραπάνω από μια ομάδες)
Θα ήθελα 3 άτομα ανά ομάδα
Αν έχετε κάποια άλλη ιδέα – γίνεται αλλά όχι «αυτόματα»
Θα φτιάξετε μια web σελίδα για το project – την οποία θα μου στείλετε replicate content and not index (for durability)!!!
4p2p, Fall 06
Term Projects
ΕΡΓΑΣΙΑ ΤΥΠΟΥ I ================= Θα επιλέξετε ένα άρθρο από μια λίστα από άρθρα Τα άρθρα αφορούν προβλήματα διαχείρισης δεδομένων είτε σε κεντρικοποιημένα συστήματα είτε σε κατανεμημένα συστήματα χωρίς τις ιδιότητες των συστημάτων ομοτίμων. Στόχος της εργασίας είναι η σχεδίαση μια εκδοχής του προβλήματος κατάλληλης για ένα σύστημα ομοτίμων κόμβων. Η εργασία σας θα πρέπει να περιέχει μια μορφή αξιολόγησης της προσέγγισής σας. Αυτή μπορεί να είναι θεωρητική (πχ, εκτίμηση πολυπλοκότητας της λύσης, απόδειξη της ορθότητας ή άλλων ιδιοτήτων (πχ εξισορρόπιση φορτίου) της λύσης) ή/και να περιλαμβάνει μια μικρή υλοποίηση. Θα παραδώσετε ένα άρθρο που θα έχει την μορφή ερευνητικής εργασίας (θα δοθούν οδηγίες). Επίσης, θα παρουσιάσετε την εργασία σας στο μάθημα (θα δοθούν οδηγίες).
5p2p, Fall 06
Term Projects
Άρθρα για τις Εργασίες Τύπου Ι[1-3] Διαλέξτε οποιοδήποτε (ένα) από τα sections 3, 4 ή 5 από το: M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell C. Staelin and A. Yu. Mariposa: A Wide-Area Distributed Database System. VLDB J., 5(1), 1996, 48-63.
[4] Εξετάστε πως το παρακάτω που συζητήσαμε στο μάθημα μπορεί να προσαρμοστεί για p2p: A. J. Demers, D. H. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. E. Sturgis, D. C. Swinehart, D. B. Terry: Epidemic Algorithms for Replicated Database Maintenance. PODC 1987: 1-12
[5] Θεωρείστε μια κατανεμημένη (p2p) εκδοχή ενός bitmap index. Για τα bitmap indexes μπορείτε να συμβουλευτείτε οποιοδήποτε βιβλίο βάσεων δεδομένων ή/και το παρακάτω P. E. O'Neil and D. Quass. Improved Query Performance with Variant Indexes. Proc. SIGMOD Conference, 1997, 38-49.
[6] Εξετάστε πως το παρακάτω που αφορά sensor networks μπορεί να εφαρμοστεί σε p2p συστήματα D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras, M. Vlachos, N. Koudas, D. Srivastava The Threshold Join Algorithm for Top-k Queries in Distributed Sensor Networks, DMSN Workshop, 2005.
6p2p, Fall 06
Term ProjectsΕΡΓΑΣΙΑ ΤΥΠΟΥ ΙΙ =================== Θα επιλέξετε ένα άρθρο που αφορά θέματα της περιοχής των συστημάτων ομότιμων κόμβων που δεν έχουμε καλύψει στο μάθημα, συγκεκριμένα: (i) security, (ii) trust/reputation, (iii) incentives, (iv) publish-subscribe συστήματα.
Παρουσίαση του άρθρου στο μάθημα.
(α) προτείνετε κάποια επέκταση του άρθρου, πχ εφαρμογή του σε άλλο τύπο overlay, βελτίωση κάποιου χαρακτηριστικού του κλπ. Σε αυτήν την περίπτωση, θα πρέπει να συμπεριλάβετε και κάποια μορφή αξιολόγησης της επέκτασης. Αυτή μπορεί να είναι θεωρητική (πχ, εκτίμηση πολυπλοκότητας της λύσης κλπ) ή/και να περιλαμβάνει μια μικρή υλοποίηση, είτε (β) να υλοποιήσετε ένα ικανοποιητικό κομμάτι του άρθρου.
Θα παραδώσετε ένα άρθρο που θα έχει την μορφή ερευνητικής εργασίας (θα δοθούν οδηγίες).
Επίσης, θα δώσετε μια δεύτερη παρουσίαση στο μάθημα αυτή τη φορά της εργασία σας (θα δοθούν οδηγίες).
7p2p, Fall 06
Term Projects
Άρθρα για τις Εργασίες Τύπου ΙI
• Security E. Sit and R. Morris: Security Considerations for Peer-to-Peer Distributed Hash Tables. IPTPS 2002: 261-269 D. S. Wallach: A Survey of Peer-to-Peer Security Issues. ISSS 2002: 42-57
• Incentives M. Feldman, K. Lai, I. Stoica and J. Chuang: Robust incentive techniques for peer-to-peer networks. ACM Conference on Electronic Commerce 2004: 102-111
• Trust/Reputation S. D. Kamvar, M. T. Schlosser, H. Garcia-Molina: The Eigentrust algorithm for reputation management in P2P networks. WWW 2003: 640-651
• Publish/subscribe M. Bender, S. Michel, S. Parkitny, and G. Weikum A Comparative Study of Pub/Sub Methods in Structured P2P Networks. DBISP2P 2006, Seoul, South Korea, Springer, 2006
8p2p, Fall 06
Term Projects
ΕΡΓΑΣΙΑ ΤΥΠΟΥ ΙIΙ ===================
Θα επιλέξετε ένα από τα συστήματα που αφορούν λογισμικό συστημάτων ομότιμων κόμβων.
Θα πρέπει να εγκαταστήσετε το σχετικό λογισμικό και να κατασκευάσετε μια μικρή εφαρμογή.
Θα παραδώσετε ένα άρθρο που θα περιλαμβάνει ένα σύντομο εγχειρίδιο για το σύστημα και μια περιγραφή της εφαρμογή σας.
Επίσης, θα παρουσιάσετε την εργασία σας στο μάθημα (θα δοθούν οδηγίες). Η παρουσίαση θα πρέπει να περιλαμβάνει και ένα σύντομο demo.
9p2p, Fall 06
Term Projects
Τα Συστήματα για τις Εργασίες Τύπου IΙI
[1] OpenDHT OpenDHT is a publicly accessible distributed hash table (DHT) service.
[2] P2: Declarative Networking: P2 is a system which uses a high-level declarative language to express overlay networks in a highly compact and reusable form
[3] PeerSim: PeerSim is a simulation environment for P2P protocols in java.
10p2p, Fall 06
Term Projects
Προθεσμίες:
Δεκ 7: Σχηματισμός ομάδων και επιλογή εργασίας Δεκ 14: 1-2 σελίδες "πρόταση εργασίας" (project proposal) (θα δοδούν οδηγίες) Δεκ 21: πιθανών να έχουμε μια μικρή παρουσίαση/συζήτηση των εργασιών την τελευταία εβδομάδα πριν τα Χριστούγεννα Ιαν 11: Παρουσιάσεις άρθρων Ομάδας ΙΙ Ιαν 18: " "
Ιαν 25: Παράδοση Εργασίας (για το άρθρο, θα δοθούν οδηγίες)
Θα υπάρχει ένα τελικό workshop που θα παρουσιαστούν οι εργασίες όλων των ομάδων.
11p2p, Fall 06
Agenda για σήμερα
1. Περιγραφή των εργασιών του μαθήματος
2. Γενικά για Replication
3. Replication Theory for Unstructured (Cohen et al paper)4. Epidemic Algorithms for Updates (Demers et al paper)
12p2p, Fall 06
Types of Replication
Two types of replication Metadata/Index: replicate index entries
Data/Document replication: replicate the actual data (e.g., music files)
Metadata vs Data(+) “Lighter” storage and bandwidth wise(+) Sizes of replicated objects more uniform(-) Adds an extra hop for actually getting the data(-) More frequent updates(-) Less durability/availability
13p2p, Fall 06
Types of Replication
Caching vs Replication
Cache: Store data retrieved from a previous request (client-initiated)
Replication: More proactive, a copy of a data item may be stored at a node even if the node has not requested it
14p2p, Fall 06
Reasons for Replication
Reasons for replication
Performanceload balancing locality: place copies close to the requestor
geographic locality (more choices for the next step in search)
reduce number of hops
AvailabilityIn case of failuresPeer departures
15p2p, Fall 06
Reasons for Replication
Besides storage, cost associated with replication: Consistency Maintenance
Make reads faster in the expense of slower writes
16p2p, Fall 06
• No proactive replication (Gnutella)– Hosts store and serve only what they requested– A copy can be found only by probing a host with a copy
• Proactive replication of “keys” (= meta data + pointer) for search efficiency (FastTrack, DHTs)
• Proactive replication of “copies” – for search and download efficiency, anonymity. (Freenet)
17p2p, Fall 06
Issues
Which items (data/metadata) to replicate
Based on popularityIn traditional distributed systems, also rate of read/write
cost benefit:the ratio: read-savings/write-increase
Where to replicate (allocation schema)
18p2p, Fall 06
Issues
How/When to update
Both data items and metadata
19p2p, Fall 06
“Database-Flavored” Replication Control Protocols
Lets assume the existence of a data item x with copies x1, x2, …, xn
x: logical data itemxi’s: physical data items
A replication control protocol is responsible for mapping each read/write on a logical data item (R(x)/W(x)) to a set of read/writes on a (possibly) proper subset of the physical data item copies of x
20p2p, Fall 06
One Copy Serializability
Correctness
A DBMS for a replicated database should behave like a DBMS managing a one-copy (i.e., non-replicated) database insofar as users can tell
One-copy serializable (1SR)the schedule of transactions on a replicated database be equivalent to a serial execution of those transactions on a one-copy database
One-copy schedule: replace operation of data copies with operations on data items
21p2p, Fall 06
ROWA
Read One/Write All (ROWA)A replication control protocol that maps each read to only one copy of the item and each write to a set of writes on all physical data item copies.
Even if one of the copies is unavailable an update transaction cannot terminate
22p2p, Fall 06
Write-All-Available
Write-all-available A replication control protocol that maps each read to only one copy of the item and each write to a set of writes on all available physical data item copies.
23p2p, Fall 06
Quorum-Based Voting
Read quorum Vr and a write quorum Vw to read or write a data item
If a given data item has a total of V votes, the quorums have to obey the following rules:
1. Vr + Vw > V2. Vw > V/2
Rule 1 ensures that a data item is not read or written by two transactions concurrently (R/W)Rule 2 ensures that two write operations from two transactions cannot occur concurrently on the same data item (W/W)
24p2p, Fall 06
Distributing Writes
Immediate writes
Deffered writesAccess only one copy of the data item, it delays the distribution of writes to other sites until the transaction has terminated and is ready to commit.It maintains an intention list of deferred updatesAfter the transaction terminates, it send the appropriate portion of the intention list to each site that contains replicated copiesOptimizations – aborts cost less – may delay commitment – delays the detection of conflicts
Primary or master copyUpdates at a single copy per item
25p2p, Fall 06
Eager vs Lazy Replication
Eager replication: keeps all replicas synchronized by updating all replicas in a single transaction
Lazy replication: asynchronously propagate replica updates to other nodes after the replicating transaction commits
In p2p, lazy replication (or soft state)
26p2p, Fall 06
Update Propagation
Stateless or State-full (the “item-owners” know which nodes holds copies of the item)
Who initiates the update: Push by the server item (copy) that changes Pull by the client holding the copy
27p2p, Fall 06
Update Propagation
When Periodic Immediate Lazy: when an inconsistency is detected Threshold-based: Freshness (e.g., number of updates or actual time)
Value Expiration-Time: Items expire (become invalid) after that time (most often used in p2p) Adaptive periodic:
Reduce or increase period based on the updates seen between two successive updates
Stateless or State-full (the “item-owners” know which nodes holds copies of the item)
28p2p, Fall 06
Path-length
Neighbor state
Total path latency
Per-hop latency volume Multiple
routesreplica
sDimension
s (d) O(dn1/d) O(d) - - -
Realities (r) O(r) - O(r) O(r)
MAXPEERS (p) O(1/p) O(p) O(p)* O(p)*
Hash functions
(k)- - - Ο(k) - O(k)
RTT-weighted routing
- - - - -
Uniform partitioning heuristic
Reduced variance
Reduces variance - - Reduced
variance - -
Summary: Design parameters and performance (CAN)
* Only on replicated data
29p2p, Fall 06
ReplicationEach node maintain a successor list of its r nearest successors
Upon failure, use the next successor in the list Modify stabilize to fix the list
CHORD: Failures
Other nodes may attempt to send requests through the failed node Use alternate nodes found in the routing table of preceding nodes or in the successor list
30p2p, Fall 06
A lookup fails, if all r nodes in the successor list fail. All fail with probability 2-r (independent failures) = 1/N
Theorem: If we use a successor list with r = Ο(logN) in an initially stable network and then every node fails with probability 1/2, then
with high probability, find_successor returns the closest living successor
the expected time to execute find_successor in the failed network is O(logN)
CHORD: Failures
31p2p, Fall 06
Store replicas of a key at the k nodes succeeding the key
Successor list helps to keep the number of replicas per item known
Other approach: store a copy per region
CHORD: Replication
32p2p, Fall 06
BATON: Failures
Upon node departure or failure, the parent can reconstruct the entries
Assume node x fails, any detected failures of x are reported to its parent y
y regenerates the routing tables of x – Theorem 2
Messages are routed Sideways (redundancy similar to CHORD) Up-down (can find its parent through its neighbors)
There is “routing” redundancy
33p2p, Fall 06
Replication - Beehive
Proactive – model-driven replication Passive (demand-driven) replication such as caching objects along a lookup path
Hint for BATONBeehiveThe length of the average query path reduced by one when an object is proactively replicated at all nodes logically preceding that node on all query pathsBATONRange queriesMany paths to data
Any ideas?
34p2p, Fall 06
Agenda για σήμερα
1. Περιγραφή των εργασιών του μαθήματος2. Γενικά για Replication
3. Replication Theory for Unstructured (Cohen et al paper)
4. Epidemic Algorithms for Updates (Demers et al paper)
35p2p, Fall 06
Replication Theory: Replica Allocation Policies in Unstructured P2P Systems
E. Cohen and S. Shenker, “Replication Strategies in Unstructured Peer-to-Peer Networks”. SIGCOMM 2002 Q. Lv et al, “Search and Replication in Unstructured Peer-to-Peer Networks”, ICS’02 – Replication Part
36p2p, Fall 06
Question: how to use replication to improve search efficiency in unstructured networks?
Replication: Allocation Scheme
How many copies of each object so that the search overhead for the object is minimized, assuming that the total amount of storage for objects in the network is fixed
37p2p, Fall 06
Replication Theory - Model
Assume m objects and n nodesEach node capacity ρ, total capacity R = n ρ
How to allocate R among the m objects? Determine ri number of copies (distinct nodes) that hold a copy of i
Σ i=1, m ri = R (R total capacity)Also, pi = ri/R – Fraction of total capacity allocated to I
Allocation represented by the vector
(p1, p2, …. pm) = (r1/R, r2/R, rm/R)
38p2p, Fall 06
Replication Theory - Model
Assume that object i is requested with relative rates qi, we normalize it by setting
Σ i=1, m qi = 1
For convenience, assume 1 << ri n and that q1 q2 … qm
Map the query distribution q to an allocation vector p
39p2p, Fall 06
Replication Theory - Model
Assume all nodes equal capacity ρ, ρ = R/n
R m (at least one copy per item)m > ρ (else, the problem is trivial, maintain copies of all items everywhere)
Bounds for pi
At least one copy, ri 1, Lower value l = 1/R At most n copies, ri n, Upper value, u = n/R
40p2p, Fall 06
Replication Theory
Assume that searches go on until a copy is found
We want to determine ri that minimizes the average search size (number of nodes probed) to locate an item i
Need to compute average search size per item
Searches consist of randomly probing sites until the desired object is found: search at each step draws a node uniformly at random and asks whether it has a copy
41p2p, Fall 06
Search Example
2 probes 4 probes
42p2p, Fall 06
Replication Theory
The probability Pr(k) that the object I is found at the k’th probe is given
Pr(k) = Pr(not found in the previous k-1 probes) Pr(found in one (the kth)
probe) =
(1 – ri/n)k-1 * ri/n
k (search size: step at which the item is found) is a random variable with geometric distribution and θ = ri/n =>
expectation n/ri
43p2p, Fall 06
Replication Theory
Ai: Expectation (average search size) for object i is the inverse of the fraction of sites that have replicas of the object
Ai = n/ri
The average search size A of all the objects (average number of nodes probed per object query)
A = Σi qi Ai = n Σi qi/ri
Minimize: A = n Σi qi/ri
44p2p, Fall 06
Replication Theory
If we have no limit on ri, replicate everything everywhereThen, the average search size
Ai = n/ri = 1
Search becomes trivial
How to allocate these R replicas among the m objects: how many replicas per object
Assume a limit on R and that the average number of replicas per site ρ = R/n is fixed
45p2p, Fall 06
Replication Theory
Minimize: Σi qi/pi
Subject to Σpi = 1 and l pi u
MonotonicitySince q1 q2 … qm, we must have
p1 p2 … pm
More copies to more popular, but how many?
46p2p, Fall 06
Uniform Replication
Create the same number of replicas for each objectri = R/m
Average search size for uniform replicationAi = n/ri = m/ρ
Auniform = Σi qi m/ρ = m/ρ (m n/R)
Which is independent of the query distribution
47p2p, Fall 06
Proportional Replication
Create a number of replicas for each object proportional to the query rate
ri = R qi
It makes sense to allocate more copies to objects that are frequently queried, this should reduce the search size for the more popular objects
48p2p, Fall 06
Proportional Replication
Number of replicas for each object:ri = R qi
Average search size for uniform replicationAi = n/ri = n/R qi
Aproportioanl = Σi qi n/R qi = m/ρ = Auniform
again independent of the query distribution
Why? Objects whose query rate are greater than average (>1/m) do better with proportional, and the other do better with uniform
The weighted average balances out to be the same
49p2p, Fall 06
Uniform and Proportional Replication
Summary:• Uniform Allocation: pi = 1/m
•Simple, resources are divided equally
• Proportional Allocation: pi = qi•“Fair”, resources per item proportional to demand• Reflects current P2P practices
Example: 3 items, q1=1/2, q2=1/3, q3=1/6Uniform Proportional
50p2p, Fall 06
Space of Possible Allocations
q i+1/q i ? p i+1/p iAs the query rate decreases, how much does the ratio of allocated replicas behave
Reasonable:p i+1/p i 1
=1 for uniform
So what is the optimal way to allocate replicas so that A is minimized?
51p2p, Fall 06
Space of Possible Allocations
Definition: Allocation p1, p2, p3,…, pm is “in-between” Uniform and Proportional if
for 1< i <m, q i+1/q i < p i+1/p i < 1
(=1 for uniform, = for proportial, we want to favor popular but not too much)
Theorem1: All (strictly) in-between strategies are (strictly) better than Uniform and Proportional
Theorem2: p is worse than Uniform/Proportional if for all i, p i+1/p i > 1 (more popular gets less) OR for all i, q i+1/q i > p i+1/p i (less popular gets less than “fair share”)
Proportional and Uniform are the worst “reasonable” strategies
52p2p, Fall 06
q2/q1
p 2/p
1Space of allocations on 2 items
Worse than prop/uniMore popular item gets less.
Worse than prop/uni
More popular gets more thanits proportional share
Better than prop/uni
Uniform
Proportional
SR
53p2p, Fall 06
So, what is the best strategy?
54p2p, Fall 06
Square-Root Replication
Find ri that minimizes A,
A = Σi qi Ai = n Σi qi/ri
This is done for ri = λ √qi where λ = R/Σi √qi
Then the average search size isAoptimal = 1/ρ (Σi √qi)2
55p2p, Fall 06
How much can we gain by using SR ?w
i iq Zipf-like query rates
Auniform/ASR
56p2p, Fall 06
Other Metrics: Discussion
Utilization rate, the rate of requests that a replica of an object i receives
Ui = R qi/ri
For uniform replication, all objects have the same average search size, but replicas have utilization rates proportional to their query rates
Proportional replication achieves perfect load balancing with all replicas having the same utilization rate, but average search sizes vary with more popular objects having smaller average search sizes than less popular ones
57p2p, Fall 06
Replication: Summary
58p2p, Fall 06
Pareto Distribution (for the queries)
59p2p, Fall 06
Pareto Distribution (for the queries)
Both model Power-law distributions
Zipf: what is the size (popularity) of the r-th ranked -- y ~ r-b Pareto: how many have size > r (look at the frequency distribution)P[X > x] ~ x-k
P[X = x] ~ x-(k+1) = x-a
"The r-th hottest item has n queries" is equivalent to saying "r items have n or more queries". This is exactly the definition of the Pareto distribution, except the x and y axes are flipped. Whereas for Zipf, we have r (rank) and compute n, in Pareto we have n and compute r (rank) Reference: http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html
60p2p, Fall 06
Replication (summary)
Each object i is replicated on ri nodes and the total number of objects stored is R, that is
Σ i=1, m ri = R(1) Uniform: All objects are replicated at the same number
of nodesri = R/m
(2) Proportional: The replication of an object is proportional to the query probability of the object
ri qi
(3) Square-root: The replication of an object i is proportional to the square root of its query probability qi
ri √qi
61p2p, Fall 06
What is the search size of a query ?
Soluble queries: number of probes until answer is found.
Insoluble queries: maximum search size
Query is soluble if there are sufficiently many copies of the item.
Query is insoluble if item is rare or non existent.
Assumption that there is at least one copy per object
62p2p, Fall 06
• SR is best for soluble queries
• Uniform minimizes cost of insoluble queries
OPT is a hybrid of Uniform and SR
Tuned to balance cost of soluble and insoluble queries: uniformly allocate a minimum number of copies per item, use SR for the rest
What is the optimal strategy?
63p2p, Fall 06
UniformSR
10^4 items, Zipf-like w=1.5
All Soluble
85% Soluble
All Insoluble
64p2p, Fall 06
We now know what we need.
How do we get there?
65p2p, Fall 06
Replication Algorithms
• Fully distributed where peers communicate through random probes; minimal bookkeeping; and no more communication than what is needed for search.
• Converge to/obtain SR allocation when query rates remain steady.
Uniform and Proportional are “easy”– Uniform: When item is created, replicate its key in a fixed
number of hosts.– Proportional: for each query, replicate the key in a fixed
number of hosts (need to know or estimate the query rate)
Desired properties of algorithm:
66p2p, Fall 06
Replication - Implementation
Two strategies are popular
Owner ReplicationWhen a search is successful, the object is stored at the requestor node only (used in Gnutella)
Path ReplicationWhen a search succeeds, the object is stored at all nodes along the path from the requestor node to the provider node (used in Freenet)Following the reverse path back to the requestor
67p2p, Fall 06
Achieving Square-Root Replication
How can we achieve square-root replication in practice?
Assume that each query keeps track of the search size Each time a query is finished the object is copied to a number of sites proportional to the number of probes
On average object i will be replicated on c n/ri times each time a query is issued (for some constant c)
It can be shown that this gives square root
68p2p, Fall 06
Replication - Conclusion
Thus, for Square-root replication
an object should be replicated at a number of nodes that is proportional to the number of probes that the search required
69p2p, Fall 06
Replication - Implementation
If a p2p system uses k-walkers, the number of nodes between the requestor and the provider node is 1/k of the total nodes visited (number of probes)
Then, path replication should result in square-root replication
Problem: Tends to replicate nodes that are topologically along the same path
70p2p, Fall 06
Replication - Implementation
Random ReplicationWhen a search succeeds, we count the number of nodes on the path between the requestor and the providerSay pThen, randomly pick p of the nodes that the k walkers visited to replicate the object
Harder to implement
71p2p, Fall 06
Achieving Square-Root Replication
What about replica deletion?Steady state: creation time equal with the deletion time
The lifetime of replicas must be independent of object identity or query rateFIFO or random deletions is okLRU or LFU no
72p2p, Fall 06
Replication: EvaluationStudy the three replication strategies in the Random graph network topologySimulation Details• Place the m distinct objects randomly into the network• Query generator generates queries according to a Poisson process at 5 queries/sec• Zipf-distribution of queries among the m objects (with a = 1.2)• For each query, the initiator is chosen randomly• Then a 32-walker random walk with state keeping and checking every 4 steps• Each sites stores at most objAllow (40) objects• Random Deletion• Warm-up period of 10,000 secs• Snapshots every 2,000 query chunks
73p2p, Fall 06
Replication: Evaluation
For each replication strategy
What kind of replication ratio distribution does the strategy generate?
What is the average number of messages per node in a system using the strategy
What is the distribution of number of hops in a system using the strategy
74p2p, Fall 06
Evaluation: Replication Ratio
Both path and random replication generates replication ratios quite close to square-root of query rates
75p2p, Fall 06
Evaluation: Messages
Path replication and random replication reduces the overall message traffic by a factor of 3 to 4
76p2p, Fall 06
Evaluation: Hops
Much of the traffic reduction comes from reducing the number of hops
Path and random, better than ownerFor example, queries that finish with 4 hops, 71% owner, 86% path, 91% random
77p2p, Fall 06
Summary
• Random Search/replication Model: probes to “random” hosts
• Proportional allocation – current practice• Uniform allocation – best for insoluble queries
• Soluble queries: • Proportional and Uniform allocations are two
extremes with same average performance• Square-Root allocation minimizes Average
Search Size
• OPT (all queries) lies between SR and Uniform• SR/OPT allocation can be realized by simple
algorithms.
78p2p, Fall 06
Discussion
Cohen et al paperPath replication overshoots or undershoot the fixed point if queries arrive in large bursts or time between search and subsequent copy generator is large – more involved algorithms than path replication
Extensions for variable size issues or nodes with heterogeneous capacities
Many issues:Other types of graphs, adaptability, etc …
79p2p, Fall 06
Agenda για σήμερα
1. Περιγραφή των εργασιών του μαθήματος2. Γενικά για Replication3. Replication Theory for Unstructured (Cohen et al paper)
4. Epidemic Algorithms for Updates (Demers et al paper)
80p2p, Fall 06
Replication & Unstructured P2P
epidemic algorithms
81p2p, Fall 06
Replication Policy How many copies Where (owner, path, random path)
Update Policy Synchronous vs Asynchronous Master Copy
82p2p, Fall 06
Methods for spreading updates:
Push: originate from the site where the update appeared To reach the sites that hold copies
Pull: the sites holding copies contact the master siteExpiration times
Epidemics for spreading updates
83p2p, Fall 06
Update at a single site
Randomized algorithms for distributing updates and driving replicas towards consistency
Ensure that the effect of every update is eventually reflected to all replicas:Sites become fully consistent only when all updating activity has stopped and the system has become quiescent
Analogous to epidemics
A. Demers et al, Epidemic Algorithms for Replicated Database Maintenance, SOSP 87
84p2p, Fall 06
Methods for spreading updates:
Direct mail: each new update is immediately mailed from its originating site to all other sites
(+) Timely & reasonably efficient(-) Not all sites know all other sites (stateless)(-) Mails may be lost
Anti-entropy: every site regularly chooses another site at random and by exchanging content resolves any differences between them
(+) Extremely reliable but requires exchanging content and resolving updates(-) Propagates updates much more slowly than direct mail
85p2p, Fall 06
Methods for spreading updates:
Rumor mongering: Sites are initially “ignorant”; when a site receives a new update it becomes a “hot rumor” While a site holds a hot rumor, it periodically chooses another site at random and ensures that the other site has seen the update
When a site has tried to share a hot rumor with too many sites that have already seen it, the site stops treating the rumor as hot and retains the update without propagating it further
Rumor cycles can be more frequent that anti-entropy cycles, because they require fewer resources at each site, but there is a chance that an update will not reach all sites
86p2p, Fall 06
Anti-entropy and rumor spreading are examples of epidemic algorithms
Three types of sites: Infective: A site that holds an update that is willing to share is hold Susceptible: A site that has not yet received an update Removed: A site that has received an update but is no longer willing to share
Anti-entropy: simple epidemic where all sites are always either infective or susceptible
87p2p, Fall 06
A set S of n sites, each storing a copy of a database
The database copy at site s S is a time varying partial function s.ValueOf: K {u:V x t :T}
set of keys set of values set of timestamps (totally ordered by <
V contains the element NILs.ValueOf[k] = {NIL, t}: item with k has been deleted from the database
Assume, just one items.ValueOf {u:V x t:T}thus, an ordered pair consisting of a value and a timestampThe first component may be NIL indicating that the item was deleted by the time indicated by the second component
88p2p, Fall 06
The goal of the update distribution process is to drive the system towards
s, s’ S: s.ValueOf = s’.ValueOf
Operation invoked to update the database
Update[u:V] s.ValueOf {r, Now{})
89p2p, Fall 06
Direct Mail
At the site s where an update occurs:For each s’ S
PostMail[to:s’, msg(“Update”, s.ValueOf)
Each site s’ receiving the update message: (“Update”, (u, t))
If s’.ValueOf.t < t
s’.ValueOf (u, t)
The complete set S must be known to s (stateful server) PostMail messages are queued so that the server is not delayed (asynchronous), but may fail when queues overflow or their destination are inaccessible for a long time n (number of sites) messages per update traffic proportional to n and the average distance between sites
s originator of the updates’ receiver of the update
90p2p, Fall 06
Anti-Entropy
At each site s periodically execute:For some s’ S
ResolveDifference[s, s’]
Three ways to execute ResolveDifference:Push (sender (server) - driven)
If s.Valueof.t > s’.Valueof.t
s’.ValueOf s.ValueOf
Pull (receiver (client) – driven)If s.Valueof.t < s’.Valueof.t
s.ValueOf s’.ValueOf
Push-Pulls.Valueof.t > s’.Valueof.t s’.ValueOf s.ValueOfs.Valueof.t < s’.Valueof.t s.ValueOf s’.ValueOf
s s’
s pushes its value to s’
s pulls s’ and gets s’
value
91p2p, Fall 06
Anti-Entropy
Assume that Site s’ is chosen uniformly at random from the set S Each site executes the anti-entropy algorithm once per period
It can be proved that An update will eventually infect the entire population Starting from a single affected site, this can be achieved in time proportional to the log of the population size
92p2p, Fall 06
Anti-EntropyLet pi be the probability of a site remaining susceptible (has not received the update) after the i cycle of anti-entropy
For pull,A site remains susceptible after the i+1 cycle, if (a) it was susceptible after the i cycle and (b) it contacted a susceptible site in the i+1 cycle
pi+1 = (pi)2
For push,A site remains susceptible after the i+1 cycle, if (a) it was susceptible after the i cycle and (b) no infectious site choose to contact in the i+1 cycle
pi+1 = pi (1 – 1/n)n(1-pi)
1 – 1/n (site is not contacted by a node)n(1-pi) number of infectious nodes at cycle iPull is preferable than
push
93p2p, Fall 06
Anti-Entropy
More next week