Moving (and averaging) values over channelswith message loss, replay, and re-ordering
Carlos Baquero
(Joint work with Paulo Almeida, Ali Shoker)
Presented at UPMC LIP6, January 2015
Introduction
Moving/Handoff Problem
Nodes in a network have splittable value quantities, and the task isto reliably move quantities from node to node.
Each transfer involves only two parties, no global agreement.Possible uses include:
Non-negative inc/dec shared counters (Positive PN-Counter)
Stock escrow
Token/lock transfers
Distributed averaging and derived data aggregates
Sketch of Handoff
Source Node i
state: von transfer(j , q) move q to node j ; q ≤ vv := v − qsendj(q)
Destination Node j
state: von receivei (q)v := v + q
Sketch of Handoff, commutative monoid with split
Split definition:
(v ′, q) = split(v , h) such that v ′ ⊕ q = v and q ≤ h
Source Node i
state: v any commutative monoidon transfer(j , h) move h, or less, to node j(v , q) := split(v , h)sendj(q)
Destination Node j
state: v any commutative monoidon receivei (q)v := v ⊕ q
Network
Conservation of quantities requires an exacly-once deliveryfrom each send to corresponding receive.
TCP mostly ensures exactly-once, but degrades toat-most-once upon connection break.
UDP can duplicate, drop and re-order messages.
Naive exactly-once over UDP
Source assigns a unique id to each sent message
Messages are re-transmitted until acknowledged
Destination stores unique ids to avoid duplicated delivers
(more compact sequence numbers ids can be used for FIFO)
+ Source can transmit immediately (one-way handshake)
− Node state at least linear on the number of (past) parties
TCP connection management
No connection specific information between incarnations
Three-way handshake to make connection
Unbounded memory, to keep counters
A transfer over TCP pays a latency price and yet is still sensible toconnection breaks
Handoff
System Model
Network can duplicate, drop and re-order
Nodes only have connection specific info during transfers
Nodes can fail, but eventually recover
Three-way handshake is needed (Attiya, Rappoport. DC 1997)
Strategy
Adapt (piggybacking) three-way handshake steps:
1 Announce available value and sender counter/clock
2 Prepare receive slot and request quantity hint
3 Split value, up to hint, and send exactly-once quantity
4 (Garbage collect at sender, upon acknowledge)
Handoff
val: 6sck: 11
t: j ((10,20),2) val: 8
sck: 10i
j val: 4dck: 20
val: 4dck: 21
s: i ((10,20),2)
(6,2)=split(8,2)
Handoff
Time
1 2
Handoff
val: 6sck: 11
t: j ((10,20),2) val: 8
sck: 10i
j val: 4dck: 20
val: 4dck: 21
s: i ((10,20),2)val: 6
dck: 21
6=4⊕2
Handoff
Time
1 2 3
Handoff
val: 6sck: 11
t: j ((10,20),2) val: 8
sck: 10i
j val: 4dck: 20
val: 4dck: 21
s: i ((10,20),2)val: 6
dck: 21
val: 6sck: 11
Handoff
Time
1 2 3 4
Handoff
val: 8sck: 10i
j val: 4dck: 20
val: 6dck: 21
val: 6sck: 11
Handoff
Time
Duplicate Resilient Communication
Payload monoid data typesValue averaging
Positive reals that ask for half difference, give as much as possible
0.
= 0
⊕ .= +
needs(x , y).
=y − x + |y − x |
4
split(x , h).
= (x − h + |x − h|
2,x + h − |x − h|
2)
Derived aggregates include global sums and node counting
Payload monoid data typesHotel booking (with averaging strategy)
Monodic values might not be in total order
X = {single 7→ 8, double 7→ 12}
Y = {single 7→ 1, double 7→ 20}
Leading to transfers in both directions
{double 7→ 4} = needs(X ,Y )
{single 7→ 3} = needs(Y ,X )
Eventually stabilizing with
X = {single 7→ 5, double 7→ 16}
Y = {single 7→ 4, double 7→ 16}
Experiment setupGraph properties
Graph with n nodes and each with 2 log n links
(Symmetric forward and backward Chord)
Small world topology. Low path lengths, High clustering
Synchronous message model
Initial values from integer uniform distribution 0 : 255
All converge to average, about 128
Experiment setupFaults
Simple experiment that aims to check resilience to message dropand message duplication faults (dropping and duplication can alsolead to re-ordering events), and show final GC of all connectionmeta-data.
Execution with no faults
Executions with 25, 50 and 75% message loss faults
Executions with 25, 50 and 75% message replay faults
Execution with 75% mixed faults
Storage probability for replay is at 20% (lower means older replays)
(Note: need and split functions not yet optimized for this topology)
Comments
+/− Base algorithm is not optimized for this experiment
+ Still, there is clear high resilience to faults
+ State after t transfers is eventually O(log t)
− Topology must ensure symmetric exchanges
− Uncontrolled churn impacts GC:
− Meta-data kept, linear with failed node peers k+ If degree is log n then k ≤ log n
+ Implemented in C++, for int, float and map payload
Related Work
The level of handshake required for managing a connection.Hagit Attiya, Rinat Rappoport. Distributed Computing. 1997.
Scalable Eventually Consistent Counters over UnreliableNetworks. Paulo Sergio Almeida, Carlos Baquero. ArXiv.2013.