Post on 16-Dec-2015
transcript
Project 2 Review (Part 2)
Ananth Rao
Overview
• Stabilize and Notify
• Join (slides stolen from lecture)
• Coding Trivia
• Bootstrapping and debugging
Identifier to Node Mapping Example• Node 8 maps [5,8]
• Node 15 maps [9,15]
• Node 20 maps [16, 20]
• …• Node 4 maps [59,
4]
4
20
3235
8
15
44
58
Routing• Each node maintains
its successor • Route packet (ID,
data) to the node responsible for ID using successor pointers
4
20
3235
8
15
44
58 send(34,data)
Stabilize
• Sent to the current successorNode periodically
• “Request” for a notify packet from the successor
Notify
• Sent in reply to the stabilize packet.
• Helps build a list of k-successors at the predecessor.
Stabilize-Notify
• Direct communication only with immediate successor and predecessor
• You receive only “n th” hand info about the n th successor
• It takes n*STABILIZE_PERIOD for a change in the n th successor to get propagated
Dealing with failures
• What happens when successorNode fails..– Timeout while waiting to receive a notify– Shift successorNode list by one
• What happens when predecssorNode fails– Timeout on receiving a stabilize from the
prececessor
Dealing with failures (cont.)
• We use fine-grained timers for detecting successor failures
• We use a coarse-grained timer for detecting a predecessor failure– Predecessor is not useful for forwarding
anyway– A fine-grained timer is not useful unless we
maintain a list of precessors
Joining Operation4
20
3235
8
15
44
58
50
• Node 50 asks node 15 to forward join message
• When join(50) reaches the destination (i.e., node 58), node 58 returns a notify message to node 50
• Node 50 updates its successor to 58
join(50)
notify(58)
succ=58
Joining Operation (cont’d)4
20
3235
8
15
44
58
50
• Node 50 sends a stabilize to Node 58. The predecessor gets updated at Node 58
• Node 44 sends a stabilize message to its successor, node 58
• Node 58 reply with a notify message
• Node 44 updates its successor to 50
succ=58stabilize()no
tify(predecessor=50)
succ=50
pred=50
Joining Operation (cont’d)4
20
3235
8
15
44
58
50
• Node 44 sends a stabilize message to its new successor, node 50
• Node 50 sets its predecessor to node 44
succ=58
succ=50
Stabilize()pred=44
pred=50
Joining Operation (cont’d)4
20
3235
8
15
44
58
50
• This completes the joining operation!
succ=58
succ=50
pred=44
pred=50
Stabilize-Notify-Join
• Very simple
• Easy to code
• Can handle concurrent joins and failures– Try a few examples.. It may a take a few more
STABILIZE_PERIODS to converge, but will eventually converge
Stabilize-Notify-Join (cont.)
• Not easy to understand– When you get it.. you get it.
• Very hard to debug
• Hard to bootstrap– Lots of corner cases when there are less than k-
nodes in the ring
Coding Advice
• Checkpoint submissions better than expected :-)• No major flaws• Be careful with timers
– “select” returns “no sooner than the requested timeout period”
– Each function call takes time!!– Careful in dealing with negative struct timeval
• More feedback coming soon..– Watch the newsgroup over the weekend :-(.
Problems with timers
• After handing the event at the head of the queue..– Get current time again– Check the “due time” of the next event in the
queue
Timers for stabilize
• Time out for receiving a notify
• When to send the next stabilize– Keep track of lastStabilizeSentTime– Use MIN(lastStabilizeSentTime+STABILIZE_PERIOD-
currTime, nextEventDueTime) for timeout to select– Careful when the successorNode changes
Debugging Tips
• Most problems occur when bootstrapping the ring
• Prefer cerr/fprintf debugging to using gdb– If you set a breakpoint in gdb, every other
program on the ring is going to timeout for some reason or the other
• In the beginning, you may want to increase timers to large values
Testing with lost packets
• With large timeouts– Use keyboard input to determine whether or not
to send a packet– Make sure STABILIZE_PERIOD >
(MAX_STABILIZE_RETRIES+1) * STABILIZE_TIMEOUT
• Use randomized drops with a small drop percentage
Go step-by-step
• Before implementing join, try and implement stabilize and notify– Start with a predetermined ring– Start with only one successor in command line, but the
list should soon grow (because of stabilize-notify)– Detect failures only (no new nodes)– Use large (1s) timeout so don’t have to start all
“chatpeers” at exactly the same time
• Helps get rid of bootstrapping artifacts in the first step