Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | brett-burns |
View: | 25 times |
Download: | 0 times |
Distributed Watchpoints: Debugging Very Large Ensembles of
Robots
De Rosa, Goldstein, Lee, Campbell, Pillai
Aug 19, 2006
8/19/2006 Distributed Watchpoints2
Motivation
• Distributed errors are hard to find with traditional debugging tools
• Centralized snapshot algorithms– Expensive– Geared towards detecting one error at a time
• Special-purpose debugging code is difficult to write, may itself contain errors
8/19/2006 Distributed Watchpoints3
Expressing and Detecting Distributed Conditions
“How can we represent, detect, and trigger on distributed conditions in very large multi-robot systems?”
• Generic detection framework, well suited to debugging
• Detect conditions that are not observable via the local state of one robot
• Support algorithm-level debugging (not code/HW debugging)
• Trigger arbitrary actions when condition is met
• Asynchronous, bandwidth/CPU-limited systems
8/19/2006 Distributed Watchpoints4
Distributed/Parallel Debugging:State of the Art
Modes:
• Parallel: powerful nodes, regular (static) topology, shared memory
• Distributed: weak, mobile nodes
Tools:
• GDB
• printf()
• Race detectors
• Declarative network systems with debugging support (ala P2)
8/19/2006 Distributed Watchpoints5
Example Errors: Leader Election
Scenario: One Leader Per Two-Hop Radius
8/19/2006 Distributed Watchpoints6
Example Errors: Token Passing
Scenario: If a node has the token, exactly one of it’s neighbors must have had it last timestep
8/19/2006 Distributed Watchpoints7
Example Errors: Gradient Field
Scenario: Gradient Values Must Be Smooth
8/19/2006 Distributed Watchpoints8
Expressing Distributed Error Conditions
Requirements:
• Ability to specify shape of trigger groups
• Temporal operators
• Simple syntax (reduce programmer effort/learning curve)
A Solution:
• Inspired by Linear Temporal Logic (LTL)– A simple extension to first-order logic– Proven technique for single-robot debugging [Lamine01]
• Assumption: Trigger groups must be connected– For practical/efficiency reasons
8/19/2006 Distributed Watchpoints9
Watchpoint Primitives
• Modules (implicitly quantified over all connected sub-ensembles)
• Topological restrictions (pairwise neighbor relations)
• Boolean connectives
• State variable comparisons (distributed)
• Temporal operators
nodes(a,b,c); n(b,c) & (a.var > b.var) & (c.prev.var != 2)
8/19/2006 Distributed Watchpoints10
Distributed Errors: Example Watchpoints
nodes(a,b,c);n(a.b) & n(b,c) & (a.isLeader == 1) & (c.isLeader == 1)
nodes(a,b,c);n(a,b) & n(a,c) & (a.token == 1) & (b.prev.token == 1) & (c.prev.token == 1)
nodes(a,b);(a.state - b.state > 1)
8/19/2006 Distributed Watchpoints11
Watchpoint Execution
nodes(a,b,c)…
21 43 65 87
10
912
11
14
13
16
15
18
17
20
19
22
21
24
23
26
25
28
27
30
29
32
31
1
2
3
1 2
1 9
.
.
.
.
1 9 2
1 910 √
8/19/2006 Distributed Watchpoints12
Performance: Watchpoint Size
• 1000 modules, running for 100 timesteps
• Simulator overhead excluded
• Application: data aggregation with landmark routing
• Watchpoint: are the first and last robots in the watchpoint in the same state?
Watchpoint Size vs. Simulation Time
0
100
200
300
400
500
600
700
800
900
none 2 3 4
Size (slots)
Time (s)
8/19/2006 Distributed Watchpoints13
Performance: Number of Matchers
• This particular watchpoint never terminates early
• Number of matchers increases exponentially
• Time per matcher remains within factor of 2
• Details of the watchpoint expression more important than size
Watchpoint Size vs. Number of Matchers
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
none 2 3 4
Size (slots)
Matchers
8/19/2006 Distributed Watchpoints14
Performance: Periodically Running Watchpoints
Watchpoint Activity % vs. Time
0
10
20
30
40
50
60
70
100% 50% 33% 25% 20% never
Activity (%)
Time (ms)
8/19/2006 Distributed Watchpoints15
Future Work
• Distributed implementation
• More optimization
• User validation
• Additional predicates
8/19/2006 Distributed Watchpoints16
Conclusions
• Simple, yet highly descriptive syntax
• Able to detect errors missed by more conventional techniques
• Low simulation overhead