Aug 19, 2006

Distributed Watchpoints: Debugging Very Large Ensembles of

Robots

De Rosa, Goldstein, Lee, Campbell, Pillai

Aug 19, 2006

8/19/2006 Distributed Watchpoints2

Motivation

• Distributed errors are hard to find with traditional debugging tools

• Centralized snapshot algorithms– Expensive– Geared towards detecting one error at a time

• Special-purpose debugging code is difficult to write, may itself contain errors


Expressing and Detecting Distributed Conditions

“How can we represent, detect, and trigger on distributed conditions in very large multi-robot systems?”

• Generic detection framework, well suited to debugging

• Detect conditions that are not observable via the local state of one robot

• Support algorithm-level debugging (not code/HW debugging)

• Trigger arbitrary actions when condition is met

• Asynchronous, bandwidth/CPU-limited systems


Distributed/Parallel Debugging:State of the Art

Modes:

• Parallel: powerful nodes, regular (static) topology, shared memory

• Distributed: weak, mobile nodes

Tools:

• GDB

• printf()

• Race detectors

• Declarative network systems with debugging support (ala P2)


Example Errors: Leader Election

Scenario: One Leader Per Two-Hop Radius


Example Errors: Token Passing

Scenario: If a node has the token, exactly one of it’s neighbors must have had it last timestep


Example Errors: Gradient Field

Scenario: Gradient Values Must Be Smooth


Expressing Distributed Error Conditions

Requirements:

• Ability to specify shape of trigger groups

• Temporal operators

• Simple syntax (reduce programmer effort/learning curve)

A Solution:

• Inspired by Linear Temporal Logic (LTL)– A simple extension to first-order logic– Proven technique for single-robot debugging [Lamine01]

• Assumption: Trigger groups must be connected– For practical/efficiency reasons


Watchpoint Primitives

• Modules (implicitly quantified over all connected sub-ensembles)

• Topological restrictions (pairwise neighbor relations)

• Boolean connectives

• State variable comparisons (distributed)

• Temporal operators

nodes(a,b,c); n(b,c) & (a.var > b.var) & (c.prev.var != 2)


Distributed Errors: Example Watchpoints

nodes(a,b,c);n(a.b) & n(b,c) & (a.isLeader == 1) & (c.isLeader == 1)

nodes(a,b,c);n(a,b) & n(a,c) & (a.token == 1) & (b.prev.token == 1) & (c.prev.token == 1)

nodes(a,b);(a.state - b.state > 1)


Watchpoint Execution

nodes(a,b,c)…

21 43 65 87

10

912

11

14

13

16

15

18

17

20

19

22

21

24

23

26

25

28

27

30

29

32

31

1

2

3

1 2

1 9

.

.

.

.

1 9 2

1 910 √


Performance: Watchpoint Size

• 1000 modules, running for 100 timesteps

• Simulator overhead excluded

• Application: data aggregation with landmark routing

• Watchpoint: are the first and last robots in the watchpoint in the same state?

Watchpoint Size vs. Simulation Time

0

100

200

300

400

500

600

700

800

900

none 2 3 4

Size (slots)

Time (s)


Performance: Number of Matchers

• This particular watchpoint never terminates early

• Number of matchers increases exponentially

• Time per matcher remains within factor of 2

• Details of the watchpoint expression more important than size

Watchpoint Size vs. Number of Matchers

0

2000000

4000000

6000000

8000000

10000000

12000000

14000000

16000000

none 2 3 4

Size (slots)

Matchers


Performance: Periodically Running Watchpoints

Watchpoint Activity % vs. Time

0

10

20

30

40

50

60

70

100% 50% 33% 25% 20% never

Activity (%)

Time (ms)


Future Work

• Distributed implementation

• More optimization

• User validation

• Additional predicates


Conclusions

• Simple, yet highly descriptive syntax

• Able to detect errors missed by more conventional techniques

• Low simulation overhead

Thank You


Backup Slides


Optimizations

• Temporal span

• Early termination

• Neighbor culling

• (one slide per)

Date post:	31-Dec-2015
Category:	Documents
Upload:	brett-burns
View:	25 times
Download:	0 times

Aug 19, 2006

Documents