One for (Almost) All: Using a Modern Programmable Programming Language in Robotics
Berthold Bä[email protected]
Autonomous Learning Robots LabInstitute of Robotics and Mechatronics
German Aerospace Center (DLR)
orA Roboticist in Language Wonderland
Trend in Software Development
• use of modern programming languages• moving away from dynamic scripting languages like Python or Ruby
backend
Haskell
Deutsche Bank
tools for innovative trading group
backend ofChat Service
high speed trading“used for everything”
backend
• all functional and (almost all) strong static type system
Demands of Modern Software Systems
• execution performance
• interpreters are intrinsically slow!
• modern compilation and JIT techniques to come close to C/C++
Computer Language Benchmarks Game
SB
CL
Lisp
Sch
eme/
Rac
ket
Clo
jure
Sca
laH
aske
ll
OC
aml
Erla
ng
Java gc
cg+
+Fo
rtran
Java
7
Rub
yP
ytho
n
Demands of Modern Software Systems
• execution performance
• interpreters are intrinsically slow!
• modern compilation and JIT techniques to come close to C/C++
• maintenance and debugging
• dynamic scripting languages good for prototyping
• but for large code base and number of developers more language support for maintainability is needed
• modern strong and static type system + type inference uncovers many application logical errors as type errors at compile time
• support for functional programming: encourages/-forces immutability -> flow of state explicit -> easier to reason about
• productivity in developing complex algorithms and application logic
• function combination rather than object composition
• highly efficient functional data structures
• parallel execution
• support for multi-core/multi-CPU has to be built into the language
Functional Programming --
Functional Programming Languages• from Matthias Felleisen, “Functional Programming is Easy, and Good for You”
http://www.ccs.neu.edu/home/matthias/Presentations/11GS/gs.pdf
Functional programming is about clear, concise
communication between programmers.
A good transition needs training, but training pays off.
Functional programming languages keep you honest
about being functional.
Haskell Scala OCaml Erlang Clojure SchemeGambit/Racket
functional + + + + + +typed (static) + + + - - - / (+)
mutation - + + - + +
strictness lazy strict(&lazy) strict(&lazy) strict strict strict / strict (lazy)
parallel ++ + JoCaml ++ ++ Termite / +compiled native JVM native VM (native
HiPE) JVM C-code / VM-
JIT
Demands of Advanced Robotic Systems
• what robots we are talking about -> advanced complex humanoids
• different to, e.g., fleets of quadrocopters
• with respect to computing power necessary and available!
• esp. in our case: platform for fundamental research -> flexibility for developing new solutions from ground up more important than integrating many existing “classical” solutions
• ...
Demands of Advanced Robotic Systemssensing• stereo cameras (2MPixel/25Hz)RGB-D sensor (0.5MPixel/33Hz)• torque sensor (all DOF, 1kHz)tactile skin on hands (3000taxel/750Hz)•IMU (6D, 500Hz)
acting• 53 DOF = 8 (plattform) + 19 (torso) + 26 (hands)• torque control over all DOF• 1kHz, <3ms latency, <100us Jitter
computing• 4x Core i7 Quadcore (onboard)• CPU cluster with 64 cores• GPGPU cluster 16 NVidia K20
hand control1kHz
arm/torso/head control (19DOF)
1kHz
platform control60Hz/50ms
state machine/communicator/view control
1kHz
pose estimator512Hz/0.5ms
circle detector25Hz/25ms
circle detector25Hz/25ms
MHT/UKF25Hz/10ms
ball tracker25Hz/35ms Linux/
QuadCore
planner25Hz/60ms
SQP optimizer
60ms
SQP optimizer
60ms
SQP optimizer
60ms
SQP optimizer
60msQNX/
32 Cores
QNX/2x DualCore
user interaction
GigE
GigE
1394
GigE
Sercos
SpaceWire
CAN
TD
TD
TD
USB
GUI3D viewer
Linux
logic planner
Linux
TD
TD
Demands of Advanced Robotic Systems
• easy interfacing/interaction with C/C++ for low-level, hard realtime, highly performant code (control algorithms, image processing)
• multi-platform support:robotic system are often heterogenous (realtime OS, drivers only for certain OS, ...)
• actor model and continuations:
• for concurrent, parallel and distributed computing
• to build complex synchronization and execution patterns for orchestration of interwoven task/behavior nets
• not purely functional: to easily work with changing states of robots and world
• Domain Specific Languages (DSLs)
• robotic system span wide range of different tasks
• DSLs can respect different abstractions also syntactically, e.g.,
• kinematic/dynamic/geometrical description
• complex and concurrent state machines
• data types of the communication packets
Domain Specific Languages --An Important Concept in Modern Software Design
DSLsarecool!
DSLs are cool!Domain specific languages are
• a little story ... by courtesy of Matthew Flatt, University of Utah(to see the animation run in Racket: (require (planet mflatt/princess:1:2/play-movie))
Domain Specific Languages --An Important Concept in Modern Software Design
LaTeX
doc
• a little story ... by courtesy of Matthew Flatt, University of Utah
Domain Specific Languages --An Important Concept in Modern Software Design
• a little story ...
• abstractions for different tasks/fields/domains often best expressed in specific language (with optimized syntax and semantics)
• embedded domain specific languages (DSELs) use infrastructure of implementing language and extend these: languages as libraries
• popular approaches for full fledged DSELs (including control structs)
• lazy functional languages (Haskell, Scala): functions and combinators
• meta-programming == syntax rewriting/manipulating the AST
• Lisps, Clojure, Scheme: macro systems “directly” manipulates S-exp
• Template Haskell, MetaOCaml, Scala Macros (since 2.10)
• if implementing language is a compiled language, also the DSL is a compiled (and efficient) language!
• even full fledged language extension possible ...
Macro Systems
• A macro extends a language by specifying how to compile a new feature into existing features
• The macro is itself implemented in theprogramming language, not an external tool.
• more on macros (taken from a talk of Robby Findler)...
• “history” of Scheme macros
• text replacing
• syntax replacing macros
• hygienic macros (obey lexical scoping)
• advanced macro systems
• with syntax object containing source location -> precise error messages
• implemented with macros in Scheme (Racket)
• performance ~ interface-based Java calls
• M. Flatt, R. B. Findler, M. Felleisen “Scheme with Classes, Mixins, and Traits”
(class object% (init size) (define current-size size)
(super-new) (define/public (get-size) current-size) (define/public (grow amt) (set! current-size (+ amt current-size))) (define/public (eat other-fish) (grow (send other-fish get-size))))
(define fish% (class object% (init size) ....))(define charlie (new fish% [size 10]))
> (send charlie grow 6)
> (send charlie get-size)16
Java-like Class System as “DSEL”
Haskell Scala OCaml Erlang Clojure SchemeGambit/Racket
functional + + + + + +typed (static) + + + - - - / (+)
mutation - + + - + +
strictness lazy strict(&lazy) strict(&lazy) strict strict strict / strict (lazy)
parallel ++ + JoCaml ++ ++ Termite / +compiled native JVM native VM (native
HiPE) JVM C-code / VM-
JIT
Haskell Scala OCaml Erlang Clojure SchemeGambit/Racket
functional + + + + + +typed (static) + + + - - - / (+)
mutation - + + - + +
strictness lazy strict(&lazy) strict(&lazy) strict strict strict / strict (lazy)
parallel ++ + JoCaml ++ ++ Termite / +compiled native JVM native VM (native
HiPE) JVM C-code / VM-
JITdistributed Cloud Haskell actors JoCaml actors!! no native,
e.g., AkkaTermite / distrib.places
FFI (w/o glue) + JNA experimental + JNA + / +
platforms Lin/Mac/Win Lin/Mac/Win Lin/Mac/Win Lin/Mac/Win Lin/Mac/Win gcc / Lin/Mac/Win
DSELs (functional)
++ ++ + + + + / +
meta programming
Template Haskell
macros (experiment.)
MetaOCaml simple macros lisp macros macros / adv. macros
Radical System Architecture: Use One Language for (Almost) All• modern higher level functional languages
• are performant
• have “batteries included”: GUI, networking, FFI, ...
• fulfill (almost) all challenges in robotic
• radical design
• one language for all
• except for small C/C++ snippets (highest performance/determinism)
• language built-in parallel and distributed execution/communication
• efficient DSELs for the various data description or execution logic tasks
• benefits
• general higher productivity with higher level language
• homogeneity drastically reduces system complexity
• developers have to/can learn in-depth one language
• no conceptual or practical “frictional loss” due to language coupling
• same higher level concepts in all components (type system, data structures, closures, continuations, ...)
• distributed communication of higher level concepts -- in contrast to “least common denominator” of multi-language-compatible middleware (aRD: static packets, ROS: dynamic arrays, ...
GPU
state machine
planner
visiondriver
GUI
Simulink
GPU
state machine
planner
visiondriver
GUI
Simulink
GPU
state machine
planner
visiondriver
GUI
Simulink
GPU
state machine
planner
visiondriver
GUI
Simulink
GPU
GPU
state machine
planner
visiondriver
GUI
Simulink
GPU
state machine
planner
visiondriver
GUI
Simulink
• follows this radical “one language” philosophy
• we chose the Scheme variant Racket as base language
• only for hard realtime (controllers) and high performance (image processing, GPU) we additionally need C/C++ and Matlab/Simulink models
• aRDx allows seamless integration of C/C++ and Matlab/Simulink with Racket up to module loading with auto-compilation
• aRDx provides highly performant and hard realtime capable communication layer for raw data transport for C/C++ -- setup of communication logic already in Racket
application
Racket
OS
aRDx
aRDxRT
C/C++
• first prototype (many things still missing) successfully works on Agile Justin
The aRDx Framework (aRD Next Generation)
T. Hammer, B. Bäuml, “Raw Performance of Robotic Software Middleware: A Comparison and aRDx’s New Realtime Communication Layer”
Ê Ê Ê Ê Ê Ê Ê ÊÙ Ù Ù ÙÙ
Ù
Ù
Ù
Á Á ÁÁ
Á
Á
Á
Á
‡ ‡ ‡ ‡‡
‡
‡
‡
Ï Ï Ï Ï ÏÏ Ï ÏÚ Ú Ú Ú Ú Ú Ú Ú
Ê aRDx Ù aRD
Á Orocos ‡ ROS
Ï ROS HfixedL Ú YARP
1 102 104 106 10810-6
10-4
10-2
1
packet size @byteD
roun
d-tri
ptim
e@sD
process
Ê Ê Ê Ê Ê Ê Ê ÊÙ Ù Ù Ù
Ù
Ù
Ù
Ù
Á Á Á ÁÁ
Á
Á
Á
‡ ‡
‡ ‡
‡
‡
‡
‡
Ú Ú Ú Ú ÚÚ
Ú
Ú
10-3 110-4
10-2‡ ‡
‡
‡ ‡
*
pause @sD
1 102 104 106 10810-6
10-4
10-2
1
packet size @byteD
host
Ê Ê ÊÊ
Ê
Ê
Ê
Ê
Ù Ù ÙÙ
Ù
Ù
Ù
Ù
Á Á Á Á
Á
Á
Á
Á
‡ ‡
‡
‡‡
‡
‡
‡
Ú Ú Ú ÚÚ
Ú
Ú
Ú
1 102 104 106 10810-4
10-3
10-2
10-1
1
10
packet size @byteD
distributed
Ê Ê Ê Ê Ê Ê ÊÙ Ù Ù Ù
Ù
Ù
Ù
Á Á ÁÁ
Á
Á
‡ ‡ ‡ ‡ ‡
‡
‡
‡
Ï Ï Ï Ï Ï Ï Ï ÏÚ Ú Ú Ú Ú Ú Ú
1 102 104 106 10810-6
10-4
10-2
1
packet size @byteD
roun
d-tri
ptim
e@sD
process
Ê Ê Ê Ê Ê Ê Ê
Ù Ù Ù Ù
Ù
Ù
Ù
Á Á Á ÁÁ
Á
Á
‡ ‡
‡ ‡
‡
‡
‡
Ú Ú Ú Ú Ú
Ú
ÚÚ
1 102 104 106 10810-6
10-4
10-2
1
packet size @byteD
host
Ê Ê Ê
Ê
Ê
Ê
Ê
Ù Ù Ù
Ù
Ù
Ù
Ù
Á Á Á ÁÁ
Á
Á
‡ ‡
‡
‡
‡
‡
‡
Ú Ú ÚÚ
Ú
Ú
Ú
1 102 104 106 10810-4
10-3
10-2
10-1
1
10
packet size @byteD
distributed
Fig. 5. Results of the stress test benchmark for 1 (top) and 20 (bottom) clients and for the three domains (columns). Each plot shows the mean round-triptime (averaged over some 100 runs) over the packet size for the various frameworks. Please, be aware of the log-log-scaling of the plots. The performance ofaRDx is almost always the best – most dramatically for the host domain where no other framework can provide zero-copy semantics. Only for small packetsizes (up to 1KB) where the transfer time is dominated by the constant overhead of a framework aRDx is beaten by aRD’s minimalistic implementationand in the 1-client case and large packets YARP is about 10% faster presumably due to a slightly more clever configuration of the TCP sockets. In the20-client case aRDx beats in the distributed domain all other frameworks by a factor of 2 because it has to transfer the packets sent from the master to theremote client only once and, hence, in each round of the test instead of 20+20 packets only 2+20 packets have to be transmitted over the GigE network.The increased constant overhead of aRDx for the host compared to the process domain is about 5x and, hence, close to the theoretically expected 6x dueto the indirect communication through the daemon. Interestingly, although aRDx needs a quite complex logic to provide zero-copy semantics in the hostdomain, its constant overhead is still 4x smaller than that of all other (except aRD) frameworks. In what follows we discuss some feature and quirks ofthe other frameworks we came about. All these frameworks scale very well and roughly linear with the number of clients. For the process domain YARPcan provide zero-copy semantics. In this domain ROS with its nodelets also was expected to show constant transfer times but could do so only after wefixed the implementation (labeled ROS fixed) – standard ROS (labeled ROS) completely initializes the memory of newly constructed packets, hence, thetransfer time has to scale with the packet size. For the host and the distributed domain YARP and ROS perform very similar as both communicate overTCP sockets (side note: for YARP, because of instabilities, we could use the potentially more efficient mutlicast and shared memory modes) . In caseof the host domain and large packets (> 1MB) they even reach almost the performance of the shared memory based transport of aRD showing that theLinux loopback sockets are very efficient. In all tests the performance of Orocos was worst, although we always tried the optimal parameters. We suspectthat this comes due to the additional abstraction layer with ACE/TAO in its communication stack. For ROS we found another severe quirk in the host anddistributed domain and packet sizes of 10KB to 100KB. There the round-trip time dramatically increases 100x. A further analysis (showed that this effectdisappears completely when adding a pause of at least 100ms between each round of the test (see the inset in the 1-client plot depicting the round-triptime over the pause time for 1KB packet) . This means, ROS is not really stress resistent.
with 100 clients running in the kHz range. Even for thedistributed domain the worst-case round-trip latencies are nolonger than 500µs.
IV. CONCLUSIONS
We presented the design considerations and implementa-tion details of the new highly performant, realtime capable,minimalistic and simple communication layer of our aRDxsoftware framework. In an in-depth benchmarking on Linuxof the raw communication performance of aRDx and thepopular robotic software frameworks ROS, YARP, Orocosand aRD it was shown that aRDx performs excellent in bothextreme performance aspects, namely latency and bandwidth,and partially dramatically outperforms the other frameworks.In addition due to the ”stress” character of our tests we coulduncover a number of severe quirks in all other frameworks.
Running on QNX, aRDx provides hard realtime perfor-mance even for distributed applications.
aRDx is already successfuly in use on our advanced andcomplex humanoid robot Agile Justin. In future publicationswe will describe its other, high level parts, like the dynamicand flexible but less performant communication layer or theadvanced mechanisms for startup and shutdown of largedistributed applications.
REFERENCES
[1] PR2 - personal robot 2. [Online]. Available:http://www.willowgarage.com
[2] M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs,E. Berger, R. Wheeler, and A. Ng, “Ros: an open-source robot oper-ating system,” in Proceedings of the Open-Source Software workshopat the International Conference on Robotics and Automation (ICRA),2009.
[3] icub. [Online]. Available: http://www.icub.org
Conclusions
• modern high level languages (beyond Python or Ruby) have much to offer
• functional programming, advanced static type systems, performance, DSELs ...
• allows radical new system architecture with One Language for (Almost) All• our new aRDx framework successfully follows this philosophy
• many interesting candidate languages for robotics: Haskell, Scala, Erlang, Clojure, Schemes (Gambit, Racket, ...), OCaml, ...
• Tip:
Roboticists, go to language wonderland!
Prepared exclusively for Berthold Baeuml
B. A. Tate. “Seven Languages in Seven Weeks”, The Pragmatic Programmer, 2010.