Simplifying debugging for multi-core Linux devices and low-power Linux clusters
Embedded World Exhibition & Conference
February 24, 2015
Introduction
Embedded Linux development
Why?
– Reuse
–Community
–Memory constraints
–C and C++
–Device compatibility
–Cost
Where?
– Routers
–Media streaming
– POS
–Hardware control
– Sensor display
Free Electrons.com
© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED 3
Multi-core
• 2 or 4 core devices much more common
– Multi-core
– Many-core
• You have a choice
– Leave the core idle
– Run additional processes
– Write multithreaded code to utilize the additional cores
• Graphical Processing Unit accelerators on the device?
How to use the additional cores?
4© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Multi-thread
• Concurrency: execution proceeds asynchronously along two or more
sequences
– Parallelism : concurrency with parallel execution
• Interdependencies
– Explicit is generally better than implicit
• Synchronization
– Race Conditions
– Deadlocks
– Live-locks
Taking advantage of parallelism in your
device
5© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Multi-device
• Computationally challenging problem
– Algorithm is parallelizable
• Requirements
– Power
– Space
– Cooling
• Fault tolerance
• Off the shelf parallel runtime vs custom
Embedded clusters
6© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
High performance computing
• Typically linux-x86
– Sometimes with GP-GPU or Intel Xeon Phi accelerators
• Programmed as sets of multi-core nodes
– Data is distributed with communication and synchronization as
needed
– Communication typically takes the form of message passing
• Entire system is optimized for app performance
– Low latency interconnect
– Parallel filesystem
• Access is via submitting batch jobs to a resource management system
Supercomputers and clusters with 100s – 1000s of nodes
7© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Rogue Wave Software
Rogue Wave helps organizations simplify
complex software development, improve
code quality, and shorten cycle times
9
What we do
© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Capabilities
10
Klocwork, OpenLogic, TotalView, IMSL,
SourcePro
Klocwork, OpenLogic
Klocwork, TotalView
Klocwork
Visualization, Stingray, PV-WAVE
SourcePro, IMSL, HydraExpress
SourcePro, IMSL, Stingray,
Visualization
OpenLogic OpenLogic OpenLogic OpenLogic
IMSL, SourcePro
Klocwork
© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
11
Used by 3,000 customers in over 57 countries across diverse industries to develop mission-critical applications and software
Financial Services Telecom Gov’t / Defense Technology Other Verticals
Global, diversified customer base
© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Embedded use cases
Retail point of sale
• Highly connected
– Operations
– Ad and promotional services
– Sensors (scale, scanner)
• Modern C++
• Many threads
– 1 or more threads for each task
– Responsiveness requirements for the threads reading the sensor
data.
13© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Industrial device controller
• Expensive equipment
• Used in production testing
• Controller software
– X86-linux
– C++
– Multi-threaded
• Customized at each site
– Customization takes the form of C code that runs in a pre-
compiled framework
14© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Sonar console
• Runs on Linux-64 bit and Linux-arm
– 2G flash memory
• Monolithic C++ with millions of lines of code
– Qt interface (touch displays)
– 100s of threads
• Rich visual data
– Video streaming
– One or more sensors
15© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Signal processing compute cluster
• Computationally demanding
• Sophisticated algorithms
– Translated from 4th generation languages/environments
• Need an answer quickly
• Using industry standard technologies
– C++
– MPI
– X86 processors for development
– Power processors for deployment
• Memory & Power constraints
16© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Techniques and
best practices
Debugging distributed applications
• Print debugging doesn’t scale
• You can debug 1 of N processes
– Do all processes exhibit the error
– Needle in the haystack problem
– Passing the bad apple problem
• You can run N debuggers on N processes
– Frustrating with N=2 impractical above N=4
• You can use a parallel debugger
– One debugger controlling all N processes
Techniques for debugging distributed apps
(1/3)
18© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Debugging distributed applications
• Parallel Debuggers will
– If any process fails you can focus on it and see its back-trace
– Allow you to synchronize your processes (if the code includes
common execution pathways)
– Allow you to focus on any process
– Allow you to compare processes
– Give you ways to find outliers
– Give you ways to group processes and work with those groups
Techniques for debugging distributed apps
(2/3)
19© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Debugging distributed applications
• Re-run at different scales
– Debug at lowest scale that exhibits defect
– What is different at that scale?
• Compare program flow in working and non-working cases
• Follow bad data back from the symptom to the cause
• Look closely at communication points and data decomposition
• Racy bugs
– Try out different relative orders of execution
– Add synchronization
• Deadlocks & Live Locks
– Examine sync points to make sure all assumptions are valid
– Examine flow control around sync points
• Take careful notes, there can be a lot of subtle factors
Techniques for debugging distributed apps
(3/3)
20© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Debugging multi-core applications
• Multithreaded applications and shared memory programming
– Data can be shared (higher memory efficiency)
• Shared memory programming
– Complexity: Only some memory is shared
• Multi-threaded programming
– All threads share the same heap and global
– Separate stacks (but mutually readable)
• Concurrency is the same
– Many of the same challenges and many of the same techniques
• Communication (accidental and intention) not as localized
• Memory management (new/delete, malloc/free) is shared
Observations about multi-threaded
debugging
21© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Debugging multi-core applications
• Print debugging can work for some bugs but can be very confusing for others
– Changes timing
• Look carefully at the thread capabilities of your debugger
• A good multi-thread debugger will give you
– An asynchronous interface
• Doesn’t assume a simple running/stopped state
– Easy access to all threads
– Complete control over threads
– Display of thread states
– Thread aware breakpoints
– Ways to synchronize threads
– Ways to hold threads
– Thread groups
– Display of thread-private data
– Display of data across threads
Techniques for multi-threaded debugging
(1/2)
22© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Debugging multi-core applications
• Try to reproduce problems without threads
• Vary the number of threads
• Try different interleaving patters
• Look at thread synchronization point (mutexes, semaphores, barriers)
• Use watchpoints (aka data breakpoints)
• Make sure resources are cleaned up before thread termination
• Use record and deterministic replay to capture the exact thread
execution pattern
Techniques for multi-threaded debugging
(2/2)
23© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Log file debugging
• Recompile with print statements for a log file
• Compile in and toggle on/off with a runtime flag
• Trace with an external tool
– System call tracing
– Debugger assisted tracing: refocus experiments without a recompile
• Tension & Trade-off
– Capture enough context to understand what is happening
– Manage the large volume of output that may be required
• Tips & Techniques
– Binary search to find the site of the error
– Consider file system / file size
– Flush the pipe, otherwise file writing is asynchronous
– The presence of a call sometimes changes the behavior (compiler bugs, optimization, race
conditions)
– Print debugging can be hairy with multi-thread or multi-process
• Externally driven tracing tools may be preferable to ensure logging happens
Narrow down the site and capture the context of the
bug
24© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Dynamic memory analysis
• Dynamic memory tools help catch hard to identify bugs
– memory leaks can lurk in a code base
– bounds violations can corrupt data
• can be an open door for malicious agents
– dangling pointers lead to racy, hard to reproduce symptoms
• Dynamic memory tools can also be used to inspect what is happening in the heap memory
– Normally quite hard to visualize and understand
– Critical for optimizing for low memory environments
• Tips & techniques
– Maintain a policy of eliminating 100% of leaks
– Use with a testing system to make sure you exercise different kinds of input and
different code paths
– Compare heap behavior over time to make sure OS and library changes don’t
introduce problems
Pinpoint leaks and analyze memory use
25© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Reverse debugging
• Record and deterministically replay execution trajectory through the code
– Record non deterministic inputs
– Replay those as needed to access any point in the execution
• If you can get a racy bug to reproduce you can examine it at leisure
– Give yourself the full benefit of hindsight
– What steps led to it happening?
– Where did the program go wrong?
• Tips & Techniques
– Use watchpoints (data breakpoints) to find the source of corrupt data
– Wait till you are close to the bug before activating the recording to avoid paying overhead for the entire runtime
– Capture recordings and save them to a file as part of bug reports
– Review recordings of defects in unfamiliar parts of the code with subject
matter experts
Get “racy” bugs “on tape”
26© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Remote Debugging / Cross Debugging
• Remote Debugging
– debugger core runs on your workstation (host) system
– lightweight agent process runs on the device (target) system
• The agent process is very lightweight
• The debugger core holds all the complex analysis data structures
• Tips & Techniques
– Start with a debug target on the host machine
• Copy and strip the version that goes on the device
– You can start the server and then choose the target process
– Sources may need to be accessible on the host
– Use a tool that does the right thing with host/target library mismatch
– Be aware of security
Limit debugger resource utilization in the target system
27© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Core file debugging
• The corefile isn’t always sufficient
– It can be trashed
– It represents the consequence of the defect, but not the cause
• Examine the site of the crash
• Look for ‘suspicious’ variables
• Tips & Techniques
– Compile with debug information
– You can sync up a pre-stripped executable with a corefile
generated by its stripped counterpart
– Check the more than one stack frame
A corefile is a good place to start
28© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Static analysis
• Scan your code with a “sanity checker”
– Identifies patterns which may or will lead to errors
– Can check for compliance with coding standards
• Finds bugs that could lead to a crash, even if they don’t right away
• Finds certain kinds of resource leakage
• The sooner the better
– Faster feedback, easier to correct
– Ideally this should work like a spell checker
Catch defects early on
29© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Rogue Wave solutions
• TotalView
– Asynchronous Thread Control
– Parallel Debugging
– Core file Debugging
– Reverse Debugging
– Dynamic Memory Analysis
• Klocwork
– C and C++ Static Code Analysis
• OpenLogic
– Mange Your Open Source Components
We can help!
Visit us at booth #4-139
30© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED
Resource slides