Post on 20-Jan-2017
transcript
Bounded Model Checking for C Programs in an Enterprise Environment
Michael TautschnigAmazon Web Services & Queen Mary University of London
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Customer: I would like to get a guarantee that there are no security bugs in this software.
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
“Software” eco system of can’t be published, but …
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Ample Open-Source Software “out there”
• Debian (http://sources.debian.net/stats/ 21st October 2016) • 26,900 source packages • 13,736,903 individual source files • 1,276,743,654 lines of source code (any programming language) • 45.5% (approx 500M) C code, 22.2% C++, 5.6% shell, 4.7% Java
• SourceForge, github, CodePlex, ...: how to automate any kind of analysis?
• Distributions (RedHat, Ubuntu/Debian, SuSE, … - but also industrial set ups)! • Software organised in source packages • Uniform interface to access/download packages • Uniform build interface, dependency management
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Building one Source Package: Compiler Tool-chain
• For now: C source code only
• goto-cc (part of CBMC distribution) • Uses compiler’s (here: GCC’s) preprocessor • Own C parser/front end (no Cil, LLVM, EDG, ...) • Supports GCC, Visual Studio, CodeWarrior, ARM-CC dialects and command
line options • Builds intermediate representation understood by CBMC/CProver tools • Linking of compiled files/archives/libraries
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Supporting arbitrary Build Systems
• Builds are performed in chroot environments • /usr/bin/gcc and /usr/bin/ld replaced by scripts invoking goto-cc (+ more work) • Key procedure:
1. Run real compiler/linker (gcc/ld) 2. Compile/link using goto-cc 3. Add result as additional ELF section
• Resulting file remains executable • Stable under file renaming, archiving, etc. • Linking stage extracts intermediate representation from extra ELF section
x86 binary CProver
IR
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Building Thousands of Packages
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Infrastructure: (Ab-)using Jenkins
Scripts, notes, configuration: https://github.com/tautschnig/cprover-debian
Jenkins master: 4 cores, 64 GB
5 slave nodes: each 64 cores,
256 GB memoryUltimate Debian
Database: Package versions, bugsSQL
SSH
Debian mirror: source archives
FTP
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Current per-package Work Flow
Compile, linkStore archive of all object
files/executables
dump-c: create human-
readable C code from IR
Add generic assertions (pointer checks,
arithmetic overflow, no-
NaN, ...)Run CBMC
w/unwinding bound 1, Z3/
Minisat (DAC’03,
TACAS’04, CAV’13)
Loop acceleration
(CAV’13)
Re-compile using goto-cc
Static weak memory cycles
(TOPLAS/PLDI’14)
re-compile using gcc (errors not
fatal)
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Exercising Language Front Ends
Compile, link
Store archive of all object
files/executables
dump-c: create human-
readable C code from IR
Add generic assertions (pointer checks,
arithmetic overflow, no-
NaN, ...)Run CBMC
w/unwinding bound 1, Z3/
Minisat (DAC’03,
TACAS’04, CAV’13)
Loop acceleration
(CAV’13)
Re-compile using goto-cc
Static weak memory cycles
(TOPLAS/PLDI’14)
re-compile using gcc (errors not
fatal)
+
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Exercising Language Front Ends
• Many bug fixes and improvements to the parser, type checker • Re-engineering of parts of the linker • Bug fixes in IR construction
• Compilation (without further analysis steps) of entire archive: ~2 days • > 250 GB of compressed archives of IR object files/executables
• 10314 archives available:
http://theory.eecs.qmul.ac.uk/debian+mole/pkgs/
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Results for relevant to Practitioners: Bug Reports
• Key feature: type checking at link time • 844 bugs reported, 530 already fixed by developers • Hundreds still to be reported
• http://bugs.debian.org/cgi-bin/pkgreport.cgi?users=mt@debian.org&tag=goto-cc&archive=both
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Reporting bugs
Automated Testing using SMID | Michael Tautschnig
Where are the cats?
• CAV’14: J. Alglave, D. Kroening, V. Nimal, D. Poetzl: Don't sit on the fence: A static analysis approach to automatic fence insertion
• PLDI’14/TOPLAS: J. Alglave, L. Maranget, M. Tautschnig: Herding Cats - Modelling, simulation, testing, and data-mining for weak memory (cited in Linux Weekly News and C/C++ WG21/N4036)
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Focus on improving/developing Methods
Compile, linkStore archive of all object
files/executables
dump-c: create human-
readable C code from IR
Add generic assertions (pointer checks,
arithmetic overflow, no-
NaN, ...)Run CBMC
w/unwinding bound 1, Z3/
Minisat (DAC’03,
TACAS’04, CAV’13)
Loop acceleration
(CAV’13)
Re-compile using goto-cc
Static weak memory cycles
(TOPLAS/PLDI’14)
re-compile using gcc (errors not
fatal)
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
TOPLAS/PLDI’14: analysing 200 million LOC for potential weak memory susceptibility
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Automated Information Leak Detection
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Analysing the Patched Version
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Overall Analysis Status (preliminary!)
Compile, linkStore archive of all object
files/executables
dump-c: create human-
readable C code from IR
Add generic assertions (pointer checks,
arithmetic overflow, no-
NaN, ...)Run CBMC
w/unwinding bound 1, Z3/
Minisat (DAC’03,
TACAS’04, CAV’13)
Loop acceleration
(CAV’13)
Re-compile using goto-cc
Static weak memory cycles
(TOPLAS/PLDI’14)
re-compile using gcc (errors not
fatal)
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Overall Analysis Status (preliminary!)
• In addition to 314 bugs reported and not yet fixed: 4915 packages with error reports - top causes:
1789 CBMC counterexamples (including several using loop acceleration) 1711 Loop acceleration bugs 200 Floating point support in Z3 back end 198 Type-inconsistent access to heap with symbolic offset 129 CBMC Out-of-memory 54 Parameter counts differ 48 Conflicting array sizes 46 Conflicting types 42 Conflicting struct types 32 Conflicting return types (byte size)