+ All Categories
Home > Documents > Software Verification: Testing vs. Model Checking - IBM … · 2017-11-15 · I Create, execute +...

Software Verification: Testing vs. Model Checking - IBM … · 2017-11-15 · I Create, execute +...

Date post: 27-Jun-2018
Category:
Upload: lethien
View: 214 times
Download: 0 times
Share this document with a friend
47
Software Verification: Testing vs. Model Checking A Comparative Evaluation of the State of the Art Thomas Lemberger Joint work with Dirk Beyer LMU Munich, Germany
Transcript

Software Verification:

Testing vs. Model Checking

A Comparative Evaluation of the State of the Art

Thomas Lemberger

Joint work with Dirk BeyerLMU Munich, Germany

Null Hypothesis:I Testing is better at finding bugs than model checking.I Testing is faster than model checking.I Testing is more precise than model checking.I Testing is easier to use than model checking.

Thomas Lemberger LMU Munich, Germany 2 / 23

Where’s the numbers?

Thomas Lemberger LMU Munich, Germany 3 / 23

Overview

Thomas Lemberger LMU Munich, Germany 4 / 23

Terminology

I Testing:I Execute finite set of test cases on programI Observe compliance/violation of specificationI Focus: Test-case generation

I Model checking:I Formally describe possible program statesI Prove compliance/violation of specificationI Abstraction important

I Automated!

Thomas Lemberger LMU Munich, Germany 5 / 23

Terminology

I Testing:I Execute finite set of test cases on programI Observe compliance/violation of specificationI Focus: Test-case generation

I Model checking:I Formally describe possible program statesI Prove compliance/violation of specificationI Abstraction important

I Automated!

Thomas Lemberger LMU Munich, Germany 5 / 23

Terminology

I Testing:I Execute finite set of test cases on programI Observe compliance/violation of specificationI Focus: Test-case generation

I Model checking:I Formally describe possible program statesI Prove compliance/violation of specificationI Abstraction important

I Automated!

Thomas Lemberger LMU Munich, Germany 5 / 23

Scope

I Single, sequential programsI Whitebox programsI Task: bug finding

Thomas Lemberger LMU Munich, Germany 6 / 23

Comparability

Test-case generators

I Di�erent conventions forprogram input

I Di�erent output formatsfor test cases

I Di�erent/no testexecutors

Thomas Lemberger LMU Munich, Germany 7 / 23

Comparability

Test-case generatorsI Di�erent conventions for

program input

I Di�erent output formatsfor test cases

I Di�erent/no testexecutors

klee_make_symbolic(&x, sizeof(x), "x");

CREST_int(x);x = input();

x = parse( fgets (...));

input(&x, sizeof (x), "x")

Thomas Lemberger LMU Munich, Germany 7 / 23

Comparability

Test-case generatorsI Di�erent conventions for

program inputI Di�erent output formats

for test cases

I Di�erent/no testexecutors

KTESTsimple.bc_sym___VERIFIER_nondet_int????...

1, ≠5, 31, ≠5, 0

Test inputs : [42, 107]

zsd;as@d

0xF2030x00030xF2030x0003

Thomas Lemberger LMU Munich, Germany 7 / 23

Comparability

Test-case generatorsI Di�erent conventions for

program inputI Di�erent output formats

for test casesI Di�erent/no test

executors

None

klee-replay

None

Thomas Lemberger LMU Munich, Germany 7 / 23

Comparability

Model checkersI Established standard for

input programs

I Established standard foroutput format of result

x = __VERIFIER_nondet_int();

Thomas Lemberger LMU Munich, Germany 8 / 23

Comparability

Model checkersI Established standard for

input programsI Established standard for

output format of result • FALSE

• UNKNOWN

• TRUE

Thomas Lemberger LMU Munich, Germany 8 / 23

Comparability

Model checkersI Established standard for

input programsI Established standard for

output format of result

∆ Adjust test-case generators to standards of model checkers

Thomas Lemberger LMU Munich, Germany 8 / 23

Framework

Thomas Lemberger LMU Munich, Germany 9 / 23

Framework: TBF

TBF: Test-based falsifierI Apply test-case generators to model checker standards

I Create, execute + observe testsI Only variable: Test-case generation toolI Specification: Never call __VERIFIER_error

I Disclaimer: Comparison of tools, not techniques

Thomas Lemberger LMU Munich, Germany 10 / 23

Framework: TBF

TBF: Test-based falsifierI Apply test-case generators to model checker standardsI Create, execute + observe tests

I Only variable: Test-case generation toolI Specification: Never call __VERIFIER_error

I Disclaimer: Comparison of tools, not techniques

Thomas Lemberger LMU Munich, Germany 10 / 23

Framework: TBF

TBF: Test-based falsifierI Apply test-case generators to model checker standardsI Create, execute + observe testsI Only variable: Test-case generation tool

I Specification: Never call __VERIFIER_error

I Disclaimer: Comparison of tools, not techniques

Thomas Lemberger LMU Munich, Germany 10 / 23

Framework: TBF

TBF: Test-based falsifierI Apply test-case generators to model checker standardsI Create, execute + observe testsI Only variable: Test-case generation toolI Specification: Never call __VERIFIER_error

I Disclaimer: Comparison of tools, not techniques

Thomas Lemberger LMU Munich, Germany 10 / 23

Framework: TBF

TBF: Test-based falsifierI Apply test-case generators to model checker standardsI Create, execute + observe testsI Only variable: Test-case generation toolI Specification: Never call __VERIFIER_error

I Disclaimer: Comparison of tools, not techniques

Thomas Lemberger LMU Munich, Germany 10 / 23

TBF Architecture

InputProgram

Preprocessor PreparedProgram

Test-CaseGenerator

TestCases

Test-VectorExtractor

TestVectors

HarnessGenerator Harness

TestExecutor

Verdict

Thomas Lemberger LMU Munich, Germany 11 / 23

TBF Architecture

InputProgram

Preprocessor PreparedProgram

Test-CaseGenerator

TestCases

Test-VectorExtractor

TestVectors

HarnessGenerator Harness

TestExecutor

Verdict

int x = __VERIFIER_nondet_int();

Preprocessor

int x; klee_make_symbolic(&x, sizeof(x), "x");

Thomas Lemberger LMU Munich, Germany 11 / 23

TBF Architecture

InputProgram

Preprocessor PreparedProgram

Test-CaseGenerator

TestCases

Test-VectorExtractor

TestVectors

HarnessGenerator Harness

TestExecutor

Verdict

int x; klee_make_symbolic(&x, sizeof(x), "x");

Test-CaseGenerator

KTESTsimple.bc_sym___VERIFIER_nondet_int????...

Thomas Lemberger LMU Munich, Germany 11 / 23

TBF Architecture

InputProgram

Preprocessor PreparedProgram

Test-CaseGenerator

TestCases

Test-VectorExtractor

TestVectors

HarnessGenerator Harness

TestExecutor

Verdict

KTESTsimple.bc_sym___VERIFIER_nondet_int????...

Test-VectorExtractor

< 0, 3, 5 >

Thomas Lemberger LMU Munich, Germany 11 / 23

TBF Architecture

InputProgram

Preprocessor PreparedProgram

Test-CaseGenerator

TestCases

Test-VectorExtractor

TestVectors

HarnessGenerator Harness

TestExecutor

Verdict

...int x = __VERIFIER_nondet_int();...

HarnessGenerator

...int __VERIFIER_nondet_int() {

return ( int ) parse(input ());}void __VERIFIER_error() {

fprintf ( stderr , "Err\n");exit (1);

}

Thomas Lemberger LMU Munich, Germany 11 / 23

TBF Architecture

InputProgram

Preprocessor PreparedProgram

Test-CaseGenerator

TestCases

Test-VectorExtractor

TestVectors

HarnessGenerator Harness

TestExecutor

Verdict

for vec in test_vectors:stderr = run(prog, harness , vec)if "Err" in stderr :

return FALSEreturn UNKNOWN

Thomas Lemberger LMU Munich, Germany 11 / 23

Evaluation

Thomas Lemberger LMU Munich, Germany 12 / 23

Considered Tools

Tool Technique

AFL-fuzz Greybox fuzzingCrest-ppc Concolic execution, search-basedCPATiger Model checking-based testing, based on CPAcheckerFShell Model checking-based testing, based on CbmcKlee Symbolic execution, search-basedPRtest Random testing

Cbmc Bounded model checkingCPA-Seq Explicit-state, predicate abstraction, k-inductionEsbmc-incr Bounded model checking, incremental loop boundEsbmc-kInd Bounded model checking, k-induction

Thomas Lemberger LMU Munich, Germany 13 / 23

Experiment Setup

I Benchmark tool: BenchExecI Limits:

I 2 CPUsI 15 GB of memoryI 15 min CPU time

I Benchmark setI Openly available:

https://github.com/sosy-lab/sv-benchmarksI Largest available benchmark setI C programsI 1490 tasks with known bugI 4203 tasks without bug

Thomas Lemberger LMU Munich, Germany 14 / 23

Experiments

1. Bug-finding capabilities: Consider 1490 tasks with bug2. Precision: Consider 4203 tasks without bug3. Validity: Comparison with existing Klee-replay

Thomas Lemberger LMU Munich, Germany 15 / 23

1. Bug-Finding Capabilities I

No.

Prog

ram

s

AFL

-fuz

zt

CPA

Tig

ert

Cre

st-p

pct

FShe

llt

Kle

et

PRt

estt

CB

MC

m

CPA

-seq

m

ESB

MC

-incr

m

ESB

MC

-kIn

dm

Unio

nTe

ster

s

Unio

nM

C

Unio

nAl

l

Total Found 1490 605 57 376 236 826 292 830 889 949 844 887 1092 1176Compilable 1115 605 57 376 236 826 292 779 819 830 761 887 930 1014

Median CPU Time (s) 11 4.5 3.4 6.2 3.6 3.6 1.4 15 1.9 2.3

I Model checkers find more bugsI Model checkers don’t need stubsI Model checkers are comparable in speed

Thomas Lemberger LMU Munich, Germany 16 / 23

1. Bug-Finding Capabilities I

No.

Prog

ram

s

AFL

-fuz

zt

CPA

Tig

ert

Cre

st-p

pct

FShe

llt

Kle

et

PRt

estt

CB

MC

m

CPA

-seq

m

ESB

MC

-incr

m

ESB

MC

-kIn

dm

Unio

nTe

ster

s

Unio

nM

C

Unio

nAl

l

Total Found 1490 605 57 376 236 826 292 830 889 949 844 887 1092 1176Compilable 1115 605 57 376 236 826 292 779 819 830 761 887 930 1014

Median CPU Time (s) 11 4.5 3.4 6.2 3.6 3.6 1.4 15 1.9 2.3

I Model checkers find more bugs

I Model checkers don’t need stubsI Model checkers are comparable in speed

Thomas Lemberger LMU Munich, Germany 16 / 23

1. Bug-Finding Capabilities I

No.

Prog

ram

s

AFL

-fuz

zt

CPA

Tig

ert

Cre

st-p

pct

FShe

llt

Kle

et

PRt

estt

CB

MC

m

CPA

-seq

m

ESB

MC

-incr

m

ESB

MC

-kIn

dm

Unio

nTe

ster

s

Unio

nM

C

Unio

nAl

l

Total Found 1490 605 57 376 236 826 292 830 889 949 844 887 1092 1176Compilable 1115 605 57 376 236 826 292 779 819 830 761 887 930 1014

Median CPU Time (s) 11 4.5 3.4 6.2 3.6 3.6 1.4 15 1.9 2.3

I Model checkers find more bugsI Model checkers don’t need stubs

I Model checkers are comparable in speed

Thomas Lemberger LMU Munich, Germany 16 / 23

1. Bug-Finding Capabilities I

No.

Prog

ram

s

AFL

-fuz

zt

CPA

Tig

ert

Cre

st-p

pct

FShe

llt

Kle

et

PRt

estt

CB

MC

m

CPA

-seq

m

ESB

MC

-incr

m

ESB

MC

-kIn

dm

Unio

nTe

ster

s

Unio

nM

C

Unio

nAl

l

Total Found 1490 605 57 376 236 826 292 830 889 949 844 887 1092 1176Compilable 1115 605 57 376 236 826 292 779 819 830 761 887 930 1014

Median CPU Time (s) 11 4.5 3.4 6.2 3.6 3.6 1.4 15 1.9 2.3

I Model checkers find more bugsI Model checkers don’t need stubsI Model checkers are comparable in speed

Thomas Lemberger LMU Munich, Germany 16 / 23

1. Bug-Finding Capabilities II

0.1

1

10

100

1000

0 200 400 600 800 1000 1200 1400

CPU

time(s)

n-th fastest correct result

AFL-fuzzT

CPATigerT

CrestT

FShellT

KLEET

PRTestT

CBMCM

CPA-seqM

ESBMC-incrM

ESBMC-kIndM

Thomas Lemberger LMU Munich, Germany 17 / 23

Time Performance

I CPU time of Kleet/AFL-fuzzt vs. ESBMC-incrm

on solvable tasks

0.1

1

10

100

1000

0.1 1 10 100 1000

CPU

TimeforE

SBMC-incrM(s)

CPU Time for KLEET (s)

0.1

1

10

100

1000

0.1 1 10 100 1000

CPU

TimeforE

SBMC-incrM(s)

CPU Time for AFL-fuzzT (s)

∆ Time performance is task-specific

Thomas Lemberger LMU Munich, Germany 18 / 23

Time Performance

I CPU time of Kleet/AFL-fuzzt vs. ESBMC-incrm

on solvable tasks

0.1

1

10

100

1000

0.1 1 10 100 1000

CPU

TimeforE

SBMC-incrM(s)

CPU Time for KLEET (s)

0.1

1

10

100

1000

0.1 1 10 100 1000

CPU

TimeforE

SBMC-incrM(s)

CPU Time for AFL-fuzzT (s)

∆ Time performance is task-specific

Thomas Lemberger LMU Munich, Germany 18 / 23

2. Precision

I 4203 tasks without bugI Testers: No false alarmsI Model Checkers: Negligible

Worst: Esbmc-incr, 6 false alarms

Thomas Lemberger LMU Munich, Germany 19 / 23

3. Validity

Comparison of TBF with Klee-replay

I Specific to Kleetest case format

I Same concept asTBF

I Comparableperformance

0.1

1

10

100

1000

0.1 1 10 100 1000

CPU

TimeforK

LEE+KLEE

-replay

(s)

CPU Time for TBF with KLEE (s)

Thomas Lemberger LMU Munich, Germany 20 / 23

Conclusion I

I TBF:I makes 5 existing test-case generators comparable

I allows easy integration of new generatorsI automatically transforms generated test cases to

executable tests

Thomas Lemberger LMU Munich, Germany 21 / 23

Conclusion II

Can we confirm our null hypothesis?I Testing is better at finding bugs than model checking.

7

I Testing is faster than model checking.

7

I Testing is more precise than model checking.

3

I Testing is easier to use than model checking.

7

Thomas Lemberger LMU Munich, Germany 22 / 23

Conclusion II

Can we confirm our null hypothesis?I Testing is better at finding bugs than model checking. 7

I Testing is faster than model checking.

7

I Testing is more precise than model checking.

3

I Testing is easier to use than model checking.

7

Thomas Lemberger LMU Munich, Germany 22 / 23

Conclusion II

Can we confirm our null hypothesis?I Testing is better at finding bugs than model checking. 7

I Testing is faster than model checking. 7

I Testing is more precise than model checking.

3

I Testing is easier to use than model checking.

7

Thomas Lemberger LMU Munich, Germany 22 / 23

Conclusion II

Can we confirm our null hypothesis?I Testing is better at finding bugs than model checking. 7

I Testing is faster than model checking. 7

I Testing is more precise than model checking. 3

I Testing is easier to use than model checking.

7

Thomas Lemberger LMU Munich, Germany 22 / 23

Conclusion II

Can we confirm our null hypothesis?I Testing is better at finding bugs than model checking. 7

I Testing is faster than model checking. 7

I Testing is more precise than model checking. 3

I Testing is easier to use than model checking. 7

Thomas Lemberger LMU Munich, Germany 22 / 23

Conclusion III

New null hypothesis:

I Model CheckingI can find more bugsI in less time

I requires less adjustments to input program

Thomas Lemberger LMU Munich, Germany 23 / 23


Recommended