Download - State coverage: an empirical analysis based on a user study Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens.

State coverage: an empirical analysis based on a user study

Dries Vanoverberghe, Emma Eyckmans, and Frank Piessens

Software Validation Metrics

• Software defects after product release are expensive– NIST2002: $60 billion annually– MS Security bulletins: around 40/year at 100k to 1M $ each

• Validating software (Testing)– Reduce # defects before release– But not without a cost

• Make tradeoff:– Estimate remaining # defects

=> Software validation metrics

Example: Code coverage

• Fraction of statements/basic blocks that are executed by the test suite

• Principle: – not executed => no defects discovered

• Hypothesis:– not executed => more likely contains defect

Example: Code coverage

• High statement coverage– No defects?– Different paths

• Structural coverage metrics:– e.g. Path coverage, data flow coverage, …– Measure degree of exploration

• Automatic tool assistance – Metrics evaluate tools rather than human effort

Problem statement

• Exploration is not sufficient– Tests need to check requirements– Evaluate completeness of test oracle

• Impossible to automate:– Guess requirements– Evaluation is critical!

• No good metrics available

State coverage

• Evaluate strength of assertions• Idea:

– State updates must be checked by assertions• Hypothesis:

– Unchecked state update => more likely defect

State coverage

• Complements code coverage– No replacement

• Metrics also assist developers– Code coverage => reachability of statements?– State coverage => invariant established by

reachable statements?

8

State coverage

• Metric:

• State update– Assignment to fields of objects– Return values, local variables, … also possible

• Computation:– Runtime monitor

number of state updates read in assertionstotal number of state updates

Design of experiment

• Existing evaluation:– Correlation with mutation adequacy (Koster et al.)– Case study by expert user

• Goal:– Directly analyze correlation with ‘real’ defects– Average users

Hypotheses

• Hypothesis 1:– When increasing state coverage (without

increasing exploration), the number of discovered defects increases

– Similar to existing case study• Hypothesis 2:

– State coverage and the number of discovered defects are correlated

– Much stronger

Structure of experiment

• Base program:– Small calendar management system– Result of software design course– Existing test suite– Presence of software defects unknown


• Phase 1: case study– Extend test suite to find defects

• First increase code coverage• Then increase state coverage

– Dry run of experiment• Simplified application• Injected additional defects


• Phase 2: Controlled user study– Create new test suite

• First increase code coverage• Then increase state coverage

– Commit after each detected defect

Threats to validity

• Internal validity– Two sessions: no differences observed– Learning effect: subjects were familiar with

environment before experiment• External validity

– Choice of application– Choice of faults– Subjects are students

Results

• Phase 1: case study– No additional defects discovered– No confirmation for hypothesis 1– Potential reasons

• Mostly structural faults• Non-structural faults were obvious

• Phase 2: Controlled user study– No confirmation for hypothesis 1

0 1 2 3 4 5 6 70

10

20

30

40

50

60

Code Coverage

user 1

user 2

user 3

user 4

user 5

user 6

user 7

user 8

user 9

user 10

user 11

user 12

user 13

# detected faults

Co

de

cove

rag

e %

0 1 2 3 4 5 6 70.00

0.20

0.40

0.60

0.80

1.00

1.20

State coverage

user 1

user 2

user 3

user 4

user 5

user 6

user 7

user 8

user 9

user 10

user 11

user 12

user 13

# detected faults

Sta

te c

ove

rag

e %

Potential causes

• Frequency of logical faults– 3/20 incorrect state updates – only 1/14 discovered!– 5/14 are detected by assertions– Focusing on these 5 faults

• Higher state coverage (42% wrt 34%) for classes that detect at least one of these 5

– How common are logical faults?

Potential causes

• Logical faults too obvious– Subjects discovered them with code coverage

• State coverage is not monotonic– Adding new tests may decrease state coverage– Always relative to exploration

0 1 2 3 4 5 6 70.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

160.00

State coverage - absolute

user 1

user 2

user 3

user 4

user 5

user 6

user 7

user 8

user 9

user 10

user 11

user 12

user 13

# detected faults

# co

vere

d s

tate

up

dat

es

Conclusions

• Experiment fails to confirm hypothesis– How frequent are logical faults?– Combine state coverage with code coverage?

• Or compare test suites with similar code coverage

• But also:– Simple – Efficient

Questions?