Date post: | 15-Jan-2016 |
Category: |
Documents |
Upload: | libby-taylor |
View: | 215 times |
Download: | 0 times |
Effectively Prioritizing Tests in Effectively Prioritizing Tests in
Development EnvironmentDevelopment Environment
Amitabh SrivastavaAmitabh SrivastavaJay ThiagarajanJay Thiagarajan
PPRC, Microsoft Research
OutlineOutline
Context and MotivationContext and MotivationTest Prioritization SystemsTest Prioritization SystemsMeasurements and ResultsMeasurements and ResultsSummarySummary
Motivation: ScenariosMotivation: Scenarios
Pre-checkin testsPre-checkin tests Developers before check-inDevelopers before check-in
Build Verification tests (BVT)Build Verification tests (BVT) Build system after each buildBuild system after each build
Regression testsRegression tests Testing team after each buildTesting team after each build
Hot fixes for critical bugsHot fixes for critical bugs
Using program changesUsing program changes
Source code differencingSource code differencing S. Elbaum, A. Malishevsky & G. Rothermel S. Elbaum, A. Malishevsky & G. Rothermel
“Test case prioritization: A family of empirical “Test case prioritization: A family of empirical studies”, Feb. 2002studies”, Feb. 2002
S. Elbaum, A. Malishevsky & G. Rothermel S. Elbaum, A. Malishevsky & G. Rothermel “Prioritizing test cases for regression testing” “Prioritizing test cases for regression testing” Aug. 2000Aug. 2000
F. Vokolos & P. Frankl, “Pythia: a regression F. Vokolos & P. Frankl, “Pythia: a regression test selection tool based on text differencing”, test selection tool based on text differencing”, May 1997May 1997
Using program changesUsing program changes
Data and control flow analysisData and control flow analysis T. Ball, “On the limit of control flow analysis for T. Ball, “On the limit of control flow analysis for
regression test selection” Mar. 1998regression test selection” Mar. 1998 G. Rothermel and M.J. Harrold, “A Safe, G. Rothermel and M.J. Harrold, “A Safe,
Efficient Regression Test Selection Technique” Efficient Regression Test Selection Technique” Apr. 1997Apr. 1997
Code entitiesCode entities Y. F. Chen, D.S. Rosenblum and K.P. Vo Y. F. Chen, D.S. Rosenblum and K.P. Vo
“TestTube: A System for Selective Regression “TestTube: A System for Selective Regression Testing” May 1994Testing” May 1994
Analysis of various techniquesAnalysis of various techniques
Source code differencingSource code differencing Simple and fastSimple and fast Can be built using commonly available tools like “diff”Can be built using commonly available tools like “diff” Simple renaming of variable will trip offSimple renaming of variable will trip off Will fail when macro definition changesWill fail when macro definition changes To avoid these pitfalls, static analysis is neededTo avoid these pitfalls, static analysis is needed
Data and control flow analysis Data and control flow analysis Flow analysis is difficult in languages like C/C++ with Flow analysis is difficult in languages like C/C++ with
pointers, casts and aliasingpointers, casts and aliasing Interprocedural data flow techniques are extremely Interprocedural data flow techniques are extremely
expensive and difficult to implement in complex expensive and difficult to implement in complex environmentenvironment
Our SolutionOur Solution
Focus on change from previous Focus on change from previous versionversion
Determine change at very fine granularity – Determine change at very fine granularity – basic block/instructionbasic block/instruction
Operates on binary code Operates on binary code Easier to integrate in production environmentEasier to integrate in production environment Scales well to compute results in minutesScales well to compute results in minutes
Simple heuristic algorithm to predict Simple heuristic algorithm to predict which part of code is impacted by which part of code is impacted by the changethe change
Test Effectiveness InfrastructureTest Effectiveness Infrastructure
…
Coverage Impact Analysis
TEST
Old Build New Build
Binary Diff
RepositoryCoverage
Magellan
Test Prioritization
ECHELON
BMAT/VULCAN
Coverage Tools
Echelon : Test PrioritizationEchelon : Test Prioritization
Leverage what has already been tested Prioritized list of test cases
Test Prioritization
Coverage for new build
Coverage Impact Analysis
New Build
Block Change Analysis
Old Build
Binary Differences
MagellanRepository
(*link with symbol server for symbols)
Block Change Analysis: Binary Block Change Analysis: Binary MatchingMatching
Old Build New Build
New Blocks
Old Blocks(not changed)
Old Blocks(changed)
BMAT – Binary Matching [Wang, Pierce and McFarling JILP 2000]
Coverage Impact AnalysisCoverage Impact Analysis
TerminologyTerminology TraceTrace: collection of one or more test cases: collection of one or more test cases Impacted BlocksImpacted Blocks: old modified and new blocks: old modified and new blocks
Compute the coverage of traces for the Compute the coverage of traces for the new buildnew build
Coverage for old (unchanged and modified) Coverage for old (unchanged and modified) blocks are same as the coverage for the old buildblocks are same as the coverage for the old build
Coverage for new nodes requires more analysisCoverage for new nodes requires more analysis
Prioritized list of test cases
Test Prioritization
Coverage for new build
Coverage Impact Analysis
New Build
Change Analysis
Old Build
Binary Differences
Coverage Impact AnalysisCoverage Impact AnalysisPredecessor Blocks (P)
Successor Blocks (S)
New Block (N)
• A Trace may cover a new block N if it covers at least onePredecessor block and at leastone Successor Block
• If P or S is a new block, thenits Predecessors or successors are used (iterative process)
Interprocedural
edge
Coverage Impact Analysis Coverage Impact Analysis
Limitations - New node may not be Limitations - New node may not be executed executed
If there is a path from successor to If there is a path from successor to predecessorpredecessor
If there are changes in control path due to If there are changes in control path due to data changesdata changes
Echelon : Test Case Echelon : Test Case PrioritizationPrioritization Detects minimal sets of test cases that are likely Detects minimal sets of test cases that are likely
to cover the to cover the impacted blocksimpacted blocks (old changed and (old changed and new blocks)new blocks)
Input is traces (test cases) and a set of impacted Input is traces (test cases) and a set of impacted blocksblocks
Uses a greedy iterative algorithm for test selectionUses a greedy iterative algorithm for test selection
Prioritized list of test cases
Test Prioritization
Coverage for new build
Coverage Impact Analysis
New Build
Change Analysis
Old Build
Binary Differences
5
2
4
1
3
T1
T2
T3
T4
T5
Set 1T1
T2
Set 2T3
T5Set 3
T4
4
1
3
0
1
1
Echelon: Test SelectionEchelon: Test Selection
Impacted Block Map
Denotes that a trace T covers the impacted block
Weights
2
0
0
0
Echelon: Test Selection OutputEchelon: Test Selection OutputOrdered List of Traces
Trace T1
Trace T2
Trace T3
Trace T4
Trace T5
Trace T7
Trace T8
SET1
SET2
SET3
Trace Tm SETn
Each set contains testcases that will givemaximum coverage ofImpacted nodes
Gracefully handles the
“main” modification case
If all the test can be run,
tests should be run in this
order to maximize the
chances of detecting
failures early
.
.
.
.
.
.
Analysis of resultsAnalysis of results
Three measurements of interestThree measurements of interest How many sequences of tests were How many sequences of tests were
formed ?formed ? How effective is the algorithm in How effective is the algorithm in
practice ?practice ? How accurate is the algorithm in How accurate is the algorithm in
practice ?practice ?
Details of BinaryEDetails of BinaryEVersion 1Version 1 Version 2Version 2
DateDate 12/11/200012/11/2000 01/29/200101/29/2001
FunctionsFunctions 31,02031,020 31,02631,026
BlocksBlocks 668,068668,068 668,274668,274
ArcsArcs 1,097,2941,097,294 1,097,6501,097,650
FileFile sizesize 8,880,1288,880,128 8,880,1288,880,128
PDBPDB sizesize 22,602,75222,602,752 22,651,90422,651,904
Impacted BlocksImpacted Blocks 00 378 (220 N, 158 OC)378 (220 N, 158 OC)
NumberNumber ofof TracesTraces
31283128 31283128
# Source Lines# Source Lines ~1.8 ~1.8 MillionMillion
~1.8 Million~1.8 Million
Echelon takes ~210 seconds for this 8MB binary
Effectiveness of EchelonEffectiveness of Echelon
Important Measure of effectiveness is Important Measure of effectiveness is early defect detectionearly defect detection
Measured % of defects vs. % of Measured % of defects vs. % of unique defects in each sequenceunique defects in each sequence
Unique defects are defects not Unique defects are defects not detected by the previous sequencedetected by the previous sequence
Effectiveness of EchelonEffectiveness of Echelon
0
20
40
60
80
100
% D
efec
ts d
etec
ted
1 2 3 4Sequence
Defects detected in each sequence
% Defects detected % Unique Defects
Effectiveness of EchelonEffectiveness of Echelon
0
20
40
60
80
100
%D
efec
ts d
etec
ted
1 2 3 4
Sequence
Defects detected in each sequence
% Defects % Unique defects
Blocks predicted hit that were not hit
Blocks predicted not hit that were actually hit(Blocks were target of indirect calls are being predicted as not hit)
Echelon Results: BinaryKEchelon Results: BinaryK
Number of Test Cases per Set
0
1
2
3
4
5
6
7
8
1 3 5 7 9 11 13 15 17 19 21 23 25 27Sets
Tes
t C
ases
Number of Impacted Blocks in Each Set
0
50
100
150
200
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Sets
#Im
pac
ted
Blo
cks
Impacted Block coverage and Cumulative Coverage wrt Sets
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
1 3 5 7 9 11 13 15 17 19 21 23 25 27Set
Co
ve
rag
e
Impacted
Cumulative
BuildBuild 24702470 BuildBuild 24802480
DateDate 05/01/200105/01/2001 05/23/200105/23/2001
FunctionsFunctions 1,7611,761 1,7741,774
BlocksBlocks 32,01232,012 32,13532,135
ArcsArcs 47,13147,131 47,32347,323
File sizeFile size 882,688882,688 894,464894,464
ImpactedImpacted BlocksBlocks
00 589 (350 N, 589 (350 N, 239 OC)239 OC)
TracesTraces 5656 5656
Echelon Results: BinaryUEchelon Results: BinaryUNumber of Test Cases per Set
0
0.5
1
1.5
2
2.5
3
3.5
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33Sets
Test
Cases
Number of Impacted Blocks per Set
0
10
20
30
40
50
60
70
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
Sets
Se
ts
Impacted Block Coverage and Cumulative Total Coverage wrt Sets
0%
10%
20%
30%
40%
50%
60%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
Sets
Co
vera
ge
Impacted
Cumulative
Build 2470Build 2470 BuildBuild 24802480
DateDate 05/01/200105/01/2001 05/23/200105/23/2001
FunctionsFunctions 1,9671,967 1,9701,970
BlocksBlocks 30,91630,916 31,00331,003
ArcsArcs 46,63846,638 46,77546,775
File sizeFile size 528,384528,384 528,896528,896
ImpactedImpacted BlocksBlocks
00 270 (190 N, 80 270 (190 N, 80 OC)OC)
TracesTraces 5656 5656
SummarySummary
Binary based test prioritization Binary based test prioritization approach can effectively prioritize tests approach can effectively prioritize tests in large scale development environmentin large scale development environment
Simple heuristic with program change Simple heuristic with program change in fine granularity works well in practicein fine granularity works well in practice
Currently integrated into Microsoft Currently integrated into Microsoft Development processDevelopment process
Coverage Impact Analysis Coverage Impact Analysis
Echelon provides a number of options Echelon provides a number of options
Control branch predictionControl branch prediction Indirect calls : if N is target of an indirect call Indirect calls : if N is target of an indirect call
a trace needs to cover at least one of its a trace needs to cover at least one of its successor block successor block
Future improvements include Future improvements include heuristic branch prediction heuristic branch prediction Branch Prediction for Free [Ball, Larus]Branch Prediction for Free [Ball, Larus]
Additional MotivationAdditional Motivation
And of course to attend ISSTA and And of course to attend ISSTA and meet some great peoplemeet some great people
Echelon: Test SelectionEchelon: Test Selection
OptionsOptions Calculations of weights can be extended, e.g. Calculations of weights can be extended, e.g.
traces with great historical fault detection can traces with great historical fault detection can be given additional weightsbe given additional weights
Include time each test takes into calculation Include time each test takes into calculation Print changed (modified or new) source code Print changed (modified or new) source code
that may not be covered by any tracethat may not be covered by any trace Print all source code lines that may not be Print all source code lines that may not be
covered by any tracecovered by any trace