Date post: | 24-Jan-2015 |
Category: |
Economy & Finance |
Upload: | rahul-premraj |
View: | 1,585 times |
Download: | 0 times |
Predicting Software Metrics at Design Time
Rahul PremrajSaarland University
Thomas ZimmermannUniversity of Calgary
Wolfgang HolzSaarland University
Andreas ZellerSaarland University
Early Resource Estimation
2
Early Resource Estimation
• Essential for successful project management.
2
Early Resource Estimation
• Essential for successful project management.
• Most commonly estimated resources: Time or Cost.
2
Early Resource Estimation
• Essential for successful project management.
• Most commonly estimated resources: Time or Cost.
• Many estimation models use software metrics as inputs.
2
Early Resource Estimation
• Essential for successful project management.
• Most commonly estimated resources: Time or Cost.
• Many estimation models use software metrics as inputs.
• Many of these metrics are not available at design time.
2
Research Question
Software’s domain
3
Research Question
Software’s domain
Software metrics
Anyrelationship?
3
4
Tom ZimmermannUniversity of Calgary
Adrian SchröterUniversity of Victoria
Predicting ComponentFailures at Design Time
International Symposium onEmpirical Software Engineering
Rio de Janiero, 2006
Andreas ZellerSarland University
4
Tom ZimmermannUniversity of Calgary
Stephan NeuhausSaarland University
Predicting Vulnerable Software Components
Conference on Computer and Communication Security
Alexandria, 2007
Andreas ZellerSarland University
Tom ZimmermannUniversity of Calgary
Adrian SchröterUniversity of Victoria
Predicting ComponentFailures at Design Time
International Symposium onEmpirical Software Engineering
Rio de Janiero, 2006
Andreas ZellerSarland University
4
Tom Zimmermann
University of Calgary
Stephan Neuhaus
Saarland University
Predicting Vulnerable
Software ComponentsConference on Computer and
Communication Security
Alexandria, 2007
Andreas Zeller
Sarland University
Tom Zimmermann
University of Calgary
Adrian Schröter
University of Victoria
Predicting Component
Failures at Design TimeInternational Symposium on
Empirical Software Engineering
Rio de Janiero, 2006
Andreas Zeller
Sarland University
Both papers used import statements as representatives of software domain.
5
Learner Predictor
produces
Existing code as
imports with software metrics
New design
as imports
Predicted metric
Fig. 1: Approach overview. By learning from the relationship between imports and metrics inexisting code, we can predict metrics based on imports alone.
3. Using the ECLIPSE code base, we show how to predict software complexity, asdefined by the widely used object-oriented ckjm software metrics [6].
We expect that advance reliable knowledge of such product-specific metrics canbe a boon to solving several management issues that constantly loom over all types ofdevelopment projects at an early stage.
This paper is organized as follows. In Section 2, we discuss features and shortcom-ings of contemporary cost estimation models. The data used for our experimentationis elaborated upon in Section 3. Thereafter, we present our experimental setup in Sec-tion 4, which is followed by results and discussions in Section 5. Threats to validity areaddressed in Section 6 and lastly, we conclude our work in Section 7.
2 Background
As discussed above, cost estimation is vital to a successful outcome of a softwareproject. But most contemporary estimation models depend upon characteristics of thesoftware that are typically unknown at start. For example, many models take into ac-count the relationship between software size and cost. Examples include algorithmicmodels such as COCOMO [7] and Putnam [8], regression models [9] and analogy-based models [10–13]. To use these models, first an estimate of the size of the project isneeded. Again, size is unknown at start of the project and can only be estimated basedon other characteristics of the software. Hence, basing cost estimates on an estimateof size adds to uncertainty of the estimates and fate of the project. This challenges thevalue of such models.
We propose a novel approach that, in contrast to others, focusses on estimating thesize of a component with as little knowledge as its design. This places managers at aunique position from where they can choose between several alternatives to optimisenot only size, but also other metrics of the software that serve as its quality indicators.We present these metrics in more detail in the following section.
Overview of Approach
6
Research Question
Software’s
domain
Software
metrics
Any
relationship?
To investigate, we used 89 core plugins from the Eclipse project.
7
Data Collection
Scan through a .java file and identify lines that take the form
import a.b.c;
8
9
import java.sql.*;
9
Connection conn = null;Statement stmt = null;
import java.sql.*;
9
Connection conn = null;Statement stmt = null;
EclipseAbstract Syntax Tree Parser
(ASTParser)
Java file AST
import java.sql.*;
9
Connection conn = null;Statement stmt = null;
import java.sql.*;
10
Connection conn = null;Statement stmt = null;
ASTView
import java.sql.*;
10
Connection conn = null;Statement stmt = null;
ASTView
java.sql.Connection
java.sql.Statement
import java.sql.*;
10
David A. Wheeler
Lines of Codehttp://www.dwheeler.com/sloccount/sloccount.html
A physical SLOC is a line ending in a newline or end-of-file marker, and which contains at least one non-whitespace non-comment character.
11
CKJM MetricsDiomidis D. Spinellis
Abbreviation MetricCACBOCBOJDKDITNOCNPMLCOMRFCWMC
Afferent Couplings Coupling between Class Objects Java specific CBO Depth of Inheritance Tree Number of Children Number of Public Methods Lack of Cohesion in Methods Response for a Class Weighted Methods per Class
http://www.spinellis.gr/sw/ckjm/
12
Data
13
Data
ClassName
13
Data
ClassName
1 1 0 0
Input Features(14,185 import statements)
13
Data
ClassName
1 1 0 0
Input Features(14,185 import statements)
SLOC WMC ... NPM
Output Features(SLOC and CKJM metrics)
13
Data
ClassName
1 1 0 0
Input Features(14,185 import statements)
SLOC WMC ... NPM
Output Features(SLOC and CKJM metrics)
Training Data(two-thirds ofthe data set)
Testing Data(remaining one-third)
13
Reduce Sample Bias
14
Reduce Sample Bias
30x14
15
Import Statementsfrom Code
Code Metrics15
Import Statementsfrom Code
Learner
Code MetricsTr
ai
ni
ng
P
ha
se
15
Import Statementsfrom Code
Learner
Code MetricsTr
ai
ni
ng
P
ha
se
Predictor
produces
15
Import Statementsfrom Code
Learner
Code MetricsTr
ai
ni
ng
P
ha
se T
es
ti
ng
P
ha
se
Import Statementsfrom Planned Code
Predictor
produces
15
Import Statementsfrom Code
Learner
Code MetricsPredicted
Code MetricsTr
ai
ni
ng
P
ha
se T
es
ti
ng
P
ha
se
Import Statementsfrom Planned Code
Predictor
produces
15
Evaluation
16
EvaluationPred(x)
% of predictions that lie within x% of Actual
Value
16
LOC 3 -x% +x%
LOC 4-x% +x%
-x% +x%
LOC 1-x% +x%
LOC 2-x% +x%
LOC 5
EvaluationActual Value
Predicted Value
Pred(x)
% of predictions that lie within x% of Actual
Value
16
LOC 3 -x% +x%
LOC 4-x% +x%
-x% +x%
LOC 1-x% +x%
LOC 2-x% +x%
LOC 5
EvaluationActual Value
Predicted Value
Pred(x)
% of predictions that lie within x% of Actual
Value
Pred(x)
= .60
⇒ 60%
= 3/5
16
Results
17
CA
CBO
CBOJDK
DIT
LCOM
NOC
NPM
RFC
SLOC
WMC
0 20 40 60 80 100
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!! !!!!! !!!!!!!!!!! !!!!!! !!!
!!!!!!!!!!!!!!!!!!! !!! !!!!!!!!
!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!! !!!!!!! !! !! !!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!
!!!!!!!!!! !!!!!!!!! !!!!! !!!!!!
!! !!!!!!!! !!!!!!!!!!!!!! !!!!!!
!
!
!!!!
!
!! !
!
!
!!
!!! !
!!
! ! !
!
! !!
!
!
!
!!
!
!!
!
!!
!
!
!! !
!
!!!
! !!
!!
!!!
!! !
!
!
!!
!!
!
!
!
!
!
!!
! !!
!!
!
!!! !!
!
!
!
!!
!
!
!
!
!!
!
!!!
!!
!!
!!
!!
! !!
!!
!
!!
! !
!!!
!!
!!
!
!
!!
!
!!
!
!!!
!
!!
!
!!
!
!!
!!!
!!!
!
!
!!
!!!!
!
!
!
!!
!!
!
!!
!!! !!
!!!
!
!!
!!
!
!!!!
!
!
!
!!
!
!
!!!
!
!
!! !!
!
!
!!!
!
!!
!!
!!
!
!!! !!
!
!
!
!
!!
!
!!!
!
!
!
!
!!!
!!
!!
!
!!!!!
!!
!!
!!!!!
!
!!
!
!!!
!!!
!!
!
!!
!
!!
!!
!
!
!!!
!
!
!!!!
!!
!!
!!
!
!
!!!
!
! !!
Legend
Pred50
! Pred25Metr
ic
PredX Value
Fig. 4: Prediction accuracy for output metrics
5 Results and Discussion
Figure 4 presents the results from our experiments. All metrics are presented in alpha-betical order on the y-axis, while the PredX values are plotted on the x-axis. For eachmetric, we have plotted both, Pred25 (as circles) and Pred50 values (as triangles) fromeach of the thirty experimental runs. The plots are jittered [24], i.e., a small randomvariation has been introduced to ease observation of overlapping values on the x-axis.
We observe from the figure that SLOC is predicted with reasonable accuracy. Pred25values hover around 42% while Pred50 values hover around 71%. Whereas, predictionresults for CBO and CBOJDK are outstandingly good. The Pred25 value for CBO hoveraround 72% and even higher for CBOJDK at 86%. Their Pred50 values hover around88% and 97% respectively. The model also predicts RFC and DIT values with reasonable
0 20 40 60 80 100
WMC
SLOC
RFC
NPM
NOC
LCOM
DIT
CBOJDK
CBO
CA
Legend
Pred 25
Pred 50M
etric
s
Pred (x)18
Threats to Validity
19
Threats to Validity
• Issues with generalisation.
19
Threats to Validity
• Issues with generalisation.
• Import statements at design time may differ from those at release time.
19
Threats to Validity
• Issues with generalisation.
• Import statements at design time may differ from those at release time.
• No filtering of outliers, since they make interesting cases to study.
19
Data Free to Download!
ClassName
1 1 0 0
Input Features(14,185 import statements)
SLOC WMC ... NPM
Output Features(SLOC and CKJM metrics)
This data will be made available in the PROMISE repository
by end-July ’08.
20
Conclusions
21
Conclusions• Imported components can determine many
software metrics.
21
Conclusions
• Reliable estimation of resources at design time is possible.
• Imported components can determine many software metrics.
21
Conclusions
• Reliable estimation of resources at design time is possible.
• Precision accuracy of up to 95% (Pred 25) for some metrics.
• Imported components can determine many software metrics.
21
Conclusions
• Reliable estimation of resources at design time is possible.
• Precision accuracy of up to 95% (Pred 25) for some metrics.
• Predictions can be further used as inputs for other resource estimation models.
• Imported components can determine many software metrics.
21