Predicting Software Metrics at Design Time

Predicting Software Metrics at Design Time

Rahul PremrajSaarland University

Thomas ZimmermannUniversity of Calgary

Wolfgang HolzSaarland University

Andreas ZellerSaarland University

Early Resource Estimation

2


• Essential for successful project management.

2



• Most commonly estimated resources: Time or Cost.

2




• Many estimation models use software metrics as inputs.

2




• Many estimation models use software metrics as inputs.

• Many of these metrics are not available at design time.

2

Research Question

Software’s domain

3

Research Question

Software’s domain

Software metrics

Anyrelationship?

3

4

Tom ZimmermannUniversity of Calgary

Adrian SchröterUniversity of Victoria

Predicting ComponentFailures at Design Time

International Symposium onEmpirical Software Engineering

Rio de Janiero, 2006

Andreas ZellerSarland University

4


Stephan NeuhausSaarland University

Predicting Vulnerable Software Components

Conference on Computer and Communication Security

Alexandria, 2007



Adrian SchröterUniversity of Victoria

Predicting ComponentFailures at Design Time

International Symposium onEmpirical Software Engineering



4

Tom Zimmermann

University of Calgary

Stephan Neuhaus

Saarland University

Predicting Vulnerable

Software ComponentsConference on Computer and

Communication Security

Alexandria, 2007

Andreas Zeller

Sarland University

Tom Zimmermann

University of Calgary

Adrian Schröter

University of Victoria

Predicting Component

Failures at Design TimeInternational Symposium on

Empirical Software Engineering


Andreas Zeller

Sarland University

Both papers used import statements as representatives of software domain.

5

Learner Predictor

produces

Existing code as

imports with software metrics

New design

as imports

Predicted metric

Fig. 1: Approach overview. By learning from the relationship between imports and metrics inexisting code, we can predict metrics based on imports alone.

3. Using the ECLIPSE code base, we show how to predict software complexity, asdefined by the widely used object-oriented ckjm software metrics [6].

We expect that advance reliable knowledge of such product-specific metrics canbe a boon to solving several management issues that constantly loom over all types ofdevelopment projects at an early stage.

This paper is organized as follows. In Section 2, we discuss features and shortcom-ings of contemporary cost estimation models. The data used for our experimentationis elaborated upon in Section 3. Thereafter, we present our experimental setup in Sec-tion 4, which is followed by results and discussions in Section 5. Threats to validity areaddressed in Section 6 and lastly, we conclude our work in Section 7.

2 Background

As discussed above, cost estimation is vital to a successful outcome of a softwareproject. But most contemporary estimation models depend upon characteristics of thesoftware that are typically unknown at start. For example, many models take into ac-count the relationship between software size and cost. Examples include algorithmicmodels such as COCOMO [7] and Putnam [8], regression models [9] and analogy-based models [10–13]. To use these models, first an estimate of the size of the project isneeded. Again, size is unknown at start of the project and can only be estimated basedon other characteristics of the software. Hence, basing cost estimates on an estimateof size adds to uncertainty of the estimates and fate of the project. This challenges thevalue of such models.

We propose a novel approach that, in contrast to others, focusses on estimating thesize of a component with as little knowledge as its design. This places managers at aunique position from where they can choose between several alternatives to optimisenot only size, but also other metrics of the software that serve as its quality indicators.We present these metrics in more detail in the following section.

Overview of Approach

6

Research Question

Software’s

domain

Software

metrics

Any

relationship?

To investigate, we used 89 core plugins from the Eclipse project.

7

Data Collection

Scan through a .java file and identify lines that take the form

import a.b.c;

8

9

import java.sql.*;

9

Connection conn = null;Statement stmt = null;

import java.sql.*;

9


EclipseAbstract Syntax Tree Parser

(ASTParser)

Java file AST

import java.sql.*;

9


import java.sql.*;

10


ASTView

import java.sql.*;

10


ASTView

java.sql.Connection

java.sql.Statement

import java.sql.*;

10

David A. Wheeler

Lines of Codehttp://www.dwheeler.com/sloccount/sloccount.html

A physical SLOC is a line ending in a newline or end-of-file marker, and which contains at least one non-whitespace non-comment character.

11

http://www.spinellis.gr/sw/ckjm/


CKJM MetricsDiomidis D. Spinellis

Abbreviation MetricCACBOCBOJDKDITNOCNPMLCOMRFCWMC

Afferent Couplings Coupling between Class Objects Java specific CBO Depth of Inheritance Tree Number of Children Number of Public Methods Lack of Cohesion in Methods Response for a Class Weighted Methods per Class


12



Data

13

Data

ClassName

13

Data

ClassName

1 1 0 0

Input Features(14,185 import statements)

13

Data

ClassName

1 1 0 0


SLOC WMC ... NPM

Output Features(SLOC and CKJM metrics)

13

Data

ClassName

1 1 0 0


SLOC WMC ... NPM


Training Data(two-thirds ofthe data set)

Testing Data(remaining one-third)

13

Reduce Sample Bias

14

Reduce Sample Bias

30x14

15

Import Statementsfrom Code

Code Metrics15


Learner

Code MetricsTr

ai

ni

ng

P

ha

se

15


Learner

Code MetricsTr

ai

ni

ng

P

ha

se

Predictor

produces

15


Learner

Code MetricsTr

ai

ni

ng

P

ha

se T

es

ti

ng

P

ha

se

Import Statementsfrom Planned Code

Predictor

produces

15


Learner

Code MetricsPredicted

Code MetricsTr

ai

ni

ng

P

ha

se T

es

ti

ng

P

ha

se

Import Statementsfrom Planned Code

Predictor

produces

15

Evaluation

16

EvaluationPred(x)

% of predictions that lie within x% of Actual

Value

16

LOC 3 -x% +x%

LOC 4-x% +x%

-x% +x%

LOC 1-x% +x%

LOC 2-x% +x%

LOC 5

EvaluationActual Value

Predicted Value

Pred(x)


Value

16

LOC 3 -x% +x%

LOC 4-x% +x%

-x% +x%

LOC 1-x% +x%

LOC 2-x% +x%

LOC 5

EvaluationActual Value

Predicted Value

Pred(x)


Value

Pred(x)

= .60

⇒ 60%

= 3/5

16

Results

17

CA

CBO

CBOJDK

DIT

LCOM

NOC

NPM

RFC

SLOC

WMC

0 20 40 60 80 100

! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!! !!!!! !!!!!!!!!!! !!!!!! !!!

!!!!!!!!!!!!!!!!!!! !!! !!!!!!!!

!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!!!!! !!!!!!! !! !! !!!!!!!!!!!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!

!!!!!!!!!! !!!!!!!!! !!!!! !!!!!!

!! !!!!!!!! !!!!!!!!!!!!!! !!!!!!

!

!

!!!!

!

!! !

!

!

!!

!!! !

!!

! ! !

!

! !!

!

!

!

!!

!

!!

!

!!

!

!

!! !

!

!!!

! !!

!!

!!!

!! !

!

!

!!

!!

!

!

!

!

!

!!

! !!

!!

!

!!! !!

!

!

!

!!

!

!

!

!

!!

!

!!!

!!

!!

!!

!!

! !!

!!

!

!!

! !

!!!

!!

!!

!

!

!!

!

!!

!

!!!

!

!!

!

!!

!

!!

!!!

!!!

!

!

!!

!!!!

!

!

!

!!

!!

!

!!

!!! !!

!!!

!

!!

!!

!

!!!!

!

!

!

!!

!

!

!!!

!

!

!! !!

!

!

!!!

!

!!

!!

!!

!

!!! !!

!

!

!

!

!!

!

!!!

!

!

!

!

!!!

!!

!!

!

!!!!!

!!

!!

!!!!!

!

!!

!

!!!

!!!

!!

!

!!

!

!!

!!

!

!

!!!

!

!

!!!!

!!

!!

!!

!

!

!!!

!

! !!

Legend

Pred50

! Pred25Metr

ic

PredX Value

Fig. 4: Prediction accuracy for output metrics

5 Results and Discussion

Figure 4 presents the results from our experiments. All metrics are presented in alpha-betical order on the y-axis, while the PredX values are plotted on the x-axis. For eachmetric, we have plotted both, Pred25 (as circles) and Pred50 values (as triangles) fromeach of the thirty experimental runs. The plots are jittered [24], i.e., a small randomvariation has been introduced to ease observation of overlapping values on the x-axis.

We observe from the figure that SLOC is predicted with reasonable accuracy. Pred25values hover around 42% while Pred50 values hover around 71%. Whereas, predictionresults for CBO and CBOJDK are outstandingly good. The Pred25 value for CBO hoveraround 72% and even higher for CBOJDK at 86%. Their Pred50 values hover around88% and 97% respectively. The model also predicts RFC and DIT values with reasonable

0 20 40 60 80 100

WMC

SLOC

RFC

NPM

NOC

LCOM

DIT

CBOJDK

CBO

CA

Legend

Pred 25

Pred 50M

etric

s

Pred (x)18

Threats to Validity

19

Threats to Validity

• Issues with generalisation.

19

Threats to Validity


• Import statements at design time may differ from those at release time.

19

Threats to Validity


• Import statements at design time may differ from those at release time.

• No filtering of outliers, since they make interesting cases to study.

19

Data Free to Download!

ClassName

1 1 0 0


SLOC WMC ... NPM


This data will be made available in the PROMISE repository

by end-July ’08.

20

Conclusions

21

Conclusions• Imported components can determine many

software metrics.

21

Conclusions

• Reliable estimation of resources at design time is possible.

• Imported components can determine many software metrics.

21

Conclusions


• Precision accuracy of up to 95% (Pred 25) for some metrics.


21

Conclusions


• Precision accuracy of up to 95% (Pred 25) for some metrics.

• Predictions can be further used as inputs for other resource estimation models.


21

Date post:	24-Jan-2015
Category:	Economy & Finance
Upload:	rahul-premraj
View:	1,585 times
Download:	0 times

Predicting Software Metrics at Design Time

Economy & Finance