A SOFTWARE IMPLEMENTATION PROGRESS MODEL by DWAYNE …

transcript

A SOFTWARE IMPLEMENTATION PROGRESS MODEL

DWAYNE TOWELL

A THESIS

SOFTWARE ENGINEERING

Submitted to the Graduate Faculty of Texas Tech University in

Partial Fulfillment of the Requirements for

the Degree of

MASTER OF SCIENCE

SOFTWARE ENGINEERING

Approved

Chairperson of the Committee

Accepted

Dean of the Graduate School

August, 2004

ACKNOWLEDGEMENTS

I would like to thank my wife, Lydia, for her support and understanding. I

could not have done this work without her support.

I would also like to thank my advisor, Dr. Jason Denton, for his guidance and

critical review. His understanding of and advice about the academic world was an

invaluable aid during my transition and this work.

Many thanks go Terry Hamm and Roger Arce for access to project archives

that made this study possible. Without their support this would still only be an idea.

Additional thanks go to friends and family who supported this work by many

different contributions. They include Roger Bonzer, Dr. Mike Frazier, Dr. Rusty

Towell, Marta E. Calderon-Campos, and countless others. Thank you all.

This work is dedicated to Ayrea and Robert, my children and the world's

greatest kids, hands down.

TABLE OF CONTENTS

ACKNOWLEDGMENTS ii

LIST OF TABLES vii

LIST OF FIGURES viii

I. INTRODUCTION 1

1.1 Need for Models 3

1.2 Model Requirements 4

1.3 Implementation Progress Model Requirements 6

II. RELATED WORK 9

2.1 Implementation Evaluation and Control 10

2.2 Time-Series Shape Metrics 13

2.3 Process Models 15

m . RESEARCH DESIGN 18

3.1 Existing Progress Models 18

3.2 Implementation Progress Model 21

3.3 Model Interpretive Value 25

3.4 Data Requirements 28

3.5 Data Collection 31

3.6 Data Source 32

IV. EFFORT METRICS 34

4.1 Source Lines of Code 34

4.2 McCabe Complexity 36

4.3 Halstead Volume 38

4.4 Other Metrics 40

V RESULTS 42

5.1 Alternative Models 42

5.2 Model Fitting Results 44

5.3 Anomalous Data Sets 49

5.4 Model Applicability 51

5.5 Conforming Cases 61

M. CONCLUSIONS 67

6.1 Limitations 67

6.2 Interpretation 68

6.3 Future Work 69

REFERENCES 71

ABSTRACT

Current software project management techniques rely on collecting metrics

to provide the progress feedback necessary to allow control of the project; however,

interpretation of this data is diflScult. A software implementation progress model is

needed to help interpret the collected data. Criteria for an implementation progress

model are developed and an implementation progress model is proposed. Findings

from the studied projects suggest the model is consistent with the observed behav

ior. In addition to quantitative validity, the model is shown to provide meaningful

interpretation of collected metric data.

LIST OF TABLES

5.1 Average squared residual error (/?^) for implementation progress models measuring source lines of code changed (SLOCC) by project. 53

5.2 Average squared residual error (R^) for implementation progress models measuring code churn (CHURN) by project. 54

5.3 Average squared residual error (7? ) for implementation progress models measuring cyclomatic complexity change (MCC) by project. 55

5.4 Average squared residual error (W^) for implementation progress models measuring Halstead length change (HLC) by project. 56

5.5 Source lines of code change (SLOCC) progress model parameters and R^ relative to linear R^. 62

5.6 Code churn (CHURN) progress model parameters and R^ relative to linear R^. 63

5.7 Cyclomatic complexity change (MCC) progress model parameters and i?2 relative to linear R^ 64

5.8 Halstead length change (HLC) progress model parameters and R^ relative to linear R^. 65

LIST OF FIGURES

3.1 Accumulated change metrics for a typical project by day. 19

3.2 Idealized implementation velocity as a function of time. 23

3.3 Idealized implementation progress as a function of time. 23

5.1 Progress measured via accumulated Halstead length change (HLC) for project nine and progress model curves (with R^). 45

5.2 Progress measured via accumulated cyclomatic complexity change (MCC) for project ten and progress model curves (with R^). 47

5.3 \\'eekly progress measured via accumulated cyclomatic complexity change (MCC) for project ten and velocity model curve. 48

5.4 Progress measured via accumulated code churn (CHURN) for project fifteen and progress model curves (with R^). 49

5.5 Progress measured via accumulated cyclomatic complexity change (MCC) for project four and progress model curves (with R^). 50

5.6 Source lines of code change (SLOCC) average squared residual error (R^) relative to linear R^. 57

5.7 Code churn (CHURN) average squared residual error (R^) relative to linear W. 58

5.8 Cyclomatic complexity change (MCC) average squared residual error (i?2) relative to hnear R'^. 59

5.9 Halstead length change (HLC) average squared residual error (i?^) relative to linear i?^. 60

CHAPTER I

INTRODUCTION

Modern software development practices rely on periodically collected software

metric data to provide management with feedback about the project and the process

used to de\elop it. Metric data is most commonly used in the area of quality as

sessment. Well-defined metrics exist to report on quality attributes such as expected

number of remaining faults. However, other areas, such as implementation progress,

make much less use of metric data for feedback. Metrics, such as total source lines

of code, could be used to report on implementation progress but measures such as

these have not been leveraged as strongly as quality assessment has used its mea

sures. Metric data is widely regarded as a valuable management feedback tool, yet it

is generally not used to monitor implementation progress.

An implementation progress model is presented and shown to identify project

phase boundaries, express the rate of implementation during each phase, and allow

objective comparisons between projects. This study provides a framework to help

interpret periodically collected implementation data. This work develops a model

for interpreting implementation progress. The proposed progress model uses existing

implementation artifact metrics, matches our intuitive understanding of the imple

mentation progress, and allows project estimation based on parameter estimates.

Well-defined and proven metrics exist for many areas of software development

including especially quality assessment. Implementation progress has no such estab-

lished metrics. Several existing metrics measure size-related attributes. While these

size-related metrics may have been originally developed to support quality assess

ment, they can be used to monitor progress in terms of size. Project size is important

because it is invariabh' used to estimate the resources needed and assess the project's

status with regard to the schedule (DeMarco 1982; Albrecht and John E. Gaffney

1983; Lind and \'airavan 1989; Jorgensen 1995). This compelling management need

for feedback about implementation progress should demand its use. Its absence may

be because metrics have not been established to support implementation progress

feedback.

The lack of proven implementation progress metrics has been a barrier to any

attempt to monitor implementation progress. However, the lack of a proven metric

is not insurmountable. Size metrics are abundant and deriving a progress metric

from a size metric can be accomplished by taking the difference in consecutive size

measurements. A far larger barrier than the lack of a metric is the lack of a proven

implementation progress model. Periodically collected data is rich in detailed infor

mation but is not inherently meaningful. A model provides a specific interpretation

of the data and allows meaning to be extracted. An implementation progress model

will allow periodically collected implementation data to be interpreted.

1.1 Need for Models

Models bridge the gap between concrete sampled data and expectations. On

a small scale, models act as predictors to set expectations over the next few data

samples. For example, a defect model may indicate the number of faults to be found in

the next release candidate. The degree to which the actual number of faults discovered

diflfers from the predicted value can be an indication of unexpected circumstances

within the project. For instance, an unusually low value (when compared with the

predicted value) may indicate fewer code changes were made than expected, or that

less testing was performed than expected. Management has been forewarned; an

investigation can be made and the appropriate response can be taken. This small

scale feedback provides valuable and timely feedback to management within the scope

of the project.

In addition to this small scale feedback, models provide feedback on a larger

scale, outside the scope of a single project. Large scale feedback assumes an entire

project can be "boiled down" to its essence. Large scale feedback is less concerned

with local variations and more concerned with the overall picture portrayed by the

data. This "essential" portrait is regularly required by management, with or without a

formal model. Without a formal model members of management must rely on guesses

and antidotal evidence. In contrast to this haphazard approach, a formal model es

tablishes a rigorous evaluation. A formal model establishes the critical parameters

within the system. Using a formal model, projects may be evaluated or compared in

terms of the model paramc>ters. Model parameters allow evaluation and comparisons

to be based on defendable data rather than guesses and hearsay. Additionally, given

estimated values for the parameters, the model can make predictions about the out

come. This relationship between parameters (input) and predictions (output) codifies

a causal effect believed to be true within the system.

1.2 Model Requirements

The primary purpose of a model is to provide a documented method of in

terpreting a set of data. In many cases the interpretation is masked by the sheer

quantity and detail of the data available. Information is revealed when the data is

interpreted in a particular way. The interpretation results can be used to evaluate

past performance, assess the current situation, and make predictions about future

performance. The interpretation can also be used to compare multiple data sets.

Results from the same model, applied to several data sets, allow the data sets to be

easily compared in terms of the model. The model provides a systematic method for

comparing projects.

One type of model interprets series data by attempting to fit collected data

to a family of curves. The single curve which best fits the data is used to describe

the data in terms of the model. The specific values used to generate the best fitting

curve are considered parameters of the model. Parameters of a model reveal one or

more dimensions of the collected data. In this case, parameters can be considered an

output of the model. Collected data is the input and results summarizing that data

are produced. Parameters can also be used as input resulting in expected sample data.

When used in this way, models make predictions based on estimated parameters. In

either case, the expected progress as defined by the model is given by the model curve.

The model equation and a specific set of parameters define the model curve.

A valuable model is one that produces a clear and concise interpretation of

the data. Part of this interpretation is in the form of the specific values for the model

parameters. For example, consider two models, one uses only two parameters while

the other has eight parameters. Even though the eight-parameter model may predicts

the data "twice" as well, it may not be the better model if its parameters have no

particular meaning or are hard to estimate.

Models should have as few parameters as possible while still modelling the

data with sufficient accuracy. Fewer parameters means the model is easier to under

stand. Part of understanding a model is understanding the relationships between the

parameters. Parameters are related by the affects each has on the others. Knowing

the trade-offs between parameters is necessary to understand a model. This is easier

if the model contains fewer parameters.

In addition to relatively few parameters, individual model parameters should

be understandable. Understandable parameters produce simple results with meaning.

On the other hand, meaningless parameters do not help to simplify or interpret the

data since they must again be interpreted. Parameter meaning is even more important

when the model is used as a predictor for new projects. In this case, model parameters

must be estimated before any data has been collected. If the individual parameters

are well understood better estimates for each will be made. Better estimates will

produce better predictions.

Related to individual parameter meaning is the parameter unit. Model pa

rameters should be expressed in well-known units, rather than new or arbitrary ones.

Parameters with direct interpretations allow the model results to be easily understood

and used. Well-known units are also much easier to estimate. Again, this allows for

better predictions.

Model parameters should be few in number, directly interpretable, and mea

sured in existing units. These properties give the model parameters the most meaning

and thus give the model the most "clarifying power".

1.3 Implementation Progress Model Requirements

In addition to general model requirements, an implementation progress model

must be compatible with existing models. Implementation progress is not a new con

cept. An informal progress model already exists; it can be seen in project vocabulary

and assumptions. For example, project members speak of "getting over the learning

curve" and being "up to speed". This informal model is commonly used to answer

common project status queries, such as:

What is the expected completion date based on the current pace? What was the size of the total effort for that project? What fraction of the total effort is currently done? What fraction of the total effort will be done by a certain date?

A proposed implementation model should serve the same purpose as the informal

model. The model must help provide answers to questions about implementation

speed and progress of current and future projects.

The informal progress model captures another key attribute of implementa

tion progress. The informal model acknowledges that project speed is not constant

throughout a project: projects "ramp up" and "slow down". These phrases refer to

project speed and suggest the ability or desire to determine implementation velocity.

As envisioned by experienced project managers, this velocity increases at the begin

ning and decreases near the end (McConnell 1998). A formal implementation progress

model should be informed by this experience and capture the canonical variations in

velocity during implementation.

In summary, the desired attributes of a formal implementation progress model

include: relatively few parameters, understandable parameters, well-known parameter

units, consistency with informal progress model, and the ability to answer manage

ment questions involving size and velocity.

The next chapter (II) discusses supporting work indicating the need for a

progress reference model. Chapter III describes the proposed implementation progress

model, hypothesis, and research process. The specific metrics used are presented in

Chapter IV including description, discussion and support for each. Results from

actual projects are presented in Chapter V. Finally, conclusions and directions for

future study are in Chapter VI.

CHAPTER II

RELATED WORK

Previous work applying software metrics to the development process has pri-

marih' been targeted at tasks before and after implementation. Much work has been

done in the pre-implementation stages to improve effort prediction and estimation (Al

brecht and John E. Gaffney 1983; Lind and Vairavan 1989; Jorgensen 1995; Boraso,

Montangero, and Sedehi 1996; Turski 2002). Metrics have also been used to evaluate

architectural design before implementation. Significant work has been done in the

post-implementation stages to predict failure rates, both for a product as a whole

as well as for individual modules (Jelinski and Moranda 1971; Goel and Okumoto

1979; van Solingen and Berghout 1999; Fenton and Neil 1999; Schneidewind 1999).

However, substantially less work has been published regarding the use of metrics for

assessment, monitoring, or control of the implementation stage itself.

DeMarco asserts "you can't control what you can't measure" (DeMarco 1982,

). Indeed, every researcher proposing, defending, or merely discussing a metric

agrees the reason behind metrics and measuring is to gain some degree of understand

ing and control over the complex process of software development (Fenton and Neil

2000). Recent studies have focused attention on how to use the vast array of data

generated by existing measures.

Published works emphasizing how to use the potentially enormous data avail

able can be coarsely divided into three groups. Many works assert metrics should be

used to assist in monitoring, evaluation and control of projects during the implemen

tation phase (DeMarco 1982; BasiU and Rombach 1988; Boehm 1988; Lott 1993; van

Sohngen and Berghout 1999; Kirsopp 2001). Other researchers recommend specifi

cally that time-series metrics data be used to monitor and evaluate projects (DeMarco

1982; McConnell 1998; Schneidewind 1999). Finally, several researchers emphasize

the insight gained from causal models over correlative models (Powell 1998; Fen

ton and Neil 2000; Turski 2002). They suggest models which provide an inherent

causes-relationship are more valuable than simpler correlation models.

2.1 Implementation Evaluation and Control

DeMarco presents a development process relying on steadily improving es

timates to provide feedback and control during all phases of software development

(DeMarco 1982). Metrics collected at each stage of development provide raw data for

creating an improved estimate for the next stage. Metrics from the current project as

well as previous projects are used to inform decisions. Inherent throughout his work is

the requirement for continuing feedback via improving estimates, allowing the devel

opment effort to be directed. A compelling case supporting estimate reviews, process

metrics, cost models and quality improvement is developed. DeMarco identifies and

recommends metrics appropriate for each stage of development.

In discussing appropriate metrics for the implementation stage DeMarco al

ludes to process metrics such as compilation rate but does not explore them. The

primary implementation measure is code weight, which is defined as a product of two

dimensions: size and complexity. DeMarco defines code size as information content

within a program. He recommends using Halstead's volume metric (Halstead 1977) to

find size. Several alternatives for measuring complexity are presented, but McCabe's

cyclomatic complexit}' measure (McCabe 1976) is recommended. Using these two

dimensions as parameters, an algorithm is presented for computing implementation

weight. Historical data from similar projects and environments is also used to provide

scaling factors. According to DeMarco, the primary motivation for computing imple

mentation weight is to improve future project estimates. However, he also calls it a

"project predictor", that is, it should predict the final size of the project accurately.

According to the system presented by DeMarco, the measure should be taken once

near the middle of implementation. The progress model described in this study may

provide a better estimate of implementation size. Since the proposed model considers

the complete project history, not simply a single point in time, it is less susceptible

to anomalies.

Boehm takes a broader approach to development feedback than simply fo

cusing on improved estimates. He introduces a software development methodology

whose principle goal is risk management (Boehm 1988). His spiral model of software

development relies on risk evaluation as the impetus for each unit of work, whether

the work unit is a prototype, design document, or code. Risk management impUes

the ability to control what is being managed. This agrees with DeMarco's argument

that our need to measure the software development process stems from our desire to

control the process (DeMarco 1982). Boehm's methodology assumes feedback metrics

exist to inform the risk evaluation process, but does not dictate specific measures or

measurement processes.

Addressing the selection of appropriate metrics for quality control, Solingen

and Berghout defined the Goal/Question/Metric Method (GQM) of improving soft

ware quality (van Solingen and Berghout 1999). GQM integrates metrics into the

development process in order to answer questions about quality raised by corporate

goals. Their methodology relies on the ability to follow the connection between cor

porate goals and specific metrics, in both directions. Measurements are defined by

goals and the results interpreted in terms of those goals. In the area of quality control,

well developed process models exist to help define and interpret metrics. However,

implementation models in general, and implementation progress in particular, have

not been as well developed.

Kirsopp addresses the need to capture development models and enough data

to evaluate them. He strongly argues that the software development process needs

measurements for feedback and that the integration must be close, detailed and ap

propriate (Kirsopp 2001). Organizations must support metrics outside of a single

project in order to validate the process, validate the results, and collect historic data.

All three of these are required to assist future project estimates. An experience fac

tory provides a repository for captured experiences and models, allowing reuse within

an organization. Kirsopp cites the TAME (Tailoring a Metric Environment) Project

(Basili and Rombach 1988) as a working example of an experience factory.

Lott provides an alternative approach; instead of suggesting or analyzing met

rics, he studied several available and proposed software engineering environments

(Lott 1993). Many of these environments include integrated tools for collecting nu

merous metrics about various development artifacts created. Lott suggests collected

data can be used to guide development and to call attention to atypical patterns wor

thy of investigation. In this regard, he assumes time-series data will be collected and

evaluated. Inherent in this idea is the development of a canonical pattern or typical

shape for a particular metric. Unfortunately, neither Lott nor the systems studied

define how to select or interpret the automatically collected measurements.

2.2 Time-Series Shape Metrics

In addition to estimation, DeMarco briefly notes that process metrics, such as

compilation rate, can be used to identify project dysfunction and impending problems

(DeMarco 1982). Periodic sampling of a metric allows the value measured to be

graphed against time. For some measures, such as compilation rate, "healthy" projects

may all have similar shapes when viewed as time-series data. If this is the case,

"unhealthy" projects can be detected if or when they deviate from the canonical shape.

For example, DeMarco suggests compilation rates that continue at a steady rate

without showing any decline may be an indication of a "thrashing" development team.

While this particular evaluation may not apply to all development environments, the

idea of a "healthy" canonical shape can be applied to all environments. In another

instance, he recommends reporting test progress as a graph showing measurements

against time. Time-series graphs make it clear how test progress has been proceeding

and how its trends change over time. In general, comparing the current project

with similar historic projects using graphs can highlight abnormal trends which may

be an indicator of trouble. Given DeMarco's emphasis on continuous monitoring

and improvement, it is surprising he does not suggest using implementation artifact

metrics, such as size or complexity, to monitor implementation progress.

Schneidewind used time-series metrics to create a method for evaluating pro

cess stability (Schneidewind 1999). He illustrates its use by evaluating the process

stability of the NASA Space Shuttle flight software development efforts. Schnei

dewind emphasizes that metric trends are a significant indicator of the underlying

process and monitoring the trends can provide feedback about the process. To quan

tify these trends, he introduces two new classes of indirect metrics. A change metric

is computed using differences in consecutive values of a traditional metric. This met

ric can be viewed as the derivative of the primary metric. The other class of indirect

metric introduced is the shape metric. A shape metric is derived from the curve of

the time-series metric data when graphed against time. For example, one shape met

ric suggested is the time at which the failure rate is highest. Lower values for this

metric may indicate process stability, while higher values may indicate instability in

the development process. A strong case is presented for the use of time-series data,

and indirect metrics derived from it, in the context of process stability. Monitoring

progress during the development stage using change and shape metrics is an obvious

extension of this study.

McConnell understands typical "code growth" on a project to contain three

distinct phases (McConneU 1998). In the first phase, architectural development and

detailed design generate very little code. The second phase provides staged deliveries

and includes detailed design, coding, and unit testing. During this phase code growth

is very high. During final release, the third phase, code growth slows to a crawl.

McConnell shows a graph depicting a typical code growth pattern for a well-managed

project. He indicates the phase transitions occur at approximately 25% and 85% of

the total development time, but acknowledges that this varies to some degree. His

main point is periodic monitoring of code size is a valuable feedback tool for managers.

No details are given about the specific metric(s) involved or the process used to collect

the data. The proposed progress model clarifies how metrics are used and provides a

specific interpretation of the three phases documented by McConnell.

2.3 Process Models

Powell expands the role of software measurement to explicitly include not

only prediction and control but also assessment and understanding (Powell 1998). He

makes arguments for assessment similar to those presented by Boehm and DeMarco

for prediction and control. Regardltiss of the motive, measurements are always based

on assumptions about the process in which the measurement is taken. Powell states

"it is impossible to talk about measurements without implying some form of [process]

model" (Powell 1998, page 5). Before measurements can be taken, and before metrics

can be determined, a model of the development environment must be chosen.

Fenton and Neil proposed using Bayesian belief networks (BBNs) to model

development environments (Fenton and Neil 2000). BBNs integrate causality, uncer

tainty, and expert subjective input to generate answers and assist in decision making.

They observe the current lack of metrics in common practice, within the software

development community, and blame a lack of realistic decision making models. Many

existing models are simplistic in that they show strong correlations between measures

but do not properly model causality. Using BBNs allows better causality modeUing

to be achieved. This allows the answers generated to be both more accurate and

traceable. BBNs encode beliefs about the causality of a process in the network. The

parameters can be populated with known values, known distributions or even sus

pected distributions. Answers computed show not only the likely result but its range

and distribution. BBNs provide a general solution to the problem of model encoding.

Turski presents a model for understanding the observed rate of software growth

as a function of time (Turski 2002). Using the number of modules as the dependent

variable and uniform interrelease intervals as the independent variable, he shows size

correlates strongly with the third root of time {size = \/time). While defendable on

the bases of Lehman's Laws of Software Evolution (Lehman, Ramil, Wernick, and

Perry 1997), Turski uses a simple mental model to understand the same relationship.

He suggests envisioning a system as a sphere with "surface" modules being easy to

modify while "interior" modules are much harder to modify. With this model in mind,

it is easy to see that the proportion of easy modules to hard modules tends toward

zero as the project (sphere) grows with time.

Turski believes that simple and manageable models, such as that described

above, provide powerful insights into understanding the forces at work in software

development. In particular, models which exhibit causal relationships rather than

simple statistical correlations provide not only better interpretation but improved

understanding of the process. He suggests similar "back-of-an-envelope" models be

developed precisely because they are simple to understand and intuitive. He expresses

concern that so few causal-model-based arguments have been presented in the litera

ture to date. The progress model proposed here includes such a simple and intuitive

interpretation of the implementation process.

CHAPTER III

RESEARCH DESIGN

This chapter presents the implementation progress model studied and proce

dures used in the study. First, the proposed model is described in detail along with

the rationale for it. This is followed by a description of the analysis process. Then,

the data required for the study is discussed. Finally, a summary of where appropriate

projects were found and how the data was collected is presented.

3.1 Existing Progress Models

Project managers have developed an intuition about what should occur during

a software development effort. An implementation progress model should be consis

tent with this experience. A condensed version of this collective wisdom is presented

by McConnell (McConneU 1998). He uses code growth as an measure of progress and

provides a nominal code growth pattern as well as a range of normal variations for

well-run projects. An appropriate progress model should reflect the basic shape of

accepted norms such as those presented by McConnell.

Another constraint on choosing an appropriate implementation progress model

is its interpretive power. Interpretation of metric data relies on some understanding of

or belief about the underlying process. For example, changes in the rate of progress

in an otherwise stable environment may indicate the project has transitioned to a

Figure 3.1: Accumulated change metrics for a typical project by day.

new phase. This assumes the rate of progress is dependent on the project state. This

process of drawing meaning from data, such as when a phase ends, is interpretation.

An implementation progress model must approximate actual project data col

lected. Figure 3.1 shows several common metrics sampled from the implementation

phase of one project studied. The values have been arbitrarily scaled to allow si

multaneous viewing of the four measures. The similar shape observed between data

sets suggests the metrics are highly correlated. This observation has been shown

previously (Lind and Vairavan 1989; Lake and Cook 1994).

The data in Figure 3.1 is very similar to that described by McConnell. Each

data series has approximately the same shape, an S-like curve. Initially the project

makes very little headway, but over a period of about seventy days the rate of progress

improves. During this time the project experiences a period of increasing speed

as resources are added and things get "up to speed". For the next 100 days the

metrics show approximately linear progress. Here the project experiences a time of

continuous, even-paced development during which much of the work is accomplished.

This is the period of highest efficiency for the project. Finally, during the last eighty

days progress continues but at a steadily slowing pace. The project experiences a

time of lower efficiency as it nears completion and attention to detail is paramount.

This graph demonstrates the important characteristics of typical progress data.

Overall progress is not linear with time, but approximately linear progress occurs

during the middle of the project. This sample and experience suggest the beginning

and end of the project contribute the least to overall progress. This means the fastest

pace occurs during the middle of the project while the ends are slower paced. The

slope of the progress curve indicates the speed of progress. The graph is consistent

with both expectations about progress and speed. The slope of progress (velocity)

can be used to determine how and when the project transitioned from "ramping up"

to its sustainable rate of progress. Likewise, near the end of the project, a slope

change indicates a change in velocity which may indicate a change in process and

possibly goals.

The typical "slow" startup phase may be partly due to resources being added,

team members establishing relationships, and processes developing within the project

group. McConnell suggests detailed requirements understanding and architecture

may still be under development during the startup phase as well (McConnell 1998).

Steady "fast" development begins only after this initial investment period. As the

project nears completion the rate of progress decreases. This slowing may be due to

inefficiencies related to critical paths, loss of resources, defect removal and attention

to detail required to meet flnal deliveries.

DeMarco acknowledges this non-linear relationship in his advice about when

to collect implementation metrics (DeMarco 1982). He strongly advocates the need

for feedback throughout the development process, however recommends only a single

sampling of size metrics near the middle of the implementation phase. This incon

sistency is understandable after examining his assumptions. DeMarco uses a linear

model which is required to pass through the origin and the current project implemen

tation weight (size). It can be easily seen in Figure 3.1 the only time an origin-based,

linear model will be accurate is if the sample point is near the middle of the project.

3.2 Implementation Progress Model

Before attempting to characterize implementation progress for a complete

project, it is appropriate to consider a definition for implementation progress. Soft

ware implementation is the activity of creating artifacts which contribute to the goal

of delivering a working system. Implementation progress is the amount of change in

an artifact over the time needed to produce the change. Progress is best measured as

the sum of many changes rather than any single measure. This is due to the volatile

nature of implementation; code may be added, removed, moved or changed. The sum

of these individual measurements is a better approximation of the total progress than

any single measurement. This is similar to the way an odometer is a better measure

of distance travelled than simply the distance from the starting point.

An individual implementation progress measurement captures implementation

artifact change over time. Change over time is velocity, so individual progress mea

surements attempt to capture implementation velocity. If implementation velocity

is in anyway analogous to velocity found in the natural world, it cannot change in

stantaneously. It is reasonable to assume implementation velocity does not change

instantly, so the first derivative (velocity) of the implementation progress model must

be continuous.

Experience suggests implementation velocity begins and ends at zero while

being at its highest in the middle. The simplest implementation velocity graph, con

sistent with experience, consists of three linear segments. Implementation velocity

begins at zero. It increases linearly until the maximum sustainable velocity for im

plementation has been reached. The velocity remains constant until near the end

of implementation when it begins to decrease. Implementation velocity constantly

decreases until it reaches zero.

project time

Figure 3.2: Idealized implementation velocity as a function of time.

project time

Figure 3.3: Idealized implementation progress as a function of time.

Figure 3.2 shows the idealized implementation velocity for a project as a func

tion of time. The horizontal, center phase represents the steady, efficient development

observed in the middle of the implementation phase. The positive slope at the left

represents increasing velocity as implementation "gathers speed". This increasing pace

may be the result of adding resources or team members establishing roles within the

project. The negative slope at the end of the graph shows the implementation phase

decreasing speed as the end approaches. The slowing may be due to inefficiencies

related to critical paths, loss of resources, and attention to detail required to meet

final deliveries. Slope changes in a graph showing velocity are good indicators of a

change in the environment, such as mode of development or goal for the project. In

the idealized graph shown in Figure 3.2 the slope changes indicate respectively that,

the project has reached its maximum sustainable velocity, and the team is attempting

to complete all tasks rather than maintain velocity.

The idealized graph shown here is symmetric; however, symmetry is not com

mon in practice and is not required by the model presented. The idealized velocity

as a function of time (vt) can be described using four parameters.

vt = <

sj-, 0 <t <tp

s, i"P j^ 6 ^s, LQ (3.1)

Equation 3.1 gives the velocity as a function of time, where s is the maximum sus

tained velocity, tp and tq are the times of the phase transitions, and te is the time at

the end of implementation. Time may be measured in any real unit, such as days.

Pseudo-time units such as releases, deliveries, or builds are not recommended unless

an equal amount of resources are spent during each interval. Velocity is measured in

size of metric change per time unit, such as lines of code per day.

Integration of the idealized velocity for a project produces idealized progress

as a function of time (pt).

'2i„

Pt= { oL n oLp^

0<t<t^

Ljy ^ ^ 1/ ^v. LQ (3.2)

{t'^ -2tet+t^+tpte-tptg)

' 2{tg-te) S^ n// /\ '-, tq <t <te

Figure 3.3 shows an idealized implementation progress curve as a function of time.

Equation 3.2 gives the implementation progress as a function of time. Progress is

measured in accumulated metric change to date, such as total lines of code changed.

Equation 3.2 is given in terms of the previously defined parameters.

The proposed model is consistent with ideas proposed by others. Schneidewind

suggests artifact metric trends may be used to study the stability of the development

process (Schneidewind 1999). He defines process stability as continued improvement

in an artifact metric over time. This relationship between artifact metrics and process

metrics is stated clearly by Woodings and Bundell. They assert the derivative with

respect to time of a valid product metric is a valid process metric (Woodings and

Bundell 2001). The product metric, in the proposed model, is any size-related measure

of code. The derivative of size is growth or implementation velocity, a measure of the

implementation process.

3.3 Model Interpretive Value

Time-series data sets are rich sources of raw data. However, the metric data

itself does not directly characterize the project progress. Trends within a data set

contain much of the global meaning (DeMarco 1982; Schneidewind 1999). Extracting

meaning from this data remains difficult due to noise inherent in both the sampUng

mechanism and the process being measured. Extracting meaningful information from

the data relies on a belief about or understanding of the process which produced it

and its essential variables. The model allows the process data to be interpreted.

Or, in other words, any modelling is an interpretation of the principle mechanisms

operating within the process.

Models with fewer parameters are preferred to models with many parameters.

Fewer model parameters reduces ambiguity and avoids overfitting. As the number

of parameters in a model grows the likelihood that non-orthogonal parameters exist

increases. Non-orthogonal parameters allow multiple sets of parameter values (so

lutions) to model the data equally well. If multiple parameter values produce an

equally good fit the model does not help differentiate between the solutions. A model

resulting in a single solution does not require an external differentiator and thus its

solution has captured more meaning. Another reason few parameters are preferred is

to avoid overfitting. Overfitting occurs when the model attempts to capture details

at or below the level of noise in the system. When overfitting occurs the model loses

generality and become specific to a particular data set or sets. To avoid overfitting

only parameters required by the interpretation are included in the model.

As described above the model includes four parameters (s, tp, tq, tg). However,

care must be taken when counting model parameters. Other projection models do

not include specific knowledge of the terminal condition. The terminal condition is

external in these models. Using these models, identifying the end time, for example,

is a property of the model parameters and the total size. In this example, total size

is eflFectively an additional parameter required to determine the end time. However,

total size is not a parameter determined by fitting. Similarly, t^ is not determined

by fitting; it is simply taken to be the maximum time in the data set. The proposed

model uses only three parameters which require fitting. Since implementation progress

has been seen to be both convex and concave, three parameters can be considered a

minimum parameter set.

A model expressing results in concrete units provides better insight than an

otherwise similar model using arbitrary or ambiguous units. The proposed model

does not include any scaling factors, constants, or other arbitrary parameters. The

proposed model uses only two units: time and metric-change. Both time and metric-

change are well-know units, understandable, and have obvious meaning to both de

velopers and managers. An intuitive understanding of the units, time and metric-

change, allows tftem to be easily used in planning and evaluation. Because they are

well-known, they require no special experience or "rules of thumb" normally associ

ated with scaling factors, constants and other invented parameters. By using concrete

units the model results may be easily related back to the project. This allows the

results to be directly used to advice management about the project.

The proposed progress model helps answer speed and estimation questions.

The model parameters can be directly used to answer management questions about

project size and the speed of development. By providing a solution in readily usable

form, predictions for "what if" scenarios are easily computed.

The proposed model allows for project implementation comparisons. Models

allow data collected from different projects to be compared. Results from the same

model applied to different data sets can be compared, revealing differences in the

projects. A model also allows projects to be compared across time or under different

circumstances. As more project results become available, historic basehnes can be de

veloped. Eventually canonical profiles for each set of circumstances can be produced.

Profiles, baselines and the ability to compare projects allow better predictions to be

3.4 Data Requirements

Before project implementation can be evaluated or compared, data about the

implementation must be collected. The implementation progress model proposed as

sumes an appropriate measure of progress is available. Ideally this measure would

report "effort expended wisely", "real progress made", or even "percent of work com

pleted". Unfortunately these highly-desirable metrics are not available. Instead, a

surrogate measure is used with the knowledge that the dimension sampled is not the

actual quantity desired (Fenton 1991). In this case, the quantity sought is imple

mentation progress. Choosing an accurate surrogate measure of progress depends on

one's definition of progress. It may seem tempting to include all activities involved

in implementation when defining this dimension. However, the many activities re

quired during implementation are quite diverse. This diversity is so large that the

only measures they have in common are time (or money) required to complete them.

Measuring resources expended is valuable because it allows results (progress) to be

compared with expenses. In order to allow this comparison, resource measurements

must be independent from progress measurements.

All of the diverse implementation activities contribute to progress in their

own way. Measuring each activity may be possible, but this approach does not easily

produce a consolidated measure unless the individual metrics are summed. Summing

many different metrics is fraught with problems such as identifying all components

and determining the correct weights for each. To avoid defining a progress metric

as the sum of many others, we define implementation progress in terms of the one

absolutely required engineering artifact necessary to deliver a software project: code.

Here we define implementation progress as a measure of the total effort captured in

deliverable code. For environments and phases that emphasize code as the primary

engineering delivery, this is an appropriate definition.

Limiting the search for a metric to an accurate measure of captured effort

within code still leaves a difficult task. Many code metrics and variations have been

proposed. A review of methods for quantifying software change covering many differ

ent research areas concludes: "the most practical, successful measures to date appear

to be based on simple atomic counts" (Powell 1998, page 14). Even restricting a

search to "simple atomic count" metrics leaves many options. Many of the metrics

available today are primarily used for static analysis. For example, they may be used

to predict the probability that a module contains a fault rather than the total amount

of effort required to create it.

This study is interested in measuring progress in implementation artifacts.

Progress was measured as artifact change over time. Existing units of internal change

were unavailable, so an appropriate unit was developed. Implementation artifact

metrics traditionally used to statically measure size form the basis of the metrics

used in this study. Since this study was interested in evaluating change as a function

of time, the difference between consecutive samples was used. The change metric

is defined as the absolute difference between consecutive samples of a size metric.

This represents the change in the dimension of the size metric over the time interval

between samples. Equation 3.3 shows Am,t is the change in a metric m at time t.

Am,t = \mt - mt-i\ (3.3)

The absolute value of each difference was used. This assumes that the correct

removal of code artifacts was approximately as challenging as correctly adding the

code. This point could be contested by those claiming that removing code is easier

than adding code. While that may be true, it cannot be claimed that the removal

effort represents progress away from the goal, which would be the case if "negative

effort" was permitted. To avoid introducing a weighting factor for "negative" changes,

all changes were treated equally. These new metrics measure the size of change over

the sample interval, or in other words, they give a measure of implementation progress.

Existing artifact measures shown to be related to size were chosen for this

study. Chapter IV discusses the specific details of each metric used.

3.5 Data Collection

In a team environment, software implementation occurs in parallel as each

team member works semi-independently. From the perspective of the project source

repository, development occurs simultaneously across many source files. This chaotic

environment means capturing progress for a complete project requires summing the

change from many individual source files. Similarly, multiple changes within a single

file are accumulated to find the total change across the time interval of interest.

Collecting measurements of implementation change would typically be done as

each source file was submitted to the project source repository. This mechanism helps

provide part of the "real-time" data needed to manage the project. For this study,

"live" projects were not available, however project source repositories with version-

control were available. These source repositories allow source files to be retrieved for

any point in the history of the project. By retrieving each version of a file, a time-

stamped series of intermediate files can be collected. This retroactive recovery of each

version means the data collected is equivalent to what would have been collected

"live" during the project implementation phase. For each version of a source file,

the following pieces of raw data were collected: date and time, filename, version

number, submission engineer, and the various effort metrics discussed in the next

chapter. Additional derived pieces of data were computed including elapsed time

in project, elapsed time since the file last changed, and other files submitted at the

"same" time. Files were determined to be submitted at the "same" time if the same

author submitted both files and the recorded submission times differed by at most

three minutes.

Several of the project data sets contained anomalies which were judiciously

removed. Six of the data sets included unusual entries long after the completion date

for the project. In some cases, this was evidently a mistaken submission of a file to the

wrong repository. In other cases, this was evidence that a later, minor maintenance

release of the project was produced. A gap of at least one month in the data was

used to identify the end of the project. A future work may compare these projects

against those without revisions to determine if the projects were shipped "too soon".

In addition to outliers near the end of the data set, three projects contained unusual

events related to repository submissions. In two cases, a zero-length file was submitted

to the repository immediately before being resubmitted correctly. In one case, several

hundred lines of code were removed from a file only to be immediately added to

another file. These three data events were considered to be the result of misuse of

the repository and the events were removed. Finally, project one was discovered to

included a prototype phase. Since the engineering team changed following completion

of the prototype, it was treated as two projects (la and lb).

3.6 Data Source

Seventeen projects from a single company were studied. All projects were de

veloped using the same process. They were six weeks to eighteen months in length and

involved one to eight engineers. All projects produced multimedia software designed

to be marketed to consumers for use with Microsoft® Windows® and on Macintosh®

personal computers between 1995 and 2002. Much of the total project effort for these

projects was not software development but rather multimedia content development.

Source and content development outside the usual definition of software was not con

sidered; only source files created by software engineers and containing source code

was included in the study.

The projects studied were not on-going efforts such as the more typical sce

nario of software maintained for in-house use. The projects were intended for mass-

distribution to consumers, maintenance changes were not anticipated or economically

acceptable. Clear delivery dates exist after which no work was to be done. This is

unlike some other environments, where software is deployed rather than delivered,

and implementation evolves into a continuous cycle of maintenance. The progress

model studied is expected to be meaningful when applied to each release of on-going

projects, however additional studies will be needed to establish this. Results from

this homogeneous group of projects should apply to initial development efforts of all

projects and to projects without maintenance phases.

CHAPTER IV

EFFORT METRICS

Three established metrics were used in the study. Two variations on the lines

of code metric were considered. Also included are Halstead's volume measure and

McCabe's cyclomatic complexity measure. These metrics have been studied together

many times. Several studies showing the correlation with effort or faults have been

published using these metrics (Lind and Vairavan 1989; Kafura and Canning 1985).

Each of the metrics is described below including definition, references, and details

about implementation within the study. Measures other than those studied here

could be used and several suggestions for future investigation are included below.

These additional metrics are not be included in this initial study.

4.1 Source Lines of Code

Software size can be measured using a count of source lines of code (SLOG)

contained within the project. This metric includes many variations. The Software

Engineering Institute has identified dozens of issues that should be considered when

defining SLOC (Park 1992). These issues help define what exactly counts as a line

of code. Typically lines containing only white space and lines consisting of comment

characters without any alphabetic characters are not counted. In addition to the

preceding rule, physical lines containing both code and comments count as two lines

for this study.

While some researchers continue to dismiss SLOC as a poor measure (Lehman,

Ramil, Wernick, and Perry 1997), studies have shown strong correlations between

SLOC and other properties such as effort, project duration, and quahty. Lind and

X' airavan even show it can outperform more complex measures (Lind and Vairavan

1989). One of the main arguments against this metric is its susceptibility to "spoof

ing". For example, simply using a different formatting scheme will produce different

results. Other intentional spoofing effects are even easier to imagine. In this case,

since the data was collected after the fact, the project engineers had no knowledge of

this study. Lacking knowledge of the study and since no measurements were taken or

published within the development environment, intentional spoofing effects are not

possible. Mitigating unintentional counting effects relies on consistent coding prac

tices or standards. In this case, the development environment strongly encouraged

conformance to a consistent set of coding standards. Since similar coding standards

were in effect for all projects, unintentional effects should be mitigated. In general,

strong conformance with local standards should improve the quality of this type of

metric.

New studies continue to cite SLOC as an independent variable for various

studies. Its continued use has been attributed to its use as the null hypothesis of

measures by Powell (Powell 1998). It is also regularly used to estimate projects

and measure productivity. These studies and estimates substitute a specific SLOC

metric for a true measure of the desired dimension. For example, Schneidewind uses

lines of code changed as a measure of total effort in his study of process stability

(Schneidewind 1999). Jorgensen uses lines of code changed to measure maintenance

task size in his study of maintenance task prediction models (Jorgensen 1995).

This study will consider two variations of SLOC. The simplest form counts the

SLOC change (SLOCC) for each file submitted. SLOCC is the absolute difference

in SLOC between source files consecutively submitted to the repository. That is, it

counts SLOC added or deleted from the previous version. The second form measures

the number of lines actually changed between submissions by comparing the files.

This second measure is sometimes referred to as code churn (CHURN) (El-Eman

2000). CHURN is the count of source lines inserted, deleted, or changed between

source files consecutively submitted to the repository. It is probably a better metric

than SLOCC since CHURN captures changed lines (which also represent real work

based on arguments presented above) that are not captured by SLOCC alone.

4.2 McCabe Complexity

The second software size metric considered uses cyclomatic, or McCabe, com

plexity measure (MCM) as its basis (McCabe 1976). For a single function, cyclomatic

complexity can be computed by adding one to the count of flow-control decision

points. For several functions, its value is the count of all decision points plus the

count of functions under consideration. This study used this definition and grouped

functions by the source file in which they appeared.

MCM was originally proposed as an complexity evaluator of functions or mod

ules. MCM was invented before object-oriented development was available, so it only

considers algorithmic size, not structural or object size. It has been studied almost as

extensively as SLOC and many studies have shown that MCM and SLOC are highly

correlated (Lake and Cook 1994; Lind and Vairavan 1989). Based on this reasoning,

Andersson et al argue cyclomatic complexity is actually a measure of effort (Ander-

sson, Enholm, and Torn ). They define effort as program length times real average

complexity {effort = length x complexity). They demonstrate cyclomatic complexity

captures the product of the length dimension and the hidden dimension of real com

plexity. This finding indicates a measure based on MCM could provide an acceptable

surrogate for captured effort.

This study considers changes in MCM to represent a measure of effort re

quired to achieve the change. This McCabe complexity change (MCC) is defined as

the absolute difference between MCM values of source files consecutively submitted

to the repository. Ideally, several variations on this metric would be considered. For

example, the MCM of the actual lines added, deleted, or changed. This measure is

conceptually similar to the difference between SLOCC and CHURN. Another inter

esting metric to consider is the sum of the complexity of all functions changed. This

suggestion assumes that any change to a function requires at least some understanding

of the entire function.

4.3 Halstead Volume

The third software size metric considered uses Halstead's Software Science

measure of program implementation length (Halstead 1977). Halstead implemen

tation length (HIL) is defined as the total of operator count and operand count.

Halstead defines operands as variables and constants; he defines operators as com

binations of symbols that affect the value or ordering of an operand. From these

definitions and his example it is obvious that HIL varies only slightly from a count

of tokens. Halstead does not provide a precise definition of operator or operands

for the C++ programming language. Instead of attempting to arbitrarily map C++

constructs to Halstead operands and operators, a token count was used.

Halstead defines many metrics derived from the count of operators and oper

ands. From these he derived many more exotic Software Science metrics. Debate

continues as to the theoretical and empirical validity of most of these measures (Fen

ton and Neil 1999). While many of these metrics have generally been disregarded,

his definition of implementation size remains in use for much the same reason SLOC

is used-it has an obvious meaning in most contexts and is easy to implement, esti

mate and understand. Albrecht studied the use of Halstead's implementation length

estimate and function points as a tool for estimating projects (Albrecht and John

E. Gaffney 1983). He shows that for large projects estimated functions points can be

used to predict HIL, which can be used to predict SLOC and implementation time.

One Software Science metric with potential appeal for this study is program

ming effort. Halstead describes programming effort as "the mental activity required

to reduce a preconceived algorithm to an actual implementation in a language in

which the implementor (writer) is fluent" (Halstead 1977, page 46). Unfortunately

this definition is more restrictive than what we desire in that it does not include time

to "concei\'e" the algorithm but rather seems to predict the time required to simply

enter the code.

This study considers changes in HIL to represent a measure of the effort re

quired to achieve the change. Halstead length change (HLC) is defined as the absolute

difference between HIL values of consecutive source files submitted to the repository.

Like the McCabe measures, ideally several other measures would be studied as well.

For example, the sum of HIL for the fines actually changed seems appropriate. As

above, this is based on the reasoning that changes in a line of code require an un

derstanding of at least that whole line. A similar argument can be made for the

containing function, and to a lesser extent the containing class or module. Halstead

also defines program volume in terms of the number of bits required to store it. Dif

ferences between total volume, rather than size, could be considered. Attempting to

measure the volume of the current change could also be studied.

4.4 Other Metrics

The metrics selected for this study were chosen by a combination of factors

including demonstrated utility and practicality. Other metrics could have been chosen

and clearly they will need to be studied. For example, Lehman et al use module count

as a surrogate for size in their study of software evolution (Lehman, RamU, Wernick,

and Perry 1997). Several other interesting metrics suspected of providing high-quality

measures are described below.

One area of software size without an established and simple measure is that

of object-oriented code. Lake and Cook show that MCM and SLOC alone miss at

least one major dimension of object-oriented software (OOS) (Lake and Cook 1994).

Using component analysis they show OOS has a dimension related to the number

of classes, number of polymorphic functions, and number of inheritance fines. Their

analysis did not address complete systems, only class trees and individual classes.

Defining and evaluating a metric based on object-oriented attributes such as those

listed above seems promising. A simple, practical measure for the extra dimension(s)

found in OOS would be very useful.

Henry and Kafura created and studied a metric based on information flow

(Henry and Kafura 1981). This measure might provide a better measure of overall

size or complexity for OOS than the existing ones. However, it does not specifically

attempt to capture size or effort. It is also extremely complex to implement, so for

these reasons it was not included in this study.

One other metric which could be considered is inspired by Strelzoff (Strelzoff

and Petzold 2003). He uses differences in Huffman compression (Huffman 1952) as

a surrogate for the information distance between two versions of program source

code. A similar algorithm could be used to generate a potentially better size-change

metric. This is especially true if the differences, rather than the complete source,

were compressed using only the dictionary of the original source. This process would

give the information distance for the change.

CHAPTER V

RESULTS

Parameterized models provide an approximation of the sampled data for a

particular data set. From a quantitative perspective, the model curve which most

closely fits the data is considered the best, since it introduces the least error. Model

fit can be measured using the squared residual after subtracting the model curve

from the sample data. To allow comparisons between data sets the average squared

residual error {R^) is usually used. Large values of R'^ indicate the model does not

fit the data. When comparing models, the model with the lowest R^ for a particular

data set provides the closest approximation. In order to establish that a model is

generally better, similar results should be obtained from multiple data sets covering

the domain of interest.

5.1 Alternative Models

In addition to the proposed implementation progress model, three alternative

models were chosen to provide a context for evaluating the fit of the proposed model.

The usual quantitative analysis for a new model is to show that it performs better than

existing models. In this case, no established implementation progress model exists.

Therefore, to provide a frame of reference, three alternative models were selected for

comparative analysis.

The first model was a standard linear approximation. The linear model curve

is given by Equation 5.1. Linear approximation, with only two degrees-of-freedom,

represents a practical lower-bound on the number of model parameters. Alternatively,

it can be viewed as the model with the highest expected R'^. It was chosen as a

practical upper-bound on R^.

lineavt = at + b (5.1)

Of the three alternative models, linear approximation is the only model with

obvious interpretations for its parameters. The slope (a) represents the average ve

locity. The y-intercept {b) can be interpreted as a "correction factor" necessary to

reduce error.

The second alternative model chosen was a multiphase, piecewise parabohc

approximation. It contains eleven degrees-of-freedom; its model curve is shown in

Equation 5.2. This model was chosen to represent a practical lower-bound on R^. No

suggestion is made as to the interpretation of the individual model parameters.

( at'^ + bt + c, Q<t <tp

dt^ + et + f, tp<t<tq (5.2)

\gt^ + ht + i, tq<t< te

This model was chosen to provide a reasonable lower-bound for i?^ given an

extremely data conforming model. The multiphase model equation contains the pro-

multiphaset = <

posed model equation as a special case. Since the proposed model consists of three

pieces, the multiphase model is piecewise with three segments as well. Where the

proposed model consists of two parabolic segments and one linear segment; the mul

tiphase model allows three parabolic segments. The proposed model is smooth, its

first derivative is continuous; the multiphase model equation makes no such guaran

The third model was a third-degree polynomial approximation, with four

degrees-of-freedom. The polynomial model curve is shown in Equation 5.3. A third-

degree polynomial approximation provides enough flexibility to model the s-curve

observed. It also provides a model of approximately the same degrees-of-freedom as

the proposed model. Again, no interpretation of the individual model parameters is

suggested.

polynomialt = at^ + bt'^ + ct + d (5.3)

5.2 Model Fitting Results

Evaluations of each metric for each project were performed. The three alter

native models described above and the proposed model were used. Model parameters

were found for each to minimize R^. Representative projects are discussed below.

Figure 5.1 shows progress measured via accumulated HLC and model curves

for project nine. As expected, the hnear model provides a poor fit for the data and the

350000

300000^

250000

200000

150000

100000

+ rreasued progess

Imear (381,596,000)

pQl>TOnial (35,019,000)

proposed (66,508,000)

mJtiphase (6,711,000)

50 100 150 200 250

Figure 5.1: Progress measured via accumulated Halstead length change (HLC) for project nine and progress model curves (with R^).

multiphase model fits the data very accurately. Both the polynomial and proposed

models provide fits between the hnear and multiphase models. The polynomial model

exhibits wild "swings" near the ends. These swings are typical of polynomial curves

which tend to favor data points near the center rather than the ends. In this case,

the polynomial model suggests a "negative" amount of accumulated work had been

accomplished until about day forty of the project. Similarly, it indicates reverse-

progress begins to occur around day 220. In almost aU cases, these polynomial model

swings suggest negative progress occurs at the beginning and end of the project.

Because of this, using the polynomial model to make predictions would be difficult

and error-prone.

The average squared residual error {R^) for each model is given in the legend

of Figure 5.1. The values agree with a visual assessment of the fit except in the

case of the polynomial model. While the lower W for the polynomial model is more

desirable, the polynomial fit suffers extensively from undesirable swings near the

ends. These swings violate a basic expectation of accumulated progress, that is it

should be monotonically increasing. While the proposed model has a larger R^, it is

monotonically increasing and behaves as expected. This behavior appears to support

in-project predictions better than the polynomial model.

In this project, most of the proposed model R'^ can be seen to occur in the first

third of the project. Each of the other three metrics show similar results; this may

indicate early efforts are not as efficiently captured by the metrics as later efforts. It

may also indicate the project pace was unpredictably high (in violation of the implied

model) during the later part of the project. Without additional information about

the project, or its context, a determination cannot be made.

Figure 5.2 shows progress measured via accumulated MCC and model curves

for project ten. In addition to the general observations noted above, two observa

tions can be made from this graph. First, discontinuities in the multiphase model

are visibly apparent, occurring at about day thirty and 72. Discontinuities severely

reduce the predictive value of a model because no influence occurs across a disconti-

o-t*±-

+ rreasiredprogESS

linear (8561)

pol>nDmal(1434)

pnopo6ed(7D6)

miliphase (324)

Figure 5.2: Progress measured via accumulated cyclomatic complexity change (MCC) for project ten and progress model curves (with R^).

nuity. In addition, the discontinuity locations cannot be predicted without all data.

Second, unlike project nine, R^ for the proposed model is less than half the R'^ for

the polynomial model.

Implementation progress for this project follows closely the proposed model

curve. This suggests the process used was healthy and the project was consistently

"on track". Figure 5.3 shows weekly accumulated progress measured via MCC and

the velocity model curve for the project. Directly measuring velocity is not possible

because measurements taken represent progress. Due to the sporadic nature of source

+ measured data

— velocity model

o _o u >

Figure 5.3: Weekly progress measured via accumulated cyclomatic complexity change (MCC) for project ten and velocity model curve.

submissions determining average velocity is difficult. Accumulated weekly progress is

used as a surrogate for velocity. Again, the model curve suggests the project was well

managed. The model indicates implementation velocity reached a sustainable level

within the first month of development. This continued for about three weeks, then

the velocity began to decrease. The relatively long deceleration period may be worth

investigating, it may indicate the project experienced trouble finishing. However,

considering the short development period, it may indicate the startup was unusually

efficient. Without additional context information a determination cannot be made.

jX* J/

]__ Y _,

4*'- i

• rrEasued pcogess

linear (4S4,400)

pdjmmal (468,400) , , , , ^ .jari / / w ; /inn\ proposeu ('HUCJ.'HOU) inJtiphase (81,400)

1 1 ; 20 40 60 80 100

120 140 160 180

Figure 5.4: Progress measured via accumulated code churn (CHURN) for project fifteen and progress model curves (with R^).

5.3 Anomalous Data Sets

Results from some data sets were ambiguous. For example. Figure 5.4 shows

progress measured via accumulated CHURN and model curves for project fifteen.

Neither the polynomial nor the proposed model provides a significant improvement

over the linear model. Numerically, the multiphase model is significantly better than

the others, however it suffers extensively from a discontinuity around day fifty.

It is worth noting the large gaps in collected data around days fifteen and

fifty, and the periodic gaps of one to two days near the beginning of the project.

50 100

+ measired ppqgtss

linear (9837)

— polynDmal (6566)

— proposed (8341)

mitiphase(1759)

150 200

Figure 5.5: Progress measured via accumulated cyclomatic complexity change (MCC) for project four and progress model curves (with R'^).

A closer look at the raw data shows the periodic gaps occur on weekends and the

larger gaps correspond to Thanksgiving and Christmas holidays respectively. It is

reasonable to assume no work was done during those times. Unfortunately, employee

billing information is not available for the project so this cannot be confirmed.

Figure 5.5 shows progress measured via accumulated MCC and model curves

for project four. Again, the multiphase model provides a substantial improvement

over the other models. In this case, the polynomial model reduces R^ by about

one-third, but the proposed model gives a smaller improvement.

The project appears atypical in a number of ways, suggesting the project may

have been distressed in someway. First, the last datum in the series is almost two

weeks after the rest of the data. This suggests an unexpected round of changes were

made just before shipping. Second, excluding the final datum, implementation ve

locity does not appear to decrease near the end. Days 170 through 180 show the

velocit}' increased over the prior thirty days. This suggests either primary imple

mentation was continuing or an unusually large number of defects were found. Third,

implementation velocity seems to oscillate. Without additional information about the

context of the project no specific diagnosis can be reached, however some alternatives

bear investigation. The project may have suffered changing goals or personnel. If the

development team and goal were not in flux, the implementation process should be ex

amined. The changing velocity suggests detailed implementation may have preceded

design, which resulted in a lower velocity later as the design was revised.

5.4 Model Applicability

In all cases the proposed model reduces R'^ compared with the linear model.

This is expected since it has an additional degree-of-freedom. When the reduction is

minimal the improvement given by the proposed model may be due to the additional

parameter rather than the appropriateness of the model. So, reducing R'^ is not

enough to indicate the proposed model should be used to interpret a particular data

On the other hand, in many cases the proposed model substantially reduces

R^ when compared with a linear model. In a few of these cases the reduction in W is

almost to the level achieved using the multiphase model. In these conforming cases

the proposed model provides a meaningful interpretation of the data.

The i?2 for eacli metric is given in Tables 5.1 through 5.4. Figures 5.6 through

5.9 show i?2 relative to the linear model R~ for each project. To improve viewing

projects are ordered by polynomial model relative R'^.

Consider the proposed model i?^ compared with the linear model i?^. In cases

where the proposed model substantially reduces R'^, the model gave improved results

with the addition of a single parameter. This substantial improvement suggests the

data conforms to the model and the results may be relied upon to correctly interpret

the data. That is, it is reasonable to assume the flt parameters (s, tp, and tq) are a

true representation of implementation progress for the project. In the non-conforming

cases, where the reduction is less significant, the model may not be appropriate and

the results should only be used judiciously. Based on the available projects, the

author suggests that the proposed model may be relied upon when the R^ is at most

half that of the linear model.

Projects may be non-conforming due to noise in the metric data. Almost

all projects exhibit noticeable one to two day "pauses". These are especially common

during the first haff of each project and repeat with a seven-day period. Some projects

also include larger periods when no apparent progress is made during a holiday break.

Table 5.1: Average squared residual error (i?^) for implementation progress models measuring source lines of code changed (SLOCC) by project.

Project

samples

linear

0.0667

1.8269

3.8133

1.0997

0.0316

8.6590

12.3603

0.0915

11.3106

0.4575

0.0111

2.4112

0.4516

0.1256

2.9719

6.3210

1.9231

Model i?2

polynomial

0.0482

1.6747

0.9391

0.5958

0.0108

2.0236

3.0171

0.0245

1.1201

0.0738

0.0043

0.8349

0.1139

0.1215

1.1158

2.6441

0.3513

(xlO^)

proposed

0.0434

1.1458

0.6371

0.6843

0.0094

2.4551

4.0937

0.0239

2.1672

0.0386

0.0037

0.8790

0.1331

0.1074

1.2222

1.7401

0.1566

multiphase

0.0094

0.4171

0.0762

0.1176

0.0021

0.2863

0.7440

0.0028

0.2127

0.0147

0.0017

0.1589

0.0181

0.0210

0.2215

0.2475

0.0414

Table 5.2: Average squared residual error {R^) for implementation progress models measuring code churn (CHURN) by project.

Project

samples

linear

0.3485

22.0988

19.6692

2.6183

0.2008

19.7174

36.1142

0.2487

49.1829

8.9162

0.0362

9.4001

1.7914

0.4844

11.0583

20.5449

7.8927

Model i?2

polynomial

0.2205

8.2642

4.9855

1.7401

0.0530

5.3214

9.8627

0.0674

4.7964

2.8376

0.0203

3.9973

0.4591

0.4684

3.1825

8.5899

1.7450

(xlO^)

proposed

0.1601

15.6206

3.4455

2.0264

0.0590

5.7359

12.7864

0.0549

9.3587

3.0023

0.0155

3.2209

0.5131

0.4064

3.3451

5.9072

0.8902

multiphase

0.0289

1.7608

0.2821

0.5516

0.0119

0.8605

1.9951

0.0087

0.8686

0.0517

0.0077

0.7597

0.0593

0.0814

0.7040

0.5758

0.1993

Table 5.3: Average squared residual error (i?^) for implementation progress models measuring cyclomatic complexity change (MCC) by project.

Project

samples

linear

0.4256

4.5927

17.0381

0.9837

0.0967

18.8760

29.4597

0.1469

23.4767

0.8561

0.0430

2.8732

0.8285

0.4218

5.9287

0.5985

2.6889

Model i?2

polynomial

0.2388

4.4837

4.3102

0.6566

0.0332

4.4041

6.1758

0.0378

2.3326

0.1424

0.0167

1.9251

0.1789

0.3137

2.6732

0.3361

0.5030

(xlO^)

proposed

0.1840

3.4801

3.4872

0.8341

0.0330

5.1594

8.5235

0.0375

5.7877

0.0706

0.0222

1.5800

0.1859

0.3645

2.5447

0.3670

0.4154

multiphase

0.0327

1.5498

0.3541

0.1759

0.0062

0.6829

1.7267

0.0054

0.5433

0.0324

0.0045

0.5812

0.0312

0.0394

0.4673

0.0751

0.1015

Table 5.4: A\erage squared residual error {R^) for implementation progress models measuring Halstead length change (HLC) by project.

Project

samples

linear

0.2576

4.8578

9.6200

1.1556

0.0721

26.8214

45.4181

0.3707

38.1596

1.4100

0.0316

3.3893

2.1188

0.4676

7.6514

8.3263

3.9044

Model i?2

polynomial

0.1478

3.9031

2.5890

0.8592

0.0232

6.3090

13.1349

0.0945

3.5019

0.2470

0.0103

1.1128

0.6570

0.4521

3.8954

3.8056

0.7697

(xlO^)

proposed

0.1036

2.7345

1.7247

0.9446

0.0260

7.6396

16.5184

0.0902

6.6508

0.1229

0.0103

0.9574

0.8304

0.3730

3.5683

3.7480

0.5633

multiphase

0.0170

1.1013

0.1905

0.2056

0.0067

0.8003

2.8797

0.0097

0.6711

0.0521

0.0043

0.3244

0.0895

0.0812

0.6345

0.3882

0.1198

linear polynomial

proposed ni t iphase

Figure 5.6: Source lines of code change (SLOCC) average squared residual error (i?2) relative to linear R^.

linear polynomial

proposed mjiti phase

Figure 5.7: Code churn (CHURN) average squared residual error {R^) relative to

linear R^.

Figure 5.8: Cyclomatic complexity change (MCC) average squared residual error (i?2) relative to linear /?2

linear polynorrial

proposed nxltiphase

Figure 5.9: Halstead length change (HLC) average squared residual error {R^)

relative to linear R"^.

Both of these phenomena can be seen in projects la and fifteen. Using work days

instead of calendar days would eliminate a major cause of time-related noise. Several

projects include substantial, sudden, and anomalous progress. In all cases where

these events were examined closely, the anomaly has proven to be the result of an

unfortunate side-effect of the data collection procedure. A commitment to collect

the needed data during the project could reduce noise by allowing anomalies to be

detected and corrected while any needed information is still available.

5.5 Conforming Cases

Tables 5.5 through 5.8 show the model parameters for each project and met

ric. Projects are ordered by i?2 relative to linear 'W and a line divides conforming

data sets from nonconforming ones. About three-quarters of the projects studied

show substantial reductions in R^ and are considered conforming cases. With only

seventeen projects and no independent data available, few definite conclusions can be

reached, however several items are worth noting.

The fewest conforming cases occur in Table 5.7, indicating that MCC may be

more discriminating with respect to implementation progress. This is likely due to

differences in emphasis. All the metrics studied have been shown to be essentially

equivalent for static analysis. However, studies have not been conducted on typical

change of these metrics over time. Consider that MCC counts entry points (functions)

Table 5.5: Source lines of code change (SLOCC) progress model parameters and R^ relative to linear R^.

Project

relative R^

Model parameters

k 64.3

Table 5^: Code churn (CHURN) progress model parameters and ^ relative to linear R^.

Project

relative R^

1028.0

parameters

Table 5.7: Cyclornatic complexity change (MCC) progress model parameters and R^ relative to linear i?^

Project

relative R'^

parameters

Table 5.8-JIalstead length change (HLC) progress model parameters and W relative to linear R^.

Project

relative R'^

1220.4

2752.8

1859.2

1065.1

2075.7

1078.1

1886.1

1482.8

1516.0

parameters

and decision points, while tlie other metrics measure length. It may be that progress

measured via MCC differs in shape.

In about half the conforming cases, the model indicates tp and tq are essentially

the same. In these cases, the model tq - tp is close to zero suggesting steady progress

did not occur, either implementation was accelerating or decelerating. This could

indicate development under a tight schedule or a process that could be improved.

It is also interesting to note that in these cases the model was able to substantially

reduce R^ while effectively using only two parameters.

In conforming cases where tq — tp is much larger than zero, the model indicates

steady, sustained implementation occurred between tp and tq. In these cases, the im

plementation velocity (s) can be stated with great confidence. Velocity is a surrogate

for productivity in the dimension measured by the specific metric. For example. Table

5.5 shows project six averaged over 360 lines of new code per calendar day during a

two month period.

In the projects studied, implementation velocity (s) varies by more than an

order of magnitude. While part of this variation is due to the number of engineers

assigned to the project, likely some is due to proficiency. This is consistent with

antidotal evidence that individual programmer productivity varies by as much as an

order of magnitude.

CHAPTER VI

CONCLUSIONS

Interpreting implementation progress measurements is difficult. A simple

model is needed to provide a framework to help interpret the data. We have developed

a piecewise approximation based on a three-phase model of linear implementation ve

locity. The model corresponds well to our intuition of how project progress occurs. It

identifies project phase boundaries as well as the velocity of implementation during

each phase. Furthermore, the progress model allows objective comparisons of project

velocity between projects and easily supports estimating.

The progress model fits the available sample data much better than a linear

model. With only one additional degree-of-freedom, the model produces fits with ap

proximately two-thirds less error than a linear fit. When compared with a polynomial

fit, the progress model performs at least as well as a polynomial model which has one

additional degree-of-freedom.

6.1 Limitations

The progress model presented here only considers non-maintenance implemen

tation. Projects with clear delivery dates, after which continuing development is not

planned, fall into this category. Projects in maintenance or under continuous devel

opment may not exhibit phases similar to projects with firm end dates.

Any model is only as good as the data on which it is based. Errors were discov

ered in both dimensions of the sample data. Spurious data entries were occasionally

introduced due to the check-in process used. Similarly, project billing information

could have helped improve the quality of the time data collected.

6.2 Interpretation

Turski, Fenton and Neil suggest causal models should be used to interpret

software metric data (Turski 2002; Fenton and Neil 2000). A causal model may

incorporate an explanation for observed relationships using an analogy. A power

ful analogy addresses both the reason for typical observations and the relationships

between them.

An inertia analogy can be used to explain the observed project progress data.

(Marasco 2002). If the implementation phase of a project possessed inertia, the re

sults obtained are expected and even necessary rather than arbitrary. In this view,

the project process itself has mass and thus inertia. Examples of overcoming process

inertia are learning to work together, establishing and understanding roles, refining

and specifying the project goal. Activities prior to implementation may also represent

overcoming inertia. Examples of these activities include software design, acquiring

domain knowledge, and skill development. Inertia prevents the project from springing

forth instantly "up to speed". Similarly, the same inertia prevents the project from in

stantly halting at the finish. In this instance, inertia can be seen in such tendencies as

"feature creep" and the general dislike of engineers for producing releases. Implemen

tation progress follows the curve as if it were an accelerating (and decelerating) mass.

In both cases, effort must be exerted to change the project pace. An inertia-based

causal model for implementation progress is simple, intuitive and understandable.

6.3 Future Work

This work provides a sound basis for further study in this area. Application

of the progress model to continuous development projects should be investigated.

Possibly a data mining solution may be useful in those cases. Martin and Yohai

suggest slope changes may be found in time-series data using a technic originally

developed to detect outliers (Martin and Yohai 2001). Elimination of outliers could

be used to detect or eliminate spurious data entries.

Another line of study takes advantage of the stability of the model for making

predictions. Estimating project parameters such as final size, delivery date, develop

ment pace, etc. during implementation should be investigated. Similarly, the effect of

project properties, such as number of engineers, experience level, domain familiarity,

length of project, etc., on the model parameters should be studied.

Additionally, investigation of other metrics as a basis for measuring progress

should be undertaken. If a size metric for object-oriented software were developed,

investigating its use as a basis for a change metric would be very valuable. Variations

of existing metrics better tuned to capture change should be studied. One example

of this type of metric is the sum of MCM of all changed functions, rather than simply

the change in MCM. Another example of improved change metric is the information

distance metric based on Huffman compression.

REFERENCES

Albrecht, A. J. and J. John E. Gaffney (1983, Nov). Software function, source lines of code, and development effort prediction: A software science validation. IEEE Transactions on Software Engineering 5(6), 639-648.

Andersson, T., K. Enholm, and A. Torn. RMC a length-independent measure of software complexity. Reports on computer science and mathematics, Abo Akademi University.

Basili, V. R. and H. D. Rombach (1988). The TAME project: Towards improvement-oriented software environments. IEEE Transactions on Software Engineering 14 {6), 758-773.

Boehm, B. W. (1988). Spiral model of software development and enhancement. IEEE Computer 21 {3), 61-72.

Boraso, M., C. Montangero, and H. Sedehi (1996). Software cost estimation: an experimental study of model performances. Technical Report TR-96-22, Universita di pisa, departmento di informatatica.

DeMarco, T. (1982). Controlling Software Projects Management, Measurement and Estimation. Inglewood Cliffs, NJ: Yourdon Press.

El-Eman, K. (2000, June). A methodology for validating software product metrics. Technical Report NRC/ERB-1076 44142, National Research Council Canada, Institute for Information Technology.

Fenton, N. E. (1991). Software Metrics: A Rigorous Approach. London: Chapman

and Hall.

Fenton, N. E. and M. Neil (1999). A critique of software defect prediction models.

Software Engineering 25 {5), 675-689.

Fenton N. E. and M. Neil (2000, June). Software metrics: roadmap. In Proceedings

of the conference on The future of Software Engineering, pp. 357-370. ACM

Press.

Goel, A. L. and K. Okumoto (1979, August). Time-dependent error-detection rate model for software reliability and other performance measures. IEEE Transaction on Reliability RE-28{3), 206-210.

Halstead, M. H. (1977). Elements of Software Science. New York, NY: Elsevier Scientific.

Henry, S. M. and D. G. Kafura (1981, Sep). Software structure metric based on information flow. IEEE Transactions on Software Engineering 7(5), 510-518.

Huffman, D. A. (1952, Sep). A method for the construction of minimum-redundancy codes. Proceedings of the Institute of Electrical and Radio Engineers 40(9), 1098-1101.

Jelinski, Z. and P. Moranda (1971). Software reliability research. In W. Freiberger (Ed.), Statistical Computer Performance Evaluation, pp. 465-484. Providence, RH: Academic Press.

Jorgensen, M. (1995). Experience with the accuracy of software maintenance task effort prediction models. IEEE Transactions on Software Engineering 21 {8), 674-681.

Kafura, D. and J. Canning (1985, Aug). A validation of software metrics using many metrics and two resources. In Proceedings of the 8th International Conference on Software Engineering, pp. 378-385.

Kirsopp, C. (2001, Apr). Measurement and the software development process. In 12th European Software Control and Metrics Conference, pp. 165-173.

Lake, A. and C. R. Cook (1994, apr). Use of factor analysis to develop oop software complexity metrics. In Proceedings Sixth Annual Oregon Workshop on Software Metrics.

Lehman, M. M., J. F. Ramil, R D. Wernick, and D. E. Perry (1997). Metrics and laws of software evolution-the nineties view. In Proceedings of 4th International Symposium on Software Metrics, pp. 20.

Lind, R. K. and K. Vairavan (1989, may). An experiemental investigation of software metrics and their relationship to software development effort. IEEE Transactions on Software Engineering 15{h), 649-653.

Lott, C. M. (1993, Oct). Process and measurement support in sees. ACM SIGSOFT Software Engineering Notes 18{-i), 83-93.

Marasco, J. (2002, August). Tracking software development projects. Dr. Dobb's Journal.

Martin, R. D. and V. Yohai (2001). Data mining for unusual movements in temporal data. In KDD Workshop on Temporal Data Mining.

McCabe, T. J. (1976, Dec). A complexity measure. IEEE Transactions on Software Engineering S(4), 308-320.

McConnell, S. (1998). Software Project Survival Guide. Redmond, WA: Microsoft Press.

Park, R. E. (1992). Software size measurement: A framework for counting source statements. Technical Report CMU/SEI-92-TR-20, Software Engineering Institute, Pittsburgh, PA.

Powell, A. L. (1998). A literature review on the quantification of software change. Technical report YCS-98-305, University of York, Department of Computer Science.

Schneidewind, N. F. (1999, Nov). Measuring and evaluating maintenance process using reliability, risk, and test metrics. IEEE Transactions on Software Engineering 25 {6), 761-781.

Strelzoff, A. and L. Petzold (2003). Revision recognition for scientific computing: theory and application. In 18th Annual Software Engineering and Knowledge Engineering Conference, pp. 46-52.

Turski, W. M. (2002, August). The reference model for smooth growth of software systems revisited. IEEE Transactions on Software Engineering 28{8), 814-815.

van Solingen, R. and E. Berghout (1999). The Goal/Question/Metric Method. London: McGraw-Hifl Publishing Company.

Woodings, T. L. and G. A. Bundell (2001, April). A framework for software project metrics. In Proceedings of the ESCOM 2001.

PERMISSION TO COPY

In presenting this thesis in partial fulfilhnent of the requirements for a master's

degree at Texas Tech University or Texas Tech University Health Sciences Center, I

agree that the Library and my major department shall make it freely available for

research purposes. Permission to copy this thesis for scholarly purposes may be

granted by the Director of the Library or my major professor. It is understood that any

copying or publication of this thesis for financial gain shall not be allowed without my

further written permission and that any user may be liable for copyright infringement.

Agree (Permission is granted.)

Smdfent Signature Date

Disagree (Permission is not granted.)

Student Signamre I te

A SOFTWARE IMPLEMENTATION PROGRESS MODEL by DWAYNE …

Documents