Date post: | 06-Jul-2015 |
Category: |
Software |
Upload: | mariana-azevedo |
View: | 189 times |
Download: | 0 times |
Similar Characteristics of Internal Software Quality
Attributes for Object-Oriented Open-Source Software Projects
PqES- Grupo de pesquisa em Engenharia de Software
http://pesquisa.dcc.ufla.br/pqes/index.php/home-english
Universidade Federal de Lavras – UFLA – Brasil
Mariana de Azevedo Santos Rodrigo Amador
Heitor Costa Paulo Henrique S. Bermejo
AGENDA
Introduction Objective Background
Software domains Cluster analysis
Methodology Results Related work Conclusion
Threats of validity Future work
2
INTRODUCTION
Software quality assurance: vital component for software development
Organization concerns: low cost and high quality products
Quality assurance are necessary, but costly: expends more 50% of the budget of projects
What to do? Find efficient and cheap methods to get information about quality on software projects
3
INTRODUCTION (CONT.)
How to predict internal software quality? Measuring their internal attributes (source code quality or its
complexity [ISO/IEC 25010 2011])
Why OO measures (metrics)?
Expectation: OO code should have high quality and maintainability (organized in classes)
Why measure OO projects? Only empirical studies on the structure of real systems can
provide tangible answers about the project’s quality
4
OBJECTIVE
Identify similar characteristics among project structures, considering their different domains through software metrics
RQ: Do the software domains have structural similarities with each other in aspects such as modularity, abstraction, stability, complexity, and specialization?
Assumption: software components with similar attributes will have similar quality characteristics
5
SOFTWARE DOMAINS
Software projects can be classified into different categories, related to different application domains
Content: input/output, determining the nature of an application or domain
Generic domains: as software complexity grows, specific domain characteristics
become unclear
Unclear: means that domains can have similar or dissimilar characteristics!
6
SOFTWARE DOMAINS (CONT.)
7
SOFTWARE DOMAINS (CONT.)
8
Science &
Engineering
Development
System Administration
Audio & Video
Home & Education
Security & Utilities
Graphics Communication
Business &
Enterprise All Sourceforge domains that have
web-based software
System software
Business software
Eng. and scientific software
Web-based software
AI software
Personal Computer Software
Games
Pressman’s approach
CLUSTER ANALYSIS
Find similar groups of objects among themselves and different from other groups of elements
Phases:
1. Selection of entities: 150 software
2. Selection of grouping attributes: metrics
3. Selection of clustering algorithm: K-means, Expectation-Maximization, hierarchical clustering
4. Data interpretation: final classification
9
CLUSTER ANALYSIS
Find similar groups of objects among themselves and different from other groups of elements
Phases:
1. Selection of entities: 150 software
2. Selection of grouping attributes: metrics
3. Selection of clustering algorithm: K-means, Expectation-Maximization, hierarchical clustering
4. Data interpretation: final classification
10
Macro: KM-Euclidian
distance Micro: KM-Manhattan
METHODOLOGY
Sample characterization:
LOC: 12,178,587
Number of classes: 69,334
Repositories: Github and Sourceforge
11
METHODOLOGY (CONT.) Tools for metrics extraction: Eclipse plugins
Tool for data analysis:
Weka
Parameters: metrics
Data analysis: Hyphoteses (H0 and H1) about software quality measures relationships
12
TOOLS MEASURES
Metrics VG, WMC, NOVM, NOC, DIT, SIX,
LCOM, CA, CE, RMI, RMA, NC, NOM,
NOA
Vizz Maintenance CBO, RFC, MPC, DAC, TCC, LOC
METHODOLOGY (CONT.)
What is expected? Example:
For H-Depth and descendants: H0: Software domains that have classes located deeper in the inheritance hierarchy (less abstract) != software domains that have less deep classes in the inheritance hierarchy (more abstract) H1: Metrics selected are not capable of identifying characteristics among
domains in relation to inheritance and abstraction. This means that the behavior on inheritance and abstraction are similar for all domains.
The hypothesis is partially validated if a pair of metrics presents the expected behavior.
13
RESULTS
For micro-categories and macro-categories:
H1-Coupling and cohesion, H2-Complexity and inheritance, H5-
Complexity and overriding were not validated
H3-Depth and descendants (DIT and RMI), H4-Complexity and size (WMC and LOC) were partially validated
H6-Abstraction and stability (RMA and RMI) was fully validated
Macro-categories: despite having a smaller error for the same analysis for micro categories, these solutions are less heterogeneous
14
RESULTS: MICRO-CATEGORIES DIT (axis X) x RMI (axis Y):
15
RESULTS: MICRO-CATEGORIES DIT (axis X) x RMI (axis Y):
16
Cluster 0
Seem to have a more balanced relationship
between abstraction and inheritance
Major Domains: Science
& Engineering (SE), Business & Enterprise
(BE)
RESULTS: MICRO-CATEGORIES DIT (axis X) x RMI (axis Y):
17
Cluster 1
Tend to have a higher
average number of descendants
Major Domains: Security & Utilities (SU), Graphics
(GPH)
RESULTS: MICRO-CATEGORIES DIT (axis X) x RMI (axis Y):
18
Cluster 2
Tend to have few abstract
classes (more stability) and few descendants
Major Domains: Development
(D), Communication (C)
RESULTS: MICRO-CATEGORIES DIT (axis X) x RMI (axis Y):
19
Cluster 3
Have more stable classes than the other clusters do
Major Domains: Games (G) and
System Administration (SA)
RESULTS: MACRO-CATEGORIES DIT (axis X) x RMI (axis Y):
20
RESULTS: MACRO-CATEGORIES DIT (axis X) x RMI (axis Y):
21
Cluster 0
Tend to have few abstract classes (more
stability) and few descendants
Major Domains: PCS, SS, AIS or Audio & Video, Communication, Graphics,
Development, Games...
RESULTS: MACRO-CATEGORIES DIT (axis X) x RMI (axis Y):
22
Cluster 1
Tend to have a higher
average number of descendants
Major Domains: PCS, SS, BS
or Audio & Video, Communication, Graphics, Development, Business &
Enterprise...
RESULTS: MACRO-CATEGORIES DIT (axis X) x RMI (axis Y):
23
Cluster 2
Have more stable classes than the other clusters
do
Major Domains: PCS, SS, AIS, ESS or Audio & Video, Communication, Graphics, Development,
Games, Science & Engineering...
RESULTS: MACRO-CATEGORIES DIT (axis X) x RMI (axis Y):
24
Cluster 3
Seem to have a more balanced relationship
between abstraction and inheritance
Major Domains: PCS, BS, AIS,
ESS or Audio & Video, Communication, Graphics,
Business & Enterprise, Games, Science & Engineering...
RELATED WORK [Romano; Pinzger 2011]: specific metrics for Java interfaces has strong
correlation with changes in abstract and concrete classes of OO projects
[Malviya; Yadav 2012]: clustering to identify OO sustainable systems
[Jehad Al Dallal 2013]: internal qualities attributes (size, cohesion and coupling) x external quality attribute (maintainability in classes)
[Souza; Maia 2013]: reference values for a set of coupling metrics,
considering software domains Our study:
Proposes a model that explains the similarity among domains in OO
internal software quality It provides a more immediate view of the trends and characteristics
of internal Java software quality
25
THREATS TO VALIDITY
Construct validity: other measures could be relevant and it is possible that the technique of clustering is not sufficient to completely validate and detect characteristics inherent in software projects developed in object-oriented Java
Internal validity: do not provide in-depth technical details of the projects. Exemple: code inspections
External validity: the study analyzes only OO software developed in Java
26
CONCLUSIONS
Some specific domains tend to have similarities relating to four properties (abstraction, stability, complexity, and specialization).
In general: Systems in the SU and GPH domains tend to have few descendants and
few abstract classes
Software in the D and C domains can have similar characteristics in inheritance and abstraction, with a higher average number of descendants
Software in the SE and BE domains tend to make good use of inheritance
Software in the G and SA domains can have more stable classes and are harder to maintain due to the lack of flexibility in changes
27
CONCLUSIONS (CONT.)
Contribution to SE:
Provide observations of structural aspects of OO development, such as
specialization, stability, abstraction, and complexity
Metrics such as WMC, DIT, LOC, RMI, and RMA are relevant to the characterization of Java internal software quality (similarity between domains)
For software developers, the study shows that some domains (for
exemple, G and SA) tend to have the same characteristics and that more efforts in these aspects are necessary so that systems can continue to be maintainable
28
FUTURE WORK
We suggest:
Repeating the analyses on a larger sample of software
The use of other repositories of available projects
The use of other metrics to obtain new results on characteristics that have not yet been explored
29
Thanks!
PqES- Grupo de pesquisa em Engenharia de Software
http://pesquisa.dcc.ufla.br/pqes/index.php/home-english
Universidade Federal de Lavras – UFLA – Brasil
Mariana de Azevedo Santos Rodrigo Amador
Heitor Costa Paulo Henrique S. Bermejo
REFERENCES
ISO/IEC 25010 (2011) Systems and Software Engineering - Systems and Software Quality Requirements and Evaluation - System and Software Quality Models.
Dallal, J. A. (2013) Object-Oriented Class Maintainability Prediction Using Internal Quality Attributes. In: Inf. Software Technology 55, 11. pp. 2028-2048.
Malviya, A. K.; Yadav, V. K. (2012) Maintenance Activities in Object Oriented Software Systems Using K-Means Clustering Technique: A Review. In: Sixth International Conference on Software Engineering, pp. 1-5.
Romano, D.; Pinzger, M. (2011) Using Source Code Metrics to Predict Change-Prone Java Interfaces. In: International Conference on Software Maintenance, pp. 303-312.
Souza, L. B. L. de; Maia, M. de A. (2013) Do Software Categories Impact Coupling Metrics? In: Working Conference on Mining Software Repositories. pp. 217-220.
31