Date post: | 10-May-2015 |
Category: |
Software |
Upload: | rafat-al-msiedeen |
View: | 102 times |
Download: | 1 times |
AN APPROACH TO RECOVER FEATURE MODELS FROM
OBJECT-ORIENTED SOURCE CODE
R. AL- Msie’deen, A. Djamel Seriai, M. Huchard, C. Urtado, S.
Vauttier, and H. S. Eyal Salman
Outlines:
INTRODUCTION.
SOFTWARE PRODUCT LINE ENGINEERING (SPLE).
FEATURE AND FEATURE MODEL (FM).
FORMAL CONCEPT ANALYSIS (FCA).
INFORMATION RETRIEVAL (IR) & LATENT SEMANTIC INDEXING (LSI).
MOTIVATIONS.
APPROACH OVERVIEW => THE MAPPING MODEL + FEATURE EXTRACTION PROCESS.
OBJECT-ORIENTED SOURCE CODE VARIATIONS.
OBJECT-ORIENTED BUILDING ELEMENTS (OBE).
COMMONALITY AND VARIATION IDENTIICATION USING FCA.
ATOMIC BLOCK OF VARIATIONS (FEATURE) IDENTIFICATION USING LSI AND FCA.
RELATED WORK.
CONCLUSION.
FUTURE WORK.
REFERENCES.2
Introduction:
Software product variants are a set of similar products that are developed by
copy-paste-modify technique not by software product line (SPL) strategy.
Copy - paste - modify
Software product variants represent a starting point to build software product line
(SPL) [FIG 08]
3
Software Product Line [SPL]:
A SPL is "a set of software intensive systems sharing a common, managed set of features that
satisfy the specific needs of a particular market segment or mission and are developed from a
common set of core assets in a prescribed way" [CLE 01]
Products Features
Domain
Core Assets Production Plan
4
Software Product Line Engineering [SPLE]:
SPLE consists in two major steps: [CLE 01]
1. Domain Engineering: Core Assets + Feature Model.
2. Application Engineering: Product Configurations.
5
Feature and Feature Model:
The Feature is a system property relevant to some stakeholder used to capture commonalities or
discriminate [variations] among systems in a family [CZA 00]
Feature models FMs are tree-like graph of features and relationships among them. FMs in SPLE
are used to represent commonality and variability of SPL members at different levels of
abstraction [POH 10]
Feature
Feature Models
Optional Feature Mandatory Feature
6
Feature and Feature Model:
Text editing system 1 FM
Text Editing System
File Management
Basic
Help
Edit
Basic Edit Select All
Resize Case conversion
Clear
Read Only Font ColorSearch Split
Require
ExcludeAND
OR
XOROptional Feature
Mandatory Feature
Legend
Change Display Settings
Replacement
Unsplit AllHorizontal Vertical
require
exclude
1 http://www.lirmm.fr/TextEditingSystemSPL 7
Formal Concept Analysis (FCA):
FCA is a mathematical method that provides a way to identify "meaningful groupings of objects
that have common attributes“ [LOE 07]
A formal context is a triple K = (O, A, R) where O and A are sets (objects and attributes,
respectively) and R is a binary relation, i.e., R ⊆ O × A.
Galois lattices [BAR 70] and concept lattices [GAN 99] are core structures of a data analysis
framework (Formal Concept Analysis, or FCA for short) for extracting an ordered set of concepts
from a dataset, called a Formal Context, composed of objects described by attributes.
8
Formal Concept Analysis (FCA):
Class (Open) Class (Close) Class (Edit) Class ( Print) Class (Select All) Class (Red) Class (Green) Class (Blue) Class (Black)
Product 1 x x x x x x
Product 2 x x x x x
Product 3 x x x x x x
Product 4 x x x x x x
Product 5 x x x x x x
A formal context describing product variants by Source code elements
The concept lattice for the formal context of Table above
Common Concept
Concept shared by two Products Concept specific for one product
9
Information Retrieval (IR) & Latent Semantic Indexing (LSI):
INFORMATION RETRIEVAL (IR) has proven useful in many disciplines such as software
maintenance and evolution, image extraction, speech recognition and horizontal search
engines like Google. Furthermore feature location is one of the most common applications of IR
in software engineering [DAV 11]
LATENT SEMANTIC INDEXING (LSI) assumed that there are some implicit relationships among
the words of documents that always appear together even if they do not share any terms; that is
to say, there are some latent semantic structures in free text [DAV 11]
The effectiveness of IR methods is measured using IR METRICS: RECALL and PRECISION.
10
Latent Semantic Indexing (LSI):
In our work, we consider the most widely used threshold for cosine similarity that equals to 0.70
[MAR 03]
In LSI all information must be manipulated and normalized to become suitable as input of LSI.
This preprocessing step include: all capital letters must be transformed into lower case letters,
removing stop words (such as: numbers, etc.), all Documents must be split into terms and
performing word stemming.
Similarity between Documents is described by similarity matrix. The similarity is computed based
on cosine similarity.
11
Motivations:
Many companies at first develop a number of similar software products without explicitly
planning for strategic reuse. Once released, if the product is SUCCESSFUL and meets the market,
similar products are to be developed [JOH 09]
INDIVIDUAL SYSTEMS
A B C D E F G
A B V G O J L
A B F D E R X
A B C D E F G
A B V G
A B F D E
SOFTWARE PRODUCT LINE
12
Motivations:
Creating manually a feature model for an existing system is time-consuming, error-prone, and
requires substantial effort from a modeler [SHE 11]
13
Motivations:
REVERSE ENGINEERING FM from source code aims to improve product maintenance, ease
system migration [CHI 90], and the extracted feature model may lead to the production of
new products.
Feature Model
P1P2
Product Variants
REVERSE ENGINEERING FM
14
Motivations:
The general OBJECTIVE of our approach is to EXTRACT INITIAL FM which model common and
variable features of product variants. We present IN THIS PAPER the part concerning about
FEATURE IDENTIFICATION from the OO source code of product variants using FCA and LSI.
We Assumed in this paper That The Product
Variants Use The Same Vocabulary To Name
Packages, Classes, Attributes And Methods
In Its Source Code.
15
Motivations:
Reverse engineering a feature model from source code for a set of product variants make
system features and dependencies explicit and clear.
There are needs to extract feature models, especially from source code the most important
source of information, where features and dependencies are hidden.
16
Approach Overview:
Optional Feature
Product Variants
Product
Source code elements
Package
Source code
Source code variation Block of variations
1..*Has a 1
Class
Attribute
Method
1..*
1..*
Atomic block of variations
Feature
1..*
1..*
1..*
1
1correspond
Feature Model
1..*
Mandatory Feature1..*
1..*
1..*
A. The Mapping Model:
To identify features we rely on a mapping model between these features and object-oriented
building elements (OBE).
Object-oriented source code variations
17
Approach Overview:
For object-oriented source code, the mandatory features are realized by OBE that are common
to all product variants.
The optional features are realized by variable OBE that can appear in some product variants or
in single product but not all product variants.
We consider that a feature corresponds to one and only one set (group) of OBE. This means
that a feature always has the same implementation in all products where it is present.
18
Approach Overview:
As a feature corresponds to one and only one set of OBE, then an optional feature is
implemented by the same set of Variables OBE (VOBE) in all products where it is present.
We define a block of variations (BV) as a set of VOBE which are always associated (i.e., which are
always identified together in all the products in which they appear).
The subsets of VOBE that belong to a BV and represent one and only one feature are called
Atomic Blocks Of Variations (ABV). A BV is composed of set of ABVs. To determine its various
parts (sub-groups), we rely on the clustering of the closest VOBEs considering the similarity
measures that are related to LSI method.
19
Approach Overview: An illustrative example:
20
Approach Overview:
B. Feature extraction process:
The approach that we propose is illustrated in Figure below. Feature extraction process consists
of the following steps:
1. OO Source code is analyzed to extract object-oriented building elements (packages, classes,
methods, attributes) for all product variants.
2. Commonalities and variations are extracted for all product variants using FCA. “Blocks of
variations are given by using FCA”
3. Blocks of variations are divided into atomic blocks of variations. Each atomic block of
variations corresponds to one and only one feature. “using LSI and FCA“
21
Object-oriented source code variations:Package Variation
Package Set Variation Package Content VariationClass Variation
Class Content Variation
Class Signature Variation
Attributes Set Variation
Methods Set Variation
Method Variation
Signature Body
Attribute Variation ( Access Level, Data Type. etc.)
1:
2:
3:
4:
(Name)
(Name)
Relationship
Public , Private, ...Access Level
Access Level
Returned Data Type
Parameters List order & data type
Exception
Local Variable
Invocation
Access
22
Object-oriented source code variations example:
Package Set Variation
Package Content Variation
23
Object-oriented building elements OBE:
in our case each product variant PN is abstracted as a set of OBE as follow:
OBE for PN ={
Package (name);
Class (name, owner);
Attribute (name, owner);
Method (name, owner);
Parameter (name, owner);
Local Variable (name, owner);
Method Invocation (name, accessed in, owner);
Method Exception (name, owner)}.24
Commonality and variation identification using FCA:
Formal context describing text editing systems by object-oriented building elements (OBE)
In the Formal contextproducts constitute therows of the Table.
In the Formal context OBE constitute the columns of the Table.
25
Commonality and variation identification using FCA:
The concept lattice for the formal context of previous Table.
The common block
Block of Variations
26
Atomic block of variations (feature) identification using LSI and FCA:
To identify the atomic block of variations that represent a single feature from a block of
variations, we consider LSI and FCA to recover all atomic block of variations.
27
Atomic block of variations (feature) identification using LSI and FCA:
In our case, each line in the block of variations represents a single document and at the same
time represents a query.
.
.
1 0.70
0.70 1
10 0
0
0
x x
x x
x0 0
0
0
The Similarity Matrix
The Context (Similarity Matrix) For θ= 0.70
28
Atomic block of variations (feature) identification using LSI and FCA:
Concept lattice shows three atomic blocks of variations extracted from one block of variations.
29
Related Work:
Ziadi et al. [ZIA 12] propose an automatic approach for feature identification from source code
for a set of product variants.
30
Their approach only investigates products in which the variability is represented in the name of
classes, methods and attributes, without considering a product lines in which the variability is
mainly represented in the body of methods
Related Work:
Ziadi approach gather all common features as a single mandatory feature under title base
feature.
We use FCA to extract commonalities and variations from product variants and distinguish
between the mandatory features by using LSI and FCA based on the lexical similarity, and
extracts all optional features and constraints such as: "and" and "require".
31
Conclusion:
In this paper, we proposed an approach based on FCA and LSI to extract a features from the
object-oriented source code of software system variants.
FCA can be used to extract common block and blocks of variations.
LSI is used with FCA to recover atomic blocks of variations that represent a single feature, using
the textual similarity.
32
Future Work:
We will use both textual and semantic similarity to determine more precisely each feature
implementation from the OO source code .
We will organize the extracted features as a feature model including all cross-tree constraints
and group of feature constraints, using the information contained in the concept lattice.
We will Integrate our approach with the linguistic matching techniques; in case product variants
use different vocabulary to names packages, classes, attributes, and methods.
33
[ZIA 12] ZIADI T., FRIAS L., DA SILVA M. A. A., ZIANE M., “Feature Identification from the Source Code of Product Variants”, MENS T.CLEVE A. F. R., Ed., Proceedings of the 15th European Conference on Software Maintenance and Reengineering, Los Alamitos, CA, USA, 2012,IEEE, p. 417–422.
[CLE 01] CLEMENTS P. C., NORTHROP L. M., Software product lines: practices and patterns, Addison-Wesley, 2001.
[JOH 09] JOHN I., EISENBARTH M., “A decade of scoping: a survey”, Proceedings of the 13th International Software Product LineConference, Pittsburgh, PA, USA, 2009, Carnegie Mellon University, p. 31–40.
[FIG 08] FIGUEIREDO E., CACHO N., SANT’ANNA C., MONTEIRO M., KULESZA U., GARCIA A., SOARES S., FERRARI F., KHAN S.,CASTOR FILHO F., DANTAS F., “Evolving software product lines with aspects: an empirical study on design stability”, Proceedings of the 30thinternational conference on Software engineering, ICSE ’08, New York, NY, USA, 2008, ACM, p. 261-270.
[CZA 00] CZARNECKI K., EISENECKER U. W., Generative programming: methods, tools, and applications, ACM Press/Addison-WesleyPublishing Co., New York, NY, USA, 2000.
[POH 10] POHL K., BCKLE G., VAN DER LINDEN F. J., Software Product Line Engineering: Foundations, Principles and Techniques,Springer Publishing Company, Incorporated, 1stedition, 2010.
[LOE 07] LOESCH F., PLOEDEREDER E., “Restructuring Variability in Software Product Lines using Concept Analysis of ProductConfigurations”, KRIKHAAR R. L. VERHOEF C. L. G. A. D., Ed., Proceedings of the 11th European Conference on Software Maintenance andReengineering, Amsterdam, Netherlands, March 2007, IEEE, p. 159–170.
[GAN 99] GANTER B., WILLE R., Formal Concept Analysis, Mathematical Foundations, Springer-Verlag, 1999.
[BAR 70] BARBUT M., MONJARDET B., Ordre et Classification: Algèbre et combinatoire, vol. 2, Hachette, 1970.
[DAV 11] DAVID B., LAWRIE D., “Information Retrieval Applications in Software Maintenance and Evolution”, In Encyclopedia of SoftwareEngineering, 2011, p. 454-463.
[SHE 11] SHE S., LOTUFO R., BERGER T., WASOWSKI A., CZARNECKI K., “Reverse engineering feature models”, ICSE, 2011, p. 461-470.
[MAR 03] MARCUS A., MALETIC J. I., “Recovering documentation-to-source-code traceability links using latent semantic indexing”,Proceedings of the 25th International Conference on Software Engineering, ICSE ’03, Washington, DC, USA, 2003, IEEE Computer Society, p.125–135.
References:
34
35
Banking systems example: