Mining Logical Clones in Software:Revealing High-Level Business &
Programming RulesWenyi Qian1, Xin Peng1, Zhenchang Xing2, Stan Jarzabek3, Wenyun Zhao1
1Fudan University, China2Nanyang Technological University, Singapore3National University of Singapore, Singapore
Logical Clones
• may not well documented• revealing high-level rules
Logical Clones
• Logical clones consisting of:–Similar methods–Similar code fragments–Similar entity classes–Persistent data projects
Logical Clones
• Today’s techniques on clone/similarity detection:– Simple clone (text, token, AST…)– Structural clone (simple clone)– Similar design structures (similarity metrics, machine learning)
• They are not enough to detect high-level clones:– lack of high-level information– need of pre-defined templates, such as certain design pattern
Approach Overview
input
abstraction
output
Program Model
• Methods & functional clusters• Entity classes• Code clones• Persistent data objects
Program Model
• Methods & functional clusters– Semantic clustering
Program Model
• Entity classes– Encapsulating information with getter/setter
Program Model
• Code clones– Simple clones in different methods
Program Model
• Persistent data objects– Data tables in DB or data entries in files
Mining Process
PosScreenprocessPay
PosPayCheck
PosScreenprocessPay
PosPayGiftCard
PosClearPayment
PosScreen
<Method> <Method>
<Method>
<Method>
<Method>
<Entity class>
<Entity class>
<Entity class>
Mining Process
PosScreenprocessPay
PosPayCheck
PosScreenprocessPay
PosPayGiftCard
PosClearPayment
PosScreen
<Method> <Method>
<Method>
<Method>
<Method>
<Entity class>
<Entity class>
<Entity class>
Mining Process
Mining Process
PosScreenprocessPay
PosPayCheck
PosScreenprocessPay
PosPayGiftCard
PosClearPayment
PosScreen
<Method> <Method>
<Method>
<Method>
<Method>
<Entity class>
<Entity class>
<Entity class>
Mining Process
Mining Process
PosScreenprocessPay
PosPayCheck
PosScreenprocessPay
PosPayGiftCard
PosClearPayment
PosScreen
<Method> <Method>
<Method>
<Method>
<Method>
<Entity class>
<Entity class>
<Entity class>
Mining Process
Tool: MiLico
Case Study
• Project: Opentaps 1.4.0– 14,351 classes & interfaces– 253,743 methods
• 1690 logical clones mined– at least 3 nodes & 2 instances
Case Study
Categories of Logical Clones
• Categories of Mined Logical Clones (manual work)– Programming Convention (37%)– Design Structure (24%)– Business Task (23%)– Business Process (16%)
Categories of Logical Clones
• Programming Convention– Similar ways to implement similar functions
Categories of Logical Clones
• Design Structure– Similar interaction structures
Categories of Logical Clones
• Business Task– Similar ways to implement similar business task
Categories of Logical Clones
• Business Process– Similar business process or sub-process
Human Study
• 5 senior graduate students, 2 questions:• Helpful for Programming understanding?• Helpful for Reuse/Evolution?
Human Study
Human Study
• 5 senior graduate students, 2 questions:• Helpful for Programming understanding?
YES• Helpful for Reuse/Evolution?
YES
Discussion
• Helpful for reuse, without knowledge of code details
• Developers with good domain knowledge will use logical clones better
• Making MiLiCo integrated with IDEs will make logical clones more useful
Conclusion
• The concept of logical clones• The approach for mining logical clones• The tool: MiLoCo• A case study, showing that logical clones
are helpful in software understanding, reuse and maintainance
Thanks for your attention!