+ All Categories
Home > Documents > Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović...

Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović...

Date post: 11-Jan-2016
Category:
Upload: leslie-curtis
View: 220 times
Download: 3 times
Share this document with a friend
Popular Tags:
16
Detecting software clones in Detecting software clones in binaries binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th Workshop “Software Engineering Education and Reverse Engineering” Sinaia, Romania 24-30 August 2014
Transcript
Page 1: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

Detecting software clones in binariesDetecting software clones in binaries

Zaharije Radivojević, Saša Stojanović, Miloš CvetanovićSchool of Electrical Engineering, Belgrade University

14th Workshop “Software Engineering Education and Reverse Engineering”

Sinaia, Romania24-30 August 2014

Page 2: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 2/16

AgendaAgenda

• Clone detection• Binary code clones• Metrics approach• Conclusions

Page 3: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 3/16

Motivation (1)Motivation (1)

• A motivating scenario is to find the reuse of a software library in a source code without an appropriate permission from the owner of the library.

Page 4: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 4/16

Code clonesCode clones

• Type-1: Identical code (ignoring formatting)

• Type-2: Syntactically identical fragments (ignoring naming and formatting)

• Type-3: Copied fragments with further modifications (ignoring some statements, naming and formatting)

• Type-4: Two or more code fragments that perform the same computation

Page 5: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 5/16

Existing toolsExisting tools

SimCad CCFinder Deckard ACD Moss

Supported languages

C, C#, Java, Py C/C++, C#, Cobol, Java, VB, Text

C, Java, Php C/C++ C/C++, C#, Cobol, Java, VB, MIPS, Text…

Language in experiment

C C C C ASM

Comparison level block, procedure

file file file file

Clone detection technique

text based token based AST based text based (ASM generated from C)

text based

Types of detected clones

1, 2, and 3 1, 2, and 3 1, 2, and 3 1, 2, and 3 1, 2, and 3

Source code required not available for commercial product

Page 6: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 6/16

Motivation (2)Motivation (2)

• A motivating scenario is to find the reuse of a software library in a commercial product binary without an appropriate permission from the owner of the library.

Source code transformed by compiler (what compiler?)

ARM architecture

Page 7: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 7/16

ApproachApproach

Page 8: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 8/16

ApproachApproach

Page 9: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 9/16

MetricsMetrics

Page 10: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 10/16

Filters/FormulasFilters/Formulas

Filters:- No filtering- Adaptive filtering(based on previous knowledge)- Interval filtering

Formulas:- Arithmetic mean- Geometric mean- Harmonic mean- Weighted functions(based on previous knowledge)

Page 11: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 11/16

Results (STAMP + Busy Box)Results (STAMP + Busy Box)

Page 12: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 12/16

Results (STAMP + Busy Box)Results (STAMP + Busy Box)

Support Vector Machines and K-Nearest neighbors had much lower results!

Page 13: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 13/16

Results (STAMP + Busy Box)Results (STAMP + Busy Box)

Page 14: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

• Configurations with newly introduced metrics achieves up to 1.44 times better recall than configurations that use only metrics from the high level languages.

• Comparison of the proposed approach with some clone detection tools shows that it achieves a higher recall for an acceptable level of precision.

• Observing only the first position, for the real world example, the proposed approach achieves recall of 43% and precision of 43% (Busy Box).

14th Workshop SEE and RE 14/16

ConclusionConclusion

Page 15: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

14th Workshop SEE and RE 15/16

Motivation (3) - finalMotivation (3) - final

• A motivating scenario is to find the use of apatent in a commercial product binary without an appropriate permission from the owner of the patent.

Page 16: Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

Thank you!Thank you!

Radivojevic Zaharije


Recommended