Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović...

Post on 11-Jan-2016

220 views 3 download

Tags:

transcript

Detecting software clones in binariesDetecting software clones in binaries

Zaharije Radivojević, Saša Stojanović, Miloš CvetanovićSchool of Electrical Engineering, Belgrade University

14th Workshop “Software Engineering Education and Reverse Engineering”

Sinaia, Romania24-30 August 2014

14th Workshop SEE and RE 2/16

AgendaAgenda

• Clone detection• Binary code clones• Metrics approach• Conclusions

14th Workshop SEE and RE 3/16

Motivation (1)Motivation (1)

• A motivating scenario is to find the reuse of a software library in a source code without an appropriate permission from the owner of the library.

14th Workshop SEE and RE 4/16

Code clonesCode clones

• Type-1: Identical code (ignoring formatting)

• Type-2: Syntactically identical fragments (ignoring naming and formatting)

• Type-3: Copied fragments with further modifications (ignoring some statements, naming and formatting)

• Type-4: Two or more code fragments that perform the same computation

14th Workshop SEE and RE 5/16

Existing toolsExisting tools

SimCad CCFinder Deckard ACD Moss

Supported languages

C, C#, Java, Py C/C++, C#, Cobol, Java, VB, Text

C, Java, Php C/C++ C/C++, C#, Cobol, Java, VB, MIPS, Text…

Language in experiment

C C C C ASM

Comparison level block, procedure

file file file file

Clone detection technique

text based token based AST based text based (ASM generated from C)

text based

Types of detected clones

1, 2, and 3 1, 2, and 3 1, 2, and 3 1, 2, and 3 1, 2, and 3

Source code required not available for commercial product

14th Workshop SEE and RE 6/16

Motivation (2)Motivation (2)

• A motivating scenario is to find the reuse of a software library in a commercial product binary without an appropriate permission from the owner of the library.

Source code transformed by compiler (what compiler?)

ARM architecture

14th Workshop SEE and RE 7/16

ApproachApproach

14th Workshop SEE and RE 8/16

ApproachApproach

14th Workshop SEE and RE 9/16

MetricsMetrics

14th Workshop SEE and RE 10/16

Filters/FormulasFilters/Formulas

Filters:- No filtering- Adaptive filtering(based on previous knowledge)- Interval filtering

Formulas:- Arithmetic mean- Geometric mean- Harmonic mean- Weighted functions(based on previous knowledge)

14th Workshop SEE and RE 11/16

Results (STAMP + Busy Box)Results (STAMP + Busy Box)

14th Workshop SEE and RE 12/16

Results (STAMP + Busy Box)Results (STAMP + Busy Box)

Support Vector Machines and K-Nearest neighbors had much lower results!

14th Workshop SEE and RE 13/16

Results (STAMP + Busy Box)Results (STAMP + Busy Box)

• Configurations with newly introduced metrics achieves up to 1.44 times better recall than configurations that use only metrics from the high level languages.

• Comparison of the proposed approach with some clone detection tools shows that it achieves a higher recall for an acceptable level of precision.

• Observing only the first position, for the real world example, the proposed approach achieves recall of 43% and precision of 43% (Busy Box).

14th Workshop SEE and RE 14/16

ConclusionConclusion

14th Workshop SEE and RE 15/16

Motivation (3) - finalMotivation (3) - final

• A motivating scenario is to find the use of apatent in a commercial product binary without an appropriate permission from the owner of the patent.

Thank you!Thank you!

Radivojevic Zaharije