Simseer - A Software Similarity Web Service

Post on 18-Nov-2014

1,300 views 0 download

Tags:

description

 

transcript

Silvio CesareDeakin University

silvio.cesare@gmail.com

Who am I and where did this talk come from?

PhD student at Deakin University.

Research focus includes malware detection and automated vulnerability detection.

Software similarity is the focus of this talk.

This talk is an overview of the core topics, how its approached in academia, and a web service that identifies software similarity.

Introduction Many applications of software similarity and

classification

Malware Detection

Software Theft Detection

Plagiarism Detection

Software Clone Detection

Problem Formulation Extract features, fingerprints, or

'birthmarks' from programs p and q.

If birthmark(p) similar to birthmark(q), then programs are similar.

Software Similarity Problem

Taxonomy of Program Features Raw Code Abstract Syntax Trees Variables Pointers Instructions Basic Blocks Procedures API Calls Control Flow Graphs Call Graphs Data Flow Procedure Dependency Graphs System Dependency Graphs Object Inheritance and Dependency

Program Features ExamplesAST (left) and Control Flow (right)

if

== return =

x 0 x 1

condition then else

movl $0x4020a0,(%esp)call 4011b8 <_puts>addl $0x1,-0x8(%ebp)

lea 0x4(%esp),%ecxand $0xfffffff0,%esppushl -0x4(%ecx)push %ebpmov %esp,%ebppush %ecxsub $0x24,%espcall 4011b0 <___main>movl $0x0,-0x8(%ebp)jmp 40115f <_main+0x2f>

add $0x24,%esppop %ecxpop %ebplea -0x4(%ecx),%espret

cmpl $0x9,-0x8(%ebp)jle 40114f <_main+0x1f>

Proc_0

Proc_2

Proc_1

Proc_4

Proc_3

Taxonomy of Features in Program Binaries

Headers

Object Code

Symbols

Debugging Information

Relocations

Dynamic Linking Information

Program Transformations Compiler Optimisation and Recompilation

Program Obfuscation

Plagiarism, Software Theft, and Derivative Works

Malware packing, polymorphism and metamorphism

Traditional Malware Packing

Restoration Routine

Hidden Code = f(Original Code)

Original Code

Remnant Restoration

Routine

Original Code = g(Hidden Code)

Packing Runtime

Original Executable Packed Executable Memory Image at Runtime

Processing Program Features Treat features or birthmark as a

mathematical object. Strings Vectors Sets Sets of Vectors Trees Graphs

Software Birthmark Similarity Strings

Edit distance etc

Vectors Cosine Similarity Euclidean distance etc

Set Similarity Jaccard distance etc

Set of Vectors Similarity Minimum matching distance

Trees and Graphs Edit distances etc

Software Indexing and Searching Nearest neighbour is closest program in

database to query.

Based on 'distance' – a measure of dissimilarity between objects.

Distances that are 'metric' can index and search more efficiently.

rNN (Range Nearest Neighbour)

q

Query Malicious

Query Benign

distance(p,q)

p

r

Malware

Query

Wiki on Software Similarity and ClassificationBook on Software Similarity and ClassificationSimseer – A Software Similarity Web Service

Wiki on Software Similarity and ClassificationReviews of academic papers.

http://www.foocodechu.com/wiki

Book on ‘Software Similarity and Classification’Academic style survey of the topic.

Published by Springer.

100 pages.

Available in April.

http://www.springer.com/computer/security+and+cryptology/book/978-1-4471-2908-0

Simseer – A Software Similarity Web ServiceAn online service to identify similarity between

programs.

Performs unpacking.

Renders an evolutionary tree to show program relationships.

Free to use!

http://www.foocodechu.com/?q=simseer-a-software-similarity-web-service

Conclusion Presented a review of software similarity.

Demonstrated a new web service.

Try it!

http://www.foocodechu.com

Questions?