Post on 15-Jan-2016
description
transcript
Clone Detection by Exploiting Assembler
Ian Davis, Mike Godfrey
University of Waterloo
Ontario, Canada
IWSC May 2010 Clone Detection by Exploiting Assembler
2
IWSC May 2010 Clone Detection by Exploiting Assembler
3
IWSC May 2010 Clone Detection by Exploiting Assembler
4
IWSC May 2010 Clone Detection by Exploiting Assembler
5
IWSC May 2010 Clone Detection by Exploiting Assembler
6
.LC107: .string "merge “…pushl $.LC107pushl command_buf+8.LCFI378:call prefixcmpaddl $16,%esptestl %eax,%eaxjne .L485subl $8,%esppushl $32pushl command_buf+8call strchraddl $16,%esp incl %eaxmovl %eax,-16 (%ebp)subl $12,%esppushl $24call xmallocaddl $16,%espmovl %eax,-8(%ebp)subl $12,%esppushl -16 (%ebp)call lookup_branch….L485
The Original Assembler
• Identify function boundaries
• Relate assembler back to source
• Remove comments, white space, etc.
• Normalize instruction set if needed
• Convert to relative addressing
• Inline string constants
• Reconstruct parameter names
• Reconstruct local variable names
IWSC May 2010 Clone Detection by Exploiting Assembler
7
pushl $"merge " pushl command_buf+8
call prefixcmpaddl $16,%esptestl %eax,%eaxjne +124subl $8,%esppushl $32pushl command_buf+8call strchraddl $16,%esp incl %eaxmovl %eax,from(%ebp)subl $12,%esppushl $24call xmallocaddl $16,%espmovl %eax,n (%ebp)subl $12,%esppushl from(%ebp)call lookup_branch
The Annotated Assembler
• Identify function boundaries
• Relate assembler to source
• Remove comments, white space, etc.
• Normalize instruction set if needed
• Convert to relative addressing
• Inline string constants
• Reconstruct parameter names
• Reconstruct local variable names
IWSC May 2010 Clone Detection by Exploiting Assembler
8
The Matching Algorithm
• Scan entire source once
• Use hashing to find first pairing
• Ignore pairings in identified clones
• Don’t cross function boundaries
• Terminate clone before later in function
• Weight matches (+) and mismatches (-)
• Special logic for matching branches
• Advance greedily while weight ≥ 0
• Then employ hill climbing
• Continue while improvement possible
• Accept if clones satisfy minimum length
• Alternative minimum for matching functions
IWSC May 2010 Clone Detection by Exploiting Assembler
9
from = strchr(command_buf.buf, ' ') + 1;n = xmalloc(sizeof(*n));s = lookup_branch(from);if (s) hashcpy(n->sha1, s->sha1);else if (*from == ':') {
uintmax_t idnum = strtoumax(from + 1, NULL, 10); struct object_entry *oe = find_mark(idnum ); if (oe->type != OBJ_COMMIT) die("Mark :%" PRIuMAX " not a commit", idnum ); hashcpy(n->sha1, oe->sha1);} else if (!get_sha1(from, n->sha1)) { unsigned long size;
char *buf = read_object_with_reference(n->sha1, commit_type, &size, n->sha1); if (!buf || size < 46) die("Not a valid commit: %s", from); free(buf);} else die("Invalid ref name or SHA1 expression: %s", from);
Source Clone 1
IWSC May 2010 Clone Detection by Exploiting Assembler
10
from = strchr(command_buf.buf, ' ') + 1;
s = lookup_branch(from);if (s) hashcpy( sha1, s->sha1);else if (*from == ':') { struct object_entry *oe; from_mark = strtoumax(from + 1, NULL, 10); oe = find_mark(from_mark); if (oe->type != OBJ_COMMIT) die("Mark :%" PRIuMAX " not a commit", from_mark); hashcpy( sha1, oe->sha1);} else if (!get_sha1(from, sha1)) { unsigned long size; char *buf; buf = read_object_with_reference( sha1, commit_type, &size, sha1); if (!buf || size < 46) die("Not a valid commit: %s", from); free(buf);} else die("Invalid ref name or SHA1 expression: %s", from);
Source Clone 2
IWSC May 2010 Clone Detection by Exploiting Assembler
11
Benefits and Conclusions
Assembler easy to derive from source / object / executable
Compliments other clone detection approaches
Compiler performs useful normalization of source for free
The analysis is semantic – not syntactic By function (forbidding overlapped clones pairs) Can handle branching sensibly Case statements easier to handle Can weight different assembler instructions differently Can reason about assembler when performing detection
IWSC May 2010 Clone Detection by Exploiting Assembler
12
Thank You