Download - NLP-EYE: Detecting Memory Corruptions via Semantic-Aware … · [email protected] Shanghai Jiao Tong University Dawu Gu [email protected] Shanghai Jiao Tong University Abstract

NLP-EYE: Detecting Memory Corruptions via Semantic-Aware MemoryOperation Function Identification

Jianqiang [email protected]

Shanghai Jiao Tong University

Siqi Ma (�)

[email protected] DATA61

Yuanyuan Zhang (�)

[email protected] Jiao Tong University

Juanru Li (�)

[email protected] Jiao Tong University

Zheyu [email protected]

Northwestern Polytechnical University

Long [email protected]


Tiancheng [email protected]


Dawu [email protected]


AbstractMemory corruption vulnerabilities are serious threats to soft-ware security, which is often triggered by improper use ofmemory operation functions. The detection of memory cor-ruptions relies on identifying memory operation functionsand examining how it manipulates the memory. Distinguish-ing memory operation functions is challenging because theyusually come in various forms in real-world software. In thispaper, we propose NLP-EYE, an NLP-based memory corrup-tion detection system. NLP-EYE is able to identify memoryoperation functions through a semantic-aware source codeanalysis automatically. It first creates a programming lan-guage friendly corpus in order to parse function prototypes.Based on the similarity comparison by utilizing both seman-tic and syntax information, NLP-EYE identifies and labelsboth standard and customized memory operation functions. Ituses symbolic execution at last to check whether a memoryoperation causes incorrect memory usage.

Instead of analyzing data dependencies of the entire sourcecode, NLP-EYE only focuses on memory operation parts.We evaluated the performance of NLP-EYE by using sevenreal-world libraries and programs, including Vim, Git, CPython,etc. NLP-EYE successfully identifies 27 null pointer de-reference, two double-free and three use-after-free that arenot discovered before in the latest versions of analysis targets.

1 Introduction

The memory-unsafe programming languages, such as C andC++, provide memory operation functions in the standardlibrary (e.g., malloc and free) to allow manipulating thememories. During the development process, developers couldimplement dynamic memory operation functions by their ownmemory management policies to achieve higher performance,or by wrapping the standard memory operation functionswith additional operations to fulfill other purposes (e.g., printdebugging information).

Mistakes made by misusing the memory operations lead towell-seen memory corruption vulnerabilities such as buffer-

overflow and double-free in real-world software and theirnumber is steadily increasing. For customized memory opera-tion functions, some private memory operation functions arepoorly implemented and thus carry some memory vulnera-bilities at birth. On the other hand, developers keep makingcommon mistakes, such as using the memory after it has beenreleased (i.e., the use-after-free vulnerability), during the de-velopment process. Both cases aggravate the emerging ofmemory corruption vulnerabilities, which endow the attack-ers higher chance of compromising a computer system. Arecent report of Microsoft demonstrated that around 70 per-cent of vulnerabilities in their products are memory safetyissues [14].

To identify memory corruptions, various analysis methodsusing different kinds of techniques have been proposed. Forinstance, code similarity detection and information flow anal-ysis are proposed to identify memory safety issues in sourcecode [29] [44] [41]. Some tools such as AddressSanitizer[40], Dr. Memory [22] can also detect memory corruptions inbinary code by instrumentation. These analyses require to ab-stract the usage of memory, and then extract certain patternsthat are related to memory corruption. Otherwise, analyzinga program with millions of lines of code is inefficient anderror-prone.

Customized memory operations could not help to decreasethe chance of memory corruption at all, and moreover, thecustomized functions cause great difficulty in memory cor-ruption analysis. Previous works, such as CRED [44], Pin-point [41] and Dr. Memory [22], only consider the memoryoperation functions defined in the standard library. They areunable to identify customized memory operation functions,and thus disregard vulnerabilities caused by customized func-tions. Manual efforts can be involved to identify and labelthose functions, but it is exhausted and time consuming.

To address the above problems, we propose NLP-EYE, asource code-based security analysis system that adopts natu-ral language processing (NLP) to detect memory corruptions.NLP-EYE will only parse the function prototypes insteadof analyzing implementation of the functions. It then applies

USENIX Association 22nd International Symposium on Research in Attacks, Intrusions and Defenses 309

symbolic execution to check whether the corresponding mem-ory usages are correct. Unlike the other tools [1], the accuracyof NLP-EYE in memory operation function identificationhelps reduce the time cost by only analyzing partial codesnippets and facilitate a better detection performance.

NLP-EYE reports typical memory corruption vulnerabili-ties, i.e., null pointer de-reference, double-free and user-after-free in seven open source software, such as Vim and Git.NLP-EYE has found 49 unknown vulnerabilities from their latestversions. For source code with more than 60 thousand offunction prototypes, NLP-EYE is able to parse every tenthousand functions in one minute and finish the memory op-eration checking within an hour.Contributions. Major contributions of this paper include:• We proposed a source code-based analysis system that

detects vulnerabilities by only analyzing a few functionimplementations, i.e., function prototypes and comments.Since these information are usually available, it is helpfulfor analysts and developers to build secure software withlimited details.

• We implemented a vulnerability detection tool, NLP-EYE, that discovers memory corruption vulnerabilitieseffectively and efficiently. By combining NLP and sym-bolic execution, NLP-EYE labels both standard and cus-tomized memory operation functions and records statesof the corresponding memory regions.

• We analyzed the latest versions of seven libraries and pro-grams with NLP-EYE, and identified 49 unknown mem-ory corruption vulnerabilities with 32 of them causedby customized memory operation functions. It demon-strates that the semantic-aware identification of NLP-EYE helps find new vulnerabilities that are unseen be-fore.

Structure. The rest of the paper is organized as below: Sec-tion 2 lists the challenges of identifying memory corruptionscaused by customized memory operation functions, and pro-vide corresponding insights to solve these challenges. Sec-tion 3 details the design of NLP-EYE. In Section 4, wereported new vulnerabilities found by NLP-EYE, and illus-trated the experiment results covering both vulnerability de-tection accuracy and performance comparison with the othertools. Section 5 discusses related works. We conclude thispaper in Section 6.

2 Background

We give a concrete example of memory corruption vulnera-bility in Figure 1. Followed by that, we point out some chal-lenges that hinders the detection of such vulnerabilities, andgive corresponding insights to address those challenges.

2.1 Running ExampleDetecting a memory corruption vulnerability (e.g., use-after-free) requires three significant steps: 1) identifying memory

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

//functions are provided by TTL module to operate dynamic memory

void TTLreleaseMem2Pool(Pool *pool, MemRegion p)

{

return pool->destroy_func(p);

}

MemRegion TTLretrieveMemFromPool(Pool *pool, size_t len)

{

return pool->alloc_func(len);

}

//memory pool used to provide dynamic memory region manipulation

extern Pool globalPool;

int main(int argc, char **argv, char **env)

{

char content[100];

scanf("%s",content);

char* buf = (char*)TTLretrieveMemFromPool(&globalPpool,1000);

int ret = processContent(content,buf);

if(!ret)

{

err("error occurs during process content!");

TTLreleaseMem2Pool(&globalPool,(MemRegion)buf);

goto clean;

}

...

clean:

TTLreleaseMem2Pool(&globalPool,(MemRegion)buf);

}

Figure 1: Double-free vulnerability caused by the customizedmemory operation functions

operation functions and labeling dynamically allocated mem-ory regions; 2) tracing the allocated memory regions to un-derstand how they are operated; and 3) detecting incorrectoperations on allocated memory regions. However, existingvulnerability detection techniques barely consider customizedmemory operation functions, and thus fail to detect vulnera-bilities triggered by them.

The customized memory operation functions has causedthe memory corruption vulnerability in Figure 1. Insteadof using the standard memory operation functions providedby C standard library, functions TTLretrieveMemFromPooland TTLreleaseMem2Pool are used to allocate a dynamicmemory (Line 18) and release the corresponding allo-cated memory (Line 23), respectively. While executing,TTLreleaseMem2Pool releases the memory if the functionprocessContent returns a null value (Line 20); then, a du-plicate release (Line 28) causes a double-free vulnerability.Consider this double-free vulnerability, it cannot be detectedby simply analyzing standard memory operation proceduresbecause of the customized memory operation functions (i.e.,TTLreleaseMem2Pool and TTLretrieveMemFromPool). 1

Generally, whether a function is a memory operation func-tion, we can observe whether it calls C standard library mem-ory operating functions, or compare the similarity with othermemory operation function implementations. In either case,it requires the function implementation which is usually not

1Actually we have applied typical tools such as Cppcheck [2] and Visu-alCodeGrepper [18] to detect the vulnerability in this sample and foundthat none of them could detect this vulnerability.

310 22nd International Symposium on Research in Attacks, Intrusions and Defenses USENIX Association

TTLMem.h

comments

function prototype

function type

functionname

argumenttype

argument name

argument name

argumenttype

void TTLreleaseMem2Pool ( Pool * pool, MemRegion P )

//functions are provided by TTL module to operate dynamic memory

Figure 2: A function prototype with comments

available. For example, the declared memory operation func-tion, alloc_func() (Line 8) might be implemented exter-nally and only its binary is available. Under such circum-stance, the semantic information in a function prototype (i.e.,function declaration) becomes the only reference for the mem-ory operation function identification.

As Figure 2 depicts, a function prototype consists of a func-tion type, a function name, argument types for arguments, and(optionally) names of arguments. While defining a functionprototype, developers prefer to use meaningful function nameand proper data types for this function. Besides, developersmay add comments to describe in more details.

In most cases, function prototypes and comments help usto determine the semantics without knowing function imple-mentations. Therefore, we can analyze prototype structuresto retrieve meanings of those words.

2.2 Challenges

Most challenges lie in understanding the function semanticsand identify memory operation functions accurately.

Challenge I: Irregular Representations. Searching for spe-cific words in the source code is the common strategy toidentify functions, such as locating the keyword memory toidentify memory operation functions. While plenty of abbre-viations and informal terms are used in function prototypes,it is difficult to extract the semantic information effectivelyby only applying a keyword-based searching strategy.

Consider the function prototype TTLreleaseMem2Pool inFigure 1. An abbreviation Mem2 represents memory to in aninformal way. The abbreviation Mem is unable to be locatedby using the word memory, and the number 2 makes it harderto understand the semantics of the phrase.

Challenge II: Ambiguous Word Explanations. Since thecontext collected from function prototypes is insufficient, itmakes the semantics extraction more challenging. Althoughsome function prototypes may use the same word, the actualfunction semantics can be different because of their variousnaming formats.

Considering two function names, PyObject_Malloc and_PyObject_DebugMallocStats, in the source code ofCPython2, the former function is for allocating a dynamicmemory while the latter one is for outputting debugging in-formation of memory allocator.

2CPython is the reference implementation of Python.

Even though the lexical analysis with a specific dictionarycan help split the word malloc from those two function names,the corresponding function semantics cannot be inferred pre-cisely. For the function PyObject_Malloc, the word mal-loc does represent memory allocation; however in functionPyObject_DebugMallocStats, malloc is used to qualify theobject, that is Stats, to illustrate the status of the memory allo-cator. Therefore, we need not only analyze the meaning butalso the format of the word to construct the function proto-type.Challenge III: Diverse Type Declaration. Diversified datatypes declaration in C/C++ programming makes it harder tocompare two function prototypes. For instance, both shortand unsigned short int, are used to represent the Integertype. Besides, C/C++ has provided a type re-define feature(i.e., typedef) that programmers can shorten the name of acomplex type.

2.3 InsightsFortunately, function prototypes are constructed by follow-ing some certain formats in programming. We utilize theseformats to extract the semantic information.Adaptive Lexical Analysis. Irregular representations makethe function prototype segmentation even harder. A naturallanguage corpus is not suitable for word segmentation in com-puter programming. Thus, we construct an adaptive corpus toaddress the problem of the lexical analysis in Challenge I. Thecorpus consists of natural language used in computer science,common keywords in the programming language (e.g., proc,ttl) and comments in the source code. Common keywordsreveal the words that are often used in programming, and com-ments in the source code suggest some semantic informationof a function.Grammar-free Comparison. By examining the functionprototypes manually, we observe that developers do not usu-ally follow English grammars when naming a function. How-ever, they still use similar words (e.g., get, acquire, alloc)with similar grammatical order (i.e., the order of words), suchas AcquireVirtualMemory and getMemfromPool. We thenpropose a grammar-free analysis, which performs an NLP-based comparison, to solve Challenge II.

To identify the semantic information of each function pro-totype, we create a set of reference functions (e.g., standardmemory operation functions), whose semantics are known.Then, we compare the function name and argument namesof each function prototype with the corresponding names inreference functions. If the similarity between a function pro-totype and a reference function is higher than a threshold, welabel this function prototype as a potential memory operationfunction, and proceed with the type comparison to confirm.Various Types Clustering. NLP-based comparison onlyhelps decide whether a function prototype is a potential mem-ory operation function. We design a type comparison schemeto handle its declared return type and argument types. Be-cause of the diversity of function types, we first normalize


Feature

Extraction

Souce Code

Corpus

Generation

Function

Prototype

Matching

Func1

Func2

Func3

Ref1

Ref2

......

Function

Matching List

Function

Labeling

Symbolic

Execution

Function

Misuses

Misuse1

Misuse2

Misuse3

...

Preprocessing Semantics Extraction

Vulnerability Detection

Report

1

2

Adaptive

Corpus

Function

Prototypes Function

Prototype

Segmentation

33

3 4

Figure 3: System overview of NLP-EYE

those types in aliases (e.g., types defined by typedef) byusing their original forms, which solve Challenge III. Havingthe pair of a function prototype and its matched referencefunction, we then compare their return types and argumenttypes. We assume a function prototype as a memory operationfunction if both names and types are matched.

3 Design of NLP-EYE

We propose NLP-EYE, a source code analysis system thatutilizes NLP to retrofit the process of the memory corruptionvulnerability detection. There are three phases: preprocess-ing, semantics extraction, and vulnerability detection inNLP-EYE. Figure 3 illustrates the overview of NLP-EYE.It takes source code files as inputs, i.e., the analysis target.The preprocessing phase extracts function prototypes andcomments to generate an adaptive corpus. The semantics ex-traction phase uses the adaptive corpus to build a matchinglist by collecting all the possible memory operation functionsin the analysis target. Vulnerability detection phase labelsmemory operation functions in the target and feeds it to thesymbolic execution to facilitate the vulnerability detection.We introduce the working details of each phase below.

3.1 Preprocessing

NLP-EYE takes a batch of source code as inputs and gener-ates function prototypes and an adaptive corpus to performadaptive lexical analysis. First, NLP-EYE extracts functionprototypes and comments from source code. Then, it com-prises comments with the other two corpuses to construct anadaptive corpus. Details are presented below.

3.1.1 Feature Extraction

Feature extraction component of NLP-EYE is built on top ofClang Static Analyzer plugin [1], which provides an interfacefor users to scan the declaration of each function. Given thesource code, NLP-EYE uses this plugin to extract all functionprototypes in the format of "Type@Name", including thosefunctions that are imported from other libraries. For commentsfrom source code, NLP-EYE uses regular expressions tomatch comment symbols in C language.

3.1.2 Corpus Generation

After collecting those comments, NLP-EYE constructs anadaptive corpus to perform adaptive lexical analysis. Theadaptive corpus includes three parts, that are Google WebTrillion Word Corpus (GWTWC) [7], MSDN library APInames [21] [20], and comments from source code.

The GWTWC is a popular corpus created by Google, con-taining more than one trillion words extracted from publicweb pages. It can be applied to identify common words usedin natural languages. With the help of MSDN library APInames and comments, NLP-EYE can process programminglanguages. The MSDN library provides normalized APIsin Camel-Case format. Therefore, it is easy to divide eachfunction name into words/abbreviations through capital let-ters. For example, function GetProcAddress can be dividedinto ["Get","Proc","Address"]. While processing commentsfrom source code, NLP-EYE first filters the symbol char-acters (e.g., #%!), and then splits text by applying regularexpressions. Numbers and words appeared in GWTWC areexcluded.

Since abbreviations are commonly used in programming,we set the appearance frequency of MSDN APIs to be higherthan the appearance frequency of comments, to provide thema higher priority. We further assume that a word, who is a


substring of another word in MSDN API names, should havea lower frequency than its parent word. For example, armin function mallocWithAlarm is a substring of Alarm, obvi-ously Alarm is to be regarded as a whole; then we assign alower frequency for arm than for Alarm.

3.2 Semantics ExtractionNLP-EYE compares each function prototype with a set ofreference functions (e.g., malloc, free), and generates a func-tion matching list. When a match was found in the functionmatching list, we can infer the semantics from the functionprototype that it has the similar semantics with the referencefunction.

NLP-EYE processes the data type and the function name,arguments name separately to identify memory operationfunctions in two steps. First, it divides the function name andarguments name into a serial of words. Next, it performs NLP-based comparison to select the potential function prototypeswith memory operation functionalities and confirm the resultsby applying type comparison.

3.2.1 Function Prototype Segmentation

To proceed function prototype segmentation, NLP-EYE ap-plies Segaran et al.’s word segmentation algorithm [39] toselect the segmentation list with the highest list frequency.

Given a function prototype (FP), with n letters, NLP-EYEfirst creates 2n−1 possible combinations of these letters andconstructs 2n−1 segmentation lists. Each segmentation listreserves the original order of these letters appeared in the FP.NLP-EYE then computes the list frequency for each seg-mentation list. It compares each word (wi) in a segmentationlist (SL) with words in the adaptive corpus, and returns thefollowing list frequency (LF):

LF =|SL|∏

i = 1f req(wi)

where |SL| represents how many words are contained inSL, and f req(wi) is the frequency of wi in the adaptive corpus.Finally, NLP-EYE considers the segmentation list with thehighest list frequency as its segmentation result.

3.2.2 Function Prototype Matching

Due to the diversity of type declaration, NLP-EYE processesnames (i.e., function names and argument names) and types(i.e., return types and argument types) separately. It performsNLP-based comparison to identify those names that are re-lated to memory operation functionalities. NLP-EYE thenapplies type comparison to determine memory operation func-tions and generates a function matching list.NLP-based Comparison. Natural language processing(NLP) has been widely used to identify the connection be-tween two words for semantic similarity matching. To mea-sure the word similarity, a context corpus is required to extract

the taxonomy information. Words in the context corpus arethen represented by sets of vectors in Word2vec [34] model.The cosine distance between two words positively related totheir semantic similarity, and a higher cosine distance repre-sents a higher similarity between two words.

To extract the semantic meaning of an unknown name, wegenerates a set of reference functions manually, which con-tains standard memory operation functions provided by C/C++and other known memory operation functions. Having thosereference functions, NLP-EYE compares the name of an un-known function with the names of the reference functions andcalculate their similarity scores. If a similarity score is higherthan a threshold, NLP-EYE labels this unknown function assimilar to the reference function, that is, the correspondingfunction is a potential memory operation function.

We address function names and argument names individ-ually, since the comparison results of function names andargument names may interfere each other while applying theNLP-based comparison. Consider a function with only abbre-viations for function names, but complete words for argumentnames, its similarity score may not achieve the threshold.Although the similarity score of the argument names is thehighest, the total similarity score will be impacted by thelow similarity of function names. Therefore, we set differentsimilarity threshold, fn-similarity and arg-similarity, as thethreshold of function names and argument names, respectively.Only when fn-similarity and arg-similarity are both satisfied,NLP-EYE will label the function. For function arguments,NLP-EYE compares each argument of the reference func-tion with every argument of the target function and generatessimilarity score. Then NLP-EYE chooses the most similarone as the corresponding arguments regardless of the numberof arguments.

Type Comparison. Given the potential memory operationfunctions and their matched reference functions, NLP-EYEcompares their data types correspondingly. We use ClangStatic Analyzer to classify the data types into several cate-gories to address the type diversity.

First, NLP-EYE normalizes data types. Some data typesare re-defined as aliases by typedef. Thus, NLP-EYE usesthe original data types to replace those aliases. Second, wedefine some coarse grained categories based on the basic datatypes in C programming. NLP-EYE finally suggests the cor-rect category for each data type. For example, unsigned intand signed short are assigned to the category of Integer.void * and char * belong to the category of pointer.

We compare the return type and corresponding argumenttypes of the potential memory operation function with datatypes of the matched reference function. If their types areassigned to the same category, the unknown function is amemory operation function, and it is assumed to have thesame semantics as the corresponding reference function. Eachpair of a function prototype and its matched reference functionis inserted to the function matching list.


3.3 Vulnerability Detection

NLP-EYE creates a vulnerability report for each source codeby comparing the usages of memory operation functions withthe pre-defined function misuses. NLP-EYE first labels mem-ory operation functions in the source code; then, NLP-EYEchecks whether there exists any function misuse.

3.3.1 Function Labeling

NLP-EYE takes the function matching list as an inputs toidentify memory operation functions. It compares functionsin the source code with the functions in the function matchinglist. If a function appears in the function matching list, NLP-EYE labels this function as a memory operation function.

3.3.2 Symbolic Execution

The code, that can be compiled independently, is regarded asan unit. NLP-EYE first generates the call graph for each unitand then executes each unit from top to bottom one by one byadopting symbolic execution.

The output of semantics extraction is a function matchinglist which maps the standard memory operation functions andits corresponding customized memory operation functions.Given this function matching list, NLP-EYE dynamicallyinstruments stubs before function calls memory operationand memory access points in advance to record and revisememory region states. NLP-EYE identifies memory opera-tion function calls by simply comparing the called functionname and the function names in the function matching list.The stubs are extra code snippets that are executed beforethe symbolic execution engine measuring the instrumentedstatements. We manually made up a coarse function misuselist which contains general function misuse implementations,such as a memory region can not be released more than onceand a memory region can not be accessed after being released.Given this list, once symbolic execution reaches any memoryaccess point or any function call site of a memory opera-tion function, NLP-EYE executes the instrumented stub andchecks misuses. If it meets a misuse, NLP-EYE will reportthis misuse as a vulnerability. Otherwise, the correspond-ing memory state will be updated (i.e., allocated or released)based on the function call of the memory operation function.For instance, the source code in figure 1, NLP-EYE instru-ments before line 18, 23 and 28 since the called functionsare identified as memory operation functions. Then duringsymbolic execution, NLP-EYE records that a memory regionis allocated in line 18 and released in line 23. When symbolicexecution reaches line 28, it recognizes that a memory region(i.e., buf) is to be released twice which is one of the givenfunction misuses, therefore NLP-EYE reports a double-freevulnerability.

Lines of code # offunctions

# of memoryoperation functions

Vim-8.1 [17] 468,133 16,012 73ImageMagick-7.0.8-15 [9] 514,472 14,636 79CPython-3.8.0a0 [3] 556,950 12,000 66Git-2.21.0 [4] 289,532 8,788 32GraphicsMagick-1.3.31 [8] 369,569 7,406 29GnuTLS-3.6.5 [6] 488,654 5,433 11LibTIFF-4.0.10 [11] 85,791 1,326 4

Total 2,773,101 65,601 294

Table 1: Lines of code, number of functions and number ofmemory operation functions collected from each library/pro-gram.

4 Evaluation

In this section, we report the results of four experiments. Thefirst experiment assesses the performance of function proto-type segmentation. The second demonstrates the accuracy ofNLP-EYE while identifying memory operation functions,and whether the context corpus has any impact on the identifi-cation accuracy. The third experiment looks into the vulnera-bility detection ability of NLP-EYE, and the last experimentdiscusses its runtime performance.

4.1 Experiment SetupDataset. We collected the latest version of seven popular opensource libraries and programs written in C/C++ programminglanguage with a total of 65,601 functions by December 2018(see Table 1 for more details).

Due to the lack of open source labeled memory operationfunctions, we created our benchmarks. For identifying mem-ory operation functions, we asked a team of annotators (3programmers), all with more than seven years of program-ming experience in C/C++ to examine the implementations ofmemory operation functions. We first required team membersto label memory operation functions independently, and thenall members checked the results together. If there were anyfunction with different labels, team members would discussan agreement to label this function before it could be includedin the dataset. In this procedure, we found 294 memory oper-ation functions in total.

Implementation. We evaluated NLP-EYE on a Ubuntu16.04 x64 workstation with an Intel Core i7-6700 CPU (fourcores, 3.40 GHz) and 64 GB RAM. For the function proto-type segmentation, we used NLTK [32], a natural languageprocessing toolkit, to create the adaptive corpus for segmen-tation. We used the WordSegment [15] module in Python tosplit function prototypes. Gensim [37] is set up for NLP-basedcomparison, which conducts the similarity comparison basedon the context corpus. Finally, we adopted Clang Static Ana-lyzer [1] to perform type comparison and symbolic execution.Clang Static Analyzer is a a source code analysis tool whichadopts symbolic execution to analyze each translation unit. Itprovides a framework that developers can intercept the sym-bolic execution process at specific points such as function calland memory access. In addition, Clang Static Analyzer provide


useful programming interfaces that can be used by developersto interact with the data type.

4.1.1 Experiment Design

To evaluate the effectiveness and efficiency of NLP-EYE, wepresent the designed four experiments in details below.EX1 (Prototype Segmentation). To evaluate the effective-ness of prototype segmentation, we measured the Levenshtein-inspired distance [45] [38] of the segmentation results as ourevaluation metrics. The distance between two segmentationlists i and j of string s is given by di j, which can be calculatedas:

di j =|s|−1

∑k=1

(veci[k] xor vec j[k])

where |s| represents the length of string s. Segmentationlists i and j are converted into vectors, veci and vec j. In eachvector, zero is regarded as “without split”, and one is “split”.

For the Levenshtein-inspired distance, a lower distance withthe correct one indicates that the segmentation list requiresfewer edit operations (i.e., split and merge) to be adjustedto the correct one. Thus, a lower distance specifies a bettersegmentation result.EX2 (Memory Operation Function Identification). NLP-EYE identifies memory operation functions by using NLP-based comparison and type comparison. We evaluated thefunction identification performance by using precision, recalland F-measure as the evaluation metrics.EX3 (Vulnerability Detection). We targeted on typical mem-ory corruption vulnerabilities in this paper, i.e., double-free,use-after-free, and null pointer de-reference against real worldsoftware products such as Vim and CPython. To evaluate theeffectiveness of NLP-EYE, we further compared it with theother four vulnerability detection tools (MallocChecker [13],Cppcheck [2], Infer [10] and SVF [42]), and counted the numberof vulnerabilities that are correctly detected.EX4 (Runtime Performance). We evaluated the averagetime cost of each phase in NLP-EYE, including preprocess-ing, semantics extraction, and vulnerability detection.

4.2 Ex1: Prototype SegmentationBefore we start, we manually split the function names we col-lected as the ground truth. We counted the number of functionnames that are correctly segmented, and then calculated theLevenshtein-inspired distance to evaluate the performance ofeach segmentation. Further, we compared the segmentationresults that are generated by the adaptive corpus of NLP-EYE with the corresponding results generated by GoogleWeb Trillion Word Corpus (GWTWC). It assesses the resultaccuracy while applying the adaptive corpus.

NLP-EYE correctly segments 230 out of 350 functionnames. Levenshtein-inspired distances of those functionnames are zero. Figure 4 demonstrated the average dis-tance of each library and program by using the adaptive

1

0 . 1 6

0 . 6 6

0 . 4 6

0 . 3 2

1 . 2 1 . 1 3

0 . 9 6

0 . 1 60 . 3 0 . 3

0 . 4 20 . 6 3

0 . 8 6

V i mI m a g e M a g i c k

C P y t h o n G i t

G r a p h i c s M a g i c kG n u T L S

L i b T I F F0 . 0

0 . 5

1 . 0

1 . 5

Avera

geDis

tances

G W T W C - b a s e d Ad

a p t i v e C o r pu s - b a s e d

Figure 4: Segmentation results of function names by usingNLP-EYE and GWTWC

corpus of NLP-EYE and GWTWC. NLP-EYE segmentsfunction names of Vim, ImageMagick, CPython, Git, Graphics-Magick,GnuTLS and LibTIFF accurately with the Levenshtein-inspired distance as 0.96, 0.16, 0.3, 0.3, 0.42, 0.63 and 0.86respectively. The results for LibTIFF and Vim are worse than theothers, because lots of function names involve single letters,and NLP-EYE cannot distinguish those letters from a word.

Except for GraphicsMagick and ImageMagick, the adap-tive corpus-based segmentation performs better than theGWTWC-based segmentation. According to our manual in-spection, we found that GWTWC is not a programminglanguage-based corpus and it cannot proceed programming ab-breviations. Thus, most of its segmentation results are worsethan the results of the adaptive corpus-based segmentation.However, this conclusion is not satisfied on GraphicsMagick,because some function names are incorrectly divided intoabbreviations by the adaptive corpus. Taken a function namepreview as an example, it is divided into [“pre”, “view”]instead of “preview”, because the frequencies of those twoabbreviations are higher in comments. For the ImageMagick,most function names are declared in normalized words, whichare easy for GWTWC and the adaptive corpus to distinguisheach word.

4.3 Ex2: Memory Operation Function Identi-fication

We counted the number of memory operation functions thatare correctly detected by NLP-EYE and computed the preci-sion, recall, and F-measure on the entire dataset. To conductthis experiment, we separately set the thresholds (fn-similarityand arg-similarity) as (0.3, 0.4, 0.5) for function names andargument names, and found that NLP-EYE performs thebest when fn-similarity and arg-similarity are 0.4 and 0.5,respectively.


# ofidentified functions

# of correctlyidentified functions Precision Recall F-measure

Vim 304 42 13% 57% 21%ImageMagick 137 44 32% 55% 40%CPython 131 48 36% 72% 48%Git 46 8 17% 25% 20%GraphicsMagick 69 16 23% 55% 32%GnuTLS 74 4 5% 36% 8%LibTIFF 8 0 0 0 0

Total 769 162 21% 55% 30%

Table 2: Memory operation function identification results of NLP-EYE

NPD DF UAFDetected Confirmed Detected Confirmed Detected Confirmed

Vim 17 17 2 1 8 2CPython 10 4 1 1 8 1Git 1 1 0 0 0 0GraphicsMagick 6 5 0 0 0 0

Total 34 27 3 2 16 3

Table 3: Detection results of null pointer de-reference (NPD), double-free (DF) and use-after-free (UAF). Note that this resultonly shows the vulnerabilities caused by customized memory operation functions.

Function Identification Results. We applied the StackOver-flow corpus for NLP-based comparison. All the posts fromthe StackOverflow forum [16] are included in the Stack-Overflow corpus. Table 2 shows the best identification re-sult with the number of identified functions and the numberof memory operation functions that are correctly identified.We also computed precision, recall, and F-measure of NLP-EYE. NLP-EYE correctly identifies 162 memory operationfunctions out of the 769 identified functions, with precision,recall, F-measure value of 21%, 55%, and 30%, respectively.For LibTIFF, NLP-EYE cannot detect any memory operationfunctions because many single letters are used to name afunction argument. For example, “s” is commonly used to ex-press “size” that causes the recognition of memory operationfunctions even harder if the thresholds are too high. We thendetermine a balance between the thresholds (i.e., fn-similarityand arg-similarity) and the identification accuracy.

Within millions of functions, NLP-EYE narrows downthe number of functions that need to be analyzed, and thetotal number of functions for manual analysis is acceptable.Furthermore, the false positive and the false negative are rea-sonable.Context Corpus Selection. We further applied NLP-basedcomparison on two extra context corpuses (i.e., Wikipediacorpus, and customized corpus) to assess the identification per-formance. The Wikipedia corpus contains all webpages fromWikipedia [19]. Alternatively, the customized corpus consistsof: 1) Linux man pages [12]; 2) Part of GNU Manuals [5];and 3) two programming tutorials, i.e., C++ Primer [31] andC Primer Plus [36].

Based on the Wikipedia corpus, NLP-EYE only identi-fies no more than ten memory operation functions in eachlibrary and program with a precision value of 7%, and a worserecall value. While using customized corpus as the contextcorpus, the precision and recall of NLP-EYE are 42% and19%, respectively. Although its precision is acceptable, it stillcauses too many false negatives. By manually analyzing theresults, we found that Wikipedia corpus is insensitive to theprogramming language, and most identified functions are un-related to memory operation. For the customized corpus, itfails to identify functions that use abbreviations, which causeexceptions if words are not found in the corpus.

4.4 Ex3: Vulnerability DetectionWe tested NLP-EYE on the seven libraries and programs toexamine whether there is any unknown memory corruptionvulnerability. Note that the seven collected libraries and pro-grams are the latest versions (collected in December 2018).

Vulnerabilities Detected by NLP-EYE. NLP-EYE detects49 vulnerabilities from these libraries and programs in total.While only considering vulnerabilities caused by customizedmemory operation functions, four libraries and programs areinvolved. The detection result is shown in Table 3. By manu-ally verifying these results, NLP-EYE successfully detects32 vulnerabilities, including 27 null pointer de-reference, twodouble-free, and three use-after-free, existed in customizedmemory operation functions. To further verify the correctnessof our results, we reported the manual-confirmed vulnerabili-ties to developers, and they have confirmed and patched ten


NLP-EYE MallocChecker Cppcheck Infer SVF

Vim 3.82 2.77 18.90 51.28 50.92ImageMagick 6.16 5.00 28.00 64.25 0.25Cpython 8.31 7.70 1.47 23.43 0.26Git 3.11 2.80 0.88 13.52 2.36GraphicsMagick 2.08 1.45 11.83 8.75 0.15GntTLS 2.75 2.33 9.65 11.13 0.11LibTIFF 0.91 0.87 0.93 3.55 0.04

Total 27.14 22.92 71.66 175.91 54.09

Table 4: Runtime performance comparison (minutes)

null pointer de-reference and all the double-free, use-after-freevulnerabilities. Each customized memory operation functionmay cause vulnerabilities, since NLP-EYE failed to identify apart of them, this may lead to a false negative of vulnerabilitydetection result. Besides the successfully detected vulnerabil-ities, NLP-EYE made false positive as well listed in Table 3.However, after we manually inspected the false positive, wefound that none of them are caused by the wrong identificationresult.

There are two reasons that cause the false positive: 1) sym-bolic execution engine proceeds the expression with indexesin a loop as a static expression. For instance, the engine mayreport a double free on an array with different index in aloop since the engine regard the array element with differentindex as the same value; 2) While processing a conditionalstatement with a complex logic, the symbolic execution en-gine executes every path without considering the constraintsdefined in the conditional statements.

Detection Effectiveness Assessment. To assess the detectioneffectiveness of NLP-EYE, we applied four detection tools,MallocChecker, Cppcheck, Infer and SVF, to the entire dataset forcomparison. MallocChecker and Infer claim to detect all threekinds of vulnerabilities. Cppcheck and SVF are designed todetect vulnerabilities of use-after-free and double-free. Forthe null pointer de-reference vulnerability, MallocChecker andInfer correctly reported 11 and 30 vulnerabilities, respectively.However, they can only report those misuses caused by stan-dard memory allocation functions, while NLP-EYE can de-tect both standard and customized memory allocation func-tions. Even worse, none of these tools can detect vulnerabili-ties of use-after-free and double-free correctly.

We analyzed false positives caused by these tools. Similarto NLP-EYE, the symbolic execution engine of MallocCheckercannot identify the index of an array in a loop. AlthoughCppcheck can detect use-after-free vulnerabilities, it becameinaccurate when lots of variables are declared to operate dy-namic memories. Infer checks all returned pointers, whichcause many false positives. It even reported a use-after-freevulnerability existed in an integer statement. SVF performedthe worst by reporting hundreds of double-free vulnerabilities,which causes lots of errors.

4.5 Ex4: Runtime PerformanceWe evaluated the time cost of each phase (i.e., preprocessing,semantics extraction, and vulnerability detection) of NLP-EYE. Additionally, we tested the runtime of the other detec-tion tools to assess the efficiency of NLP-EYE.

Before vulnerability detection, we collected all the postson StackOverflow forum with the size of 17GB to create thecontext corpus, and it costs 56 hours to generate the modelfile. This step processes only once because we can repeatedlyuse the context corpus in further analysis.

Table 4 shows the total runtime cost of NLP-EYE andthe other tools while analyzing our dataset. NLP-EYE pre-processes each library and program, and constructs the cor-responding adaptive corpus within one seconds. It furtherspends 36.601s on average to identify memory operation func-tions in each library and program. NLP-EYE spends 70.917son ImageMagick, but no more than 6s on LibTIFF, because Im-ageMagick has 14,636 functions and LibTIFF only includes1,326 functions.

By comparing with the other tools, the runtime performanceof NLP-EYE and MallocChecker are similar, since they usethe same symbolic execution engine. SVF sacrifices the de-tection accuracy to achieve a higher runtime performance.Unfortunately, it is unhelpful for programmers to pinpointvulnerabilities. Cppcheck and Infer analyze the entire sourcecode to ensure a complete coverage, which costs much time.

4.6 LimitationsNLP-EYE successfully detects some memory corruptionvulnerabilities other tools cannot detect. The results of func-tion identification and vulnerability detection indicate thatNLP-EYE understands the function semantics well with onlylimited information. However, we still have the following lim-itations that cause detection failures.

1. When a function implementation is complex, the sym-bolic execution engine in NLP-EYE cannot correctlyanalyze the data flow and control flow.

2. NLP-EYE cannot handle single letters involved in thefunction prototypes which may causes false positive andfalse negative.


File1: GraphicsMagick/magick/memory.c

File2: GraphicsMagick/coders/pdb.c

1

2

3

4

5

6

7

8

static Image *ReadPDBImage(const ImageInfo

*image_info,ExceptionInfo *exception){

...

comment=MagickAllocateMemory(char *,length+1);

p=comment;

p[0]='\0';

...

}

1

2

3

4

5

6

MagickExport void * MagickMalloc(const size_t size){

if (size == 0)

return ((void *) NULL);

MEMORY_LIMIT_CHECK(GetCurrentFunction(),size);

return (MallocFunc)(size);

}

Figure 5: A null pointer de-reference vulnerability in Graphic-sMagick

4.7 Case Study

We discuss two representative vulnerabilities found in theGraphicsMagick library and the CPython interpreter, respec-tively.

GraphicsMagick is a library that was derived from the Im-ageMagick image processing utility in November 2002. Graph-icsMagick is securely designed and implemented after be-ing tested by Memcheck and Helgrind3. Also, AddressSanitizer(ASAN) [40], the most mature redzone-based memory errordetector, proves it to be secure against memory errors. Never-theless, NLP-EYE detects six null pointer de-reference vul-nerabilities from its latest version. An example is presented inFigure 5. The function MagickAllocateMemory is declaredto allocate memories. If the dynamic memory is insufficientand a null pointer is returned by this function (Line 4 of File2),a segmentation fault will be triggered (Line 6 of File2).

To detect this vulnerability, a detector should rec-ognize the customized memory allocation functionMagickAllocateMemory, which is a macro definitionof the MagickMalloc function. For MagickMalloc, itsimplementation is defined in File1, and a customizedmemory allocation function MallocFunc is declared in thisfunction. Besides analyzing the standard memory operationfunctions, NLP-EYE first identifies the macro definition,MagickAllocateMemory in Line4 of File2, and uses itsoriginal function MagicMalloc in File1 to replace it. Byproceeding the preprocessing and semantics extractionphases, NLP-EYE labels those functions as memoryoperation functions, and finally locates function misuses. Incomparison, other detection tools (e.g., MallocChecker) cannotdistinguish those customized functions (i.e, MallocFunc,MagicMalloc, and MagickAllocateMemory), and thus failto detect the flaw.

3Memcheck is a memory error detector for C and C++ programs. Hel-grind is a tool for detecting synchronisation errors in C, C++ and Fortranprograms that use the POSIX pthreads threading primitives. They are allbased on Valgrind [35]

File1: CPython/Objects/obmalloc.c

File2: CPython/Modules/_randommodule.c

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

static PyObject *

random_seed(RandomObject *self, PyObject *args){

...

res = _PyLong_AsByteArray((PyLongObject *)n,

(unsigned char *)key,

keyused * 4,

PY_LITTLE_ENDIAN,

0); /* unsigned */

if (res == -1) {

PyMem_Free(key);

goto Done;

}

...

Done:

PyMem_Free(key);

return result;

}

1

2

3

void PyMem_Free(void *ptr){

_PyMem.free(_PyMem.ctx, ptr);

}

Figure 6: A double-free vulnerability in CPython

Another sample code snippet with double-free vulnerabilityis shown in Figure 6, which is detected from CPython inter-preter. Apparently, function PyMem_Free in File1 is a memoryde-allocation function. If the variable res is -1, the variablekey will be freed twice (Line 10 and 15 of File2, respectively) .To our surprise, this simple vulnerability was found neither bymanual audit nor automated source code analysis. Accordingto the feedback of CPython developers, the corresponding hostfunction has been tested for many times, but the vulnerabilitystill exists. Based on this feedback, we would say that identi-fying customized memory operation functions is suitable tomemory corruption detection. NLP-EYE is very helpful inthis scenario.

5 Related Work

There are prior efforts of vulnerability detection, in this sec-tion, we introduce these works based on their analysis ap-proaches, i.e., source code-based analysis and binary code-based analysis.

5.1 Source Code-based Analysis

Previous studies detect vulnerabilities by applying pro-gram analysis on source code to extract pointer informa-tion [41] [44] and data dependencies [33], [24], [29], [28].

To analyze C programming source code, CRED [44] detectsuse-after-free vulnerabilities in C programs. It extracts points-to information by applying a path-sensitive demand-drivenapproach. To decrease false alarms, it uses spatio-temporalcontext reduction technique to construct use-after-free pairs


precisely. However, the pairing part is time consuming thatevery path in the source code is required to be analyzed andmemorized. Instead of analyzing the entire source code, Pin-point [41] applies sparse value-flow analysis to identify vul-nerabilities in C programs, such as use-after-free, double-free.To reduces the cost of data dependency analysis, Pinpoint ana-lyzes local data dependence first and then performs symbolicexecution to memorize the non-local data dependency andpath conditions.

Similar to the above, some other tools detect vulnerabilitiesby compare data-flows with some pre-defined rules/violations.CBMC [28] is a C bounded model checker, which examinessafety of the assertions under a given bound. It translatesassertions and loops into a formula. If this formula satisfiesany pre-defined violations, then a violated assertion will beidentified. Coccinelle [29] finds specific bug by comparing thecode with a given pattern written in Semantic Patch Language(SmPL).

Source code-based analysis has also been applied to Linuxkernel. Due to the large amount of kernel code in Linux,DR. CHECKER [33] and K-Miner [24] are designed to be moreeffective and efficiency. DR. CHECKER employs a soundy ap-proach based on program analysis. It is capable of conductinglarge-scale analysis and detecting numerous classes of bugsin Linux kernel drivers. K-Miner finds vulnerabilities by set-ting up a virtual kernel environment and processing syscallsseparately.

Those proposed tools perform well to detect vulnerabilitiesimplemented under standard programming styles, such as call-ing standard library APIs, designing standard implementationsteps. They cannot proceed those customized functions justlike how NLP-EYE does.

Instead of applying program analysis, both VulPecker [30]and VUDDY [27] detects vulnerabilities based on the code sim-ilarity. VulPecker builds a vulnerability database by using diffhunk features collected from each vulnerable code and itscorresponding patch code. VUDDY proceeds each vulnerablefunction as an unit, and then abstracts and normalizes vulnera-ble functions to ensure that they are able to detect clones withmodifications. However, similarity-based techniques requirea massive database that can be learnt from.

5.2 Binary Code-based Analysis

Instead of analyzing source code, binary code can alsobe adopted to identify memory corruption vulnerabili-ties on stacks and allocated memories [22], [35], [40],[26], [43], [25], [23].

Memory shadowing helps to track the memory statusat runtime. It also causes large memory consumption.Dr.Memory [22] conducts memory checking on Windows andLinux. It uses memory shadowing to track the memory statusand identifies stack usage within heap memory. Dr.Memory isflexible and lightweight by using an encoding for callstacks toreduce memory consumption. AddressSanitizer [40] minimizesthe memory consumption by creating a compact shadow mem-

ory, which achieves a a 128-to-1 mapping. By implementinga specialized memory allocator and code instrumentation inthe compiler, AddressSanitizer analyzes the vulnerabilities onstack, head, global variables. HOTracer [26] discovers heapoverflow vulnerabilities by examining whether a heap accessoperation can be controlled by an attacker. HOTracer findsvulnerabilities by giving an accurate definition to buffer over-flow and it uses a heuristic method to find memory allocationfunctions. HOTracer is able to identify memory allocation func-tions with a higher accuracy, and several unknown overflowvulnerabilities are detected.

Unfortunately, detecting memory corruptions through bi-nary code-based analysis requires proper inputs, that can pre-cisely trigger the corresponding memory operation. It mightcause some false negatives because of the incomplete codecoverage.

6 Conclusion

We propose an NLP-based automated approach to detect mem-ory corruption vulnerabilities. A detection tool, NLP-EYE,is developed to identify vulnerabilities of null pointer de-reference, use-after-free, double free. The novelty of our ap-proach is that we retrieve the function semantics accuratelybased on a little function information, i.e., function prototypesand comments, instead of using the entire function implemen-tations. With the help of NLP-based and type-based analyses,NLP-EYE identifies memory operation functions accurately.Our approach is also adaptable since NLP-EYE generatesan adaptive corpus for different dataset by extracting theircomments from source code and various programming styles.

In this work, we only focused on memory corruption vul-nerabilities. We plan to extend NLP-EYE in future withadditional reference functions to identify the other vulnera-bilities. We also open source NLP-EYE to help analysts anddevelopers to improve software security.

7 Acknowledgments

The authors would like to thank the anonymous reviewersfor their feedback and our shepherd, Dongpeng Xu, for hisvaluable comments to help improving this paper.

This work was supported by the General Program ofNational Natural Science Foundation of China (GrantNo.61872237), the Key Program of National Natural ScienceFoundation of China (Grant No.U1636217) and the NationalKey Research and Development Program of China (GrantNo.2016YFB0801200).

We especially thank Huawei Technologies, Inc. for theresearch grant that supported this work, Ant Financial Ser-vices Group for the support of this research within the SJTU-AntFinancial joint Institute of FinTech Security, and NanjingTuring Artificial Intelligence Institute with the internship pro-gram.


References

[1] Clang Static Analyzer. http://clang-analyzer.llvm.org.

[2] Cppcheck. http://cppcheck.sourceforge.net.

[3] CPython. https://www.python.org.

[4] Git. https://git-scm.com.

[5] GNU Manuals Online.https://www.gnu.org/manual/manual.en.html .

[6] GnuTLS. https://www.gnutls.org.

[7] Google Web Trillion Word Corpus.http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html.

[8] GraphicsMagick. http://www.graphicsmagick.org.

[9] ImageMagick. https://www.imagemagick.org.

[10] Infer. https://fbinfer.com.

[11] LibTIFF. http://www.libtiff.org.

[12] Linux man pages online. http://man7.org/linux/man-pages/index.html.

[13] MallocChecker. https://clang-analyzer.llvm.org/.

[14] Microsoft: 70 percent of all security bugs are memorysafety issues. https://www.zdnet.com/article/microsoft-70-percent-of-all-security-bugs-are-memory-safety-issues/.

[15] Python Wordsegment.https://pypi.org/project/wordsegment/.

[16] Stackoverflow. https://stackoverflow.com.

[17] Vim. https://www.vim.org.

[18] VisualCodeGrepper. https://github.com/nccgroup/VCG.

[19] Wikipedia. https://www.wikipedia.org.

[20] Windows 8 APIs References.https://docs.microsoft.com/en-us/windows/desktop/apiindex/windows-8-api-sets.

[21] Windows Driver API references.https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/.

[22] Bruening, Derek and Zhao, Qin. Practical MemoryChecking with Dr. Memory. In Proceedings of the 9thAnnual IEEE/ACM International Symposium on CodeGeneration and Optimization (CGO), 2011.

[23] Dinakar Dhurjati and Vikram Adve. Efficiently Detect-ing All Dangling Pointer Uses in Production Servers. InProceedings of International Conference on DependableSystems and Networks (DSN), 2006.

[24] David Gens, Simon Schmitt, Lucas Davi, and Ahmad-Reza Sadeghi. K-miner: Uncovering Memory Corrup-tion in Linux. In Proceedings of 2018 Network andDistributed System Security Symposium (NDSS), 2018.

[25] Niranjan Hasabnis, Ashish Misra, and R Sekar. Light-weight Bounds Checking. In Proceedings of the TenthInternational Symposium on Code Generation and Opti-mization (CGO), 2012.

[26] Xiangkun Jia, Chao Zhang, Purui Su, Yi Yang, HuafengHuang, and Dengguo Feng. Towards Efficient HeapOverflow Discovery. In Proceedings of 26th USENIXSecurity Symposium USENIX Security (USENIX Secu-rity), 2017.

[27] Seulbae Kim, Seunghoon Woo, Heejo Lee, and HakjooOh. VUDDY: A Scalable Approach for VulnerableCode Clone Discovery. In Proceedings of 2017 IEEESymposium on Security and Privacy (SP), 2017.

[28] Daniel Kroening and Michael Tautschnig. CBMC–CBounded Model Checker. In Proceedings of Interna-tional Conference on Tools and Algorithms for the Con-struction and Analysis of Systems (TACAS), 2014.

[29] Julia Lawall, Ben Laurie, Ren’e Rydhof Hansen, NicolasPalix, and Gilles Muller. Finding Error Handling Bugsin Openssl Using Coccinelle. In Proceedings of 2010European Dependable Computing Conference (EDCC),2010.

[30] Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, HanchaoQi, and Jie Hu. VulPecker: an Automated VulnerabilityDetection System Based on Code Similarity Analysis.In Proceedings of the 32nd Annual Conference on Com-puter Security Applications (ACSAC), 2016.

[31] Stanley B. Lippman. C++ Primer. 2012.

[32] Edward Loper and Steven Bird. NLTK: the NaturalLanguage Toolkit. arXiv preprint cs/0205028, 2002.

[33] Aravind Machiry, Chad Spensky, Jake Corina, NickStephens, Christopher Kruegel, and Giovanni Vigna.DR.CHECKER: A Soundy Analysis for Linux KernelDrivers. In Proceedings of 26th USENIX Security Sym-posium USENIX Security (USENIX Security), 2017.

[34] Tomas Mikolov, Kai Chen, Greg Corrado, and JeffreyDean. Efficient Estimation of word representations inVector Space. arXiv preprint arXiv:1301.3781, 2013.

[35] Nicholas Nethercote and Julian Seward. Valgrind: aFramework for Heavyweight Dynamic binary instru-mentation. In Proceedings of ACM Sigplan notices,2007.

[36] Stephen Prata. C Primer Plus. 2014.


[37] Radim Rehurek and Petr Sojka. Software Framework forTopic Modelling with Large Corpora. In Proceedings ofthe LREC 2010 Workshop on New Challenges for NLPFrameworks (LREC), 2010.

[38] Edward J Schwartz, Cory F Cohen, Michael Duggan,Jeffrey Gennari, Jeffrey S Havrilla, and Charles Hines.Using Logic Programming to Recover C++ Classes andMethods from Compiled Executables. In Proceedingsof the 2018 ACM SIGSAC Conference on Computer andCommunications Security (CCS), 2018.

[39] Toby Segaran and Jeff Hammerbacher. Beautiful Data:the Stories Behind Elegant Data Solutions. 2009.

[40] Konstantin Serebryany, Derek Bruening, AlexanderPotapenko, and Dmitriy Vyukov. AddressSanitizer: AFast Address Sanity Checker. In Proceedings of the2012 USENIX Annual Technical Conference (USENIXATC), 2012.

[41] Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou,Gang Fan, and Charles Zhang. Pinpoint: Fast and Pre-cise Sparse Value Flow Analysis for Million Lines of

Code. In Proceedings of the 39th ACM SIGPLAN Con-ference on Programming Language Design and Imple-mentation (PLDI), 2018.

[42] Yulei Sui and Jingling Xue. SVF: Interprocedural StaticValue-flow Analysis in LLVM. In Proceedings of the25th international conference on compiler construction(CC), 2016.

[43] Erik Van Der Kouwe, Vinod Nigade, and Cristiano Giuf-frida. Dangsan: Scalable use-after-free Detection. InProceedings of the Twelfth European Conference onComputer Systems (EuroSys), 2017.

[44] Hua Yan, Yulei Sui, Shiping Chen, and Jingling Xue.Spatio-Temporal Context Reduction: a Pointer-Analysis-Based Static Approach for Detecting Use-After-FreeVulnerabilities. In Proceedings of 2018 IEEE/ACM40th International Conference on Software Engineering(ICSE), 2018.

[45] Li Yujian and Liu Bo. A Normalized Levenshtein Dis-tance Metric. IEEE transactions on pattern analysisand machine intelligence (TPAMI), 2007.