Post on 29-Jul-2015
transcript
Coen De Roovercderoove@vub.ac.be
A Recommender System for Refining
Ekeko/X Transformations
Tim Molderezmail@timmolderez.be
(c) anonymous Vancouver street artist
Search-and-replaceBeyond syntactic templates
Meta-programming library
Recommender system
Logic meta-programming in Clojure
Genetic search for template mutations
Motivation: automating edits within a program
value of annotation
before
after
manypublic class BreakStatement extends Statement {
@EntityProperty(value = SimpleName.class)private EntityIdentifier label;
public EntityIdentifier getLabel() {return label;
}
public void setLabel(EntityIdentifier label) {this.label = label;
}}
public class BreakStatement extends Statement {@EntityProperty(value = SimpleName.class)private EntityIdentifier<SimpleName> label;
public EntityIdentifier<SimpleName> getLabel() {return label;
}
public void setLabel(EntityIdentifier<SimpleName> label) {this.label = label;
}} to be used as type parameter
Automating Edits: using existing tool support
Brussels’
logic meta-programming
Ekeko
Eclipse plugin
applicationsprogram and corpus analysis
program transformation
meta-programming library for Clojure
causally connected
applicative meta-programming
script queries over workspace
specify code characteristics declaratively, leave search to logic engine
manipulate workspace
tool building
Building Development Tools Interactivelyusing the EKEKO Meta-Programming Library
Coen De RooverSoftware Languages Lab
Vrije Universiteit Brussel, BelgiumEmail: cderoove@vub.ac.be
Reinout StevensSoftware Languages Lab
Vrije Universiteit Brussel, BelgiumEmail: resteven@vub.ac.be
Abstract—EKEKO is a Clojure library for applicative logicmeta-programming against an Eclipse workspace. EKEKO hasbeen applied successfully to answering program queries (e.g.,“does this bug pattern occur in my code?”), to analyzing projectcorpora (e.g., “how often does this API usage pattern occurin this corpus?”), and to transforming programs (e.g., “changeoccurrences of this pattern as follows”) in a declarative manner.These applications rely on a seamless embedding of logic queriesin applicative expressions. While the former identify source codeof interest, the latter associate error markers with, computestatistics about, or rewrite the identified source code snippets.In this paper, we detail the logic and applicative aspects of theEKEKO library. We also highlight key choices in their implemen-tation. In particular, we demonstrate how a causal connectionwith the Eclipse infrastructure enables building developmenttools interactively on the Clojure read-eval-print loop.
I. INTRODUCTION
EKEKO is a Clojure library that enables querying andmanipulating an Eclipse workspace using logic queries thatare seamlessly embedded in functional expressions. Recentapplications of EKEKO include the GASR tool for detect-ing suspicious aspect-oriented code [1] and the QWALKEKOtool for reasoning about fine-grained evolutions of versionedcode [2]. In this paper, we describe the meta-programmingfacilities offered by EKEKO and highlight key choices intheir implementation1. We also draw attention to the highlyinteractive manner of tool building these facilities enable.
II. RUNNING EXAMPLE: AN ECLIPSE PLUGIN
More concretely, we will demonstrate how to build alightweight Eclipse plugin entirely on the Clojure read-eval-print loop. We will use this plugin as a running examplethroughout the rest of this paper. Our Eclipse plugin is tosupport developers in repeating similar changes throughout anentire class hierarchy. It is to associate problem markers withfields that have not yet been changed. In addition, it is topresent developers a visualization of these problems. Finally,it is to provide a “quick fix” that applies the required changescorrectly.
Figure 1 illustrates the particular changes that need to berepeated. The raw EntityIdentifier type of those fieldswithin a subclass of be.ac.chaq.model.ast.java.ASTNode
1The EKEKO library, its implementation, and all documentation is freelyavailable from https://github.com/cderoove/damp.ekeko/.
p u b l i c c l a s s B r e a k S t a t e m e n t ex tends S t a t e m e n t {/ / B e f o r e changes :@ E n t i t y P r o p e r t y ( v a l u e = SimpleName . c l a s s )p r i v a t e E n t i t y I d e n t i f i e r l a b e l ;
/ / A f t e r changes :@ E n t i t y P r o p e r t y ( v a l u e = SimpleName . c l a s s )p r i v a t e E n t i t y I d e n t i f i e r <SimpleName> l a b e l ;
/ / . . . ( a . o . , a c c e s s o r methods change a c c o r d i n g l y )}
Fig. 1: Example changes to be repeated.
that carry an @EntityProperty annotation, is to receive a typeparameter that corresponds to the annotation’s value key.
III. ARCHITECTURAL OVERVIEW
The EKEKO library operates upon a central repository ofproject models. These models contain structural and behavioralinformation that is not readily available from the projectsthemselves. The models for Java projects include abstractsyntax trees provided by the Eclipse JDT parser, but alsocontrol flow and data flow information computed by the SOOTprogram analysis framework [3].
An accompanying Eclipse plugin automatically maintainsthe EKEKO model repository. To this end, it subscribes toeach workspace change and triggers incremental updates orcomplete rebuilds of project models. As a result, the infor-mation operated upon by the EKEKO library is always up-to-date. In addition, this plugin provides an extension point thatenables registering additional kinds of project models. TheKEKO extension, for instance, builds its project model fromthe results of a partial program analysis [4] —enabling queriesover compilation units that do not build correctly.
IV. LOGIC PROGRAM QUERYING
The EKEKO library enables querying and manipulating pro-grams using logic queries and applicative expressions respec-tively. We detail the former first. Section V discusses the latter.The program querying facilities relieve tool builders fromimplementing an imperative search for source code that ex-hibits particular characteristics. Instead, developers can specifythese characteristics declaratively through a logic query. The
[WCRE-CSMR14]
Think of it as querying a database of program information!
Logic meta-programming (LMP)
Logic relations: ast/2 and has/3 “tables”
relation between a ?node of an Abstract Syntax Tree (AST)
and its ?type
(ast ?type ?node)
(has ?property ?node ?value)
LMP
relation of AST nodes and the values of their properties
(ekeko [?statement ?expression] (ast :ReturnStatement ?statement) (has :expression ?statement ?expression))
Logic querying: AST relations
SELECT ast.node as ?statement, has.value as ?expression FROM ast, has WHERE ast.type = :ReturnStatement AND has.property = :expression AND has.node = ast.node;
SQL query
relation of all AST nodes of type :ReturnStatement and the value of their :expression property
equivalent logic query
LMP
(defn expression|returned [?expression] (fresh [?statement] (ast :ReturnStatement ?statement) (has :expression ?statement ?expression)))
Logic programming: defining relations LMP
local variable
(ekeko* [?returned ?type] (expression|returned ?returned) (ast|expression-‐type ?returned ?type) (type|binary ?type))
returned expressions of a type defined in compiled rather than source code
defining relation:
using newly defined relation:
control and data flow
structural
syntactic for(Object i : collection)
Ekeko Library: relations
class Ouch {int hashCode() { return ...;}
}
scanner = new Scanner();...x.close();...scanner.next();
(ast ?kind ?ast)(has ?property ?ast ?value)(ast-encompassing|method+ ?ast ?m)(ast-encompassing|type+ ?ast ?t)
(classfile-type ?binaryfile ?type)(type-type|sub+ ?type ?subtype) (type-name|qualified ?type ?qname)(advice-shadow ?advice ?shadow)
for logic meta-programming
(method|soot-cfg ?m ?cfg)(unit|soot-usebox ?u ?ub)(local|soot-pointstoset ?l ?p)(soot|may|alias ?l1 ?l2)
Ekeko Library: functions
(remove-node node)(replace-node node newnode)(change-property node property value)(apply-and-reset-rewrites!)
for applicative meta-programming
(visualize nodes edges :layout layout :node|label labelfn :edge|label labelfn . . .)
(add-problem-marker marker node)(register-quickfix marker rewritefn) (reduce—workspace fn initval) (wait-for-builds-to-finish)
rewriting
visualizing
tooling
before
after
public class BreakStatement extends Statement {@EntityProperty(value = SimpleName.class)private EntityIdentifier label;
public EntityIdentifier getLabel() {return label;
}
public void setLabel(EntityIdentifier label) {this.label = label;
}}
public class BreakStatement extends Statement {@EntityProperty(value = SimpleName.class)private EntityIdentifier<SimpleName> label;
public EntityIdentifier<SimpleName> getLabel() {return label;
}
public void setLabel(EntityIdentifier<SimpleName> label) {this.label = label;
}}
Automating Edits: using Ekeko
Live Demo
Meta-programming library
Search-and-replace
Recommender system
Logic meta-programming in Clojure
Beyond syntactic templates
Genetic search for template mutations
Automating Edits: using existing tool support
requires specifying
where to apply a change
what changes to apply
carefully ensuring
no unwarranted changes are applied no required changes are missed
Complexity: Ekeko vs IntelliJ
Ekeko
IntelliJ “replace structurally”
matches
Templates to the rescue
template = code + meta-vars & wildcards
template
Templates to the rescue
wildcardmeta-var
template = code + meta-vars & wildcards
Templates to the rescue
wildcardmeta-var
template = code + meta-vars & wildcards
limited to syntactic matching …
1/ directives for control over matching
2/ support for template composition
3/ directives for control over rewriting
advanced code templates in Ekeko/X
I’m a “structural search and replace” on steroids
1/ directives for control over matching2/ formal operators for template mutation3/ genetic search for mutation recommendations
template
matches
[<component>]@[<directive>]
match any match is in source code, with matching type and properties
orsimple qualified name match is any name resolving to name in template(equals ?var) any exposes match
child, child+, child* any match is corresponding child of parent match, nested within that child (+), or either (*)
match|set list match has at least given elements, in any order
match|regexp list match has elements described by regexp
(type ?type)type/variable declaration/referenceexpression
match resolves to, is of, or declares the type of its argument
(subtype ?type), (subtype+ ?type), (subtype* /type)
type/variable declaration/reference
match resolves to a (transitive +, reflexive *) subtype of the given argument
(refers-to ?var) expression match lexically refers to local, parameter or field denoted by its argument
(invokes ?method) invocation expression match invokes given argument
[<component>]@[<directive>]
[….acceptVisitor(…)]@[(invokes ?method)][public void acceptVisitor(ComponentVisitor v)…]@[(equals ?method)]
constraining syntax, structure, data flow, control flow of matches
grouping of templates
2/ support for template composition
class template with field and methods
separate field, methods, class templates
3/ directives for control over rewriting
search templates=>replacement templates
replaces argument with instantiation of template for each match
replace any replaces its operand with a template instantiation
add-element lists adds a template instantiation to its operand…
not necessarily an entire template match!
before
after
public class BreakStatement extends Statement {@EntityProperty(value = SimpleName.class)private EntityIdentifier label;
public EntityIdentifier getLabel() {return label;
}
public void setLabel(EntityIdentifier label) {this.label = label;
}}
public class BreakStatement extends Statement {@EntityProperty(value = SimpleName.class)private EntityIdentifier<SimpleName> label;
public EntityIdentifier<SimpleName> getLabel() {return label;
}
public void setLabel(EntityIdentifier<SimpleName> label) {this.label = label;
}}
Automating Edits: using Ekeko/X
Live Demo
Meta-programming library
Search-and-replace
Recommender system
Logic meta-programming in Clojure
Beyond syntactic templates
Genetic search for template mutations
But specifying templates is still hard…
often requires multiple iterations
no unwanted matches
no required matches are missed generalization
refinement
no support for editing process
code templatesource code + meta-variables + matching directives
no disciplined methods for generalizing/refining templates
no automated support in the form of recommender system
Specifying templates in Ekeko/X …
? ?
?
?
Specifying templates in Ekeko/X …
1/ formal operators for template mutation2/ genetic search for mutation recommendations
return age; return ?v;introduce-variable
generalize-aliases
atomicmutation
composite mutation
public class Book {private Integer count;public Integer getCount() {
return count;}
}
public class Book {private Integer ?v1;public Integer getCount() {
return [?v2]@[(refers-to ?v1)];}
}
(Operator. "add-directive-invokes" operators/add-directive-invokes :refinement "Add directive invokes." opscope-subject applicability|methodinvocation "Requires matches to invoke the binding for the meta-variable." [(make-operand "Meta-variable (e.g., ?v)" opscope-variable validity|variable)])
constraints on their subject
constraints on their operands
constraints enable checking applicability of operator, validity of its operands + generating possible values!
(Operator. "remove-node" operators/remove-node :destructive "Remove from template." opscope-subject applicability|deleteable "Removes its selection from the template." [])
1/ formal operators for template mutation2/ genetic search for mutation recommendations
1/ formal operators for template mutation2/ genetic search for mutation recommendations
specification of which all matches are desired
given enumeration of desired matches
1 row = 1 individual = 1 template group specification
a) determine precision and recall of each individual b) determine extent of partial match for individuals without matches c) penalize excess use of directives
concurrently!
1/ formal operators for template mutation2/ genetic search for mutation recommendations
suppressing local minima
16 2 Tree-based GP
3
1y
∗
+
yx
+
+
2x
/
CrossoverPoint
CrossoverPoint
3
+
2x
/
(x+y)+3
(y+1) (x/2)*
(x/2)+3
Parents Offspring
GARBAGE
Figure 2.5: Example of subtree crossover. Note that the trees on the leftare actually copies of the parents. So, their genetic material can freely beused without altering the original individuals.
to crossover operations frequently exchanging only very small amounts ofgenetic material (i.e., small subtrees); many crossovers may in fact reduceto simply swapping two leaves. To counter this, Koza (1992) suggested thewidely used approach of choosing functions 90% of the time and leaves 10%of the time. Many other types of crossover and mutation of GP trees arepossible. They will be described in Sections 5.2 and 5.3, pages 42–46.
The most commonly used form of mutation in GP (which we will callsubtree mutation) randomly selects a mutation point in a tree and substi-tutes the subtree rooted there with a randomly generated subtree. This isillustrated in Figure 2.6. Subtree mutation is sometimes implemented ascrossover between a program and a newly generated random program; thisoperation is also known as “headless chicken” crossover (Angeline, 1997).
Another common form of mutation is point mutation, which is GP’srough equivalent of the bit-flip mutation used in genetic algorithms (Gold-berg, 1989). In point mutation, a random node is selected and the primitivestored there is replaced with a di↵erent random primitive of the same aritytaken from the primitive set. If no other primitives with that arity ex-ist, nothing happens to that node (but other nodes may still be mutated).When subtree mutation is applied, this involves the modification of exactlyone subtree. Point mutation, on the other hand, is typically applied on a
[Genetic programming, a field guide]
1/ formal operators for template mutation2/ genetic search for mutation recommendations
(Operator. "add-directive-invokes" operators/add-directive-invokes :refinement "Add directive invokes." opscope-subject applicability|methodinvocation "Requires matches to invoke the binding for the meta-variable." [(make-operand "Meta-variable (e.g., ?v)" opscope-variable validity|variable)])
(Operator. "remove-node" operators/remove-node :destructive "Remove from template." opscope-subject applicability|deleteable "Removes its selection from the template." [])
crossover
mutation
probability finetuning…
best templates after 30 iterations
[public void acceptVisitor(ComponentVisitor v);]@[(invoked-by ?v17892744)]comp.acceptVisitor(v)
0.91
desired
[public void acceptVisitor(ComponentVisitor v){...}]@[(invoked-by ?v20420073)]comp.acceptVisitor(v)
0.91
[public void acceptVisitor(ComponentVisitor v) ??v23406365]@[(invoked-by ??v23499077)]?v23184877(v)
0.90
[….acceptVisitor(…)]@[(invokes ?method)][public void acceptVisitor(ComponentVisitor v)…]@[(equals ?method)]
Example run: polymorphic invocations
Ongoing Experiment
RQ1: how effective is the search in finding template changes?
RQ2: do users find the recommended changes helpful?
RQ3: do composite, template-specific mutations converge more quickly to a solution than generic code mutations?
finetuning: very sensitive to probabilities of crossover and mutation, quality of RNG, diversity in population, …
Meta-programming library
Search-and-replace
Recommender system
Logic meta-programming in Clojure
Beyond syntactic templates
Genetic search for template mutations
Future work involving Ekeko/X
and possible collaborations
buggy
fixed
evolvedclone B
clone B
evolvedclone B
fixed
system variant A system variant B system variant C
evolvedclone C
clone C
evolvedclone C
fixed
PATCH
PATCH'
PATCH''??
search for transformation editssuch that it can be applied to a variant of the system it was intended for