Report (docx) - Home page - Lancaster University

P a g e | 1

Darren Lee

Generating IDE support for Dynamic Languages

B.Sc. Computer Sciencewith Software Engineering

19th March, 2010

“I certify that the material contained in this dissertation is my own work and does not contain unreferenced or unacknowledged material. I also warrant that the above statement applies to the implementation of the project and all associated documentation. Regarding the electronically submitted version of this submitted work, I consent to this being stored electronically and copied for assessment purposes, including the Department’s use of plagiarism detection systems in order to check the integrity of assessed work.

I agree to my dissertation being placed in the public domain, with my name explicitly included as the author of the work.”

Date: 19th March, 2009

Signed:

P a g e | 2

Abstract

This project introduces DLTGen; a system designed to generate IDE support for dynamic languages from a higher level specification. The aims of DLTGen were to simplify two currently difficult tasks, firstly creating an IDE plug-in and secondly modelling a dynamic language. The final outcome of this project was evaluated by attempting to specify IDEs for languages not previously considered. DLTGen was able to support a range of complicated functionality in these languages. The power of DLTGen is that not only is it demonstrated creating tooling for smaller domain specific languages but also for full general purpose programming languages.

P a g e | 3

Contents1 Introduction........................................................................................................................1

1.1 Overview.....................................................................................................................2

1.2 Motivation...................................................................................................................2

1.3 Aims of this project.....................................................................................................3

1.4 Approach.....................................................................................................................3

1.5 Unique aspects of the project......................................................................................3

1.6 Report Overview.........................................................................................................4

2 Background........................................................................................................................5

2.1 What is a dynamic language?......................................................................................5

2.1.1 Dynamic type systems..........................................................................................5

2.1.2 Duck Typing.........................................................................................................6

2.1.3 Runtime object alteration.....................................................................................6

2.1.4 Well defined global scope....................................................................................7

2.1.5 Code generation...................................................................................................7

2.1.6 Closures................................................................................................................8

2.2 Existing Work..............................................................................................................9

2.2.1 Dynamic Language Tool Kit (DLTK).................................................................9

2.2.2 Xtext.....................................................................................................................9

2.2.3 EMFText............................................................................................................10

2.2.4 Visual Studio......................................................................................................10

3 Design..............................................................................................................................11

3.1 Design Alternatives...................................................................................................11

3.2 Goals..........................................................................................................................12

3.3 Model.........................................................................................................................12

3.4 Detecting there was a request....................................................................................13

3.5 Determine sensible completions................................................................................14

3.5.1 Annotate Action.................................................................................................15

3.5.2 Find Action.........................................................................................................17

3.5.3 Link Action........................................................................................................18

3.5.4 Algorithm Action...............................................................................................20

3.6 Surface the completion to the IDE............................................................................20

P a g e | 4

3.6.1 Insert Action.......................................................................................................20

3.6.2 Propose Action & Styling..................................................................................20

4 Implementation................................................................................................................22

4.1 Overview...................................................................................................................22

4.2 Specification Language.............................................................................................23

4.2.1 Qualifier Task.....................................................................................................23

4.2.2 Annotate Task........................................................................................................24

4.2.3 Member Access..................................................................................................24

4.2.4 Scope Searching Task........................................................................................25

4.3 Framework.................................................................................................................26

4.3.1 Special Algorithms.............................................................................................26

4.3.2 Eclipse Integration Points..................................................................................27

4.4 Generator...................................................................................................................28

4.4.1 Path Resolution..................................................................................................28

4.4.2 Code Safety........................................................................................................28

5 System in operation..........................................................................................................29

5.1 Ruby Background......................................................................................................29

5.2 Getting Started...........................................................................................................29

5.2.1 Prerequisites.......................................................................................................29

5.2.2 Creating an Xtext project...................................................................................29

5.2.3 Creating a DLTGen project................................................................................29

5.3 Specifying the language and IDE..............................................................................30

5.3.1 Global Scope......................................................................................................30

5.3.2 Type System & Literals.....................................................................................30

5.3.3 Global Variable Accessor..................................................................................31

5.3.4 Simple Type Inference.......................................................................................33

5.3.5 Advanced Type Inference on External Iterators................................................35

6 Testing..............................................................................................................................37

6.1 Framework.................................................................................................................37

6.1.1 Testing Procedure...............................................................................................37

6.1.2 Features Tested...................................................................................................37

6.1.3 Results................................................................................................................38

6.2 Generator...................................................................................................................38

P a g e | 5

6.2.1 Methodology..........................................................................................................38

6.2.2 Features Tested...................................................................................................38

6.2.3 Results................................................................................................................39

7 Evaluation........................................................................................................................40

7.1 Scala..........................................................................................................................40

7.1.1 C Style Syntax....................................................................................................40

7.1.2 Local Type Inference.........................................................................................40

7.1.3 Object Orientated & Type System.....................................................................42

7.1.4 Higher-Order Functions.....................................................................................43

7.1.5 Polymorphic Methods........................................................................................43

7.1.6 Scala Findings....................................................................................................44

7.2 EOL...........................................................................................................................44

7.2.1 Model Access.....................................................................................................44

7.2.2 Implicit Typing...................................................................................................45

7.2.3 Operations..........................................................................................................45

7.2.4 Model Writing....................................................................................................46

7.2.5 EOL Findings.....................................................................................................46

8 Conclusion........................................................................................................................47

8.1 Review of Aims.........................................................................................................47

8.1.1 IDE Generation..................................................................................................47

8.1.2 IDE Feature Support..........................................................................................47

8.1.3 Dynamic Language Feature Support..................................................................48

Single focus type system...........................................................................................48

8.2 Future Work...............................................................................................................49

8.2.1 Specification Language Format.........................................................................49

8.2.2 Eclipse Features.................................................................................................49

8.2.3 IDE Interoperability...........................................................................................49

8.2.4 Dynamic Language Research.............................................................................49

8.2.5 Expanding the Specification Language..............................................................50

8.3 Lessons Learned........................................................................................................50

Bibliography.............................................................................................................................52

P a g e | 6

References

Appendices

Appendix A – Model Visualisation 55

Appendix B – JavaScript Generics 56

Appendix C – Specification Language Section 57

Appendix D – JavaScript Runtime Object Alteration 58

Appendix E – Ruby DLTGen Specification 59

Appendix F – Ruby Grammar60

Appendix G – Tests 61

Appendix H – Scala IDE Screenshots 62

Appendix I – Ruby IDE Screenshots 63

Appendix J – Original Project Proposal 64

Working documents can be found at:

http://www.lancs.ac.uk/ug/leed2/

P a g e | 1

http://www.lancs.ac.uk/ug/leed2/

1 Introduction

1.1 OverviewProgramming languages are always evolving, but recently a lot of focus has been put on dynamic languages such as Javascript, Ruby and Python. John Ousterhout (1) made the argument over a decade ago that programming tasks are becoming more connection focused and that dynamic languages are better suited for this. With web technologies in particular this has become the case.

Although a lot of work has been done and is being done on making dynamic languages better, an area where there is less work is creating tooling for dynamic languages. This project introduces DLTGen (dynamic language tooling generator) as a mechanism to simplify creating this tooling. Specifically this is achieved by providing a higher level description of the IDE (Integrated Development Environment). This benefits creating of tooling by removing some of the complexity of processing a dynamic language but also the complexity of creating a sophisticated IDE.

In doing this there are a number of technical difficulties due to the fact that dynamic code tends to be fairly flexible. Unfortunately definitions of what constitutes a dynamic language are rather ambiguous. An abstract definition often used is a program which can change the program structure during runtime (2). This very open-ended ideal is difficult to classify but could include evaluating Strings as new code, updating the type system, extending object definitions among others. Some definitions state that a dynamic language should also have a dynamic type system (3) however many do not. Finally another definition gaining popularity is any language which is easy to use; many dynamic languages make claims of the productivity and learnability benefits (4) (1) (5).

The ultimate outcome of this project is a mechanism to specify and generate features of an IDE and dynamic language processing. The code generated will provide a complete IDE plug-in. It will also support manual modifications to the code; this will enable DLTGen to be used as a complete solution or just a starting point for creating sophisticated IDEs.

1.2 MotivationThere are two key motivations for this project: both processing dynamic languages and building a sophisticated IDE are difficult tasks. Processing a dynamic language is difficult because they are so flexible and sometimes it is not possible to achieve an accurate interpretation without fully executing the code. For example, a variable may change data type but only under a certain path of execution and in order to know the type we would need to know what path was followed, which would be too difficult to process without interpreting code. In Chapter 2 of this report several trickier dynamic language features are described and what makes them difficult to process.

Today there are many great IDEs and most of them have a plug in architecture. However, most of the better ones have come to support such a large set of features that developers get

P a g e | 2

lost in details. Of course, for big languages like Ruby and Actionscript the producing companies hire large teams of developers to create great IDEs, but for a single developer, researcher or a small team it is much harder. To use a specific example, Eclipse (6) is an example of a particularly large IDE framework; there are many ways to do the same thing. A lot of the ‘getting-started’ documentation provided is too basic to create a sophisticated IDE and referencing what others have done often ends up leading one towards a solution which is now deprecated. There is a need for a simpler way to create sophisticated tooling to aid language researchers and developers. For example, developers often embed dynamic languages into systems to provide extensibility mechanisms. There are some great resources to help this however there are very few resources to help create tooling for their particular flavour of the source language. These toy languages are too small to invest substantial time and money into creating tooling. A higher level generated solution could make this more economically feasible.

1.3 Aims of this projectThe goals of this project are categorised into two areas, simplifying IDE creation and simplifying supporting dynamic languages. The aims are listed below.

Generate IDEs to simplify the process of tooling creation. Specify basic and sophisticated IDE features. Support dynamic language features.

Specifically the aim of this project is to successfully develop a specification language for IDE features and build a generator to create an IDE from it. This tool could then be used to evaluate the projects effectiveness in creating reasonably sophisticated IDEs for new dynamic languages.

1.4 ApproachThe goals of this project are large and complex, it is beyond the scope and time requirements of the project to take a traditional approach - carefully researching dynamic languages and designing detailed abstractions. Instead this project took a different, iterative, approach. Firstly; hand written IDEs for two popular dynamic languages - Javascript and Ruby - were created. From these hand-crafted IDEs, abstractions for the specification language were created and the generator built. To validate the generator the two original handmade IDEs were re-specified in the new specification language. Finally the system was evaluated for new languages. The languages targeted in evaluation were intentionally not considered in designing the abstractions in order to yield better evaluation. However, researching the area of dynamic languages inevitably influenced how generic the abstractions were kept.

1.5 Unique aspects of the projectThe core unique aspect of this project is its aim to specify sophisticated IDE support in a higher level language. Being able to abstractly define an IDE significantly reduces the work load associated with creating tooling. By providing such a mechanism DLTGen could make tooling more economically viable to smaller research and toy languages.

P a g e | 3

Another key point in the uniqueness of this project is that it can be used for full scale general purpose languages and not just smaller DSLs. This flexibility shows that generated solutions are at least feasible for full scale languages.

The core feature that this project supports is code completion mechanisms. Other generated IDE solutions only provide simplistic code completions for syntax. DLTGen goes much further to provide complicated code completions which interact with knowledge of the language and source code.

1.6 Report OverviewThis section provides a brief overview of the proceeding chapters in this report. Chapter 2 describes some of the background research which took place in order to understand dynamic languages and IDE generation. It describes common characteristics of dynamic languages and the difficulties posed in supporting them in tooling. It also looks at existing solutions how far they go and their relative merits. Language examples are JavaScript unless otherwise stated.

Next, the design chapter conceptualizes the problems DLTGen was required to solve. The solutions which were created are described in a high level using conceptualizations and examples to demonstrate the benefits of the decisions made. This leads into Chapter 3, Implementation, takes a closer look at how the abstract features from design operate together to create an IDE. This includes how they can be composed into a larger IDE description and interact with each other.

Chapter 5 provides a walkthrough of creating an IDE for the Ruby programming language. Particular language features are described and then decomposed into a solution represented in specification.

Chapter 6 describes the testing methodology for DLTGen and outlines some of the tests performed. The tests discussed in this chapter are looking purely at system robustness. The following chapter, evaluation, looks at how effective the system is. This chapter takes two new languages, Scala and EOL and discusses to what extent DLTGen was able to support them. The focus in this section is what types of language features were able to be supported, what were not and how flexible the solution is.

Finally, chapter 8 summarizes the findings of the project and how successful it was against the original objectives. This chapter also discusses possible future work and research that could improve and add to DLTGen.

P a g e | 4

2 BackgroundChapter 2 discusses the background of the problem to help understand where the technical difficulties for this system lie. This chapter also looks at relevant existing solutions.

2.1 What is a dynamic language?In order to support dynamic languages we need a firm definition of what one is. As previously mentioned this is a poorly defined area. In this section of Chapter 2 are descriptions of some features commonly associated with dynamic programming languages. Each one is described generally followed by a JavaScript example to help conceptualize the technical problem it poses.

2.1.1 Dynamic type systemsDynamic typing is a mechanism whereby type checking is done during execution of the program and not compile time as with traditional static languages. In a dynamic typing system “values have types but variables do not” (7). If I define the variable x as shown in figure 2.1 it has no data type. Only after the second line of code executes does it have a data type String, when the third line executes it becomes a Number.

var x;x = "Hello World";x = 42;

Figure 2.1, Value based typing example

The variable x has no type associated with it; its type is determined by its value. This makes type inference very difficult because it requires tracking the values put into x. In a static language we could just look at the definition, in a dynamic type system we cannot. While performing type inference on the variable x the system needs to be more aware of its lifetime, instead of looking for any declaration of x the system needs to find the most relevant use of x for a given line of code.

if(true){ x = "Hello World"; //X here is a string}else{ x = 42; //X here is a number}

Figure 2.2, Typing Alternatives

The lifetime of a variable could be even more complex, in figure 2.2 x has two different data types depending on the evaluation of the if statement. Similarly the value of x could be different outside of the if statement. To provide code completion for this example is possible without interpretation, because the code we want to complete will be in either branch, providing mechanisms to guide the search of variable usage would be enough.

P a g e | 5

2.1.2 Duck TypingDuck typing is a more flexible form of dynamic typing. Duck typing is only concerned with what can be done with the data. The name comes from a concept called the “duck test”:

“when I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.” (8)

We are less interested in what the data is; we only care that we can do with the data what we need to do. To continue the duck analogy we don’t care that it is a duck, we only care that it can quack (9). In figure 2.3 there is a function which pokes a given duck producing a quack. The parameter duck has no type, and it could be anything as long as it can quack.

var human = {quack: function(){ print("Human imitates duck"); } }var duck = {quack: function(){ print("Quaackkk!"); } }

function pokeDuck(duck){ duck.quack();}

Figure 2.3, Duck Typing Example

//Valid callspokeDuck(human);pokeDuck(duck);

//Invalid callspokeDuck(new Object());pokeDuck("A String");

Figure 2.4, Use of duck typing

Figure 2.4 demonstrates what would and would not be valid code for the ‘pokeDuck’ function. Neither a blank object nor a string can be used as the parameter ‘duck’ because they don’t have a quack function. In type inference, duck typing is often used to determine members of parameters. The members of an object represent the actual children of the data, these could come from a range of sources. By matching a function call (caller) and a function (callee) an inference system can observe use of the parameter and produce inferences from that value. This mechanism can work both ways.

2.1.3 Runtime object alterationA common feature of a number of dynamic languages is runtime object alteration which relates to dynamic typing systems. It provides a means for a language to change what features are available on an object at runtime.

var bob = new Person();var paul = new Person();bob.drinksWine = true;

(paul.drinksWine == null) //True

Figure 2.5, Object Alteration Example

P a g e | 6

The variable bob and paul are both instances of the “Person” class. In JavaScript it is perfectly valid to add a new member to one and not the other. By the time line 3 of figure 2.5 has executed bob has a “drinksWine” member. This member is not part of the Person class but has been added dynamically. As a result paul does not have an equivalent as the variable was not attached to paul. If we wanted to provide the member for both we can attach it to the classes “prototype”.

Person.prototype.drinksWine = true;//Rest of the code

Figure 2.6, Class Alteration Example

Every new Person class instance that is created after the code in figure 2.6 is executed will have a ’drinksWine’ variable. However, any instances created before that line executed will not have this member. Clearly this is a difficult concept to process by just looking at source code, it requires keeping track of when particular objects came into existence, and it is far beyond the capabilities of this project to support this level of complexity.

2.1.4 Well defined global scopeDynamic languages rarely compile to a low level, and because of this it is impossible in many dynamic languages to interact at the system level. This obviously restricts access to much needed system level tasks such as I/O. The solution provided by a number of dynamic languages is well defined global scopes. In this context a scope refers to a place in the program runtime which stores variables/functions/classes and potentially instances of any object defined in the language. Each scope represents a place from which accessors/mutators can find/adjust these instances of language concepts such as types, variables and so on. Core functionality is usually written using native code and linked in a well-defined scope so that the target language can interact with it. Languages which do this include Actionscript (10), PHP (11), Ruby (12) and many more. The problem with this is it relies on external mechanisms because the objects in the global scope are not defined in the target language. Given the large dependency on these well-defined global scopes there needs to be a mechanism of hand crafting a global scope and bringing it into the parse tree/model for analysis.

2.1.5 Code generationA number of dynamic languages have a simple mechanism to generate code on the fly, turning Strings into evaluated code. It is potentially a very powerful feature but it is an unwieldy one. The problem with code generation for a system processing source code is how complicated the process of building the code can be. Figure 2.7 shows two examples of how dynamic code generation might be used.

P a g e | 7

//Simple examplevar code = "alert('" + msg + "');";eval(code);

//Complicated examplefunction createClass(id, superType){ var code = "function " + id + "(){" code += "}"; code += id + ".prototype = new " + superType + "()"; eval(code);}

Figure 2.7, Eval Example

The first example simply takes a message and generates new code which will invoke the alert function. Although not overly complicated, it still requires evaluating the code rather than analysis. The second example shows a truly horrific use of eval, generating a new class on the fly. In earlier versions of Ecma script (Javascript) there is no built in syntax for performing polymorphism, many web toolkits provide code similar to that which is shown above to generate classes with polymorphism on the fly. The KONtx framework (13) from Yahoo is an example of a toolkit which does this. With the potential complexity of this feature it is likely to be very difficult or near impossible to support without using a code evaluation technique.

However, Erik Meijer poses (14) a set of common uses for executing Strings as programs, and numerous dynamic languages are starting to provide specific support for these features. This reduces the significance of this particular dynamic language feature and the discussed features include:

A substitute for higher order functions. Deserialization of objects. Meta programming.

2.1.6 ClosuresA closure is a first-class1 function which retains the context it was created in. It is related to functional programming but often used in dynamic languages in general. The closure can access local variables from the scope it was created in despite the fact it has its own set of locals and local scope. Figure 2.8 shows an example of closures in JavaScript. A new function is created but it retains access to the ‘wordList’ argument from the creator’s local scope.

1 A function can be created, passed around, stored at run time

P a g e | 8

function spell_checker(wordList){ var temp = function(input){ return checkSpelling(input, wordList); } return temp;}

Figure 2.8, Closure Example

Constructs such as these also impose a need for some way to guide name binding like duck typing.

2.2 Existing WorkThere are a number of projects (many of which are still experimental) which aim to make IDE creation and supporting dynamic languages easier. The existing solutions fail to meet this project’s requirement by either having high entry levels, not being generative or only providing basic IDE features.

2.2.1 Dynamic Language Tool Kit (DLTK)The dynamic language tool kit (15) is an Eclipse project aimed at reducing the complexity of creating a fully featured IDE for a dynamic language. The main component is a mechanism to process dynamic code using an interpreter and providing frameworks to access some of this information.

Interpretation rather than analysis is likely to yield more data for dynamic languages. However, there is no free lunch with the DLTK like many of the IDE frameworks and libraries in Eclipse. The documentation (16) guides the developer through creating over 40 classes to interact with the DLTK and the resulting IDE is still fairly simplistic. This project does seem to have a lot of promise however for more experienced developers willing to invest the time.

2.2.2 XtextXtext (17) provides a framework for developing DSLs and programming languages. From a specification in the Xtext language it generates a model (implemented in EMF). Along with this it generates a source code editor. The code editor provides syntax highlighting, hyperlinking and basic content assist/code completion. Code completion provided is based on syntax suggestions rather than language features, for example a suggestion to close a bracket rather than a suggestion of variable members. There are extensions which generate other IDE features such as an outline view, project wizards, etc. This makes Xtext more desirable to this project as it already has an extension mechanism in place. The main problem with Xtext is the poor developer experience. There are two potential issues for developers and this project using Xtext. Firstly it is poorly documented, there is a user guide (18) however if you want to add advanced features or extend Xtext there is very little to help you. Much of the code, although shipped with source often just has generated comments. The second issue is how often it changes; it undergoes heavy changes between versions which can break previous work. This is understandable for an incubation project.

P a g e | 9

A technology related to Xtext is Xpand2. Xpand is a code generation framework, used by Xtext. It allows code generation against an EMF model. Code is defined in template files and directives in the source will bind data from the EMF model. Xtend also supports calling Java code during generation; this provides a great deal of flexibility.

The Xpand language’s key feature is the ability to “expand” a template for a certain EMF object or collection of EMF objects. This works well with this project as it creates several reusable directives in the specification. The code generated for each directive can be put into one template and be expanded from other templates.

2.2.3 EMFTextEMFText (19) is an interesting system which allows the specification of a textual language for an EMF model. It’s a great way to create a DSL with easy parsing as an EMF model. Like Xtext the DSL is specified and an IDE is generated. Although it is fit for its purposes the IDE is fairly simplistic for the aims of this project. There is also less opportunity to extend EMFText than there is to interoperate with Xtext.

2.2.4 Visual StudioVisual Studio has some of the best code completion. Their extension system is called “Language Services”, it uses a ‘lex’ lexer and ‘yacc’ (20) grammar. Visual Studio 2010 provides good support for internal languages built on top of the DLR2 (21) (e.g. C# 4.0). However, this is still in beta and it is unclear to what extent Language Services will make these features available for dynamic language integration. It is also unclear whether or not the language must run using the DLR.

2 Dynamic Language Runtime

P a g e | 10

3 DesignThe purpose of this chapter is to address what the system needs to be capable of doing and which mechanisms have been created to accommodate this. There are several pieces to this project; frameworks, generators and utilities, however this chapter will only outline solutions abstractly - Chapter 4 will discuss some of the more interesting technical specifics in more detail. Below is an overall conceptualisation of the processes which will be discussed in this chapter.

Figure 3.1, Overall Design

3.1 Design AlternativesBefore addressing the process and results of design, the features DLTGen will support need to be made clear. As shown in Chapter 2 there are a number of existing solutions related to generating IDEs. The one that was most rich in features and appropriate to this project was Xtext and as a result DLTGen is implemented as extensions to Xtext. Xtext provides a number of IDE features including syntax highlighting, properties, wizards, hyperlinking, outlining and more; however it does not provide sophisticated code completion (also known as Autocomplete). The decision was made that this was the best place for DLTGen to add value. Code completion is a very rich IDE feature; it is also one of the most complicated IDE features to create. Code completion is a complicated problem for any language to handle, however dynamic languages add extra complexity by being harder to analyze. Therefore the majority of DLTGen’s features will be aimed at supporting sophisticated code completion.

A fundamental decision taken by DLTGen is the way it analyses source code. DLTGen provides a simple mechanism whereby additional functionality can be attached to elements in the Xtext grammar. There are a number of other approaches for analysing source code, one such method is interpretation. Executing select statements or complete code to infer inferences is clearly a very powerful idea with the potential to yield a lot of information. For this projects purposes however the mechanisms of interpretation are too complex and language specific. Another method would have been to create workflow analysis mechanisms

P a g e | 11

where language interactions are defined. Defining a complete workflow analysis however does not lend itself well to a simple definition.

3.2 GoalsCode completion can manifest in several forms. Generally it consists of showing a popup list of possible completions for the current context and allowing the user to select the correct one. It could also be presented as automatically inserting code when there is less of a choice for the end user. Code completion as a process can be divided into the following stages:

1. In the background continually build up meta data for language objects and provide inferences.

2. Detecting there was a request/opportunity to provide code completion.3. Determine sensible completions for the current context.4. Surface the completion to the user.

The rest of this chapter will describe the solutions provided by DLTGen to handle these four requirements using a higher level specification.

3.3 ModelIn the background of any sophisticated IDE is always a model of the code being worked with. In DLTGen the model represents an index of meta data derived from looking at source code and other sources so that it can be used later in other IDE features. Building the model is a multistage process but it usually starts with detecting language features in the source code. Xtext already provides an opportunity to express language features in a grammar and DLTGen builds on this. Non-terminals in this grammar can be processed to become part of the model and the understanding of the source code. There are three types of data that need to be captured in the model to provide sophisticated code completion; these include static data, runtime data and type data. Any language feature (non-terminal) can makeup these different data elements.

Static Data represents well defined Meta data that is loaded only once. This allows well known language features to be pre-defined in a hand written specification and included in the model. This makes providing a well-defined global scope possible, a key feature in many dynamic languages. The developer would declare their global scope statically in an XML file or in the target language which would then be referenced in the IDE features specification as a static scope.

The next type of data is Runtime data; this contains data for constructs found in the current source code. The way this data is collected is defined in the specification language. Runtime Data will be thrown away periodically as the document changes and features get removed. Meta data in the runtime model has a time to live, when this time is up the data will be completely reprocessed. This improves efficiency as we do not have to process every language feature every time we need to use the model.

Finally there is Type data; this is information about data types in the type system. Not all languages have type system; however it is common enough to provide support at this level in

P a g e | 12

DLTGen. This data can be affected by and contain Meta data from static or runtime sources. It maintains data such as members of a data type, polymorphic information etc. Data modified in the Type model by the static model is non-volatile, data modified by the runtime model is volatile

Creating the model is the job of “model builders”; a model builder is a group of language processing mechanisms defined by the user to contribute to the model for a particular non-terminal. It describes how each language feature contributes to the model and how certain aspects should be processed. This is described further in Chapter 4. Appendix A provides a visualization of the model sources and what data they might contain.

The reason for collecting this data is so that other IDE features can use it. As previously stated the life cycle of an IDE feature is to detect the request, find the proposals and display them. No proposals can be found if there is no model. Similarly it is unreasonable to do all this processing within the life cycle of an IDE feature as it would be too inefficient. This chapter continues by describing the life cycle of an IDE feature, which will always assume to have a model behind them.

3.4 Detecting there was a requestIn order to detect a request typically “activation characters” are registered with the IDE framework, when one of these characters is detected, the plugin is given a chance to determine if there is an opportunity to perform code completion. This is unnecessary boiler plate code for a generated solution; instead DLTGen provides “triggers”. A trigger is invoked when a certain prefix is found in the code, for example if in Javascript the user writes the key phrase “new” it would be sensible to propose a list of instantiable classes. There are two types of trigger, static and model. A static trigger does something basic such as closing a bracket, for which we do not need to look inside the model. A model trigger requires access to the model to explore language features before it can complete its task. The reason for the distinction is to provide better efficiency, accessing the parse tree and model require them to be locked to other code, it is senseless doing this when the model is not required for a simplistic task.

A common use for a static trigger is to automatically close balanced characters such as brackets, braces and string literals, in the following example when an open bracket is detected a closing bracket is automatically inserted.

<staticTrigger sequence="("> <insert> <sequence>")"</sequence> <offsetSource>insertEnd</offsetSource> <offset>-1</offset> </insert></staticTrigger>

Figure 3.2, Static Trigger Example

The new character is inserted at the current cursor position however where the cursor ends up can be controlled. By providing an offset of -1 the cursor is moved back into the brackets,

P a g e | 13

this is useful in Javascript where arguments can be put in between brackets. Static triggers rarely get more complicated than this.

Model triggers are where the core functionality of DLTGen starts. The following example demonstrates displaying a list of classes when the new keyword is entered. In JavaScript, any function is theoretically instantiable.

<modelTrigger sequence="new "> <find var="funcs" in="model" quantity="*"> <condition>funcs.class == JS_Function</condition> </find> <propose>funcs</propose></modelTrigger>

Figure 3.3, Model Trigger Example

Figure 3.4 Possible result of the trigger

When the “new “ key phrase is detected the “actions” in the trigger will begin to function. There are a number of actions which help provide the desired functionality and they will be described in more detail further in this chapter. It would be possible in future work to provide additional trigger invocations such as key combinations or menus however for this project’s purpose key phrase invocation is enough to demonstrate a wide range of language feature support.

3.5 Determine sensible completions The most difficult aspect of providing code completion is deciding what completions are sensible in the current context and most of this project’s efforts have been put into this problem. The problem can be decomposed into two key tasks shown in Figure 3.5. The first task is transforming features from source code into data which is usable. When a language feature is detected it is automatically added to the model, however a simple Xtext grammar parse tree is not enough information to perform code completion. DLTGen provides an opportunity to put additional information into the model against a non-terminal to help subsequently explore the model. The second task in determining sensible completions is interpreting the context and the model. For the example of the new keyword, the interpretation would be finding all instantiable classes.

P a g e | 14

Figure 3.5, Tasks and Actions

To achieve each of these two tasks there are “actions”. An action is a well-defined procedure which can be combined together in an imperative way to produce functionality. Figure 3.5 shows some of the actions that are required to achieve the two tasks; however, the boundary can and often does merge. This section of the document will demonstrate how some of the key actions can be used to help determine sensible completions for different language features as well as showing what they provide to the system as a whole.

3.5.1 Annotate ActionAnnotations are a method of attaching Meta data to a non-terminal object. The Meta data is simply a key value pair and can be anything. What it fundamentally enables is the ability to do work ahead of time. Processing can be done using other actions to determine some characteristic of the language feature which will subsequently be useful. When storing Meta data on a non-terminal all non-terminal attributes are accessible, this is an extension to the Xtext mechanism not a replacement. Below is an example of an Xtext non terminal.

JS_Return: 'return' (expression=JS_ValueStatement) (';'?);

Figure 3.6, Javascript Return Statment Non-Terminal

The only attribute of a Javascript return statement is “expression”, new attributes cannot be introduced into the grammar unless they represent something parseable in the source text - it is a grammar not a model. In DLTGen any attribute from the Xtext grammar can be referenced but new values can also be defined. This also allows DLTGen to expose special Meta data variables such as ‘_path’ which represents the qualified path of a non-terminal. This is used to uniquely identify an object to help track the life cycle of a feature and determine which data can be cached or thrown away. Below is a scenario where there is data that can be worked out ahead of time to benefit support of the dynamic language feature object alteration. Imagine the following Xtext non-terminal:

JS_DynamicAssignment: (objectName=JS_TERMINAL_IDENTIFIER)'.' (memberName=JS_TERMINAL_IDENTIFIER)'=' (expression=JS_ValueStatement) (';'?);

P a g e | 15

Figure 3.7, JavaScript Assignment Non-Terminal

It is intended to match runtime object alteration, that is to say there is a variable (objectName) and a new member is being added to it (memberName).

var o = new Object();o.name = "Bob";o.

Figure 3.8, Object Alteration Example

In the above example, ‘o’ is the objectName and ‘name’ is the memberName. When it comes to determining the members of ‘o’ it would be useful if the model already knew that the JS_DynamicAssignment ‘name’ is a member of the JS_Var ‘o’. The following annotation for a JS_DynamicAssignment could detect and store that meta data.

<annotations>  <find var="obj" in="scope" quantity="1"> <condition>obj._name == self.objectName</condition> </find>  <annotate>self.ownerObject = obj</annotate></annotations>

Figure 3.9, Object Owner Annotation

When it comes to determining the members of ‘o’ the model can be searched for all objects with “ownerObject” set to the variable being processed. The keyword “self” in DLTGen refers to the current non terminal. Any attribute on the non-terminal is accessible as well as some special attributes such as ‘_name’. The ‘_name’ represents the qualified name, assigning this for an object makes searching by name possible. Whenever an action has a “var” attribute, the result of that action can be accessed by other actions using the name given, this can be seen on the last line of figure 3.9.

Annotations can also append to collections, when trying to determine a list of proposals the following syntax is used:

<annotate>result.members += aListToAdd</annotate>

Figure 3.10, List Append Example

It takes the form of the collection += new member or members, the variable “aListToAdd” could be, for example, the results from searching the model.

These mechanisms of analysing the semantics of the model to transport information could be compared to a less specific attribute grammar (22). The mechanisms provided by DLTGen care less about the exact semantics of the model or parse tree. Constructs simply relate to something they know is somewhere in the model without the complexity of an attribute grammar.

P a g e | 16

3.5.2 Find ActionOnce a Model is being maintained the data inside it needs to be accessible for that data to be useful. To deal with this task there is the Find action, it provides a mechanism to search the model among other things providing filters to constrain results. The Find action does however go much further by providing a wide range of “search sources” that help constrain where the system can expect to find the results, most importantly providing a consistent syntax but also improving efficiency. Providing several sources also keeps the syntax minimal, there is no need to specify additional conditions repeatedly to capture the common functionality which the search sources represent. Figure 3.11 shows just a few of the more useful search sources.

Figure 3.11, Selection of Search Sources

Being able to query these data sources enables a wide range of language features to be supported because it is so flexible. Just a few of the things it can enable include name binding3, member resolution4 and much more.

The specification for a find action always requires 2 parameters. The first is called ‘var’, this represents a name which makes the results accessible. This variable can be referred to by other actions subsequently; it could even be the source of another find. The second attribute is named ‘in’ which represents which search source to use, this can be a well-known name such as “model” or a previously named variable (e.g. result of a previous find action). Any find which returns non-terminals also requires a quantity attribute, this can be ‘1’ or ‘*’ to represent one or many. Figure 3.12 shows the basic syntax of a find action, it attempts to find the class which owns the current feature.

<find var="jsClass" in="parents" quantity="1"> <condition>jsClass.class == JS_Class</condition></find>

Figure 3.12, Find Parent Class Example

Conditions allow the results to be constrained, any of the attributes on the non-terminal as well as Meta data and special values can be tested. The special attribute “class” is what type of non-terminal it is, this is being used to pick out only JS_Class non-terminals. The result is referable as “jsClass”, for conditions this is how they reference the current non-terminal being filtered, subsequently it represents the result value. This found value can be used like any non-terminal; all its properties are accessible including Meta data and special values.

3 Ability to match an identifier to the language feature/definition it refers to.4 Ability to find features owned/contained by another language feature.

P a g e | 17

When searching inside a non-terminal it is also possible to direct the search. This is useful to provide custom scoping rules, in Javascript when looking for the value of an identifier there is a well-defined order of which scopes to search first:

1. The current function’s local scope, this includes variables and arguments inside the current function.

2. The parent classes local scope.3. The global scope.

This logic can be captured as search rules for a “JS_Function” and “JS_Class” in the specification. This feature is described and demonstrated more in Chapter 4.

3.5.3 Link ActionLinking is a mechanism to define the relationship between non-terminals so one can be turned into another. For example there is a well-defined relationship between a generic parameter5 and a generic value6, when displaying a method which returns a generic parameter what really should be displayed is the generic value. The problem is however that the generic parameter and the generic value belong to two completely separate language features.

A fundamental difference between analyzing dynamic code and analyzing static code is a greater need to observe use of language features rather than their definition. To determine a variable’s data type in Javascript, the system must look at its use (an assignment) rather than its definition. This is a recurring theme in a number of dynamic language features. To support more advanced uses of this DLTGen has a feature called the “inference stack”. While determining completions the system (as guided by the specification) will visit a number of non-terminals. Each non-terminal visited is put onto the inference stack to keep a history of all features involved in calculating the current completions. The inference stack is exposed in two forms, firstly as a search source and secondly through Linking.

To demonstrate this below is an example of how this could work for the previous example of generic parameters as supported in newer implementations of JavaScript (ECMAScript fifth edition (23) & ECMAScript Harmony). Figure 3.13 shows the syntax of generics in this version of Javascript.

var list = new Vector.<String>();list.item(0).

Figure 3.13, Javascript Generics

The variable ‘list’ is assigned to a Vector instance; however, the vector has a generic parameter with the value ‘String’. This indicates that it is a Vector (list) of Strings. The second line of code in figure 3.13 shows calling the item function, for the Vector class this should return a value with the same type as the generic parameter. Presume that the following three non-terminals exist with the given attributes (grammar definitions for these can be found in Appendix B).

5 Generics is a mechanism of allowing types to be determined later.6 The value provided for the to be determined type.

P a g e | 18

JS_Instanciator (name:STRING, genericParam:JS_GenericDef) JS_Class (name:STRING, genericParam:JS_GenericValue) JS_Function (name:STRING, returnType:JS_ReturnType)

When proposing the functions for a Vector instance there is a JS_ReturnType which could be a generics parameter. This needs to be converted into a JS_GenericValue in order to display it. To resolve this value linking can be used along with the inference stack, figure 3.14 shows the contents of the inference stack for the given

Figure 3.14, Inference Stack

To invoke the linking mechanism the “Link” action is used, it is specified as shown in figure 3.15. It states that aim of converting “self.returnType”, as in the JS_ReturnType into a JS_GenericValue.

<link var="retType" type="JS_GenericValue"> <from>self.returnType</from></link>

Figure 3.15, Link Action Example

In order for this link to work the specification needs to define the links between the various non-terminals involved in this transformation, these are shown in figure 3.16.

Figure 3.16, Transformation Route

The rules are as follows, a JS_ReturnType can be converted into a JS_GenericDef (found on a JS_Class) if they have the same name. That JS_GenericDef can then be converted into a JS_GenericValue (found on a JS_Instanciator) if they have the same index7.

7 The index value represents their index in their containing feature, for example a list of arguments.

P a g e | 19

3.5.4 Algorithm ActionDynamic languages have a wide range of features; DLTGen attempts to create generic constructs which can be used to support these features. Sometimes these generic features are not enough and adding additional actions would clutter the specification too much. For these circumstances there are special algorithms. Common algorithms which would be too complex to define in the specification are implemented using the algorithm action. Some of the algorithms available will be described in Chapter 4 however below is an example of the syntax. It is basically an attribute set along with what data to use as an input and where to put an output.

<algorithm var="duck" in="scope" input="self" type="DuckTyping"> <attribute id="caller" value="JS_Call" /> <attribute id="callerArg" value="JS_ArgumentValueListTail" /> <attribute id="callee" value="JS_Function" /> <attribute id="calleeArg" value="JS_ArgumentNameListTail" /></algorithm>

Figure 3.17, Duck Typing Algorithm

Figure 3.17 is an example of achieving duck typing, a common way to resolve values for arguments which are otherwise unknown.

3.6 Surface the completion to the IDEIn order to take the completions found and present those to the user there needs to be mechanisms to interact with the IDE. DLTGen provides three mechanisms to handle this described below.

1. Insert Action – Used to insert text into the code editor.2. Propose Action – Used to display a pop up list of completions.3. Style Action – Used to style a completion.

These mechanisms are explored in brief detail in the remainder of section 3.5 as they are fairly simplistic.

3.6.1 Insert ActionThe insert feature allows the IDE to insert automatic textual completions into the current code editor. This could be used to insert static strings or the results processed using other actions. In section 3.3 there is a simple example, for more information see the appendices for examples or the user manual for full instructions on using it.

3.6.2 Propose Action & StylingOnce a trigger has found a list of proposals it needs to tell Eclipse to display them and how they should be displayed.

The style action is used to describe how a proposal for a particular non-terminal should look when it is displayed in Eclipse. Whenever a non-terminal is “proposed” by a content assist trigger this mechanism is invoked to determine how it should look. Determining the style is a simple task, it involves providing values for attributes like which label to use, what icon to use etc. It can also specify a different value to insert when selected and control where to

P a g e | 20

move the cursor after insertion. The following example would be appropriate for JS_Function non-terminal. The label should be the name of the function followed by the arguments list.

<proposal>  <algorithm var="args" input="self.args" type="StringJoin"> <attribute id="seperator" value=", " /> <attribute id="value">input.name</attribute> </algorithm>  <style> <label>self.name "(" args ")"</label> <value>self.name "()"</value> <offsetSource>insertEnd</offsetSource> <offset>-1</offset> <icon>ASSIGN_ICO</icon> </style></proposal>

Figure 3.18, Proposal Style Example

The first action is using a special algorithm to join the arguments into a comma separated list, which is then used in the label of the proposal. The value attribute represents what will be inserted when selecting this proposal, this is the name of the function followed by a pair of parenthesis, however the -1 value used for “offset” will move the cursor back into the bracket pair so the user can enter their arguments.

P a g e | 21

4 Implementation

This chapter discusses, in more technical depth, how the various components of the system come together to generate an IDE from a specification. Things that will be looked at include how the specification language can be composed to support language features. It will also look at some of the more useful algorithms provided by DLTGen. Finally there will be a brief overview of the sub system which generates the IDE code.

4.1 OverviewDLTGen is essentially made up of three separate components each having several different tasks. Figure 4.1 shows these three components and what they provide. Not all of these will be discussed, more detail can be found in the working documents and user manual.

Figure 4.1, Core Components of DLTGen

Specification LanguageChapter 3 discussed some of the abstract concepts supported in the system and specification language. For these actions to be useful they need to be grouped together to perform certain tasks. DLTGen provides a number of tasks which are overviewed in this chapter.

FrameworkThe majority of DLTGens functionality has been put into a common library called ‘DLTFrameworkLib’. To create clean and concise generated code as much functionality as possible was created in this framework to allow the generated code to simply invoke it. Some of the features in this framework will be described including some of the eclipse integration points and some special algorithms.

GeneratorThe generator is an Xpand2 generator; responsible for turning the specification into code. In order to keep the specification clean this project had to create a number of generator

P a g e | 22

extensions. These manage the code as it is generated keeping track of variables, their data types and if they are nullable8. This ensures DLTGen generates efficient and safe code.

4.2 Specification LanguageThe specification language is split into 6 sub sections where different language and IDE features can be described; these can be seen in Appendix C. Most of these are fairly simple and will not be covered outside of the user manual; the focus of this chapter is the “classes” section which is where model builders are placed. A model builder is a definition of a series of common tasks which can be performed for a particular non-terminal. The syntax of a model builder is shown in Figure 4.2 for a JS_Var non-terminal.

<classes> <eclass class="JS_Var">  </eclass></classes>

Figure 4.2, Model Builder for non-terminal JS_Var

By providing a set of common tasks for a given non-terminal the result of these well-known tasks can be exposed to other model builders and triggers. One of these tasks is the “members” task described in section 4.2.3; the purpose of this task is to return a list of members for the given non-terminal.

This works to the strength of the grammar, for example a variable might be defined as (name=ID) ‘=’ (value=Statement) where Statement is a non-terminal with several non-terminal alternatives. The alternatives could be ‘StringLiteral’, ‘NumericLiteral’, ‘Instantiator’ for example. The members task of a variable could simply return the members of its “value”, the Statement non-terminal. Depending on which alternative it happens to be a different members task will be invoked. This way of defining common tasks keeps the specification simple and reduces redundancy.

4.2.1 Qualifier TaskQualifying a non-terminal serves two purposes in DLTGen, it allows the user to provide values for an object’s name and an objects full path. The full path is used internally to uniquely identify objects to improve efficiency. Everything a non-terminal is parsed into the model it is qualified. The two special attributes are referable as ‘_name’ and ‘_path’.

<eclass class="JS_Var"> <qualify> <annotate>self._name = self.name</annotate> </qualify></eclass>

Figure 4.3, Model Builder Qualifier

For any non-terminal which defines this behaviour other actions can use their _name and _path values, this is useful as it provides a well-known location for the name of a particular

8 Characterised by being able to hold the value null, and therefore be unsafe to use without a test.

P a g e | 23

language feature. This makes name binding, a very common task in code analysis, much simpler.

4.2.2 Annotate TaskThe annotate task provides an opportunity to add to the model as soon as a new non-terminal is parsed. The annotate task is always invoked straight away for every non-terminal. This is where processing that can be done ahead of time such as finding related non-terminals should be done.

4.2.3 Member AccessA member is a child of particular language feature, for example an instance of a String in JavaScript has a number of static and instance functions and properties as members. The key value added by the members task is it provides a simple way for each non-terminal to determine their own children. Other specification features can then access this members list that was returned and filter/sort them as required. In a statically defined language this is not required, typically all the system needs to know for a static language is what data type something is and it can find the members itself. In a dynamic language constructs such as runtime object alteration mean this can be less predictable. Below is an example of how JavaScript’s particular flavour of runtime object alteration can be supported using the members task. Figure 4.4 shows some simple JavaScript statements, a variable is created of type Object and then two new members are added to it, its children are no longer just the members of the class Object.

var details = new Object(); details.username = "test"; details.password = "test";

Figure 4.4, Runtime Object Alteration

To determine the members of the variable ‘details’ the IDE must look at its type and its runtime attached members, figure 4.5 shows this.

Figure 4.5, Member Resolution Visualization

P a g e | 24

First it uses the members of its expression that is what it was assigned to; this could be anything including an identifier pointing at another variable. In the above example it was assigned to an Instanciator so its members then find the Type and use its members. The other source is runtime alterations for the variable. To retrieve these, a simple find action can be used. Specification for this support can be found in Appendix D.

4.2.4 Scope Searching TaskIn many languages, dynamic and static there is often a structure to how a scope is searched, the scoping task allows the user to control this. In DLTGen any non-terminal can be a scope, without search rules looking inside the current scope will only ever find objects found below the object in the parse tree. However, the scoping task allows the specification of which other scopes can be searched and in what order.

Using the JavaScript scope search example from chapter 3 there are three stages to resolving an identifier in a function. As shown in figure 4.6, first it must look inside the functions own scope. If it does not find the member there it looks in the class which owns the function’s scope (ECMAScript 3rd Edition only). If it still cannot find the variable it will look in the global scope.

Figure 4.6, Scope Searching Hierarchy

Figure 4.7 shows how the first stage of this process could be created in specification. The JS_Function should search its local members first (_contents) and then it should search its parent class. A similar set of rules could be declared for the JS_Class non-terminal. This task has a special action only usable in the scope task. “<find-source>” this determines what the next scope to search is, there can be several of these in one scope definition and it will search the scopes in the order defined. Like a sub-routine call, any other scope used as a “find-source” will be fully explored before returning to the next given source.

P a g e | 25

<scope>  <find-source>self._contents</find-source>  <find var="jsClass" in="parents" quantity="1"> <condition>jsClass.class == JS_Class</condition> </find> <find-source>jsClass</find-source></scope>

Figure 4.7, Example Scope Search Rule

4.3 FrameworkThe framework in DLTGen is where all core functionality is implemented, it contains code intended to be invoked by generated code, helping to keep the generated code concise. This chapter looks briefly at a few of the more fundamental components in the framework which improve the overall understanding of how DLTGen works. The particular areas that will be discussed are fundamental algorithms and eclipse integration points.

4.3.1 Special AlgorithmsThere is some functionality which is too complicated to express using the generic constructs in DLTGen. To avoid cluttering the specification specific, algorithms can be exposed as “special algorithms”. The most common algorithm used during this project was dot typing; all four languages looked at have support for this. Dot typing is a mechanism of accessing members of objects and subsequently the members of other members by building a path separated with dots.

var str = "Hello world";str.toUpperCase().toString().

Figure 4.8, Dot Typing Example

The above example is performing dot typing on the variable “str”. This variable is a String; strings have a member called “toUpperCase” which returns another String. String also has a member called “toString” which returns a String. So, when dotting off the final member in Figure 4.8 it should propose String type members. The dot typing algorithm deals with this as follows. In the following specification presume “path” is the dot path string that has been previously resolved, in figure 4.8 path would be the complete second line of code.

<algorithm var="dotItem" in="scope" input="path" type="DotTyping"> <attribute id="seperator" value="." /></algorithm>

Figure 1.9, Dot Typing Invocation

The algorithm splits the input by the separator and then parses each piece on the fly. Using tasks described in section 4.2 it can then determine the members of the first section. It then finds a member in that list with the same qualified name as the next section, this is then repeated. Figure 4.10 demonstrates this with the aid of a sequence diagram.

P a g e | 26

Figure 4.10, Dot Typing Algorithm

4.3.2 Eclipse Integration PointsA key aim of DLTGen was to make creating an IDE possible without being familiar with the underlying IDE. This has been realized with an IDE framework neutral specification language. However it is worth briefly overviewing exactly what has been abstracted away and what specification features map to which Eclipse specific feature.

4.3.2.1 BuilderIn Eclipse, all background processing is performed in construct called a “builder”. A builder is typically used to incrementally update models related to the source text as it changes. DLTGen uses builders to perform the same function. A number of features including type system building, non-terminal annotation and qualification are performed in the builder as the code changes. Non-terminals are processed and given a time to live; this is to ensure any invalid data (perhaps from parsing broken code) does not stick around for too long. When runtime features need to use the model it is requested from the builder and locked so no more updates can occur during processing. This removes a number of difficult problems developers face when creating a builder. The specifier does not need to concern themselves with efficiency, model exclusion or mechanisms to determine what code changed to improve performance.

P a g e | 27

4.3.2.2 Content Assist ProcessorEclipse provides a lot more freedom for code completion invocation via a mechanism called the content assist processor. Eclipse simply provides a mechanism to be called back when certain “activation characters” are entered into the source text. It is then the developer’s responsibility to not only create proposals but also filter them as the user continues to type; this is unnecessary boiler plate code. This construct has been replaced by triggers in DLTGen, they are simpler as they have activation strings not only characters. In addition to this all sorting and subsequent filtering is automatically dealt with.

4.4 GeneratorAn amount of what the generator does is expanding a template for a given language feature in the specification, however most of the time more advanced processing needs to be undertaken to make the code work efficiently and safely. The following two sections discuss these issues and the technical problems they pose.

4.4.1 Path ResolutionDLTGen’s specification is not as simple as defining simple values for simple attributes, many of the constructs are model interactions dependent on the particular language and grammar. For example, take the annotation “x._name = y.name”. The system needs to be certain what the most primitive types of x and y are before it can generate code to access its variables. If the system knows what specific non-terminal ‘y’ is then it can generate “y.getName()” otherwise it needs to go via a general getter which takes the form: y.getAnnotation(“name”). Obviously it is in the interest of efficiency to directly call the getter but it is sometimes not possible. If y was the result of a find action all the system can be certain of is that it is a non-terminal (or String for document search). DLTGen paths can also be much more complicated and go many levels deep.

To support this DLTGen provides a mechanism to keep track of variables as code is generated, from this contextual information reflection is used to determine what can be directly accessed and what cannot. If there is no feasible way to get an attribute an error is produced to guard against specification mistakes.

4.4.2 Code SafetyDLTGen is a higher level specification and therefore removes the concern of code safety. The code DLTGen generates is determined to be safe or not and if not appropriate statements are inserted to make it safe. For example, presume “self” represents a JS_Var non-terminal. It is legal to access “self.expression._name” where expression is another non-terminal JS_Expression. However, it is possible that expression could be null making this statement unsafe. This is solved in DLTGen by keeping track of what is ‘nullable’, from this information safety can be determined and unsafe expressions can be wrapped in safety checks.

P a g e | 28

5 System in operationThis chapter demonstrates using the system to create an IDE for the language Ruby. It will explain how to get started as well as go through supporting some Ruby language features. Not everything will be covered as Ruby is a complicated programming language with too many features to cover. A full specification and grammar can be found in appendices E and F respectively.

5.1 Ruby BackgroundRuby is a dynamic general purpose programming language. It has gained a lot of popularity recently as a web technology “Ruby on Rails”. Ruby supports a number of programming paradigms including functional, object orientated and imperative. Ruby was chosen as one of the two original target languages because it is popular and introduced new concepts Javascript did not. To follow is a brief overview of some of the interesting features of Ruby which make it desirable to this project as a good example of a dynamic language. For more information about Ruby see the official website (24).

Everything is an expression. Everything is imperatively executed including declarations. Unique block syntax and external iterators (discussed in more detail later). Everything is an object.

5.2 Getting StartedThe following section provides a brief overview of how to get started using DLTGen.

5.2.1 PrerequisitesIn order to build an IDE using DLTGen, Xtext must be installed. Xtext provides the grammar engine and the starting point for IDE integration. The Xtext version used for this project is 0.7.2 and can be downloaded from the Xtext distribution site (25). It is essential to use the correct version as Xtext has a habit of changing features dramatically between versions.

5.2.2 Creating an Xtext projectThe Xtext project is the starting point for DLTGen. For more information on setting up a project consult the user manual for this project. When a project is created a grammar file and a generator workflow will be created.

The first stage in developing an IDE is to develop this grammar. This document will not explain ways of going about this, information can be found on the Xtext documentation website (26). Grammars for all the languages in this project can be found on the project website.

5.2.3 Creating a DLTGen projectAll that is required to convert the Xtext project into a DLTGen project is to right click on it and go to the DLTGen -> Add DLTGen support menu option. This will include the necessary

P a g e | 29

bundles, create a specification xml file and add the DLTGen fragment to the generator workflow. The created specification xml file is where the language specification referred to in the following section should be placed.

5.3 Specifying the language and IDEThere are several tasks involved when specifying an IDE. For the most part, a specification does not need to be complete; it is pointless to qualify a string literal for example. In order to specify just what is required the best approach is to pick a feature to support and add the required pieces in a methodical way.

5.3.1 Global ScopeA good starting point is usually defining any well-defined global scopes. The subsequent features added will most likely benefit from having some global data they can display. For example showing a list of classes after an instantiator would be dull if there were no classes defined.

There are two ways to define a global scope, it can be hand written in XML for complete control or provided in the target language that the system can parse. Figure 5.1 demonstrates this; it defines the String type and the signatures for its members. This Ruby code file is simply referenced in specification under the static model as shown in Figure 5.2.

class String def capitalize() end def casecmp(other_str) end def chomp() endend

Figure 5.1, Ruby String Type

<staticModel> <scope id="global" src="global.rb" type="language" /></staticModel>

Figure 5.2, Global Scope Loading

5.3.2 Type System & LiteralsRuby is a full general purpose language with a basic class system familiar to any Java programmer. In order to reference types in our model ‘type specifiers’ need to be defined, these define what non-terminals represent class definitions and how to retrieve their members. It can also provide other information such as polymorphism descriptors.

P a g e | 30

<typeSystem> <typeSpecifier class="ClassDef"> <members> <annotate>result.members += self.statments</annotate> </members> </typeSpecifier></typeSystem>

Figure 5.3, Type Specifier Example

In the above example, any ClassDef non-terminal represents a class; this non-terminal contains an attribute called “statements” which contains the children of the class. This will be used as the members of a ClassDef. Whenever the language parses a ClassDef non-terminal it will register a type using the qualified name and add its members to the type’s member list. For the example in figure 5.1 the type “String” would be registered with 3 MethodDef’s as children.

In many programming languages there are “literals”, inline values which represent an instance of a class with a particular value. For example, in Ruby “Hello World” is a String literal; it represents an instance of the String class with the value “Hello World”. Once there are rules in the grammar to match a literal it is usually necessary to define the members of a literal as the members of the type they are instances of. There is a short hand for defining this in DLTGen shown in Figure 5.4.

<literals> <literal class="LiteralString" type="String" /> <literal class="LiteralNumber" type="Number" /></literals>

Figure 5.4, Literal Definition

5.3.3 Global Variable AccessorThe first complete feature this walkthrough will implement is suggesting variable names. In Ruby there are several scopes where variables can be stored; one of these is the “global” scope. This is not to be confused with DLTGen’s concept of a global scope. There is only one global scope, if a variable is put into it anywhere in code it can be taken out anywhere else. The syntax is very simple, the variables name is prefixed with a dollar symbol to indicate it is global.

$globalVar = "Hello World"

Figure 5.5, Ruby Global Variable Assignment

A reasonable IDE feature for Ruby would be when the user enters a dollar symbol the IDE proposes a list of all global variables it knows about. The following three stages are required to make this feature work.

1. Detect the dollar symbol has been entered.2. Find all the global variables.3. Display the variables found in a proposal window.

P a g e | 31

Firstly, to detect the dollar symbol a trigger needs to be defined. It must be a model trigger because it is going to search the model for global variables. The next task is to find all the global variables and propose them. This is achieved simply using a find action followed by a propose action.

<modelTrigger sequence="$" id="GlobalVars"> <find var="globVars" in="model" quantity="*"> <condition>globVars.class == GlobalVariable</condition> </find> <propose>globVars</propose></modelTrigger>

Figure 5.6, Trigger to find global variables

The find action is searching in “model”, this means it will look everywhere. The quantity is “*” (many) because the goal is to find all the global variables. Finally there is a condition which ensures it only finds GlobalVariable non-terminals. Once the variables are found it simply proposes what was found, DLTGen automatically removes duplicates. This represents the processing portion of the completion.

Before this feature will work there needs to be specification to guide the presentation of the proposal. Figure 5.7 shows a possible styling where the label is simply the name of the variable; a future expansion to this could include showing the variables data type. Figure 5.8 shows what this feature looks like when passed through the generator.

<eclass class="GlobalVariable"> <proposal>  <style> <label>self.name</label> <value>self.name</value> </style> </proposal></eclass>

Figure 5.7, Non-Terminal Proposal Style

Figure 5.8, Global Variable Proposals

P a g e | 32

5.3.4 Simple Type InferenceIn Ruby, the value of a variable can be determined by looking at how it is assigned. For example, if a variable is assigned “Hello World” it obviously is a String. Ruby is a C style language, it supports dot typing, members of a previously defined construct are accessed by providing the name followed by a dot.

str = "Hello World";str.

Figure 5.9, Ruby Dot Typing Example

The above example defines a local variable with the name “str”. It is given a value of a String literal. When the second line is processed, the appropriate suggestion would be members of the String type. Figure 5.10 shows the stages in making the inference.

Figure 5.10, Process of Type Inference

The first task is to create the model trigger which will begin this process. This trigger must find the dot path, the reference as typed in by the user to perform inference upon. Then it uses the dot typing algorithm to find what the path refers to. Finally it needs propose what was found.

<modelTrigger sequence="." id="DotAccess"> <find var="path" in="document" direction="reverse"> <break>EOF</break> <!—In reverse mode is start of file--> <break>Block</break> <break>BraceBlock</break> <break>MethodDef</break> <break>ClassDef</break> <break>';'</break> <break>'|'</break> </find>  <algorithm var="dotItem" input="path" type="DotTyping"> <attribute id="seperator" value="." /> </algorithm>  <propose>dotItem.members</propose></modelTrigger>

Figure 5.11, Ruby Dot Typing Trigger

This trigger basically resolves the dot path and passes it into the dot typing mechanism; the result of this then determines the members shown.

P a g e | 33

In the Ruby grammar a non-terminal called “LocalVariableRef” is defined, it represents a reference to a local variable with no modifier. To make the dot typing algorithm work for a LocalVariableRef there needs to be a member’s task for it. All it needs to do is find an object with the same name in the same scope and use its members. To be able to do this of course there needs to be a qualification task for any non-terminals involved so that their name value is available to match against.

<eclass class="LocalVariableRef"> <members>  <find var="obj" in="scope" quantity="1"> <condition>obj._name == self.name</condition> </find> <annotate>result.members += obj.members</annotate> </members></eclass>

<eclass class="InstanceVariable">

<qualify> <annotate>self._name = self.name</annotate> </qualify> <members> <annotate>result.members += self.tail.members</annotate> </members></eclass>

Figure 5.12, Example Identifier Resolver

The next stage, as shown in figure 5.10, is to use the value the variable was assigned to as its type. The members request can simply be forwarded to the variables value non-terminal, this is shown in figure 5.12 under the InstanceVariable’s members task. A similar handing off is defined for the non-terminal “AssignmentTail”, as this is what InstanceVariable is followed by in the grammar. The AssignmentTail class simply contains an operator followed by an expression, the expression‘s members are what will be used. In Figure 5.9 the expression is a StringLiteral, in section 5.3.2 literals were defined, this automatically returns the members of a String for the members of a StringLiteral. When this is generated and run the IDE functions as shown.

Figure 5.13, Ruby Member Completion

Providing members tasks for different non-terminals will extend what can be inferred when dotting off a local variable.

P a g e | 34

5.3.5 Advanced Type Inference on External IteratorsA particularly interesting and popular feature of Ruby is external iterators; they allow a method call to provide a block of code which should execute when the method being called “yields” data. The concept can be unusual to Ruby newcomers, below is an example which illustrates how it works.

class NumericRange def each()

//Dummy yield to infer numeric data is yielded yield 0; endend

def sample() r = 1..4 r.each { |x| print(x) }end

Figure 5.14, Ruby External Iterator

In Ruby, a numeric range can be defined as X..Y, on a numeric range class is a method called “each”. This method iterates over each number between x and y and yields it. The “sample” method shows how this can be used. When the each method yields data the external iterator will execute and the value of x will be what was yielded. The code would print “1234” if executed.

Imagine code dotting off the x external iterator argument. A DLTGen IDE can infer x to be a number and propose members of the Number class. In order to achieve support for this the problem must first be decomposed to understand what language features are interacting, figure 5.14 shows a possible chain of inferences which could support this feature. The ultimate goal is to take the Iterator argument non-terminal and find an equivalent YieldStatement non-terminal.

Figure 5.14, External Iterator Inference

This can be thought of as three fairly simple find actions as shown below.

P a g e | 35

<members>  <find var="caller" in="parents" quantity="1"> <condition>caller.class == MethodCall</condition> </find>  <algorithm var="callee" input="caller.path" type="DotTyping"> <attribute id="seperator" value="." /> </algorithm>  <find var="yield" in="callee" quantity="1"> <condition>yield.class == YieldStatement</condition> </find> <annotate>result.members += yield.members</annotate></members>

Figure 5.15, Yield Resolver

First it finds the MethodCall, the external iterator in the grammar is a child of the method call so it can expect it to be a containing feature. The method being referred to can have a complicated path, in the example it is “r.each” therefore it must be processed through the dot typing mechanism. Finally it just needs to find an appropriate yield in the callee. Below is a screenshot which shows the feature running for the example above.

Figure 5.16, Inferring Types of Iterator Arguments

P a g e | 36

6 TestingThis chapter focuses on the procedures undertaken to verify the stability and robustness of the implementation. Testing was split into two areas, the framework features and the code generator, each requiring different testing strategies. All tests discussed in this chapter and their results can be found in Appendix G.

6.1 FrameworkAs previously discussed, most of DLTGen’s features are implemented as a library referred to as the framework. This framework is made up of several parts which represent the various processes the IDE has to perform at runtime. These include features such as maintaining a model, updating the model, searching the model etc.

6.1.1 Testing ProcedureTests for the framework were derived using a black box method; features were tested rather than individual units and components. Tests were defined to verify DLTGen features such as model management and the various actions. The tests employ a number of typical and non-typical uses to ensure a feature is robust, it would not be tested thoroughly enough if it was only validated in the way DLTGen generates code which uses the feature.

Most of the features in the framework are dependent on having source data, the framework cannot build a model or act upon it without a grammar or parsed grammar objects for example. To provide a basis for testing the Ruby grammar was used along with a number of sample ruby scripts. Automated JUnit9 tests were created where the setup for each test establishes this test environment.

6.1.2 Features TestedFour main features which present more complexity than other aspects of this project were tested. These are briefly discussed below.

Model Management, these tests are designed to ensure DLTGen accurately maintains the language models. This includes checking model builders are used correctly; appropriate caching takes place and the life cycle of non-terminals is accurate.

Annotations, it is important that all methods of interacting with non-terminals and Meta data are tested as the generator will use them all, attempting to provide the most efficient mechanism for the context.

Find Action, there are many ways in DLTGen to search and filter the model, many different search sources add to this complexity. For this reason the constructs behind the Find action were chosen to be automatically tested.

Linking Action, although a less used feature it has a reasonable amount of complexity which can benefit from automated testing.

9 http://www.junit.org/

P a g e | 37

6.1.3 ResultsAlthough features mostly worked as expected some problems did arise and were fixed. There were issues regarding security. Given that an Xtext and DLTGen project make use of multiple Eclipse projects there are security concerns when using features such as reflection. This was solved by providing interfaces for the project in the correct security domain to implement. The local project did security sensitive work and passed the results. An example of this is creating an instance of a MetaEObject. These were not accessible outside of the Xtext project therefore the model loader provides a factory for this.

Another issue which arose from testing was the difficulty in debugging a particular feature. Many of the constructs in DLTGen have several potential failure points but the correct one was never made obvious to the user. To aid this, a debug flag was added to the Model class. When this flag is turned on extra debug information will appear in the console. Imagine the dot typing algorithm; it could fail on any part of the dot separated path. The improvements made to the system shows the routes taken. Similarly, a debug action was added to the specification. This allows the user to create their own debug prints from within the specification. This helps users print the state of non-terminals and the model to debug features.

6.2 GeneratorThe job of the generator is to take the specification and turn it into java code, in doing this it has to perform a number of tasks, some of which will warrant detailed testing. While processing a specification the generator needs to detect erroneous or nonsensical specification and report appropriately to the user. The generator also needs to create efficient, accurate and safe Java code. This is more complicated than it may seem and involves keeping track of previously generated code and the context. This rest of this section will discuss how some of these issues were dealt with.

6.2.1 MethodologySimilar to the testing of the runtime framework, the generator cannot operate without a grammar; once again the Ruby grammar will be used to perform tests. For tests which validate erroneous specification a project was created called “dltgen_tests” which defines a range of erroneous specification. It would be too complicated to integrate specification generation testing into JUnit tests as it is so heavily built into the Xpand2 engine, therefor these must be executed manually. More atomic features of the generator such as the path resolver were tested with JUnit testing.

6.2.2 Features TestedBelow is a brief overview of what generator features were involved in testing the generator.

Path Resolver, DLTGen does not dumbly swap out text when it generates code, it attempts to understand the statements it is given to generate the most efficient constructs. This is the task of the Path Resolver; it keeps track of what is known about variables as to most efficiently generate code.

P a g e | 38

Path Safety Checker, It would be too buggy to simply generate complicated statements without considering their safety. Because of this the generator has a feature which validates how safe generated paths are and wraps them in constructs to make them safe. This component is important to generating reliable code and warrants automated testing.

Erroneous Specification, The generator reports back a number of errors which can be detected in the specification, to derive the errors and ensure they are caught a range of test specifications were created.

6.2.3 ResultsThe key problem found by testing the generator was how it sometimes mistakenly did not recognise attributes of a non-terminal child types. The mechanism for looking at this was observing the ECore model; however, attributes could be primitive types or Strings which are obviously not defined in the ECore model. As a result, the generator path resolver was re-written to use reflection, this mechanism proved much more flexible as it would resolve paths accurately based on their type. Other minor issues presented themselves from testing the generator such as certain nullables not being null checked. Issues like this were resolved as they were found.

P a g e | 39

7 EvaluationTo evaluate DLTGen as a whole, new IDEs were created for languages not previously considered during design and implementation. It is important that these languages were not researched before evaluating the system because of the iterative approach this project took. While researching this project it became clear how diverse and flexible dynamic languages can be. For this reason attempts were made to keep solutions flexible to give a better chance of them translating to other languages. This chapter will look at the two new languages and see how far the system was able to go in supporting their language features. The two languages chosen were Scala and EOL. Screenshots of Scala and Ruby IDEs in operation can be seen in Appendices H and I respectively.

7.1 ScalaScala is a dynamic and functional language built on top of the Java Virtual Machine. Scala is a very interesting language, it makes extensive use of implicit typing, however, unlike most dynamic languages that are implicit it constrains what can and cannot be implicitly typed. Scala supports a lot of C style constructs to ease migration; however in a lot of cases it usually provides a more appropriate functional alternative. More information about Scala can be found on its website (27). This section of the document will look at some of Scala’s features and observe which DLTGen can support and which it cannot. Below is an overview of some of the features which could be implemented or partially implemented. To follow this are more technical details of the supported features.

Local Inferences. Compound types. Higher order functions. Duck typing. Polymorphic methods.

7.1.1 C Style SyntaxScala uses C style syntax and therefore there is an opportunity to provide some simple static completions. DLTGen was able to define a set of bracket/brace/string balancing static triggers. Being able to auto-complete syntax is likely to often work with DLTGen as it is such as simple process.

7.1.2 Local Type InferenceScala supports inferring the value of local variables and values; this is supportable in DLTGen, variables and values proxy their statements value. Figure 7.1 shows a range of mechanisms in place. Firstly, types of values are determined by what they are assigned to, the system also uses the closest value to the line being completed to ensure the most appropriate inferences. Within this the IDE also understands that changes made within blocks (if statement) only apply below the statement. More advanced mechanisms such as dot typing were used to resolve paths like those used to access methods in a call.

P a g e | 40

Figure 7.1, Scala Local Type Inference

DLTGen was able to support this feature well, what is less known is how well it can cope with more complex paths. Due to the time constraints of this project the grammars are partially incomplete, it is therefore unknown if DLTGen could infer a more complicated expression, e.g. “x + y”. Full expression evaluation would probably be beyond this version of DLTGen, however expressions like the one above can be thought of in a similar way to the duck typing mechanism. The duck typing mechanism takes atom non-terminals which make up a path and adds its understanding onto them. A special algorithm for expression evaluation could potentially be created (especially for languages supporting operation overloading10); however this approach would likely never be as accurate as an interpreted approach.

Another type inference issue unsolved by DLTGen is the issue of branching. In Scala, but also in Javascript), it is possible to return different data types from a method depending on which path of execution took place. In the following example, the method could return a string or an integer depending on the argument “type”.

def getUserID(userID:Int, type:String) ={ if(type == "String"){ userID.toString() }else{ userID }}

Figure 7.2, Branching Type Inference

For a system analyzing code this is almost impossible to infer. The system needs to execute the code in order to know which path occurs for a particular invocation. However, what could be determined from code analysis is that it will return a String or an Int. Future work for this project could include surfacing the uncertainty to the developer by proposing members of both types. It would need to be clear in the UI that the proposals are two different eventualities rather than all available at any time.

10 Operation overloading allows source code to change what operators such as + do for certain class types.

P a g e | 41

7.1.3 Object Orientated & Type SystemAlthough many features of object orientation are associated more with statically typed languages it is still a common feature in dynamic languages and is supportable in DLTGen. Scala supports a polymorphism mechanism similar to Java’s; a class can extend another class and implement a “trait”11. This behaviour is exposed by the members shown in a proposal and is controllable using the type system features exposed by DLTGen, this can be seen in figure 7.3. The figure also shows suggesting possible super types, having removed its own class from the list.

Figure 7.3, Object Orientated Completions

Another Scala type system feature which was dealt with by DLTGen is “Inner Classes” (28). Unlike Java, when a class is defined inside another it becomes a member of the outer class. This feature helps to package up related functionality better.

Another supportable feature is Scala’s compound type mechanism. Types in Scala can be defined as “X with Y” to denote they can be treated as either class X or Y. DLTGen can support this because it gives full control over determining the members of everything. The grammar can be used to detect the sequence and the specification simply finds both types and uses their members.

The biggest problem faced by DLTGen while supporting Scala relates to name spacing. Scala has complete access to any Java library, obviously DLTGen has no support for this given how specific it is to JRE languages and given Ruby and Javascript self-define their classes. What DLTGen does have is a concept of global scopes; a custom global scope loader could possibly be built to load in a Java library’s contents into non-terminals. This alone would not be enough however, there is no way in the specification to determine which global scopes are loaded and which are not. For example a Java library scope should only be loaded when an appropriate import statement is found in the source text. Therefore the only way to currently support it would be to load all library constructs in at the start. Understandably this would be far too inefficient if done for the standard java packages.

11 Similar to Java interfaces but allows partial implementation.

P a g e | 42

7.1.4 Higher-Order FunctionsAs previously stated it is quite common for dynamic languages to contain functional constructs and Scala is one of these languages. While evaluating Scala support was created for higher order functions, a function which can be returned or taken as a parameter (29). This works because the specification sets up a method to return its final statement, including other method definitions.

Figure 7.4, Higher-Order Function Screenshot

7.1.5 Polymorphic MethodsOne of the more unique features in Scala is a concept called “Polymorphic Methods” (30). Methods can be parameterized similar to how a class can be parameterized using generics. This mechanism can be partially supported in DLTGen. Unlike generics in Java the system needs to observe how the method is used rather than how it is defined. DLTGen’s inference stack proved to be a useful mechanism in supporting this. In the example below there is a class called “List”. One of its methods “map” is a polymorphic method, it defines a parameter called “Q” which also defines the generic parameter of the return type. As the second half of figure 7.5 shows both generic parameters Q and T were inferred. The system was able to look at the inference stack to find the class instantiator and the method call which between them declare the values of Q and T.

Figure 7.5, Polymorphic Method Screenshot

The short coming of DLTGen’s solution is it does not support mixing mechanisms like this well. For example, if the method returned “Q” instead of returning List[Q] then the system would need to first determine if Q was a generic parameter, if so use that otherwise look for the data type “Q”. Although this is theoretically possible in the specification language it is a

P a g e | 43

very clumsy definition. DLTGen could benefit from a cleaner solution for providing alternative routes for the same inference.

7.1.6 Scala FindingsOverall a reasonable number of Scala features were supported however there were problems. Scala made it obvious that DLTGen, having being based on two fairly simplistic typing systems (Javascript and Ruby) is not fully equipped enough to deal with complicated type systems. Scala has a lot of features from the JRE as well as some of its own unique features.

Scala also made it clear that the specification needs mechanisms to define alternatives. In a Scala method the type can come from several sources (as discussed in 7.1.5). Defining these alternatives is not powerful enough in DLTGen to fully support polymorphic methods.

7.2 EOLThe Epsilon (31) project provides a family of model interaction languages which are based on a core language called EOL. EOL (Epsilon Object Language) is a dynamic language which can be used to perform tasks based on models as well as update models; it uses a lot of implicit typing in its syntax. In a model independent syntax it provides access to a number of modeling systems including EMF, XML and MOF. The reason EOL was chosen to evaluate DLTGen is because unlike the other languages tested up to now it is not general purpose programming language. It also brings some of its own interesting and unique features. In order to evaluate EOL a custom component needed to be created for the IDE, a model connector, this is effectively a custom global scope loader. It is a justifiable hand written addition to permit testing the core functionality of DLTGen with EOL. Below is a summary of what the EOL IDE was able to achieve.

Inclusion of foreign data (EMF model). Scattered class members (operations). Basic implicit typing. Duck Typing.

7.2.1 Model AccessAs stated above, EOL executes with a set of, user specified, named models. In order to test additional functionality of DLTGen a model loader was created. As an interesting side note it was reasonably simple to bring this foreign data into DLTGen’s model. This process will be briefly discussed to provide a glimpse into DLTGen’s extensibility.

EMF Models were chosen as the model types which would be supported. To enable this functionality the constructs EOL uses, such as a model element, model element attribute and model element reference were created in the EOL grammar. These non-terminals had all the required attributes but were not parsed as they were not included in the root nodes set of alternatives. Next a class called EMFScope was created which simply extended DLTGen’s StaticScope type. This class simply loaded in an EMF Model and iterated over its features creating an equivalent non-terminal. The final task was to load this static scope. Classes in DLTGen are generated in pairs, the first represents a base implementation and the second is an empty super class. The super class never re-generates and therefor functionality can be

P a g e | 44

over ridden in it. The model loading class has a method used for loading static scopes, this was overridden and the EMFScope loaded into the model.

7.2.2 Implicit TypingEOL is another example of a dynamic language making use of implicit tying. Variables can have types set in the definition however it is not required. DLTGen makes this fairly simple as with other languages, it’s simply a matter of linking members tasks together to support tracing a route from identifier to specifier. Below is an example loaded with a model called “OO” that represents object orientated programming. It shows the variable “feature” has been inferred by performing dot typing and name binding.

Figure 7.6, Implicit Typing

There are more advanced opportunities in EOL to provide implicit typing however some of these would be beyond DLTGen. Wherever a collection is present in DLTGen there are a range of sorting/filtering operations which take a condition to match against. Below is an example.

x.features.select(f | f.type = Operation)

Figure 7.7, Collection Filtering

Although x.features is defined as a list of “Feature” model elements, if the above code was evaluated using an interpreted mechanism it could determine the only values left were “Operation” model elements and display those members. Once again DLTGen cannot cope with this level of language complexity because it does not evaluate code. The best DLTGen could do is show the members of a “Feature” model element knowing the result will at least be that type.

7.2.3 OperationsAn interesting feature of EOL is operations (32), they allow methods to be defined for any type without being defined within the class or model elements definition. The code in figure 7.8 would make add2 a member of any integer value. This makes sense for a model manipulation language where you can’t directly add members to the model elements without changing the model.

operation Integer add2() : Integer { return self + 2;}

P a g e | 45

Figure 7.8, Example Operation

DLTGen was able to support this mechanism because it gives the specifier the flexibility to define members. For a type specifiers members task for example, it returns all the members of the type and all operations for the type. What DLTGen could be considered to be lacking is a mechanism of determining if type X extends type Y however this is not necessary to implement this functionality. Assume there was an operation for “Object” types and Integer extended Object. While requesting the members of Object as its super type that would find operations on Objects. However, other languages may not be able to work around this.

Figure 7.9, Operation Members

7.2.4 Model WritingEOL, unlike its inspiration OCL, not only allows reading of models but also writing of models. This is a powerful feature; it enables the generation of the model as it is worked with. This does however create a problem for DLTGen. The statements to generate a model element can be as flexible as the language as a whole. Model elements could be created in loops or other complicated constructs. It is beyond DLTGen to evaluate this code to determine what the model would be at a certain point of execution; this would require an interpreted solution.

7.2.5 EOL FindingsEOL has been good for demonstrating that foreign data can successfully be linked into the model. This is likely to be an important factor in supporting a number of DSLs which interact with an input.

However the IDE also showed a fundamental failing of DLTGen and its approach. The inability to resolve complicated expressions completely shut out the possibility of supporting advanced local inferences, such as figure 7.7. DLTGen will struggle where implicit typing is heavily reliant on evaluating/resolving complicated expression.

P a g e | 46

8 ConclusionChapter 8 summarizes how effective this project has been and summarizes what has been learned. This is done by revisiting the project’s aims and objectives in order to assess how well they have been met.

8.1 Review of AimsThe original aim of this project was to make creating tooling for dynamic languages easier. This was to be realized by being able to specify IDE features for dynamic languages in a higher level language. Below is a small dissection of the aims and goals of this project which will be used to look at how effective the solution is.

Generate IDEs to simplify the process of tooling creation. Specify basic and sophisticated IDE features. Support dynamic language features.

8.1.1 IDE GenerationDLTGen meets the objective of allowing IDE features to be specified in a higher level language. The reason was desirable was the overly complex nature of IDE development creating a high entry level for small languages. DLTGen has made this process much simpler, IDE features are completely specified using a handful of generic constructs. Currently DLTGen generates about 10 lines of code for every line of specification excluding comments. A lot of the complexity has also been wrapped into a framework which also makes creating hand writing additions simpler.

DLTGen can also act as method of bootstrapping an IDE by getting a developer part of the way there. All features in DLTGen can be tweaked and worked with while still incorporating new specification. DLTGen can be used to specify the more general features of a language and custom code can deal with the features which make the particular language unique.

8.1.2 IDE Feature SupportThe original proposal for this project listed a number of IDE features which could possibly be supported. After researching the area it became obvious that existing solutions could already generate a number of basic IDE features. As a result DLTGen focused on code completion, a much less well-supported area in generated IDE solutions. DLTGen does provide support for basic code completion tasks such as terminating syntax, however the majority of its efforts were put into more sophisticated features.

This project has demonstrated sophisticated, language aware, IDE features can be fully specified in a higher level system. DLTGen provides a range of features aimed at interpreting source code to create sophisticated code completion. From these generic constructs advanced features can be created with a fraction of the time investment. DLTGen also provides more specific sophisticated mechanisms such as duck and dot typing, features which by themselves can be fairly complex.

P a g e | 47

8.1.3 Dynamic Language Feature Support A key factor in this project’s success was is how well it supports dynamic language features. This document has explored a range of language features and how DLTGen can support them, including:

Implicit Typing Duck Typing Higher-Order functions Runtime Object Alteration Dynamic Type Systems Dot Typing Object Orientation

Some of these features are not just limited to dynamic languages, features such as dot typing and object orientation; however dynamic languages and static languages are not two completely different entities. It was necessary to put these mechanisms in place as dynamic languages, of course, take inspirations from statically typed languages.

This project has seen four different dynamic languages, this is enough to draw some generalisations about what DLTGen should and should not be used for. If the target dynamic language has the following features there is a good chance it will be fairly well supported in DLTGen.

Self describing – All language features/objects and constructs can be declared within the source language itself. DLTGen can make better use of its inter-linked approach for this type of language.

Name bound local inference – The language primarily determines data types of identifiers by observing other named constructs.

Single focus type system – DLTGen is better equipped to handle type systems which are consistently dynamic or completely static. DLTGen does not support well providing alternatives to the same problem.

Of course DLTGen has demonstrated it can handle many other features, but these represent a core set of language fundamentals which are well supported. Some of features which are unlikely to be supported well for any language are shown below.

Evaluated local inference – Any language where local inference requires the use of complicated expressions is unlikely to be dealt with correctly in DLTGen. For this type of language an interpreted approach is favourable.

Constructs defined outside the language – Languages which support features from external data without enabling those features to their own constructs tend to introduce processing which cannot be done internally. A specific example is supporting JRE

P a g e | 48

libraries when the same features used in those libraries are not supported in the source language. Java has a range of features that Scala does not and yet they can interoperate because of the common execution layer.

8.2 Future WorkDuring this project a number of decisions have been made and a number of alternative research areas have become apparent. Below are a number of the areas which DLTGen could benefit from being explored.

8.2.1 Specification Language FormatDLTGen provides a specification language in the form of XML, although this has worked up to now, it is not the cleanest solution. XML is fundamentally a data definition language, not a logic definition language. DLTGen could benefit from using or creating more specific language syntax. DLTGen could also benefit from taking some of its constructs and using a more standard representation of them. An example of where this would be useful is with the Find action. Providing an already well-established querying mechanism/syntax could prove more useful to users.

8.2.2 Eclipse FeaturesThis project has a range of features to derive information about source text but the only way to surface this information is with completion proposals. These features could be used to process data for other IDE features. For example, when hovering over a variable in Eclipses Java tooling a large tooltip window displays information known about this variable. All the data required to create such a feature could be gathered with existing constructs; all that is needed are the specification/Eclipse integration point.

DLTGen could also benefit from exploring how IDE features could be better used for dynamic languages. The example of surfacing uncertainty has previously been discussed in this report. When the system is not certain about the proposals it should show it could propose what it does know with an indication that not all proposals may be valid.

8.2.3 IDE InteroperabilityBy being a generated solution to creating IDEs, it could be beneficial to the language development community to be able to take one specification and generate IDE plugins for a range of IDEs. Some IDEs providing good integration points include Visual Studio, NetBeans and no doubt more. Doing this of course would involve a fairly large reworking of the existing system and may prove infeasible.

8.2.4 Dynamic Language ResearchThe time restrictions imposed on this project made it impossible to perform a full evaluation of a wide range of dynamic languages. DLTGen could most likely benefit from exploring more languages and attempting to derive a core set of features which support most cases. It may also be the case that this type of thinking is not valid for dynamic languages. There may not be enough commonalities between a suitably large collection of dynamic languages to suggest any feature over another. In this case DLTGen could benefit from being more

P a g e | 49

modular; a language could pick a certain type of type system or turn on functional language features etc.

8.2.5 Expanding the Specification LanguageThe specification language could clearly have a much bigger range of features, future work could look at what this project showed was missing or explore the issue in more detail. Some example features could include:

A more generic linking mechanism which supports going from type A to B rather than A to B on C.

A mechanism to include scopes when requires. E.g. to support import statements. Provide more language shortcuts, “literals” make defining the behaviour of a literal

much simpler and there are more very common tasks performed in specification. One of these is proxying another non-terminal’s behaviour; this could be defined using much less specification.

8.3 Lessons LearnedOverall, DLTGen has provided a range of mechanisms which have enabled very functional IDEs for dynamic languages to be created. Although it is not perfect it was never likely to be perfect. This project approached the problem knowing that dynamic languages are unpredictable. There is no way other than interpretation to provide a completely accurate representation of dynamic program state in an IDE, but DLTGen has shown it is possible to get a lot of the way and without writing any code.

What has also become very clear is how diverse dynamic languages can be. They represent a serious paradigm shift for every day development but they are also a great testing bed for new features. As new languages are created, so will new unique features justifying the languages creation. F# (33) for example introduces new ways to write code ready for parallelization; traditional constructs such as for loops have been replaced with parallel safe alternatives. It is impossible for any project such as DLTGen to be constantly playing catch-up with all the language exploration often channelled into dynamic languages.

This project has also shown that certain types of languages are better suited to this approach. Languages such as Scala which have a lot of infrastructure behind them (JRE Mechanisms) tend to be more difficult to support. Languages which could self-define all of their functionality are better for generated solutions because there is less magic behind the scenes.

Finally, it may also be too soon to try and group together and coerce dynamic languages. The re-birth of the dynamic language is still a fairly recent happening. Big companies such as Microsoft and Sun are starting to put real weight behind dynamic languages; we can no doubt expect a lot more innovation in the area in the near future.

I personally learnt a lot from this project and I have a much greater appreciation of the complexity behind my favourite IDE. I also found a great appreciation for dynamic languages. Staying in the static world for too long can convince anybody that language innovation is slow moving, but in the dynamic world there are a huge range of interesting

P a g e | 50

features. This project also gave me an opportunity to learn a lot more about modelling and code generation, there are some truly fascinating projects around such as EMF and Xpand which provide very elegant solutions to the area.

Finally, my personal opinion is that an interpreted solution is the best way to go for big languages such as Javascript, Ruby and Scala which can invest the time and money in implementing this properly. For languages which do not have as many resources DLTGen would definitely be a good solution to create a whole IDE or just get started.

P a g e | 51

Bibliography1. Scripting: Higher-Level Programming for the 21st Century. Ousterhout, John K. 1998, Computer, p. 30.

2. Wikipedia. Dynamic Language. Wikipedia. [Online] [Cited: March 2, 2010.] http://en.wikipedia.org/wiki/Dynamic_language.

3. Unveiling the Origins, Myths, Use and Benefits of Dynamic Languages. ActiveState. [Online] 2009. [Cited: January 6, 2010.] http://www.activestate.com/business_solutions/white_papers/dynamic_languages/ActiveState_Dynamic_Languages.pdf.

4. Interactive Python Scripting Example. Jim Hugunin's Thinking Dynamic. [Online] [Cited: February 21, 2010.] http://blogs.msdn.com/hugunin/archive/2004/08/23/219186.aspx.

5. Muziol, Lukasz. Teaching programming : Modern approaches using tools and dynamic languages. Google Code. [Online] [Cited: 26 2010, February.] http://gride.googlecode.com/files/lmuziol-teaching-summary.pdf.

6. Eclipse. Eclipse. [Online] [Cited: March 12, 2010.] http://www.eclipse.org.

7. Type System. Wikipedia. [Online] [Cited: March 2, 2010.] http://en.wikipedia.org/wiki/Type_system#Dynamic_typing.

8. Duck Typing. Wikipedia. [Online] [Cited: January 5, 2010.] http://en.wikipedia.org/wiki/Duck_typing.

9. comp.lang.python. Google Groups. [Online] July 26, 2000. [Cited: January 5, 2010.] http://groups.google.com/group/comp.lang.python/msg/e230ca916be58835?hl=en&pli=1.

10. Actionscript Language Reference. Macromedia. [Online] 2004. [Cited: March 1, 2010.] http://download.macromedia.com/pub/documentation/en/flash/mx2004/fl_actionscript_reference.pdf.

11. PHP Language Reference. PHP. [Online] http://php.net/manual/en/langref.php.

12. Ruby Standard Library. Ruby Documentatin. [Online] [Cited: March 1, 2010.] http://www.ruby-doc.org/stdlib/.

13. Yahoo TV Widgets. Yahoo Developer. [Online] [Cited: March` 13, 2010.] http://developer.yahoo.com/connectedtv/devguide/.

14. Meijer, Erik. Static Typing Where Possible, Dynamic Typing When Needed: The End of the Cold War Between Programming Languages. Microsoft Research. [Online] [Cited: March 1, 2010.] http://research.microsoft.com/en-us/um/people/emeijer/Papers/RDL04Meijer.pdf.

P a g e | 52

15. Dynamic Language Toolkit. Eclipse. [Online] Eclipse. [Cited: March 18, 2010.] http://www.eclipse.org/dltk/.

16. Building an IDE with DLTK. Eclipse. [Online] [Cited: March 1, 2010.] http://wiki.eclipse.org/A_guide_to_building_a_DLTK-based_language_IDE.

17. Xtext. Xtext. [Online] [Cited: March 16, 2010.] http://www.eclipse.org/Xtext/.

18. Xtext User Guide. Eclipse. [Online] [Cited: March 1, 2010.] http://www.eclipse.org/Xtext/documentation/0_7_2/xtext.html.

19. EMFText. EMFText. [Online] [Cited: March 17, 2010.] http://www.emftext.org.

20. YACC. Compiler Tools. [Online] [Cited: March 11, 2010.] http://dinosaur.compilertools.net/.

21. Dynamic Language Runtime. MSDN. [Online] http://msdn.microsoft.com/en-us/library/dd233052(VS.100).aspx.

22. Attribute Grammar. Indopedia. [Online] [Cited: March 14, 2010.] http://www.indopedia.org/Attribute_grammar.html.

23. ECMA-262. ECMA International. [Online] http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf.

24. Ruby Language. Ruby Language. [Online] [Cited: March 12, 2010.] http://www.ruby-lang.org/en/.

25. Xtext Distribution. Xtext. [Online] [Cited: March 10, 2010.] http://xtext.itemis.com/xtext/language=en/23947/downloads.

26. Xtext Documentation. Xtext Documentation. [Online] [Cited: March 10, 2010.] http://www.eclipse.org/Xtext/documentation/.

27. Scala Language. Scala Language. [Online] [Cited: March 10, 2010.] http://www.scala-lang.org/.

28. A Tour of Scala: Inner Classes. Scala Language. [Online] [Cited: March 9, 2010.] http://www.scala-lang.org/node/115.

29. A Tour of Scala: Higher-Order Functions. Scala Language. [Online] [Cited: March 10, 2010.] http://www.scala-lang.org/node/134.

30. A Tour of Scala: Polymorphic Methods. Scala Language. [Online] [Cited: March 10, 2010.] http://www.scala-lang.org/node/121.

31. Epsilon. Eclipse. [Online] [Cited: March 13, 2010.] http://www.eclipse.org/gmt/epsilon/.

32. The Epsilon Book. http://www.eclipse.org/gmt/epsilon/doc/book/. [Online] [Cited: March 10, 2010.] http://www.eclipse.org/gmt/epsilon/doc/book/.

P a g e | 53

33. F Sharp. Microsoft Research. [Online] [Cited: March 10, 2010.] http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/.

P a g e | 54

Appendix A

Visualization of the types of data stored in the Model. Shows non-terminals loaded into the different model data sources. The non-terminals within blue boxes represent the children of the non-terminal.

P a g e | 55

Appendix B

Grammar non-terminal definitions for Javascript generics.

JS_GenericValue: '.<' (type=IDENTIFIER) '>';

JS_Instanciator: 'new' (classID=IDENTIFIER) (genericParam=JS_GenericValue)? '('...

JS_GenericDef: '<' (name=IDENTIFIER) '>';

JS_Class: 'class' (name=IDENTIFIER) (genericParam=JS_GenericDef)?

JS_ReturnType: ':' type=IDENTIFIER;

JS_Function: 'function' (name=IDENTIFIER) '(' (args+=ArgName)* ')' (returnType=JS_ReturnType)? '{' (statements+=JS_Statement)* '}';

P a g e | 56

Appendix C

Sections of the specification language.

<?xml version="1.0" encoding="UTF-8"?><ide> <typeSystem>  </typeSystem> <literals>  </literals> <staticModel>  </staticModel> <classes>  </classes> <contentAssist>  </contentAssist> <resources>  </resources></ide>

P a g e | 57

Appendix D

Specification for supporting runtime object alteration in Javascript.

<members>  <annotate>result.members += self.expression.members</annotate>  <find var="additions" in="model" quantity="*"> <condition>obj.owner == self</condition> </find> <annotate>result.members += additions</annotate></members>

JS_Variable Members Task

<annotations> <find var="obj" in="scope" quantity="1"> <condition>obj._name == self.objectName</condition> </find> <annotate>self.owner = obj</annotate></annotations>

JS_DotAssignment ObjectAlteration

JS_DotAssignment: (objectName=IDENTIFIER)'.'(propertyName=IDENTIFIER) '=' (value=JS_Expression);

JS_DotAssignment Grammar

P a g e | 58

Appendix E

To follow this page is the complete XML DLTGen language specification for ruby.

P a g e | 59

Appendix F

To follow this page is the complete Xtext grammar for Ruby.

P a g e | 60

Appendix G

To follow this page is the complete set of tests undertaken while testing DLTGen.

P a g e | 61

Appendix H

Screenshots of Scala IDE in operation

Dot typing considering scoping of the if statement

P a g e | 62

Higher order functions

Scope resolution and inner classes shown as being owned by Graph

P a g e | 63

Duck Typing

P a g e | 64

Appendix I

Screenshots of Ruby IDE in operation

Scope suggestions

Dot typing

P a g e | 65

External Iterator Resolution

Duck Typing

P a g e | 66

Appendix J

The following pages are the original project proposal document.

P a g e | 67

Date post:	17-May-2015
Category:	Documents
Upload:	newbu
View:	2,803 times
Download:	0 times

Report (docx) - Home page - Lancaster University

Documents