Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | derick-simmons |
View: | 216 times |
Download: | 0 times |
Dr Jekyll and Mr C
Rob EnnalsIntel Research Cambridge
13/3/06 Dr Jekyll and Mr C (SRG Talk)2
C is holding us back
Much important software is currently written in C– Even if not the most lines of code, probably most of the cycles
Security problems
Hard to analyse
Hard to debug
Unreliable
Unsafe
Hard to understand
Hard to write
Hard to parallelise
Unexpressive
connected
13/3/06 Dr Jekyll and Mr C (SRG Talk)3
Functional Languages are Great!
Safety
Generic Types
Lambda Expressions
Controlled Effects
Type Classes
Easier to write
Easier to understand
More reliable
More secure
Easier to parellelize
Features Benefits
So why does nobody use them?
13/3/06 Dr Jekyll and Mr C (SRG Talk)4
A Problem: Language Switching Costs
Much important software is currently written in C
Moving to a new language incurs high switching costs– Programmers, tools, libraries, and existing code, all tied to C
C
Programmers
Trust
Libraries
Existing Code Tools
13/3/06 Dr Jekyll and Mr C (SRG Talk)5
A Solution: Lossless Round Tripping
Jekyll is a high level functional programming language– Featuring most of the features of Haskell + more
Jekyll can be translated losslessly to and from C– Preserving layout, formatting, comments, everything– C code is readable and editable
C File
C File
Jekyll File
Jekyll File
C Programmer
Jekyll Programmer
13/3/06 Dr Jekyll and Mr C (SRG Talk)6
Switching Costs are Reduced
Programmers and Tools can still use the C version.
Existing C code can stay in C– Although there may be benefit to be had from modifying it
If Jekyll ceases to be maintained, just use the C
Jekyll
C Programmers
C Trust
C Libraries
Existing C Code C Tools
13/3/06 Dr Jekyll and Mr C (SRG Talk)7
Jekyll is Transparent
C Programmers can edit programs without knowing about Jekyll.
This requires that:– C programmers can understand C produced by the Jekyll Translator– The Jekyll translator can understand edits made by C programmers
C FileC Programmer Jekyll Translator
Jekyll is very tolerant of edits to C code. This is essential.
13/3/06 Dr Jekyll and Mr C (SRG Talk)8
We assume that C programmers do
NOT KNOW ANYTHING about Jekyll
But they still need to be able to edit Jekyll-encoded C files
13/3/06 Dr Jekyll and Mr C (SRG Talk)9
Jekyll-Encoded C files are Unannotated
No funny macros, No weird comments, No restrictive naming rules
Just good, readable, editable, C
All extra info is simply thrown away– retrieved from the previous Jekyll version when converted back
struct<%a> Node{ %a *element; List<%a> *tail; };
struct Node{ void *element; List *tail;}
Jekyll Encoded C
13/3/06 Dr Jekyll and Mr C (SRG Talk)10
Reconstruction based on previous version
There are many ways to decode a C file as Jekyll– Extra type info, different features being encoded, etc etc
We chose the encoding that matches the previous version– Aiming to minimise the textual difference from the previous Jekyll file
This allows Jekyll to correctly decode unannotated C
New C File
Old Jekyll File
New Jekyll File
13/3/06 Dr Jekyll and Mr C (SRG Talk)11
Encoding based on the previous version
There are many ways to encode a Jekyll feature as C– Temporary names, whitespace, different encodings, etc
We chose the encoding that matches the previous C version– Aiming to minimise the textual difference from the previous file
This allows Jekyll to avoid modifying hand-edited C
New Jekyll File
Old C File
New C File
13/3/06 Dr Jekyll and Mr C (SRG Talk)12
Jekyll is another view of C
Authoritative source code can stay as C
But programmers and tools can also view it as Jekyll
C Programmers need not know Jekyll is even being used.
C Repository
C File
C File
Jekyll File
C File
Jekyll Programmer
C Programmer
Jekyll Repository
Jekyll File
Jekyll File
13/3/06 Dr Jekyll and Mr C (SRG Talk)13
Jekyll Features
Use of unsafe features causes a warning unless marked as “unsafe”
All of C
Unsafe FeaturesImperative FeaturesLow-Level Features
C TypesC ExpressionsPre-processor
Most of O'Caml +Haskell
Algebraic TypesType Classes
Lambda ExpressionsPattern Matching
Generic TypesType SafetyOptional GCNOT LAZY!
Jekyll
13/3/06 Dr Jekyll and Mr C (SRG Talk)14
What is Jekyll
Jekyll & its C Encoding
Lossless Translation
Demo
13/3/06 Dr Jekyll and Mr C (SRG Talk)15
Superset of C
All C programs are valid Jekyll programs, unless:– They use extensions that Jekyll does not understand– They use the pre-processor in a way that Jekyll does not understand
In future: Support everything GCC can compile
C
Jekyll
13/3/06 Dr Jekyll and Mr C (SRG Talk)16
Haskell
A mix of Haskell, O'Caml, and Cyclone
Jekyll contains no original language features– All features are present in either Haskell, O'Caml or Cyclone– Features are usually implemented in the same way too– Although the combination can be interesting…
We will focus on the encoding, rather than the language itself
Cyclone
O'Caml
13/3/06 Dr Jekyll and Mr C (SRG Talk)17
Generic Types
All extra type info is thrown away– type parameters– type variables– type constraints
The Jekyll translator restores them from the previous Jekyll file
struct<%a> Node{ %a *element; List<%a> *tail; };
struct Node{ void *element; List *tail;}
Jekyll C
13/3/06 Dr Jekyll and Mr C (SRG Talk)18
Tagged Unions
No annotations here either– Jekyll will attempt to decode any struct that has _tag and _body fields
tagged<%a> List{ Node<%a> NODE; void EMPTY; };
switch(*l){ case EMPTY: return 0; case NODE n: return len(n);};
struct List{ enum {NODE,EMPTY} _tag; union { Node NODE; void EMPTY; } _body; };
switch(l->_tag){ case EMPTY: return 0; case NODE: return len(l->_body.NODE);};
Jekyll C
13/3/06 Dr Jekyll and Mr C (SRG Talk)19
Unsafe Unions
All unsafe C operations are allowed– Pointer arithmetic– Unchecked array bounds– Unsafe casts, etc etc
Must be marked with the "unsafe" keyword to avoid a warning
unsafe *p++ = *q++; *p++ = *q++;
Jekyll C
13/3/06 Dr Jekyll and Mr C (SRG Talk)20
Lambda Expressions
Programmers are free to change all generated names– The fe and ft prefixes are the defaults, but are not required– They are just used to reduce incidence of name clashes
int plusthree(int z){return foo(3, x : x + z;);}
struct fe_env{ int *z;};int ff_lam(struct fe_env *_env, int x){return x+*(_env->z);}
int plusthree(int z){ struct fe_env ft0 = {&z}; return foo(3,(void*)&ff_lam,&ft0);}
13/3/06 Dr Jekyll and Mr C (SRG Talk)21
Type Classes (Haskell-Style) (1/2)
Jekyll implements the full Haskell98 type class system
Any struct that contains only functions can be decoded as a type class
Type-classes are a good match for C code– They don't change the in-memory representation (unlike vtables)– One can add methods to existing types (unlike vtables)
interface Print %a{ void print(%a *x);};
struct Print { void (*print)(void* _env, _va *x);};
13/3/06 Dr Jekyll and Mr C (SRG Talk)22
Type Classes (Haskell-Style) (2/2)
Defining a new type class instance creates a new dictionary struct.
implement Print int { void print(int *x){print_int(*x);};};
implement(Print int);
void int_print(void* _env,int *x){print_int(*x);};
struct Print Print_int = {(void*)&int_print};
13/3/06 Dr Jekyll and Mr C (SRG Talk)23
Initialiser Expressions
Safe, easy, creation of values.
One can of course rename all temporaries.
return new Node{h,t}
List *tmp;tmp = (List*) jkl_GC_malloc(sizeof(List);tmp->_tag = Node;tmp->_body.Node.head = h;tmp->_body.Node.tail = t;return tmp;
13/3/06 Dr Jekyll and Mr C (SRG Talk)24
Other Features
Fat pointers – allow safe pointer arithmetic (like Cyclone)
Macrotype – tell Jekyll how to interpret foreign macros (like Astec)
13/3/06 Dr Jekyll and Mr C (SRG Talk)25
What is Jekyll
Jekyll & its C Encoding
Lossless Translation
Demo
13/3/06 Dr Jekyll and Mr C (SRG Talk)26
Simplified C->Jekyll Translation
Ignoring parsing, transforms, analysis, typchecking, etc etc
Jekyll FileC FileNon-det
Jekyll File
Previous Jekyll File
Select Closest
Decode Output
13/3/06 Dr Jekyll and Mr C (SRG Talk)27
Simplified Jekyll->C Translation
C FileJekyll FileNon-det
C File
Previous C File
Select Closest
Encode Output
Ignoring parsing, transforms, analysis, typchecking, etc etc
13/3/06 Dr Jekyll and Mr C (SRG Talk)28
Expanded Jekyll->C Translation
Non-detCombined AST
CTokens
JekyllTokens
Parse Pretty Print
EncodeJekyll AST
Possible CTokens
Previous CTokens
Select Closest
Whiteflow/Check Output
Analysis
Possible JklTokens
Guesses
13/3/06 Dr Jekyll and Mr C (SRG Talk)29
Encode/Decode: Non-deterministic
Produce a non-deterministic AST describing all possibilities– Encode: Produce C that could implement a Jekyll feature– Decode: Look for C code that might implement a Jekyll feature
Decode is very aggressive – will even accept invalid encodings– If it seems that that might have been what was intended– User can be warned about these at check time
Non-detC AST
Encode Jekyll AST
Non-detJekyll AST
DecodeC AST
13/3/06 Dr Jekyll and Mr C (SRG Talk)30
Check: Ensure input was well formed
Decode stage will accept illegal encodings– By design: Makes converting mangled C easier
Check that our output be translated back to our input?– If not, then warn the user to look at the diffs
CTokens
PossibleTokens
Check
13/3/06 Dr Jekyll and Mr C (SRG Talk)31
Degrees of Conformity
Cannot Translate
Translatesbut check
fails
Translatesand check
passes
Translatesand is
canonical
Encoding stays as C
Best match is a decoded featureBut encoding was invalid
Generate a file, but warn
All is good
13/3/06 Dr Jekyll and Mr C (SRG Talk)32
Select Closest: Resolve Non-Determinism
Chose encoding so as to minimise the textual differencefrom the previous file
If AST did not change, new file will bebit-for-bit identical to old file
Now: Line-by-line comparison– Minimises differences as seen by "diff"
Future: Burrows-Wheeler longest common substring
Non-detFile
PreviousFile
Select Closest
13/3/06 Dr Jekyll and Mr C (SRG Talk)33
Twinned Token Printing
Carry whitespace and comments between Jekyll and C– Otherwise language comments would be entirely disconnected
Whitespace can come from input file or previous file– Twinned token: Whitespace from input token that matches the twin– Untwinned token: Whitespace from previous file version
Printed C
Printed Jekyll
TwinsJekyllAST
Input Jekyll
Previous C
13/3/06 Dr Jekyll and Mr C (SRG Talk)34
What is Jekyll
Jekyll & its C Encoding
Lossless Translation
Demo
13/3/06 Dr Jekyll and Mr C (SRG Talk)35
Demo
13/3/06 Dr Jekyll and Mr C (SRG Talk)36
Conclusions
• Jekyll is a powerful functional programming language
• Lossless translation makes it practical to migrate C code
• Non-Deterministic encoding makes it tolerant of C edits
Download Jekyll now:
http://jekyllc.sf.net
13/3/06 Dr Jekyll and Mr C (SRG Talk)37