+ All Categories
Home > Documents > Lect Slides

Lect Slides

Date post: 06-Jan-2016
Category:
Upload: vikasdalal
View: 240 times
Download: 0 times
Share this document with a friend
Description:
eww
Popular Tags:

of 33

Transcript
  • Introduction to CompilersIntroductiontoCompilers

  • Writing Cross CompilersWritingCrossCompilers

    Mac C compiler Unix C Mac C complierMac C compilersource code

    in Unix C

    Unix Ccompiler

    Mac C complierusable on Unix

    Mac C complierusable on Unix

    Mac C compilersource code

    in Unix C

    Mac C complierusable on Mac

    in Unix C

  • Writing Retargetable CompilersWritingRetargetableCompilers

    Twomethods: Make a strict distinction between frontendMakeastrictdistinctionbetweenfront endandbackend,thenusedifferentbackends.

    Generatecodeforavirtualmachine,thenbuild,acompilerorinterpretertotranslatevirtualmachinecodetoaspecificmachinecode.

  • BootstrappingBootstrapping Processofwritinga compiler (or assembler)ing p ( )thetarget programminglanguage whichitisintendedtocompile.

    Applying this technique leads to a self Applyingthistechniqueleadstoaselfhosting compiler.

    Many compilers for many programmingManycompilersformanyprogramminglanguagesarebootstrapped,includingcompilersfor BASIC, ALGOL, C, Pascal, PL/I, Factor, Haskell,Modula 2 Oberon OCaml CommonModula2, Oberon, OCaml, CommonLisp, Scheme,Java, Python, Scala, Nimrod, Eiffel,andmore.

  • Formal LanguagesFormalLanguages

    Already studiedAlreadystudied

  • Roles of ScannerRolesofScanner

    Removal of commentsRemovalofcomments Caseconversion Removal of white spaces Removalofwhitespaces

    Blanks,tabulars,carriagereturnsandlinefeeds Interpretation of compiler directives Interpretationofcompilerdirectives

    #include, #ifdef, #ifndef and#define are directives to redirect the input of#define aredirectivesto redirecttheinput ofthecompiler

    Maybedonebyaprecompiler

  • Token: An element of the lexical definition ofToken:Anelementofthelexicaldefinitionofthelanguage.

    Lexeme: A sequence of characters identified Lexeme:Asequenceofcharactersidentifiedasatoken.P S f i i d ib d b l Pattern :Setofstringsisdescribedbyarulecalledpatternassociatedwithatoken.

  • Regular Languages and Regular ExpressionRegularLanguagesandRegularExpression

    Studied in Theory of computationStudiedinTheoryofcomputation

  • Possible ImplementationsPossibleImplementations

    LexicalAnalyzerGenerator(e.g.Lex)y ( g )+ safe,quick Mustlearnsoftware,unabletohandleunusualsituations

    TableDrivenLexicalAnalyzer+ generalandadaptablemethod,samefunctioncanbeusedfor all tabledriven lexical analyzersforalltable drivenlexicalanalyzers

    Buildingtransitiontablecanbetediousanderrorprone

  • Possible ImplementationsPossibleImplementations

    HandwrittenHand written+ Canbeoptimized,canhandleanyunusualsituation easy to build for most languagessituation,easytobuildformostlanguages

    Errorprone,notadaptableormaintainable

  • Design of a Lexical AnalyzerDesignofaLexicalAnalyzer

    St Steps1- Construct a set of regular expressions (REs)

    that define the form of all valid tokenf h2- Derive an NDFA from the REs

    3- Derive a DFA from the NDFA4- Translate to a state transition table5- Implement the table5 Implement the table6- Implement the algorithm to interpret the table

  • SpecificationoftokensSpecification of tokensRegularexpressionsareimportantnotationforspecifying patternsspecifyingpatterns.

    RulestodefineRegularexpressions

    Limitations of regular expressionsLimitationsofregularexpressions

    Notdescribebalancedornestedconstructs.RepeatingstringscannotbedescribedEg{wcw|wisstringofasandbs}

  • Regular ExpressionsRegularExpressions

    { } : { }s : {s | s in s^}a : {a}a : {a}r | s : {r | r in r^} or {s | s in s^}s* : {sn | s in s^ and n>=0}s+ : {sn | s in s^ and n> 1}

    id -> letter(letter|digit)*

    s+ : {sn | s in s and n>=1}

    Num->digit+(.digit+)? (E(+|-)?digit+)?

  • Recognition of tokensRecognitionoftokensTransitiondiagrams:

    Asanintermediatestepinconstructionoflexicalanalyzer,weproduceastylizedflowchart,calledatransitiondiagram.

    start letter

    Letterordigit

    other ( k () ll d())start

    9 10 11other Return(gettoken(),install_id())

    Transitiondiagramforidentifiersandkeywords

  • Implementingatransitiondiagramp g gAsequenceoftransitiondiagramscanbeconvertedintoaprogramtolookforthetokensspecifiedbythediagrams.Programsizeisproportionaltothenoof

    & d i h distates&edgesinthediagrams.

    digit

    25 26 27

    start digit

    g

    other

    Transitiondiagramfornumbers

    C code for Lexical Analyzer is :CcodeforLexicalAnalyzeris:

  • token nexttoken()token nexttoken() {while(1){

    switch (state) { case 0: c = nextchar(); /* c is lookahead character */ if ( bl k t b li ) {if (c==blank :: c==tab :: c==newline) { state = 0; lexerne beginning++; _ g g/* advance beginning of lexerne */ }

    else if (c == '') state = 6;else if (c == > ) state = 6;

  • else state = fail(); ()break; /* cases 1-8 here */ case9:c=nextchar ();

    if (isletter(c)) state = 10; else state = fail();else state = fail(); break;

    case 10: c = nextchar(); if (isletter(c)) state = 10; else if (isdigit(c)) state = 10; else state = 11;break;

  • case 11: retract(1); install id();case 11: retract(1); install_id(); return ( gettoken() ); .../* cases 12-24 here */ case25:c=nextchar ();

    if(isdigi t(c))state=26;

    else state = fail(); break;

    case 26: c = nextchar();case 26: c = nextchar(); if (isdigit(c)) state = 26;else state = 27; break;

    case 27: retract(1); install_nurn(); return ( NUM ); }}}

  • Gettoken()Looksforlexemeinsymboltable.Iflexemeiskeyword,correspondingtokenisreturned;otherwisetokenidisreturned.

    Install id()Install_id()Hasaccesstobuffer,wheretheidentifierlexemeislocated.

    Sym table is examined & if lexeme is found marked as keyword,it returns 0.Symtableisexamined&iflexemeisfoundmarkedaskeyword,itreturns0.

    Lexemeisfound&isprogramvariable,returnspointertosymtableentry

    Ifnotfoundinsymtable,itisinstalledasavariable&pointertonewlycreatedt i t dentryisreturned.

    Install_num()

  • Derive NDFA from REsDeriveNDFAfromREs

    CouldderiveDFAfromREsbut: MucheasiertodoNDFA,thenderiveDFA No standard way of deriving DFAs from ResNostandardwayofderivingDFAsfromRes UseThompsonsconstruction(Loudens)

    letter

    letter

    digit

    letter

  • Derive DFA from NDFADeriveDFAfromNDFA Use subset construction (Loudens)Usesubsetconstruction(Louden s) Maybeoptimized

    i i l Easiertoimplement: No edges Determinist(nobacktracking)

    l

    letter

    [ h ]

    letter

    letter [other]letter

    l

    e

    t

    t

    e

    r

    digit

    digitdigit

  • Implementation ConcernsImplementationConcerns

    BacktrackingBacktracking Principle :Atokenisnormallyrecognizedonlywhenthenextcharacterisread.

    Problem :Maybethischaracterispartofthenexttoken. Example :x

  • Implementation ConcernsImplementationConcerns

    AmbiguityAmbiguity Problem :Sometokenslexemesaresubsetsofothertokens.

    Example : n-1. Isitor?l i Solutions :

    Postponethedecisiontothesyntacticanalyzer Donotallowsignprefixtonumbersinthelexicalspecificationg p p Interactwiththesyntacticanalyzertofindasolution.(Inducescoupling)

  • ExampleExample

    Alphabet:p {:,*,=,(,),,{,},[a..z],[0..9]}

    Simpletokens: {(,),{,},:,}

    Compositetokens:{ (* *)} {:=,>=,

  • ExampleExample

    Ambiguity problems: Ambiguityproblems:Character Possible tokens

    : :, :=: :, :> >, >=<


Recommended