+ All Categories
Home > Documents > CSE309-2

CSE309-2

Date post: 10-Apr-2018
Category:
Upload: shyamd4
View: 216 times
Download: 0 times
Share this document with a friend

of 62

Transcript
  • 8/8/2019 CSE309-2

    1/62

    Chapter 2

    CSE309N

    Chapter 2Chapter 2A Simple OneA Simple One Pass CompilerPass Compiler

    Dewan Tanvir Ahmed

    Computer Science & EngineeringBangladesh University of Engineering and Technology

  • 8/8/2019 CSE309-2

    2/62

    Chapter 2

    CSE309N

    The Entire Compilation Process

    Grammars for Syntax DefinitionGrammars for Syntax Definition SyntaxSyntax--Directed TranslationDirected Translation

    ParsingParsing -- Top Down & PredictiveTop Down & Predictive

    Pulling Together the PiecesPulling Together the Pieces The Lexical Analysis ProcessThe Lexical Analysis Process

    Symbol Table ConsiderationsSymbol Table Considerations

    A Brief Look at Code GenerationA Brief Look at Code Generation Concluding Remarks/Looking AheadConcluding Remarks/Looking Ahead

  • 8/8/2019 CSE309-2

    3/62

    Chapter 2

    CSE309N

    OverviewOverview

    Programming Language can be defined by describing

    1. The syntax of the language

    1. What its program looks like

    2. We use CFG or BNF (Backus Naur Form)

    2. The semantics of the language

    1. What its program mean

    2. Difficult to describe

    3. Use informal descriptions and suggestive examples

  • 8/8/2019 CSE309-2

    4/62

    Chapter 2

    CSE309N

    Grammars for Syntax DefinitionGrammars for Syntax Definition

    AA ContextContext--freeGrammarfreeGrammar ((CFGCFG) Is Utilized to) Is Utilized toDescribe the Syntactic Structure of a LanguageDescribe the Syntactic Structure of a Language

    A CFG Is Characterized By:A CFG Is Characterized By:

    1. A Set ofTokens orTerminal Symbols

    2. A Set ofNon-terminals

    3. A Set ofProduction RulesEach Rule Has the Form

    NTp {T, NT}*

    4. A Non-terminal Designated As

    the Start Symbol

  • 8/8/2019 CSE309-2

    5/62

    Chapter 2

    CSE309N Grammars for Syntax DefinitionGrammars for Syntax DefinitionExample CFGExample CFG

    listp list + digit

    listp list - digit

    listp digit

    digitp 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

    (the | means OR)(So we could have written

    listp list + digit | list - digit | digit )

  • 8/8/2019 CSE309-2

    6/62

    Chapter 2

    CSE309N

    InformationInformation

    A string of tokens is a sequence of zero or more tokens.

    The string containing with zero tokens, written as , iscalled empty string.

    A grammar derives strings by beginning with the start

    symbol and repeatedly replacing the non terminal by the

    right side of a production for that non terminal.

    The token strings that can be derived from the start symbol

    form the language defined by the grammar.

  • 8/8/2019 CSE309-2

    7/62

    Chapter 2

    CSE309N

    Grammars are Used to Derive Strings:

    Using the CFG defined on the earlier slide, we can

    derive the string: 9 - 5 + 2 as follows:

    listp list+ digit

    p list- digit+ digit

    p digit- digit+ digit

    p 9 - digit+ digit

    p 9 - 5 + digit

    p 9 - 5 + 2

    P1 : listp list + digit

    P2 : listp list - digit

    P3 : listp digit

    P4 : digitp 9

    P4 : digitp 5

    P4 : digitp 2

  • 8/8/2019 CSE309-2

    8/62

    Chapter 2

    CSE309N

    Grammars are Used to Derive Strings:

    This derivation could also be represented via a Parse Tree

    (parents on left, children on right)

    list

    digit

    digit

    list

    digit

    list

    9

    5

    2-

    +

    listp list+ digit

    p list- digit+ digit

    p digit- digit+ digit

    p 9 - digit+ digit

    p 9 - 5 + digit

    p 9 - 5 + 2

  • 8/8/2019 CSE309-2

    9/62

    Chapter 2

    CSE309N

    A More ComplexGrammar

    What is this grammar for ?

    What does represent ?What kind of production rule is this ?

    blockp begin opt_stmts end

    opt_stmtsp stmt_list|

    stmt_listp stmt_list ; stmt | stmt

  • 8/8/2019 CSE309-2

    10/62

    Chapter 2

    CSE309N

    Defining a Parse Tree

    A parse tree pictorially shows how the start symbol of agrammar derives a string in the language.

    More Formally, a Parse Tree for a CFG Has the Following

    Properties:

    Root Is Labeled With the Start Symbol

    Leaf Node Is a Token or

    Interior Node Is aNon-Terminal

    If Ap x1x2xn, Then A Is an Interior; x1x2xn Are

    Children of A and May BeNon-Terminals orTokens

  • 8/8/2019 CSE309-2

    11/62

    Chapter 2

    CSE309N Other Important ConceptsAmbiguity

    string string

    string string

    string

    +

    2-

    59

    Why is this a Problem ?

    Grammar:

    stringp string + string | string string| 0 | 1 | | 9

    Two derivations (Parse Trees) for the same token string.

    stringstringstringstring

    string

    -

    9 +

    5 2

  • 8/8/2019 CSE309-2

    12/62

    Chapter 2

    CSE309N Other Important ConceptsAssociativity of Operators

    Left vs. Right

    right

    letter

    letter

    right

    letter

    right

    c

    b

    a =

    =

    rightp letter = right | letter

    letterp a | b | c | | z

    list

    digit

    digit

    list

    digit

    list

    9

    5

    2-

    +

    listp list + digit |

    | list - digit | digit

    digitp 0 | 1 | 2 | | 9

  • 8/8/2019 CSE309-2

    13/62

    Chapter 2

    CSE309N

    Embedding AssociativityEmbedding Associativity

    The language of arithmetic expressions with +The language of arithmetic expressions with + -- (ambiguous) grammar that does not enforce

    associativitystringp string + string | string string| 0 | 1 | | 9

    non-ambiguous grammar enforcing leftassociativity (parse tree will grow to the left)

    stringp string + digit | string - digit | digit

    digitp 0 | 1 | 2 | | 9

    non-ambiguous grammar enforcing rightassociativity (parse tree will grow to the right)

    stringp digit + string | digit - string | digit

    digitp 0 | 1 | 2 | | 9

  • 8/8/2019 CSE309-2

    14/62

    Chapter 2

    CSE309N Other Important ConceptsOperator Precedence

    What does

    9 + 5 * 2

    mean?

    Typically

    ( )

    * /+ -

    is precedence

    order

    This can be

    incorporated

    into a grammar

    via rules:

    exprp expr + term | expr term | term

    termp term * factor | term / factor | factor

    factorp digit | ( expr )

    digitp 0 | 1 | 2 | 3 | | 9

    Precedence Achieved by:

    expr&term for each precedence level

    Rules for each are left recursive or associate to the left

  • 8/8/2019 CSE309-2

    15/62

    Chapter 2

    CSE309N

    Syntax for StatementsSyntax for Statements

    stmtp id := expr

    | ifexprthen stmt

    | ifexpr then stmtelse stmt

    | while expr do stmt

    | begin opt_stmts end

    Ambiguous Grammar?

  • 8/8/2019 CSE309-2

    16/62

    Chapter 2

    CSE309N

    Syntax-DirectedTranslation

    Associate Attributes With Grammar Rules and Translate as Parsingoccurs

    The translation will follow the parse tree structure (and as a result thestructure and form of the parse tree will affect the translation).

    First example: Inductive Translation.

    InfixInfix toto PostfixPostfixNotation Translation for ExpressionsNotation Translation for Expressions Translation defined inductively as: Postfix(E) where E is an

    Expression.

    1. IfE is a variable orconstant then Postfix(E) = E2. IfE is E1 op E2 then Postfix(E)

    = Postfix(E1 op E2) = Postfix(E1) Postfix(E2) op

    3. IfE is (E1) then Postfix(E) = Postfix(E1)

    Rules

  • 8/8/2019 CSE309-2

    17/62

    Chapter 2

    CSE309N

    ExamplesExamples

    Postfix( ( 9 5 ) + 2 )

    = Postfix( ( 9 5 ) ) Postfix( 2 ) +

    = Postfix( 9 5 ) Postfix( 2 ) +

    = Postfix( 9 ) Postfix( 5 ) - Postfix( 2 ) +

    = 9 5 2 +

    Postfix(9 ( 5 + 2 ) )

    = Postfix( 9 ) Postfix( ( 5 + 2 ) ) -

    = Postfix( 9 ) Postfix( 5 + 2 )

    = Postfix( 9 ) Postfix( 5 ) Postfix( 2 ) +

    = 9 5 2 +

  • 8/8/2019 CSE309-2

    18/62

    Chapter 2

    CSE309N

    Syntax-Directed Definition

    Each Production Has a Set ofSemantic Rules Each Grammar Symbol Has a Set ofAttributes

    For the Following Example, String Attribute t is

    Associated With Each Grammar Symbol

    recall: What is a Derivation for 9 + 5 - 2?

    exprp expr term | expr + term | term

    termp 0 | 1 | 2 | 3 | | 9

    listp list- digit p list+ digit- digit p digit+ digit- digitp 9 + digit- digit p 9 + 5 - digitp 9 + 5 - 2

  • 8/8/2019 CSE309-2

    19/62

    Chapter 2

    CSE309N

    Syntax-Directed Definition (2))

    Each Production Rule of the CFG Has a SemanticEach Production Rule of the CFG Has a SemanticRuleRule

    NoteNote: Semantic Rules for: Semantic Rules forexprexprdefinedefine ttas aas asynthesized attribute i.e., the various copies ofsynthesized attribute i.e., the various copies ofttobtain their values from childrenobtain their values from children ttss

    Production Semantic Rule

    exprp expr + term expr.t := expr.t || term.t || +

    exprp expr term expr.t := expr.t || term.t || -exprp term expr.t := term.ttermp 0 term.t := 0

    termp 1 term.t := 1. .

    termp 9 term.t := 9

  • 8/8/2019 CSE309-2

    20/62

    Chapter 2

    CSE309N

    Semantic Rules are Embedded in Parse Tree

    expr.t =95-

    expr.t =9

    expr.t =95-2+

    term.t =5

    term.t =2

    term.t =9

    2+5-9

    It starts at the root and recursively visits the children ofeach node in left-to-right order

    The semantic rules at a given node are evaluated once allThe semantic rules at a given node are evaluated once alldescendants of that node have been visited.descendants of that node have been visited.

    A parse tree showing all the attribute values at each nodeA parse tree showing all the attribute values at each node

    is called annotated parse tree.is called annotated parse tree.

  • 8/8/2019 CSE309-2

    21/62

    Chapter 2

    CSE309N Translation SchemesEmbedded Semantic Actions into the right sides of

    the productions.

    exprp expr + term {print(+)}

    p expr - term {print(-)}

    p term

    termp 0 {print(0)}

    termp 1 {print(1)}

    termp 9 {print(9)}

    term

    term

    termexpr

    expr

    expr

    9

    5

    2-

    +

    {print(-)}

    {print(9)}

    {print(5)}

    {print(2)}

    {print(+)}

    A translation scheme islike a syntax-directed

    definition except the

    order of evaluation of

    the semantic rules is

    explicitly shown.

  • 8/8/2019 CSE309-2

    22/62

    Chapter 2

    CSE309N

    ParsingParsing

    Parsing is the process of determining if astring of tokens can be generated by a grammar.

    Parser must be capable of constructing the

    tree.

    Two types of parser

    Top-down:

    starts at root

    proceeds towards leaves

    Bottom-up:

    starts at leaves

    proceeds towards root

  • 8/8/2019 CSE309-2

    23/62

    Chapter 2

    CSE309N

    Parsing Top-Down& Predictive

    TopTop--Down ParsingDown Parsing Parse tree / derivation of aParse tree / derivation of atoken string occurs in atoken string occurs in atop down fashion.top down fashion.

    For Example, Consider:For Example, Consider:

    typep simple

    | o id

    | array [ simple ]of type

    simplep integer

    | char

    | num dotdot num

    Suppose input is :

    array [ num dotdot num ]of integer

    Parsing would begin with

    typep ???

    Start symbol

  • 8/8/2019 CSE309-2

    24/62

    Chapter 2

    CSE309N

    TopTop--Down Parse (type = start symbol)Down Parse (type = start symbol)

    type]simple of[array

    type

    type]simple of[array

    type

    numnum dotdot

    Input : array [ num dotdot num ]of integer

    Lookahead symbol

    type

    ?

    Input : array [ num dotdot num ]of integer

    Lookahead symbol

  • 8/8/2019 CSE309-2

    25/62

    Chapter 2

    CSE309N

    TopTop--Down Parse (type = start symbol)Down Parse (type = start symbol)

    Input : array [ num dotdot num ]of integer

    type]simple of[array

    type

    numnum dotdot simple

    type]simple of[array

    type

    numnum dotdot simple

    integer

    Lookahead symbol

    The selection of

    production for nonterminal may involve

    trail and error

  • 8/8/2019 CSE309-2

    26/62

    Chapter 2

    CSE309N Top-Down ProcessRecursive Descent or Predictive Parsing

    Parser Operates by Attempting to Match Tokens in theInput Stream

    Utilize both Grammar and Input Below to Motivate Codefor Algorithm

    array [ num dotdot num ]of integer

    typep simple

    | o id

    | array [ simple ]of type

    simplep integer

    | char

    | num dotdot num

    procedure match ( t: token ) ;

    begin

    if lookahead= t thenlookahead : = nexttoken

    else error

    end ;

  • 8/8/2019 CSE309-2

    27/62

    Chapter 2

    CSE309N

    TopTop--Down Algorithm (Continued)Down Algorithm (Continued)

    procedure type ;begin

    if lookahead is in {integer, char, num } then simple

    else if lookahead = o then begin match(o ) ; match( id ) endelse if lookahead= array then begin

    match( array ); match([); simple; match(]); match(of); type

    end

    else errorend ;procedure simple ;begin

    if lookahead= integer then match ( integer );

    else if lookahead= char then match( char );

    else if lookahead= num then begin

    match(num); match(dotdot); match(num)

    end

    else error

    end ;

  • 8/8/2019 CSE309-2

    28/62

    Chapter 2

    CSE309N

    TracingTracing

    Input: array [ num dotdot num ]of integer

    To initialize the parser:

    setglobal variable :lookahead = array

    call procedure:type

    Procedure callto type withlookahead = array resultsinthe actions:

    match( array ); match([); simple; match(]); match(of); type

    Procedure callto simple withlookahead = num resultsinthe actions:

    match(num); match(dotdot); match(num)

    Procedure callto type withlookahead = integer resultsinthe actions:

    simple

    Procedure callto simple withlookahead = integer resultsinthe actions:

    match ( integer )

  • 8/8/2019 CSE309-2

    29/62

    Chapter 2

    CSE309N

    LimitationsLimitations

    Can we apply the previous technique to everyCan we apply the previous technique to everygrammar?grammar? NO:NO:

    typep simple

    | array [ simple ]of type

    simplep integer

    | array digit

    digitp 0|1|2|3|4|5|6|7|8|9

    consider the string consider the string array 6the predictive parser starts withthe predictive parser starts with typetype and lookahead=and lookahead=

    array

    apply productionapply production typep simple ORORtypep array digit????

  • 8/8/2019 CSE309-2

    30/62

    Chapter 2

    CSE309N

    When to UseWhen to Use --ProductionsProductions

    The recursive descent parser will use -productions as adefault when no other production can be used.

    stmtp begin opt_stmts end

    opt_stmtsp stmt_list|

    While parsing opt_stmts, if the lookahead symbol is

    not in FIRST(stmts_list), then the -productions is

    used.

  • 8/8/2019 CSE309-2

    31/62

    Chapter 2

    CSE309N

    Designing a Predictive ParserDesigning a Predictive Parser

    Consider AConsider ApEpE FIRST(E)=set of leftmost tokens that appear in E or instrings generated by E.

    E.g. FIRST(type)={o,array,integer,char,num}

    Consider productions of the formConsider productions of the form AApEpE,, AApFpF the setsthe setsFIRST(FIRST(EE) and FIRST() and FIRST(FF) should be disjoint) should be disjoint

    Then we can implement predictive parsingThen we can implement predictive parsing

    Starting with Ap? we find into which FIRST() set the

    lookahead symbol belongs to and we use thisproduction.

    Any non-terminal results in the correspondingprocedure call

    Terminals are matched.

  • 8/8/2019 CSE309-2

    32/62

    Chapter 2

    CSE309N

    Problems with Top Down ParsingProblems with Top Down Parsing

    Left Recursion in CFGMay Cause Parser to Loop Forever.Left Recursion in CFGMay Cause Parser to Loop Forever. Indeed:Indeed: In the production ApAEwe write the program

    procedure A{

    if lookahead belongs to First(AE) then

    call the procedure A}

    Solution: Remove Left Recursion...Solution: Remove Left Recursion... without changing the Language defined by the

    Grammar.

  • 8/8/2019 CSE309-2

    33/62

    Chapter 2

    CSE309N

    Dealing with Left recursionDealing with Left recursion

    Solution: Algorithm to Remove Left Recursion:Solution: Algorithm to Remove Left Recursion:

    exprp expr+ term | expr- term | term

    termp 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

    exprp term rest

    restp + term rest | - term rest|

    termp 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

    BASIC IDEA

    ApAE|FbecomesAp FR

    RpER|

  • 8/8/2019 CSE309-2

    34/62

    Chapter 2

    CSE309N

    What happens to semantic actions?What happens to semantic actions?

    exprp expr + term {print(+)}

    p expr - term {print(-)}

    p term

    termp 0 {print(0)}

    termp 1 {print(1)}

    termp 9{print(9)}

    exprp term rest

    restp + term {print(+)} rest

    p - term {print(-)} rest

    p

    termp 0 {print(0)}

    termp 1 {print(1)}

    termp 9 {print(9)}

  • 8/8/2019 CSE309-2

    35/62

    Chapter 2

    CSE309N Comparing GrammarsComparing Grammarswith Left Recursionwith Left Recursion

    Notice Location of Semantic Actions in TreeNotice Location of Semantic Actions in Tree

    What is Order of Processing?What is Order of Processing?

    expr

    expr

    expr

    term

    term

    term

    {print(2)}

    {print(+)}

    {print(5)}

    {print(-)}

    {print(9)}

    5

    +

    2

    -

    9

  • 8/8/2019 CSE309-2

    36/62

    Chapter 2

    CSE309N Comparing GrammarsComparing Grammarswithout Left Recursionwithout Left Recursion

    Now, Notice Location of Semantic Actions in TreeNow, Notice Location of Semantic Actions in Treefor Revised Grammarfor Revised Grammar

    What is Order of Processing in this Case?What is Order of Processing in this Case?

    {print(2)}

    expr

    term

    term {print(-)}

    term {print(+)}

    {print(5)}

    {print(9)} rest

    rest

    2

    5

    -9

    +

    rest

  • 8/8/2019 CSE309-2

    37/62

    Chapter 2

    CSE309N Procedure for the Non terminalsexpr, term, and rest

    expr(){

    term(), rest();

    }

    rest()

    {

    if ( lookahead == +) {

    match(+); term(); putchar(+); rest();

    }

    else if ( lookahead = -) {match(-); term(); putchar(-); rest();

    }

    else ;

    }

  • 8/8/2019 CSE309-2

    38/62

    Chapter 2

    CSE309N Procedure for the Non terminalsexpr, term, and rest (2)

    term()

    {

    if (isdigit(lookahead)){

    putchar(lookahead); match();

    }

    else error();

    }

  • 8/8/2019 CSE309-2

    39/62

    Chapter 2

    CSE309N

    Optimizing the translatorOptimizing the translator

    Tail recursion

    When the last statement executed in a procedure body is a

    recursive call of the same procedure, the call is said to be tail

    recursion.rest()

    {

    L: if ( lookahead == +) {

    match(+); term(); putchar(+); goto L;

    }

    else if ( lookahead = -) {

    match(-); term(); putchar(-); goto L;

    }

    else ;

    }

  • 8/8/2019 CSE309-2

    40/62

    Chapter 2

    CSE309N

    Optimizing the translatorOptimizing the translator

    expr(){

    term(),while(1)

    if ( lookahead == +) {

    match(+); term(); putchar(+);}

    else if ( lookahead = -) {

    match(-); term(); putchar(-);

    }

    else break;

    }

  • 8/8/2019 CSE309-2

    41/62

    Chapter 2

    CSE309N

    Lexical AnalysisLexical Analysis

    A lexical analyzer reads and converts the input into a stream of tokens to

    be analyzed by the parser.

    A sequence of input characters that comprises a single token is called a

    lexeme.

    Functional Responsibilities

    1. White Space and Comments Are Filtered Out

    blanks, new lines, tabs are removed modifying the grammar to incorporate white space into

    the syntax difficult to implement

  • 8/8/2019 CSE309-2

    42/62

    Chapter 2

    CSE309N

    Functional Responsibilities (2)

    Constants

    The job of collecting digits into integers is generally given to a

    lexical analyzer because numbers can be treated as single units

    during translation.

    numbe the token representing an integer.

    The value of the integer will be passed along as an attribute of the

    token num

    Example:

    31 + 28 + 59

    < +, >

    NB: 2nd Component of the tuples, theattributes, play no role during parsing, but

    needed during translation

  • 8/8/2019 CSE309-2

    43/62

    Chapter 2

    CSE309N

    Functional Responsibilities (3)

    Recognizing Identifiers and Keywords

    Compilers use identifiers as names of

    Variables

    Arrays

    Functions

    A grammar for a language treats an identifier as token

    Example:

    credit = asset + goodwill;

    Lexical analyzer would convert it like

    id = id + id ;

    309

  • 8/8/2019 CSE309-2

    44/62

    Chapter 2

    CSE309N

    Functional Responsibilities (3)

    Recognizing Identifiers and Keywords (2)

    Languages use fixed character strings ( if, while, extern) to identify certain

    construct. We call them keywords.

    A mechanism is needed fir deciding when a lexeme forms a keyword and

    when it forms an identifier.

    Solution

    1. Keywords are reserved.

    2. The character string forms an identifier

    only if it is not a keyword.

    CSE309N

  • 8/8/2019 CSE309-2

    45/62

    Chapter 2

    CSE309N

    Interface to the Lexical AnalyzerInterface to the Lexical Analyzer

    Lexical

    AnalyzerParser

    Pass token

    and its

    attributes

    Input

    Read

    character

    Push back

    character1. Read characters from input

    2. Groups them into lexeme

    3. Passes the token together with

    attribute values to the later stage

    This part is implemented

    with a buffer

    Why pushback?

    CSE309N

  • 8/8/2019 CSE309-2

    46/62

    Chapter 2

    CSE309N The Lexical Analysis ProcessA Graphical Depiction

    uses getchar ( ) to

    read character

    pushes back c using

    ungetc (c , stdin)

    returns token

    to caller

    tokenval

    Sets global variable

    to attribute value

    lexan ( )

    lexical

    analyzer

    CSE309N

  • 8/8/2019 CSE309-2

    47/62

    Chapter 2

    CSE309N

    Example of a Lexical AnalyzerExample of a Lexical Analyzer

    function lexan: integer ; [ Returns an integer encoding of token]var lexbuf: array[ 0 .. 100 ]of char ;

    c : char ;

    begin

    loop begin

    read a character into c ;if c is a blank or a tab then

    do nothing

    else if c is a newline then

    lineno : = lineno + 1

    else if c is a digit then beginset tokenval to the value of this and following digits ;

    return NUM

    end

    CSE309N

  • 8/8/2019 CSE309-2

    48/62

    Chapter 2

    CSE309N

    Algorithm for Lexical AnalyzerAlgorithm for Lexical Analyzer

    else if c is a letter then beginplace c and successive letters and digits into lexbuf;

    p : = lookup ( lexbuf) ;

    ifp = 0 then

    p : = insert( lexbf, ID) ;

    tokenval: =preturn the tokenfield of table entryp

    end

    else

    set tokenval to NONE ; / * there is no attribute * /

    return integer encoding of charactercend

    end

    Note: Insert / Lookup operations occur against the Symbol Table !

    CSE309N

  • 8/8/2019 CSE309-2

    49/62

    Chapter 2

    CSE309N

    Symbol Table ConsiderationsSymbol Table Considerations

    ARRAY symtable

    lexptr token attributes

    div

    mod

    id

    id

    0

    1

    2

    3

    4

    EOSiEOStnuocEOSdomEOSvid

    ARRAY lexemes

    OPERATIONS: Insert (string, token_ID)

    Lookup (string)

    NOTICE: Reserved words are placed into

    symbol table for easy lookup

    Attributes may be associated with each entry, i.e.,

    Semantic Actions

    Typing Info: idp integeretc.

    CSE309N

  • 8/8/2019 CSE309-2

    50/62

    Chapter 2

    CSE309N

    Abstract StackMachinesAbstract StackMachines

    The front end of a compiler constructs an intermediate representation ofthe source program from which the back end generates the target program.

    One popular form of intermediate representation is code for an abstract

    stack machine.

    I will show you how code will be generated for it.

    The properties of the machine

    1. Instruction memory

    2. Data memory

    3. All arithmetic operations are performed on values on a stack

    CSE309N

  • 8/8/2019 CSE309-2

    51/62

    Chapter 2

    CSE309N

    InstructionsInstructions

    Instructions fall into three classes.1. Integer arithmetic

    2. Stack manipulation

    3. Control flow

    push 5

    rvalue 2

    +

    rvalue 3

    *

    . . .

    1

    2

    3

    4

    5

    6

    16

    7

    0

    11

    7

    ...

    1

    2

    3

    4

    Instructions Stack Data

    pc

    t

    op

    CSE309N

  • 8/8/2019 CSE309-2

    52/62

    Chapter 2

    CSE309N

    LL--value and Rvalue and R--valuevalue

    What is the difference between left and right side identifier?

    L-value Vs. R-value of an identifier

    I : = 5 ; L - Location

    I : = I + 1 ; R Contents

    The right side specifies an integer value, while left side specifies

    where the value is to be stored.

    Usually,

    r-values are what we think as valuesl-values are locations.

    CSE309N

  • 8/8/2019 CSE309-2

    53/62

    Chapter 2

    CSE309N

    Stack manipulationStack manipulation

    push v push v onto the stack

    rvalue l push contents on data location l

    lvalue l push address of data location lpop throw away value on top of the stack

    := the r-value on top is placed in the l-value below

    it and both are popped

    copy push a copy of the top on the stack

    CSE309N

  • 8/8/2019 CSE309-2

    54/62

    Chapter 2

    CSE309N

    Translation of ExpressionsTranslation of Expressions

    Day = (1461*y) mod4 + (153*m +2 ) mod5 + d

    lvalue day

    push 1461

    rvalue y

    *

    push 4

    mod

    push 153

    rvalue m

    *

    push 2

    +

    push 5

    mod

    +

    rvalue d

    +

    :=

    0

    1

    2

    -3

    ...

    1

    2 day

    3 y

    4 m

    5 d

    CSE309N

  • 8/8/2019 CSE309-2

    55/62

    Chapter 2

    CSE309N

    Translation of Expressions (2)Translation of Expressions (2)

    22 22

    14611461

    22

    14611461

    11

    22

    14611461

    22

    14611461

    44

    22

    11

    22

    11

    153153

    CSE309N

  • 8/8/2019 CSE309-2

    56/62

    Chapter 2

    CSE309N

    Translation of Expressions (3)Translation of Expressions (3)

    22

    11

    153153

    22

    22

    11

    306306

    22

    11

    306306

    22

    22

    11

    308308

    22

    11

    308308

    55

    22

    11

    33

    22

    44

    CSE309N

  • 8/8/2019 CSE309-2

    57/62

    Chapter 2

    CSE309N

    Translation of Expressions (4)Translation of Expressions (4)

    22

    44

    --33

    22

    11

    0

    1

    2

    -3

    ...

    1

    2 day

    3 y

    4 m

    5 d

    CSE309N

  • 8/8/2019 CSE309-2

    58/62

    Chapter 2

    CSE309N

    Control FlowControl Flow

    label l target of jumps to l; has no other effect

    goto l next instruction is taken from statement with label l

    gofalse l pop the top value; jump if it is zero

    gotrue l pop the top value; jump if it is nonzero

    halt stop execution

    The control flow instructions for the stack

    machine are

    CSE309N

  • 8/8/2019 CSE309-2

    59/62

    Chapter 2

    CSE309N

    Translation of statementsTranslation of statements

    If

    code for expr

    gofalse out

    code for stmt1

    label out

    stmt p if exprthen stmt1

    { out := newlabel;stmt.t := expr.t ||

    gofalse out ||

    stmt1.t ||

    label out }

    CSE309N

  • 8/8/2019 CSE309-2

    60/62

    Chapter 2

    CSE309N

    Translation of statements (2)Translation of statements (2)

    label test

    code for expr

    gofalse out

    code for stmt1

    goto test

    label out

    while

    stmt pwhile exprdo stmt1

    { test := newlabel;

    out := newlabel;

    stmt.t := label test ||

    expr.t ||

    gofalse out ||

    stmt1.t ||

    goto test ||

    label out }

    CSE309N

  • 8/8/2019 CSE309-2

    61/62

    Chapter 2

    CSE309N

    Concluding Remarks / Looking Ahead

    Weve Reviewed / Highlighted Entire CompilationProcess

    Introduced Context-free Grammars (CFG) andIndicated /Illustrated Relationship to CompilerTheory

    Reviewed Many Different Versions of ParseTrees That Assist in Both Recognition andTranslation

    Well Return to Beginning - Lexical Analysis

    Well Explore Close Relationship of LexicalAnalysis to Regular Expressions, Grammars, andFinite Automatons

    CSE309N

  • 8/8/2019 CSE309-2

    62/62

    TheEnd


Recommended