+ All Categories
Home > Documents > unit2-compiler design

unit2-compiler design

Date post: 09-Mar-2016
Category:
Upload: dhanasekar
View: 42 times
Download: 0 times
Share this document with a friend
Description:
compiler design

of 24

Transcript
  • 7/21/2019 unit2-compiler design

    1/24

  • 7/21/2019 unit2-compiler design

    2/24

    Chapter 1

    CSE309N

    LEXICAL ANALYZER

    Scan Input

    Remove WS, NL,

    Identify Tokens

    Create Symbol Table

    Insert Tokens into ST

    Generate rrors

    Send Tokens to !arser

  • 7/21/2019 unit2-compiler design

    3/24

  • 7/21/2019 unit2-compiler design

    4/24

    Chapter 1

    CSE309N

    Introducin !a"ic Ter#inoloy

    %hat are &a'or Terms for (exical nalysis)

    T)*N

    Token" re$re"ent a "et o% "trin" de"cri&ed &y a $attern' Exa#$le" Include (Identi%ier)* (nu#&er)* etc'

    !+TTRN

    T+e "et o% "trin" i" de"cri&ed &y a rule called a$attern a""ociated ,it+ t+e token'

    L- A lexe#e i" a "e-uence o% c+aracter" in t+e "ource

    $rora# t+at #atc+e" t+e $attern %or a token Identi%ier". x* count* na#e* etc/

  • 7/21/2019 unit2-compiler design

    5/24

    Chapter 1

    CSE309N

    Introducin !a"ic Ter#inoloy

    Token Sample LexemesInformal Description of

    Patternconst

    if

    relation

    id

    num

    literal

    const

    if

    , >=

    pi, count, D2

    3.11!, ",

    !."2#23

    $core %umpe%&

    const

    if

    < or or >=

    or >

    letter follo'e% ()

    letters an% %i*its

    an) numeric constant

    an) c+aracters (et'een $an% $ except $

    lassifi

    es

    Pattern

    -ctual alues are critical. Info

    is /

    1.Store% in s)m(ol ta(le

    2.0eturne% to parser

  • 7/21/2019 unit2-compiler design

    6/24

    Chapter 1

    CSE309N

    Example/ # = 2

    E

    Attri&ute" %or Token"Attri&ute" %or Token"

    LEXEME

    TOE! "TT#$%&TE' (alues

  • 7/21/2019 unit2-compiler design

    7/24Chapter 1

    CSE309N

    andlin Lexical Error"andlin Lexical Error"

    $rror *andling is +ery localized, with espect to -nput

    ource /or example0 whil x 2 3 ! do

    generates nolexical errors in 4C(

    In ,+at Situation" do Error" 1ccur2

    Le.ical analy/er is unable to proceed because none of t$epatterns for tokens matc$es a prefi. of remainin" input'

    anic #ode Reco4ery

    0elete successive c$aracters from t$e remainin" input until t$eanaly/er can find a #ell1formed token'

    -ay confuse t$e parser o""i&le error reco4ery action".

    0eletin" an e.tra c$aracter

    Insertin" a missin" Input C$aracters

    Replacin" a incorrect c$aracter by a correct c$aracter'

    Transposin" t#o ad2acent C$aracters

  • 7/21/2019 unit2-compiler design

    8/24Chapter 1

    CSE309N

    )'*+ )ompiler -esign .

    Reular Ex$re""ion"Reular Ex$re""ion"

    /ormal pecification for Tokens

    %e use regular expressions to describe tokens of aprogramming language.

    $ach regular expression denotes a language.

    language denoted by a regular expression iscalled as a reular "et.

  • 7/21/2019 unit2-compiler design

    9/24Chapter 1

    CSE309N

    )'*+ )ompiler -esign /

    Reular Ex$re""ion" 5Rule"6Reular Ex$re""ion" 5Rule"6

    egular expressions o+er alphabet

    eg. $xpr (anguage it denotes 56a 5a6

    r1! 7 r8! (r1! (r8!r1! r8! (r1! (r8!r!9 (r!!9r! (r!

    r!: 2 r!r!9 r!) 2 r! 7

  • 7/21/2019 unit2-compiler design

    10/24Chapter 1

    CSE309N

    )'*+ )ompiler -esign +0

    Reular Ex$re""ion" 5cont'6Reular Ex$re""ion" 5cont'6

    %e may remo+e parentheses by using precedencerules. 3 $i"$est concatenation ne.t 4 lo#est

    ab9

    7c means ab!9

    !7c! $x0

    5 67,89

    748 5: 67,89

    %748&%748& 5: 677,78,87,889 73 5: 6 ,7,77,777,7777,''''9 %748&3 5: all strin"s #it$ 7 and 8, includin" t$e empty

    strin"

  • 7/21/2019 unit2-compiler design

    11/24Chapter 1

    CSE309N

    ++

    Reular 7e%inition"Reular 7e%inition"

    To write regular expression for some languages can bedifficult, because their regular expressions can be ;uitecomplex. -n those cases, we may use regular definitions.

    %e can gi+e names to regular expressions, and we canuse these names as symbols to define other regularexpressions.

    #efine regular expressions in terms of named regularexpressions

    reular de%initionis a se;uence of the definitions ofthe form0d1 r1 where di is a definition name and

    d8 r8 ri is a regular expression o+er symbols in . 5d1,d8,...,di

  • 7/21/2019 unit2-compiler design

    12/24Chapter 1

    CSE309N

    )'*+ )ompiler -esign +1

    Reular 7e%inition" 5cont'6Reular 7e%inition" 5cont'6

    $x0 -dentifiers in 4ascalletter+ 4 ; 4 ''' 4 < 4 a 4 b 4 ''' 4 /di"it 7 4 8 4 ''' 4 =

    idletter %letter 4 di"it & 3

    If #e try to #rite t$e re"ular e.pression representin" identifiers #it$out usin"re"ular definitions, t$at re"ular e.pression #ill be comple.'

    %+4'''4 di"it di"it3 or di"its di"it ?

    opt1fraction% ' di"its & @

    opt1e.ponent% %?41&@ di"its & @

    unsi"ned1num di"its opt1fraction opt1e.ponent

  • 7/21/2019 unit2-compiler design

    13/24Chapter 1

    CSE309N

    Reconition o% token"Reconition o% token"

    The next step is to formalize the patterns0digit 1: A71=B

    Digits 1: di"it?

    number1: di"it%'di"its&@ %A?1B@ 0i"it&@

    letter 1: A+1

  • 7/21/2019 unit2-compiler design

    14/24Chapter 1

    CSE309NCATER 3 LEXICAL ANALYSISCATER 3 LEXICAL ANALYSISSection 3 Reconition o% Token"Section 3 Reconition o% Token"

    1 Task of recognition of token in a lexical analyzer

    Regularexpression

    Token Attribute-value

    if if -

    id id Pointer to table

    entry< relop LT

  • 7/21/2019 unit2-compiler design

    15/24Chapter 1

    CSE309N

    Tran"ition diara#"Tran"ition diara#"

    Transition diagram for relop

    E 0 N

  • 7/21/2019 unit2-compiler design

    16/24Chapter 1

    CSE309N

    Tran"ition diara#" 5cont'6Tran"ition diara#" 5cont'6

    Transition diagram for reser+ed words andidentifiers

    CSE309N

  • 7/21/2019 unit2-compiler design

    17/24Chapter 1

    CSE309N

    Tran"ition diara#" 5cont'6Tran"ition diara#" 5cont'6

    Transition diagram for unsigned numbers

    CSE309N

  • 7/21/2019 unit2-compiler design

    18/24Chapter 1

    CSE309N

    Tran"ition diara#" 5cont'6Tran"ition diara#" 5cont'6

    Transition diagram for whitespace

    CSE309N

  • 7/21/2019 unit2-compiler design

    19/24Chapter 1

    CSE309N

    !u%%er air"

    (exical analyzer needs to look ahead se+eral characters

    beyond the lexeme for a pattern before a match can beannounced.

    =se a function unetcto push lookahead characters backinto the input stream.

    (arge amount of time can be consumed mo+ing characters.

    'pecial %ufferingTec2ni3ue4se a (uffer %ii%e% into t'o 5c+aracter

    +ales

    5 = 5um(er of c+aracters on one %isk (lock

    6ne s)stem comman% rea% 5 c+aracters

    7e'er t+an 5 c+aracter => eof

    CSE309N

  • 7/21/2019 unit2-compiler design

    20/24Chapter 1

    CSE309N

    !u%%er air" 586

    T'o pointers to t+e input (uffer are maintaine%

    T+e strin* of c+aracters (et'een t+e pointers

    is t+e current lexeme

    6nce t+e next lexeme is %etermine%, t+e for'ar%

    pointer is set to t+e c+aracter at its ri*+ten%.

    M4E eof155)lexeme

    beginningfor6ard7scans a2ead

    to findpatternmatc28

    )omments and 62ite space can be treated aspatterns t2at yield no token

    CSE309N

  • 7/21/2019 unit2-compiler design

    21/24Chapter 1

    CSE309N

    Code to ad4ance %or,ard $ointerCode to ad4ance %or,ard $ointer

    if forward at the end of first halft2en begin

    reloa% secon% +alf 8

    forward / = forward 9 18

    end

    else if forward at en% of secon% +alf t2en begin

    reloa% first +alf 8

    moe forward to (i*innin* of first +alf

    end

    else forward / = forward 9 18

    1. T+is (ufferin* sc+eme 'orks :uite 'ell most of t+e

    time (ut 'it+ it amount of looka+ea% is limite%.

    2. Limite% looka+ea% makes it impossi(le to reco*ni;e

    tokens in situations '+ere t+e %istance, for'ar%

    pointer must trael is more t+an t+e len*t+ of

    (uffer.

    PitfallsPitfalls

    CSE309N

  • 7/21/2019 unit2-compiler design

    22/24Chapter 1

    CSE309N

    Alorit+#. !u%%ered I1 ,it+ Sentinel"

    eof5M4E eofeof155)

    )urrent token

    lexemebeginning

    for6ard7scans a2eadto find

    patternmatc28

    forward / = forward + 1 8

    if forward = eof t2en begin

    if forward at en% of first +alft2en begin reloa% secon% +alf 8

    forward / = forward 9 1

    end

    else if forward at en% of secon%

    +alf t2en begin reloa% first +alf 8

    moe forward to (i*innin* offirst +alf

    end

    else

    eof'it+in (uffer

    si*nif)in* en% of input

    2n% eofno moreinput

    lock I6

    lock I6

    "lgorit2mperforms

    $9O:s; e canstill

    2a(e get =ungetc2ar

    5o' t+ese 'ork

    on

    real memor)(uffers

    CSE309N

  • 7/21/2019 unit2-compiler design

    23/24Chapter 1

    CSE309N

    :or#alizin Token 7e%inition

    EX"M?LE' "!- OT@E# )O!)E?T'A

    'upposeA ' is t2e string banana

    ?refix A ban, banana

    'uffix A ana, banana'ubstringA nan, ban, ana,

    banana

    'ubse3uenceAbnan, nn

    ?roper prefix,subfix, orsubstring cannot

    be all of '

    CSE309N

  • 7/21/2019 unit2-compiler design

    24/24

    CSE309NAlorit+#. !u%%ered I1 ,it+

    Sentinel"

    eof

    5M4E eof

    eof

    155)

    )urrent token

    lexeme beginning for6ard7scans a2eadto find

    patternmatc28

    forward / = forward + 1 8

    ifforward =

    eof t2en begin if forward at en% of first +alft2en begin

    reloa% secon% +alf 8

    forward / = forward 9 1

    end

    else if forward at en% ofsecon% +alf t2en begin

    reloa% first +alf 8

    moe forward to (i*innin*of first +alf

    endl f it+i ( ff

    2n% eofno morei

    lock I6

    lock I6

    "lgorit2mperforms

    $9O:s; e canstill

    2a(e get =ungetc2ar

    !o6 t2ese 6orkon

    real memoryb ff


Recommended