Date post: | 09-Mar-2016 |
Category: |
Documents |
Upload: | dhanasekar |
View: | 42 times |
Download: | 0 times |
of 24
7/21/2019 unit2-compiler design
1/24
7/21/2019 unit2-compiler design
2/24
Chapter 1
CSE309N
LEXICAL ANALYZER
Scan Input
Remove WS, NL,
Identify Tokens
Create Symbol Table
Insert Tokens into ST
Generate rrors
Send Tokens to !arser
7/21/2019 unit2-compiler design
3/24
7/21/2019 unit2-compiler design
4/24
Chapter 1
CSE309N
Introducin !a"ic Ter#inoloy
%hat are &a'or Terms for (exical nalysis)
T)*N
Token" re$re"ent a "et o% "trin" de"cri&ed &y a $attern' Exa#$le" Include (Identi%ier)* (nu#&er)* etc'
!+TTRN
T+e "et o% "trin" i" de"cri&ed &y a rule called a$attern a""ociated ,it+ t+e token'
L- A lexe#e i" a "e-uence o% c+aracter" in t+e "ource
$rora# t+at #atc+e" t+e $attern %or a token Identi%ier". x* count* na#e* etc/
7/21/2019 unit2-compiler design
5/24
Chapter 1
CSE309N
Introducin !a"ic Ter#inoloy
Token Sample LexemesInformal Description of
Patternconst
if
relation
id
num
literal
const
if
, >=
pi, count, D2
3.11!, ",
!."2#23
$core %umpe%&
const
if
< or or >=
or >
letter follo'e% ()
letters an% %i*its
an) numeric constant
an) c+aracters (et'een $an% $ except $
lassifi
es
Pattern
-ctual alues are critical. Info
is /
1.Store% in s)m(ol ta(le
2.0eturne% to parser
7/21/2019 unit2-compiler design
6/24
Chapter 1
CSE309N
Example/ # = 2
E
Attri&ute" %or Token"Attri&ute" %or Token"
LEXEME
TOE! "TT#$%&TE' (alues
7/21/2019 unit2-compiler design
7/24Chapter 1
CSE309N
andlin Lexical Error"andlin Lexical Error"
$rror *andling is +ery localized, with espect to -nput
ource /or example0 whil x 2 3 ! do
generates nolexical errors in 4C(
In ,+at Situation" do Error" 1ccur2
Le.ical analy/er is unable to proceed because none of t$epatterns for tokens matc$es a prefi. of remainin" input'
anic #ode Reco4ery
0elete successive c$aracters from t$e remainin" input until t$eanaly/er can find a #ell1formed token'
-ay confuse t$e parser o""i&le error reco4ery action".
0eletin" an e.tra c$aracter
Insertin" a missin" Input C$aracters
Replacin" a incorrect c$aracter by a correct c$aracter'
Transposin" t#o ad2acent C$aracters
7/21/2019 unit2-compiler design
8/24Chapter 1
CSE309N
)'*+ )ompiler -esign .
Reular Ex$re""ion"Reular Ex$re""ion"
/ormal pecification for Tokens
%e use regular expressions to describe tokens of aprogramming language.
$ach regular expression denotes a language.
language denoted by a regular expression iscalled as a reular "et.
7/21/2019 unit2-compiler design
9/24Chapter 1
CSE309N
)'*+ )ompiler -esign /
Reular Ex$re""ion" 5Rule"6Reular Ex$re""ion" 5Rule"6
egular expressions o+er alphabet
eg. $xpr (anguage it denotes 56a 5a6
r1! 7 r8! (r1! (r8!r1! r8! (r1! (r8!r!9 (r!!9r! (r!
r!: 2 r!r!9 r!) 2 r! 7
7/21/2019 unit2-compiler design
10/24Chapter 1
CSE309N
)'*+ )ompiler -esign +0
Reular Ex$re""ion" 5cont'6Reular Ex$re""ion" 5cont'6
%e may remo+e parentheses by using precedencerules. 3 $i"$est concatenation ne.t 4 lo#est
ab9
7c means ab!9
!7c! $x0
5 67,89
748 5: 67,89
%748&%748& 5: 677,78,87,889 73 5: 6 ,7,77,777,7777,''''9 %748&3 5: all strin"s #it$ 7 and 8, includin" t$e empty
strin"
7/21/2019 unit2-compiler design
11/24Chapter 1
CSE309N
++
Reular 7e%inition"Reular 7e%inition"
To write regular expression for some languages can bedifficult, because their regular expressions can be ;uitecomplex. -n those cases, we may use regular definitions.
%e can gi+e names to regular expressions, and we canuse these names as symbols to define other regularexpressions.
#efine regular expressions in terms of named regularexpressions
reular de%initionis a se;uence of the definitions ofthe form0d1 r1 where di is a definition name and
d8 r8 ri is a regular expression o+er symbols in . 5d1,d8,...,di
7/21/2019 unit2-compiler design
12/24Chapter 1
CSE309N
)'*+ )ompiler -esign +1
Reular 7e%inition" 5cont'6Reular 7e%inition" 5cont'6
$x0 -dentifiers in 4ascalletter+ 4 ; 4 ''' 4 < 4 a 4 b 4 ''' 4 /di"it 7 4 8 4 ''' 4 =
idletter %letter 4 di"it & 3
If #e try to #rite t$e re"ular e.pression representin" identifiers #it$out usin"re"ular definitions, t$at re"ular e.pression #ill be comple.'
%+4'''4 di"it di"it3 or di"its di"it ?
opt1fraction% ' di"its & @
opt1e.ponent% %?41&@ di"its & @
unsi"ned1num di"its opt1fraction opt1e.ponent
7/21/2019 unit2-compiler design
13/24Chapter 1
CSE309N
Reconition o% token"Reconition o% token"
The next step is to formalize the patterns0digit 1: A71=B
Digits 1: di"it?
number1: di"it%'di"its&@ %A?1B@ 0i"it&@
letter 1: A+1
7/21/2019 unit2-compiler design
14/24Chapter 1
CSE309NCATER 3 LEXICAL ANALYSISCATER 3 LEXICAL ANALYSISSection 3 Reconition o% Token"Section 3 Reconition o% Token"
1 Task of recognition of token in a lexical analyzer
Regularexpression
Token Attribute-value
if if -
id id Pointer to table
entry< relop LT
7/21/2019 unit2-compiler design
15/24Chapter 1
CSE309N
Tran"ition diara#"Tran"ition diara#"
Transition diagram for relop
E 0 N
7/21/2019 unit2-compiler design
16/24Chapter 1
CSE309N
Tran"ition diara#" 5cont'6Tran"ition diara#" 5cont'6
Transition diagram for reser+ed words andidentifiers
CSE309N
7/21/2019 unit2-compiler design
17/24Chapter 1
CSE309N
Tran"ition diara#" 5cont'6Tran"ition diara#" 5cont'6
Transition diagram for unsigned numbers
CSE309N
7/21/2019 unit2-compiler design
18/24Chapter 1
CSE309N
Tran"ition diara#" 5cont'6Tran"ition diara#" 5cont'6
Transition diagram for whitespace
CSE309N
7/21/2019 unit2-compiler design
19/24Chapter 1
CSE309N
!u%%er air"
(exical analyzer needs to look ahead se+eral characters
beyond the lexeme for a pattern before a match can beannounced.
=se a function unetcto push lookahead characters backinto the input stream.
(arge amount of time can be consumed mo+ing characters.
'pecial %ufferingTec2ni3ue4se a (uffer %ii%e% into t'o 5c+aracter
+ales
5 = 5um(er of c+aracters on one %isk (lock
6ne s)stem comman% rea% 5 c+aracters
7e'er t+an 5 c+aracter => eof
CSE309N
7/21/2019 unit2-compiler design
20/24Chapter 1
CSE309N
!u%%er air" 586
T'o pointers to t+e input (uffer are maintaine%
T+e strin* of c+aracters (et'een t+e pointers
is t+e current lexeme
6nce t+e next lexeme is %etermine%, t+e for'ar%
pointer is set to t+e c+aracter at its ri*+ten%.
M4E eof155)lexeme
beginningfor6ard7scans a2ead
to findpatternmatc28
)omments and 62ite space can be treated aspatterns t2at yield no token
CSE309N
7/21/2019 unit2-compiler design
21/24Chapter 1
CSE309N
Code to ad4ance %or,ard $ointerCode to ad4ance %or,ard $ointer
if forward at the end of first halft2en begin
reloa% secon% +alf 8
forward / = forward 9 18
end
else if forward at en% of secon% +alf t2en begin
reloa% first +alf 8
moe forward to (i*innin* of first +alf
end
else forward / = forward 9 18
1. T+is (ufferin* sc+eme 'orks :uite 'ell most of t+e
time (ut 'it+ it amount of looka+ea% is limite%.
2. Limite% looka+ea% makes it impossi(le to reco*ni;e
tokens in situations '+ere t+e %istance, for'ar%
pointer must trael is more t+an t+e len*t+ of
(uffer.
PitfallsPitfalls
CSE309N
7/21/2019 unit2-compiler design
22/24Chapter 1
CSE309N
Alorit+#. !u%%ered I1 ,it+ Sentinel"
eof5M4E eofeof155)
)urrent token
lexemebeginning
for6ard7scans a2eadto find
patternmatc28
forward / = forward + 1 8
if forward = eof t2en begin
if forward at en% of first +alft2en begin reloa% secon% +alf 8
forward / = forward 9 1
end
else if forward at en% of secon%
+alf t2en begin reloa% first +alf 8
moe forward to (i*innin* offirst +alf
end
else
eof'it+in (uffer
si*nif)in* en% of input
2n% eofno moreinput
lock I6
lock I6
"lgorit2mperforms
$9O:s; e canstill
2a(e get =ungetc2ar
5o' t+ese 'ork
on
real memor)(uffers
CSE309N
7/21/2019 unit2-compiler design
23/24Chapter 1
CSE309N
:or#alizin Token 7e%inition
EX"M?LE' "!- OT@E# )O!)E?T'A
'upposeA ' is t2e string banana
?refix A ban, banana
'uffix A ana, banana'ubstringA nan, ban, ana,
banana
'ubse3uenceAbnan, nn
?roper prefix,subfix, orsubstring cannot
be all of '
CSE309N
7/21/2019 unit2-compiler design
24/24
CSE309NAlorit+#. !u%%ered I1 ,it+
Sentinel"
eof
5M4E eof
eof
155)
)urrent token
lexeme beginning for6ard7scans a2eadto find
patternmatc28
forward / = forward + 1 8
ifforward =
eof t2en begin if forward at en% of first +alft2en begin
reloa% secon% +alf 8
forward / = forward 9 1
end
else if forward at en% ofsecon% +alf t2en begin
reloa% first +alf 8
moe forward to (i*innin*of first +alf
endl f it+i ( ff
2n% eofno morei
lock I6
lock I6
"lgorit2mperforms
$9O:s; e canstill
2a(e get =ungetc2ar
!o6 t2ese 6orkon
real memoryb ff