8/2/2019 Lexical Analyser Parser
1/37
LEXICAL ANALYZER ANDPARSER
8/2/2019 Lexical Analyser Parser
2/37
2
COMPILER
A compiler is a program takes a programwritten in a source language and translates itinto an equivalent program in a target
language.
source program COMPILER targetprogram
error messages
( Normally a program written in
a high-level programming language)
( Normally the equivalent program in
machine code relocatable object file)
8/2/2019 Lexical Analyser Parser
3/37
PHASES OF COMPILER
3
8/2/2019 Lexical Analyser Parser
4/374
PHASES OF A COMPILER
Lexical
Analyzer
Semantic
Analyzer
Syntax
Analyzer
Intermediate
Code Generator
Code
Optimizer
Code
Generator
Target
ProgramSource
Program
Each phase transforms the source program from one representationinto another representation.
They communicate with error handlers.
They communicate with the symbol table.
8/2/2019 Lexical Analyser Parser
5/37
LEXICAL ANALYZER
8/2/2019 Lexical Analyser Parser
6/37
INTRODUCTION
A lexical analyzer breaks an input stream of charactersinto tokens. Programs performing lexical analysis arecalled lexical analyzer or lexer.
A lexer consists of scanner and tokenizer.
Writing lexical analyzers by hand can be a tediousprocess, so software tools have been developed to easethis task.
Perhaps the best known such utility is Lex. Lex is alexical analyzer generator for the UNIX operatingsystem, targeted to the C programming language
8/2/2019 Lexical Analyser Parser
7/37
ROLE OF THE LEXICAL ANALYZER
7
8/2/2019 Lexical Analyser Parser
8/37
INTRODUCING BASIC TERMINOLOGY
What are Major Terms for Lexical Analysis? TOKEN
A classification for a common set of strings
Examples Include , , etc.
PATTERN
The rules which characterize the set of strings for a token
Recall File and OS Wildcards ([A-Z]*.*)
LEXEME
Actual sequence of characters that matches pattern and is classifiedby a token
Identifiers: x, count, name, etc
8
8/2/2019 Lexical Analyser Parser
9/37
The input program as you see it.
main ()
{
int i, sum;
sum = 0;
for (i=1; i
8/2/2019 Lexical Analyser Parser
10/37
10
8/2/2019 Lexical Analyser Parser
11/37
11
8/2/2019 Lexical Analyser Parser
12/37
LEXICAL ANALYZER RESPONSIBILITIES Lexical analyzer [Scanner]
Scan input
Remove white spaces,tabs,new line characters
Remove comments
Manufacture tokens
Generate lexical errors
Pass token to parser
8/2/2019 Lexical Analyser Parser
13/37
13
8/2/2019 Lexical Analyser Parser
14/37
14
8/2/2019 Lexical Analyser Parser
15/37
15
8/2/2019 Lexical Analyser Parser
16/37
LEX INTRODUCTION
Lex is one of the compiler writing tools, that is used togenerate a lexical analyzer or scanner from descriptionof tokens of programming language to be implemented.
Lex takes a specially-formatted specification filecontaining the details of a lexical analyzer. This toolthen creates a C source file for the associated table-driven lexer.
8/2/2019 Lexical Analyser Parser
17/37
LEX SPECIFICATION
Input to the Lex is a text file containing regular expression alongwith the actions to be taken by the generated scanner when eachregular expression is matched.
The output is a file that contains C source code definingprocedure yylex(),which implements DFA corresponding to regularexpression given in input file.
The output file is usually called lex.yy.c or lexyy.c, which when
compiled linked to the main program acts as a scanner or lexicalanalyzer recognizing tokens specified by regular expression of theinput file.
8/2/2019 Lexical Analyser Parser
18/37
LEX SPECIFICATIONS
A Lex input file is consists of three parts, a collection ofdefinitions, a collection of rules, and a collection of usersubroutines. These three sections are separated bydouble-percent directives (``%%'').
A proper Lex specification has the following format.
8/2/2019 Lexical Analyser Parser
19/37
LEX SPECIFICATIONS
{definition}
%%
{rules}
%%
{user subroutines}
Where the definition & the user subroutines are often
omitted. The second %% is optional, but the first isrequired to mark the beginning of rules.
8/2/2019 Lexical Analyser Parser
20/37
The input program as you see it.
main ()
{
int i, sum;
sum = 0;
for (i=1; i
8/2/2019 Lexical Analyser Parser
21/37
21
8/2/2019 Lexical Analyser Parser
22/37
22
8/2/2019 Lexical Analyser Parser
23/37
23
8/2/2019 Lexical Analyser Parser
24/37
24
8/2/2019 Lexical Analyser Parser
25/37
25
8/2/2019 Lexical Analyser Parser
26/37
26
8/2/2019 Lexical Analyser Parser
27/37
27
8/2/2019 Lexical Analyser Parser
28/37
28
8/2/2019 Lexical Analyser Parser
29/37
29
8/2/2019 Lexical Analyser Parser
30/37
MAIN FEATURES Simple implementation.
Fast lexical analysis.
Efficient resource utilization.
Portable.
8/2/2019 Lexical Analyser Parser
31/37
APPLICATIONS AND FUTURE WORK Text Editing
Text Processing
Pattern Matching File Searching
8/2/2019 Lexical Analyser Parser
32/37
PARSER
8/2/2019 Lexical Analyser Parser
33/37
PARSING
Parsing (syntactic analysis) is the processof analyzing a sequence of tokens todetermine their grammatical structure with
respect to a given (more or less) formalgrammar.
YACC SPECIFICATION
8/2/2019 Lexical Analyser Parser
34/37
YACC SPECIFICATIONYacc (yet another compiler compiler) is a parser generator,which is a program that takes as its input a specification of
syntax of the programming language, and produces as itsoutput a parse procedure for that language whose name isyyparse().
The notation used for preparing this specification is agrammer(CFG).
Input to yacc is a specification file usually with .y suffix,
containing the rules of grammar specifying the structure oflanguage to be implemented. The output is C source code forparser, usually in a file y.tab.c or ytab.c.
FORMAT OF SPECIFICATION FILE
8/2/2019 Lexical Analyser Parser
35/37
FORMAT OF SPECIFICATION FILE
{ definition }%%
{ rules }
%%
{ programs }
The definition section contains information about tokens,data types, and grammar rules. It also includes any C codethat must go directly into the output file at its beginning.
8/2/2019 Lexical Analyser Parser
36/37
CREDITS
Credits goes out to
A special thanks goes out to
8/2/2019 Lexical Analyser Parser
37/37
THANK YOU!