Post on 17-Jan-2015
description
transcript
L A B # 1 : I N T R O D U C T I O N & L E X I C A L A N A LY S I S
COMPILER ENGINEERING
University of DammamGirls’ College of ScienceDepartment of Computer Science Compiler Engineering Lab
Department of Computer Science - Compiler Engineering Lab
2
WHAT IS A COMPILER?
• It is a program that reads a program written in one language - the source language – and translates it into an equivalent program in another language – the target language-
• An important part of this translation process is that the compiler reports to its user the presence of errors in the source program.
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
3
Compiler
error messages
Source
program
target
program
COMPILER THEORY
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
4
COMPILER ENVIRONTMENT TOOLS
• Many software tools that manipulate source program first perform some analysis .
• Some examples of such tools include
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
5
1- STRUCTURE EDITOR
• It takes as input a sequence of commands to build a source program
• performs the text creation and modification function of a text editor
• Analyze program text, putting and appropriate hierarchical structure on the source program• Checks that the input is correctly formed• Can supply Keywords automatically• Can jump from a begin or left parenthesis to its
matching end or right parenthesis
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
6
2- PRETTY PRINTERS
• Analyze the program and prints it in such a way that the structure of the program becomes clearly visible.
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
7
3- STATIC CHECKERS
• Reads a program • Analyze it• Discover potential bugs without running the program
• Catch logical errors
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
8
4 - INTERPRETERS
• Performs the operations implied by the source program.
• What is the difference between a Compiler and an Interpreter ?
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
925-29/2/12
COMPILER PHASES
Department of Computer Science - Compiler Engineering Lab
10
PARTS OF COMPILATION
1. Analysis The analysis part
breaks up the source program into consistent pieces
and creates an intermediate representation of the source program.
2. Synthesis
The synthesis part constructs the desired target program from the intermediate representation.
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
11
PROCESSING ENDS OF A COMPILER
1. Front-End Consists of phases that depend primarily on the source language and largely independent of the target machine (lexical – syntactic – symbol table – semantic – intermediate code )
2. Back-End Includes those portions of
the compiler that depend on the target machine , and do not depend on the source language (code optimization , code generation)
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
12
COMPLIER PHASES
25-29/2/12
SourceProgram
Machine Language
Compiler
Front End
Back End
Analysis
Synthesis
Intermediate CodeObjectCode
Lexical
Syntax (Hierarchical)Contextual
“Scanning”
“Parsing”
“Semantic Analysis”
Phases are important to simplify the compiler’s structure
Department of Computer Science - Compiler Engineering Lab
13
COMPLIER PHASES INTERACTION (VIA DATA STRUCTURE)
25-29/2/12
SourceProgram
Machine Language
Compiler Analysis
Synthesis
Intermediate CodeObjectCode
Lexical
Syntax
Contextual
Text
Tokens
Abstract (Syntax
Tree)Decorated
AST + Symbol Table
Front End
Back EndIntermediate
Code
Object Code
Department of Computer Science - Compiler Engineering Lab
14
COMPILER PHASES
25-29/2/12
LEXICAL ANALYZER
SYMANTIC ANALYZER
SYNTAX ANALYZER
INTERMEDIATE CODE GENERATOR
CODE OPTIMIZER
CODE GENERATOR
ERROR HANDLING
Symbol Table Manager
Department of Computer Science - Compiler Engineering Lab
15
COMPILER CONSTRUCTION TOOLS
• Compiler can be written like any program• A programmer can use software
development tools like :• Debugger• Version manager• Profilers
• More specialized tools have been developed for helping implementing various phases of a compiler
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
16
1- SCANNER GENERATORS
• Generate lexical analyzer from a specification based on regular expression.
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
17
2- PARSER GENERATORS
• Produces syntax analyzers from input that is based on a context – free grammar.
• In early compilers ,syntax analysis consumed a large fraction of running time and large fraction of intellectual effort of writing compilers.
• Using parser generator gives ability to implement this phase in few days.
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
18
3- SYNTAX–DIRECTED TRANSLATOR ENGINE
• Produce collection of routines that walk the parser tree generating the intermediate code
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
19
4 - AUTOMATIC CODE GENERATOR
• Takes a collection of rules that define the translation of each operation of the intermediate language into the machine language for the target machine
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
20
5 - DATA FLOW ENGINE
• Much of information needed to perform good code optimization involves “ data_ flow analysis”,
• The gathering of information about how values are transmitted from one part of a program to each other part
25-29/2/12
Department of Computer Science - Compiler Engineering Lab
2125-29/2/12
LEXICAL ANALYSISFIRST PHASE OF A COMPILER
INSERTING A LEXICAL ANALYZER BETWEEN THE INPUT AND THE PARSER
InputLexical
AnalyzerParser
Read character
push back
character
pass Token and its attribute
LEXICAL ANALYZER MECHANISM
• Read the characters from the input• Group them into lexemes• Pass the tokens formed by the lexemes together
with their attribute values to the later stages• In some situations the lexical analyzer has to
read some more characters ahead before it can decide on the token to be returned to the parser
• the extra character has to be pushed back onto the input, because it can be the beginning of the next lexeme.
IMPLEMENTING THE INTERACTION
Lexan))
Lexical
Analyzer
Read character
using getchar) )
push back
character F
ungetc)F,stdin)
pass Token and its attribute
LEX …
• A particular tool , that has been widely used to specify lexical analyzers for a variety of languages
• Using such tool will allow us to show how the specification of patterns using regular expressions can be combined with action
REGULAR EXPRESSION PATTERNS FOR TOKENS
Attribute-value Token Regular expression
- - ws
- if If
- then then
- else else
Pointer to table entry id Id
Pointer to table entry num Num
LT relop <
LE relop <=
EQ relop =
NE relop <>
GT relop >
GE relop >=
LEX SPECIFICATION
• A Lex program consists of three parts:1. Declarations
2. Translation rules
3. Auxiliary procedure
1- DECLARATIONS SECTION
Includes declarations of :
variables, manifest constants
and regular definitions
Manifest constant..
Is an identifier that is declared to represent a constant
DEFINITION OF MANIFEST CONSTANT USED BY THE TRANSLATION RULES
LT , LE, EQ , NE , GT , GE , IF , THEN , ELSE , ID , NUMBER ,
RELOP, AROP
REGULAR DEFINITIONS
delim [ \t\n]
Ws {delim}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number
{digit}+(\.{digit}+)?(E[+\-]?{digit}+)?
2-TRANSLATION RULES
are statements of the form P1 {action1}
P2 {action2}
……………..
Pn {action n}
• where each p is a regular expression and each {action} is a program fragment describing what action the lexical analyzer shoud take when pattern p matches a lexeme
2- TRANSLATION RULES
Ws no action and no returnif return (IF)then return (THEN)else return (ELSE)“<“ val =LT return (RELOP)and similarly to other relation operationsId val = install_id( ) return(ID)Number val= install_num( ) return(NUM)
3-AUXILIARY PROCEDURES
• Holds whatever auxiliary procedures are needed by the action
• a lexical analyzer created by lex behaves in concert with a parser in the following manner:
when activated by the parser the lexical analyzer begins reading its remaining input ,one character at a time ,until it has found the longest prefix of the input that is matched by one of the regular expressions P then it execute action
CON..
• Typically action will return control to the parser, if it does not the lexical analyzer proceeds to find more lexemes until an action causes control to return to the parser
• The lexical analyzer returns a single quantity to the parser ,the token..
• to pass an attribute value with information about the lexeme we can set a global variable called val
AUXILIARY PROCEDURES
• install_id ( )
Procedure to install the lexeme • install_num ( )
similar procedure to install a lexeme that is a number
WRITING A LEXICAL ANALYZER
• Write a lexical analyzer Using C++ language.
• Write it as a function called from inside main( )
• Call that function Lexan• Lexan function returns the value of Token
THE LEXICAL ANALYZER WILL DO..
• Read character from the user
• If the character is a blank (Space) or a (tab) (written ‘\t’) no token is returned to the parser, exit the function
• If the character is (new line) written (‘\n’) the line numbers will be incremented ,no token is returned
• If the character is one Digit .. Tokenval
MORE THAN ONE DIGIT ..
• Allow user to enter sequence of characters• While the user entering digits after first digit the
analyzer allows him to enter more digits• Each time the analyzer compute the Tokenval• If the next character is not digit push back the
character• Each time print the result from each part to see
the output
TOKENVAL..
• First digit
Tokenval= t –’0’• Next digit Tokenval = tokenval * 10 + t - ’0’
READING CHARACTER FROM THE USER
#include <stdio.h>int getchar( );• Gets character from stdin.• getchar is a macro that returns the next
character on the named input stream stdin. • On success , getchar returns the character read,
after converting it to an int without sign extension using the ASCII code.
PUSHING BACK CHARACTERS
#include <stdio.h>ungetc (c,stdin)• Pushes a character back into input stream.• ungetc pushes the character c back onto the
named input stream, which must be open for reading. This character will be returned on the next call to getchar for that stream. One character can be pushed back in all situations.
• On success, ungetc returns the character pushed back.
TEST CHARACTER IF (DIGIT) OR NOT
#include <ctype.h>
isdigit(t)• Tests for decimal-digit character.• isdigit is a macro that classifies ASCII-coded
integer values by table lookup• isdigit returns nonzero if c is a digit.
Department of Computer Science - Compiler Engineering Lab
43
QUESTIONS?
Thank you for listening
25-29/2/12