Date post: | 14-Jan-2016 |
Category: |
Documents |
Upload: | oliver-butler |
View: | 214 times |
Download: | 0 times |
What on Earth?LEXEME TOKEN PATTERNprint print p,r,i,n,t( leftpar (4 number 4* arith * 5 number 5) rightpar )userAnswer ID Letter followed by letters
and digits“Game of Jones”
literal Any string between “ and “
Translators
Translators – Module Knowledge Areas• Types of translators and their use
•Lexical analysis• Syntax analysis
• Code generation and optimisation
• Library routines
Translators – Module Knowledge Areas
•Lexical analysis• Describe what happens during lexical analysis
So, we need to know:• What is meant by Lexical Analysis
• What the key language is
• What part lexical analysis plays in the translation process
• How lexical analysis works
• How to identify the key aspects of lexical analysis
Translators – Lexical Analysis
So far we have investigated the link between source code, assembly code and machine code
In reality there are many more steps involved in getting code to run
There are a number of compilation phases:
• Parsing the source code (Lexical analysis)• Syntax analysis• Type checking• Machine code generation• Code block sequencing• Register allocation• Optimization• Linking of libraries
Translators - Parsing
Consider this flow diagram of the translation process:
The source code is parsed ……
Parsing is analysis of the source code
Each line eg print(4*5) is read
The compiler allocates a type (tokenizes) to each element eg keyword/reserved word, variable, constant …..
Translators - Parsing
In the example print(4*5)
print and * are recognised (in its simplest form, print is known as a reserved word for print and * is known as the multiplier or an arithmetic token)
If the example were written Print 4*5 the parser would not recognise mistakes in syntax – that is not the job of lexical analysis
Because Print does not match a pattern for a keyword the compiler will assume it is a variable (often give the token ID) and will have the token for that assigned to it.
4 and 5 will have tokens for number (specifically integer) applied and the * has an arithmetic token applied
In effect, what happens is that a pair is created comprised of the token and the lexeme
White space, eg extra lines in source code, spaces between characters and comments are stripped out as these are unnecessary for code to be translated into machine code
Translators - Parsing
Look again at the parsing table:
Each lexeme is a component of the source code
Each token specifies the type of data the lexeme is
The lexeme and token make a pair
When parsing the source code, each lexeme follows a pattern
EG print has the pattern p,r,i,n,t whilst the left parentheses has only one component in the pattern
LEXEME
TOKEN
PATTERN
print print p,r,i,n,t( leftpar (4 numb
er4
* arith *5 numb
er5
) rightpar
)
Translators - Parsing
Where key/reserved words are concerned, the pattern for the lexeme-token pair must match exactly
If we add the line print(“The answer to 4 * 5 is”) the lexeme- token, pattern for the content in the quotes would be:
The translator knows that text surrounded by quotations has the token literal and that the quotations should be ignored
Once the source code has been analysed by the lexer it is ready for the next stage – syntax analysis
LEXEME TOKEN PATTERN“The answer to 4 * 5”
literal Sequence of characters inside the quotes but not including the quotes