What on Earth? LEXEMETOKENPATTERN print p,r,i,n,t (leftpar( 4number4 arith 5number5 )rightpar)...

What on Earth?LEXEME TOKEN PATTERNprint print p,r,i,n,t( leftpar (4 number 4* arith * 5 number 5) rightpar )userAnswer ID Letter followed by letters

and digits“Game of Jones”

literal Any string between “ and “

Translators

Translators – Module Knowledge Areas• Types of translators and their use

•Lexical analysis• Syntax analysis

• Code generation and optimisation

• Library routines

Translators – Module Knowledge Areas

•Lexical analysis• Describe what happens during lexical analysis

So, we need to know:• What is meant by Lexical Analysis

• What the key language is

• What part lexical analysis plays in the translation process

• How lexical analysis works

• How to identify the key aspects of lexical analysis

Translators – Lexical Analysis

So far we have investigated the link between source code, assembly code and machine code

In reality there are many more steps involved in getting code to run

There are a number of compilation phases:

• Parsing the source code (Lexical analysis)• Syntax analysis• Type checking• Machine code generation• Code block sequencing• Register allocation• Optimization• Linking of libraries

Translators - Parsing

Consider this flow diagram of the translation process:

The source code is parsed ……

Parsing is analysis of the source code

Each line eg print(4*5) is read

The compiler allocates a type (tokenizes) to each element eg keyword/reserved word, variable, constant …..


In the example print(4*5)

print and * are recognised (in its simplest form, print is known as a reserved word for print and * is known as the multiplier or an arithmetic token)

If the example were written Print 4*5 the parser would not recognise mistakes in syntax – that is not the job of lexical analysis

Because Print does not match a pattern for a keyword the compiler will assume it is a variable (often give the token ID) and will have the token for that assigned to it.

4 and 5 will have tokens for number (specifically integer) applied and the * has an arithmetic token applied

In effect, what happens is that a pair is created comprised of the token and the lexeme

White space, eg extra lines in source code, spaces between characters and comments are stripped out as these are unnecessary for code to be translated into machine code


Look again at the parsing table:

Each lexeme is a component of the source code

Each token specifies the type of data the lexeme is

The lexeme and token make a pair

When parsing the source code, each lexeme follows a pattern

EG print has the pattern p,r,i,n,t whilst the left parentheses has only one component in the pattern

LEXEME

TOKEN

PATTERN

print print p,r,i,n,t( leftpar (4 numb

er4

* arith *5 numb

er5

) rightpar

)


Where key/reserved words are concerned, the pattern for the lexeme-token pair must match exactly

If we add the line print(“The answer to 4 * 5 is”) the lexeme- token, pattern for the content in the quotes would be:

The translator knows that text surrounded by quotations has the token literal and that the quotations should be ignored

Once the source code has been analysed by the lexer it is ready for the next stage – syntax analysis

LEXEME TOKEN PATTERN“The answer to 4 * 5”

literal Sequence of characters inside the quotes but not including the quotes

Date post:	14-Jan-2016
Category:	Documents
Upload:	oliver-butler
View:	214 times
Download:	0 times

What on Earth? LEXEMETOKENPATTERN print p,r,i,n,t (leftpar( 4number4 *arith* 5number5 )rightpar)...

Documents

What on Earth? LEXEMETOKENPATTERN print p,r,i,n,t (leftpar( 4number4 arith 5number5 )rightpar)...