Chapter 9Chapter 9
Compilers and Language Compilers and Language TranslationTranslation
The Compilation ProcessThe Compilation Process
Phase I: Lexical analysisPhase I: Lexical analysis Phase II: ParsingPhase II: Parsing Phase III: Semantics and code Phase III: Semantics and code
generationgeneration Phase IV: Code OptimizationPhase IV: Code Optimization
IntroductionIntroduction
High-level languages are more High-level languages are more difficult to “translate” than assembly difficult to “translate” than assembly languages.languages.
Assembly language and machine Assembly language and machine language are related 1-to-1.language are related 1-to-1.
The relationship between a high-level The relationship between a high-level language and machine language is 1-language and machine language is 1-to-many. to-many.
CompilerCompiler
The piece of software that The piece of software that translates high-level translates high-level programming language codes programming language codes into machine language codes.into machine language codes.
Two distinct goals of compiler:Two distinct goals of compiler:• CorrectnessCorrectness• Efficient and conciseEfficient and concise
Example: 2xExample: 2x00+2x+2x11+…+2x+…+2x5000050000
Objectfile
The Compilation ProcessThe Compilation Process
Scanner ParserCode
Generator
Optimizer
Lexical AnalysisLexical Analysis
The compiler examines the individual The compiler examines the individual characters in the source program characters in the source program and groups them into syntactical and groups them into syntactical units, called units, called tokenstokens, that will be , that will be analyzed in succeeding stages.analyzed in succeeding stages.
Analogous to grouping letters into Analogous to grouping letters into words prior to analyzing text.words prior to analyzing text.
ParsingParsing
During this stage the sequence of During this stage the sequence of tokens formed by the scanner is tokens formed by the scanner is checked to see whether it is checked to see whether it is syntactically correct according to the syntactically correct according to the rules of the programming language.rules of the programming language.
Equivalent to checking whether the Equivalent to checking whether the words in the text form grammatically words in the text form grammatically correct sentences.correct sentences.
Semantic Analysis and Code Semantic Analysis and Code GenerationGeneration
If the high-level language statement If the high-level language statement is structurally correct, then the is structurally correct, then the compiler analyzes its meaning and compiler analyzes its meaning and generates the proper sequence of generates the proper sequence of machine language instructions to machine language instructions to carry out these actions.carry out these actions.
Code OptimizationCode Optimization
The compiler takes the generated The compiler takes the generated code and see whether it can be made code and see whether it can be made more efficient, either by making it more efficient, either by making it run faster, or having it occupy less run faster, or having it occupy less memory.memory.
Phase I: Lexical AnalysisPhase I: Lexical Analysis Scanner, or lexical analyzer, groups Scanner, or lexical analyzer, groups
input characters into tokens.input characters into tokens. Example:Example:a = b + 319 - delta;a = b + 319 - delta;
The scanner discards nonessential The scanner discards nonessential characters, such as blanks and tabs, characters, such as blanks and tabs, and the group the remaining and the group the remaining characters into high-level syntactic characters into high-level syntactic symbols such as symbols, numbers, symbols such as symbols, numbers, and operators. and operators.
Token ClassificationsToken Classifications
Token typeToken type Classification Classification numbernumber
symbolsymbol 11
numbernumber 22 Others: =(3),+(4),-(5),;(6); ==(7), Others: =(3),+(4),-(5),;(6); ==(7),
if(8), else (9), ( 10, ) 11if(8), else (9), ( 10, ) 11
Phase II: ParsingPhase II: Parsing
During the parsing phase, a compiler During the parsing phase, a compiler determines whether the tokens determines whether the tokens recognized by the scanner fit recognized by the scanner fit together in a grammatically together in a grammatically meaningful way.meaningful way.
Analogous to the operation of Analogous to the operation of “diagramming a sentence”. “diagramming a sentence”.
ExampleExample
To prove the To prove the sequence of words:sequence of words:
The man bit the The man bit the dogdog
is a correctly formed is a correctly formed sentence.sentence.
Another ExampleAnother Example
The man bit theThe man bit the
Programming Language Programming Language ExampleExample
Statement: a = b + c Statement: a = b + c
Parse TreeParse Tree
The structure shown in the previous The structure shown in the previous example is called a parse tree.example is called a parse tree.
It starts from the individual tokens It starts from the individual tokens a,=,b,+,c and show how these a,=,b,+,c and show how these tokens can be grouped together into tokens can be grouped together into predefined grammatical categories predefined grammatical categories such as <symbol>, <addition such as <symbol>, <addition operator> and <expression> until operator> and <expression> until the desired goal is reached. (in this the desired goal is reached. (in this case, <assignment statement>) case, <assignment statement>)
Grammars, Languages and BNFGrammars, Languages and BNF
How does a parser know how to construHow does a parser know how to construct the parse tree?ct the parse tree?
The parser must be given a formal descriThe parser must be given a formal description of the syntax, the grammatical strption of the syntax, the grammatical structure, of the language that it is going to ucture, of the language that it is going to analyze.analyze.
Most widely used notation for representiMost widely used notation for representing the syntax of programming language ng the syntax of programming language is called is called BNFBNF, an acronym for Backus-Na, an acronym for Backus-Naur form.ur form.
BNFBNF
The syntax of a language is specified The syntax of a language is specified as a set of rules, also called as a set of rules, also called productions.productions.
The entire collection of rules is called The entire collection of rules is called a grammar.a grammar.
BRN rule:BRN rule:left-hand side::=“definition”left-hand side::=“definition”
BNF ExampleBNF Example
<assignment <assignment statement>::=<symbol>=<expressistatement>::=<symbol>=<expression>on>
The rule says that the syntactical The rule says that the syntactical construct called <assignment construct called <assignment statement> is defined as a statement> is defined as a <symbol> followed by the token = <symbol> followed by the token = followed by the syntactical construct followed by the syntactical construct called <expression>called <expression>
Terminal/NonterminalsTerminal/Nonterminals
BNF uses two types of objects on the rigBNF uses two types of objects on the right hand side of a productions:ht hand side of a productions:• Terminals: actual tokens of the language recTerminals: actual tokens of the language rec
ognized and returned by a scanner.ognized and returned by a scanner.• Nonterminals: an intermediate grammatical Nonterminals: an intermediate grammatical
category used to help explain and organize tcategory used to help explain and organize the language.he language.
Goal SymbolGoal Symbol
The goal symbol is the highest-level nonThe goal symbol is the highest-level nonterminal.terminal.
When goal symbol has been produced, tWhen goal symbol has been produced, the parser has finished building the tree, he parser has finished building the tree, and the statements have been successfuand the statements have been successfully parsed.lly parsed.
The collection of all statements that can The collection of all statements that can be successfully parsed is called the be successfully parsed is called the langlanguageuage defined by a grammar. defined by a grammar.
Meta-symbolsMeta-symbols Meta-symbol: used to describe the Meta-symbol: used to describe the
characteristics of another language.characteristics of another language. BNF has five meta-symbols:BNF has five meta-symbols:
<<>>::= ::= | :OR, | :OR, Ex:<digit>:=0|1|2|3|4|5|6|7|8|Ex:<digit>:=0|1|2|3|4|5|6|7|8|99 : null string: null stringEx:<signed integer>:= <sign><number> Ex:<signed integer>:= <sign><number>
<sign>:= +|-|<sign>:= +|-|
Fundamental Rule of ParsingFundamental Rule of Parsing
If, by repeated applications of the If, by repeated applications of the rules of the grammar, a parser can rules of the grammar, a parser can convert the sequence of input tokens convert the sequence of input tokens into the goal symbol, then that into the goal symbol, then that sequence of tokens is a syntactically sequence of tokens is a syntactically valid statement of the language.valid statement of the language.
ExampleExample
A three-rule grammarA three-rule grammar1.1. <sentence>::=<noun><verb><sentence>::=<noun><verb>
2.2. <noun>::= bees|dogs<noun>::= bees|dogs
3.3. <verb>::=buzz|bite<verb>::=buzz|bite• Example 1: Dogs bite.Example 1: Dogs bite.• Example 2: Bees dogs.Example 2: Bees dogs.
Another ExampleAnother Example
Grammar for a simplified Grammar for a simplified assignment statementassignment statement
1.1. <assignment <assignment statement>::=<variable>=<expression>statement>::=<variable>=<expression>
2.2. <expression>::=<variable>|<expression>::=<variable>|<variable>+<variable><variable>+<variable>
3.3. <variable>::= x|y|z<variable>::= x|y|z
Generated Parse TreeGenerated Parse Tree
Wrong PathWrong Path
How to parse?How to parse?
The process of parser is a complex The process of parser is a complex sequence of applying rules, building sequence of applying rules, building grammatical constructs, seeing grammatical constructs, seeing whether things are moving toward whether things are moving toward the correct answer (the goal symbol). the correct answer (the goal symbol). If not, “undo” the rule just applied If not, “undo” the rule just applied and try another.and try another.
Look-ahead parsing algorithm: Look-ahead parsing algorithm: “looking down the road” a few tokens “looking down the road” a few tokens to see what would happen if a certain to see what would happen if a certain choice were made.choice were made.
Example Example
Not possible to build a parse tree with the grammar.
Major ChallengeMajor Challenge
Design a grammar that:Design a grammar that:• Includes every valid statement that we Includes every valid statement that we
want to be in the languagewant to be in the language• Excludes every invalid statement that Excludes every invalid statement that
we do not want to be in the languagewe do not want to be in the language
Assignment Statement (2Assignment Statement (2ndnd try) try)
1.1. <assignment <assignment statement>::=<variable>=<expression>statement>::=<variable>=<expression>
2.2. <expression>::=<variable>|<expression>::=<variable>|<expression>+<expression> <expression>+<expression> (recursive definition)(recursive definition)
3.3. <variable>::= x|y|z<variable>::= x|y|z
Resulting Parse TreeResulting Parse Tree
Using Recursive DefinitionUsing Recursive Definition
Validity vs. AmbiguityValidity vs. Ambiguity
It is possible to construct two parse It is possible to construct two parse trees of x=x+y+z using the 2trees of x=x+y+z using the 2ndnd grammar.grammar. Two different meanings. Two different meanings.
X=(x+y)+zX=(x+y)+z x=x+(y+z)x=x+(y+z)
If-else grammarIf-else grammar
Parse TreeParse Tree
Phase III: Semantics and Code Phase III: Semantics and Code GenerationGeneration
1.1. <sentence>::=<noun><verb><sentence>::=<noun><verb>2.2. <noun>::= bees|dogs<noun>::= bees|dogs3.3. <verb>::=buzz|bite<verb>::=buzz|bite
Possible combinations:Possible combinations:• Dogs bite.Dogs bite.• Dogs bark.Dogs bark.• Bees bite.Bees bite.• Bees bark.Bees bark.
Not all combinations make sense.Not all combinations make sense.
Semantics and Code Semantics and Code GenerationGeneration
A compiler examines the semantics A compiler examines the semantics of a programming language of a programming language statement. It analyzes the meaning statement. It analyzes the meaning of the tokens and tries to understand of the tokens and tries to understand the actions they perform.the actions they perform.
If the statement is meaningless, it is If the statement is meaningless, it is semantically rejected. Otherwise it is semantically rejected. Otherwise it is translated into machine language.translated into machine language.
ExampleExample
The statementThe statement sum=a+b;sum=a+b;
is syntactically correct.is syntactically correct. But what if the variables are defined as fBut what if the variables are defined as f
ollows:ollows:char a;char a;
double b;double b;
int sum;int sum;
Semantic RecordsSemantic Records
Each nonterminal symbol is associated Each nonterminal symbol is associated with a semantic record, a data structure with a semantic record, a data structure that stores information about a nontermthat stores information about a nonterminal, such as the actual name of the objeinal, such as the actual name of the object and its data type.ct and its data type.
Semantic Records (II)Semantic Records (II)
Grows gradually.Grows gradually.
Another SituationAnother Situation
Two-Stage ProcessTwo-Stage Process
Semantic analysis: a pass over the Semantic analysis: a pass over the parse tree to determine whether all parse tree to determine whether all branches of the tree are semantically branches of the tree are semantically valid.valid.
Code generation: the compiler makes Code generation: the compiler makes a 2a 2ndnd pass over the parse tree to pass over the parse tree to produce the translated code. produce the translated code.
ExampleExample
Example (cont’d)Example (cont’d)
Example (cont’d)Example (cont’d)
Example (cont’d)Example (cont’d)
Example (cont’d)Example (cont’d)
Code OptimizationCode Optimization
To make the code more efficient:To make the code more efficient:• Local optimizationLocal optimization• Global optimizationGlobal optimization
Different from programmer Different from programmer optimization with compiler tools such optimization with compiler tools such as:as:• Visual development environmentsVisual development environments• On-line debuggersOn-line debuggers• Reusable code librariesReusable code libraries
Local OptimizationLocal Optimization
Look at a very small block of Look at a very small block of instructions and try to improve it.instructions and try to improve it.
Possible approachesPossible approaches• Constant evaluation: x=1+1;Constant evaluation: x=1+1;• Strength reduction: x=x*2; Strength reduction: x=x*2; • Eliminating unnecessary operationsEliminating unnecessary operations
Global OptimizationGlobal Optimization
Look at large segments of program Look at large segments of program and decide how to improve and decide how to improve performance.performance.
A much harder problem.A much harder problem.