Date post: | 08-Aug-2018 |
Category: |
Documents |
Upload: | janhavi-vishwanath |
View: | 216 times |
Download: | 0 times |
of 26
8/22/2019 1 Intro Crypto
1/26
Chapter 1 2301373: Introduction 1
Introduction
8/22/2019 1 Intro Crypto
2/26
Chapter 1 2301373: Introduction 2
What is a Compiler?
A compileris a computer
program that translates aprogram in a source language
into an equivalent program in a
target language.
A source program/code is a
program/code written in the
source language, which is
usually a high-level language.
A target program/code is a
program/code written in thetarget language, which often is a
machine language or an
intermediate code.
compilerSource
programTarget
program
Error
message
8/22/2019 1 Intro Crypto
3/26
Chapter 1 2301373: Introduction 3
Process of Compiling
scannerparser
Semantic analyzerIntermediate code generatorCode optimization
Code generatorCode optimization
Stream of characters
Stream of tokens
Parse/syntax tree
Annotated tree
Intermediate code
Intermediate codeTarget code
Target code
8/22/2019 1 Intro Crypto
4/26
Chapter 1 2301373: Introduction 4
Some Data Structures
Symbol table
Literal table
Parse tree
8/22/2019 1 Intro Crypto
5/26
Chapter 1 2301373: Introduction 5
Symbol Table
Identifiers are names of variables,
constants, functions, data types, etc.
Store information associated with identifiers
Information associated with different types of
identifiers can be different
Information associated with variables are name, type,
address,size (for array), etc.
Information associated with functions are name,typeof return value, parameters, address, etc.
8/22/2019 1 Intro Crypto
6/26
Chapter 1 2301373: Introduction 6
Symbol Table (contd)
Accessed in every phase of compilers
The scanner, parser, and semantic analyzer put
names of identifiers in symbol table.
The semantic analyzer stores more information
(e.g. data types) in the table.
The intermediate code generator, code
optimizer and code generator use information in
symbol table to generate appropriate code. Mostly use hash table for efficiency.
8/22/2019 1 Intro Crypto
7/26
Chapter 1 2301373: Introduction 7
Literal table
Store constants and strings used in program
reduce the memory size by reusing constants
and strings
Can be combined with symbol table
8/22/2019 1 Intro Crypto
8/26
Chapter 1 2301373: Introduction 8
Parse tree
Dynamically-allocated, pointer-based
structure
Information for different data types
related to parse trees need to be stored
somewhere.
Nodes are variant records, storing
information for different types of data
Nodes store pointers to information storedin other data structure, e.g. symbol table
8/22/2019 1 Intro Crypto
9/26
Chapter 1 2301373: Introduction 9
Scanning
A scanner reads a stream of characters and
puts them together into some meaningful
(with respect to the source language) units
called tokens .
It produces a stream of tokens for the next
phase of compiler.
8/22/2019 1 Intro Crypto
10/26
Chapter 1 2301373: Introduction 10
Parsing
A parser gets a stream of tokens from the
scanner, and determines if the syntax
(structure) of the program is correct
according to the (context-free) grammar of
the source language.
Then, it produces a data structure, called a
parse tree or an abstract syntax tree, which
describes the syntactic structure of theprogram.
8/22/2019 1 Intro Crypto
11/26
Chapter 1 2301373: Introduction 11
Semantic analysis
It gets the parse tree from the parser together with
information about some syntactic elements It determines if the semantics or meaning of the
program is correct.
This part deals with static semantic. semantic of programs that can be checked by readingoff from the program only.
syntax of the language which cannot be described incontext-free grammar.
Mostly, a semantic analyzer does type checking.
It modifies the parse tree in order to get that(static) semantically correct code.
8/22/2019 1 Intro Crypto
12/26
Chapter 1 2301373: Introduction 12
Intermediate code generation
An intermediate code generator
takes a parse tree from the semantic analyzer
generates a program in the intermediate
language.
In some compilers, a source program is
translated into an intermediate code first and
then the intermediate code is translated into
the target language. In other compilers, a source program is
translated directly into the target language.
8/22/2019 1 Intro Crypto
13/26
Chapter 1 2301373: Introduction 13
Intermediate code generation (contd)
Using intermediate code is beneficial whencompilers which translates a single sourcelanguage to many target languages arerequired.
The front-end of a compilerscanner tointermediate code generator can be used forevery compilers.
Different back-endscode optimizer and code
generator is required for each target language. One of the popular intermediate code is
three-address code. A three-address codeinstruction is in the form of x= y op z.
8/22/2019 1 Intro Crypto
14/26
Chapter 1 2301373: Introduction 14
Code optimization
Replacing an inefficient sequence of
instructions with a better sequence of
instructions.
Sometimes called code improvement.
Code optimization can be done:
after semantic analyzing performed on a parse tree
after intermediate code generation performed on a intermediate code
after code generation performed on a target code
8/22/2019 1 Intro Crypto
15/26
Chapter 1 2301373: Introduction 15
Code generation
A code generator
takes either an intermediate code or a parse
tree
produces a target program.
8/22/2019 1 Intro Crypto
16/26
Chapter 1 2301373: Introduction 16
Error Handling
Error can be found in every phase of
compilation.
Errors found during compilation are called static
(orcompile-time) errors.
Errors found during execution are calleddynamic(orrun-time) errors
Compilers need to detect, report, and
recover from error found in source programs Error handlers are different in different
phases of compiler.
8/22/2019 1 Intro Crypto
17/26
Chapter 1 2301373: Introduction 17
a compiler which generates target code for a
different machine from one on which the
compiler runs.
A host language is a language in which the
compiler is written.
T-diagram
Cross compilers are used very often in
practice.
Cross Compiler
S
H
T
8/22/2019 1 Intro Crypto
18/26
Chapter 1 2301373: Introduction 18
Cross Compilers (contd)
If we want a compiler from
languageA to language B on amachine with language E,
write one with E
write one with D if you have acompiler from D to Eon some
machine
It is better than the former approach
ifD is a high-level language but Eis
a machine language
write one from G to B with Eif we
have a compiler from A to G
written in E
A
E
B
D
?
EA
DB
G
E
BA
E
G
8/22/2019 1 Intro Crypto
19/26
Chapter 1 2301373: Introduction 19
Porting
Porting: construct a compiler between a
source and a target language using onehost language from another host language
AA
KA
H
H A
H
K
AA
K
A
H
K A
K
K
8/22/2019 1 Intro Crypto
20/26
Chapter 1 2301373: Introduction 20
Bootstrapping
If we have to implement, from
scratch, a compiler from ahigh-level language A to a
machine, which is also a host,
language,
direct method
bootstrapping
AH
H
A
A1
H
A1
A2
H
A2A3
HA
3
H
H
8/22/2019 1 Intro Crypto
21/26
Chapter 1 2301373: Introduction 21
Cousins of Compilers
Linkers
Loaders
Interpreters
Assemblers
8/22/2019 1 Intro Crypto
22/26
Chapter 1 2301373: Introduction 22
History (1930s -40s)
1930s
John von Neumann invented the concept of
stored-program computer.
Alan Turing defined Turing machine and
computability.
1940s
Many electro-mechanic, stored-program
computers were constructed. ABC (Atanasoff Berry Computer) at Iowa
Z1-4 (by Zuse) in Germany
ENIAC (programmed by a plug board)
8/22/2019 1 Intro Crypto
23/26
Chapter 1 2301373: Introduction 23
History : 1950
Many electronic, stored-program computers were
designed. EDVAC (by von Neumann)
ACE (by Turing)
Programs were written in machine languages.
Later, programs are written in assembly languagesinstead. Assemblers translate symbolic code and memory
address to machine code.
John Backus developed FORTRAN (no recursivecall) and FORTRAN compiler.
Noam Chomsky studied structure of languagesand classified them into classes called Chomskyhierarchy.
0A 1F 83 90 4B
op code, address,..
LDI B, 4
LDI C, 3
LDI A, 0
ST: ADI A, C
DEC BJNZ B, ST
STO 0XF0, A
Grammar
8/22/2019 1 Intro Crypto
24/26
Chapter 1 2301373: Introduction 24
History (1960s)
Recursive-descent parsing was introduced.
Nuar designed Algol60, Pascals ancestor,
which allows recursive call.
Backus-Nuar form (BNF) was used to
described Algol60.
LL(1) parsing was proposed by Lewis and
Stearns.
General LR parsing was invented by Knuth.
SLR parsing was developed by DeRemer.
8/22/2019 1 Intro Crypto
25/26
Chapter 1 2301373: Introduction 25
History (1970s)
LALR was develpoed by DeRemer.
Aho and Ullman founded the theory of LR
parsing techniques.
Yacc (Yet Another Compiler Compiler) was
developed by Johnson.
Type inference was studied by Milner.
8/22/2019 1 Intro Crypto
26/26
Chapter 1 2301373: Introduction 26
Reading Assignment
Louden, K.C., Compiler Construction:
Principles and Practice, PWS Publishing,1997. ->Chapter 1