+ All Categories
Home > Documents > 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring...

04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring...

Date post: 11-May-2018
Category:
Upload: hoangnga
View: 221 times
Download: 2 times
Share this document with a friend
118
COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others Lexical Analysis
Transcript
Page 1: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

COMP 524, Spring 2014 Bryan Ward

Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others

Lexical Analysis

Page 2: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

The Big Picture

Scanner (lexical analysis)

Parser (syntax analysis)

Semantic analysis & intermediate code gen.

Machine-independent optimization (optional)

Target code generation.

Machine-specific optimization (optional)

Character Stream

Token Stream

Parse Tree

Abstract syntax tree

Modified intermediate form

Machine language

Modified target language

!2

Page 3: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

The Big Picture

Scanner (lexical analysis)

Parser (syntax analysis)

Semantic analysis & intermediate code gen.

Machine-independent optimization (optional)

Target code generation.

Machine-specific optimization (optional)

Character Stream

Token Stream

Parse Tree

Abstract syntax tree

Modified intermediate form

Machine language

Modified target language

!3

Lexical analysis: grouping consecutive characters that “belong together.”

!

Turn the stream of individual characters into a stream of tokens that have individual meaning.

Page 4: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Source Program

• The compiler reads the program from a file.!➡ Input as a character stream.

!4

Page 5: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Source Program

• The compiler reads the program from a file.!➡ Input as a character stream.

!4

1 2

Source File

3 - 7 5 * f… …o o ;=

Page 6: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Source Program

• The compiler reads the program from a file.!➡ Input as a character stream.

!4

1 2

Source File

3 - 7 5 * f… …o o ;=

Compilation requires analysis of program structure.!➡ Identify subroutines, classes, methods, etc. ➡ Thus, first step is to find units of meaning.

Page 7: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Tokens

!5

1 2

Source File

3 - 7 5 * f… …o o ;=

Page 8: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Tokens

!5

Not every character has an individual meaning.!➡In Java, a ‘+’ can have two interpretations: ‣A single ‘+’ means addition. ‣A ‘+’ ‘+’ sequence means increment.

➡A sequence of characters that has an atomic meaning is called a token.

➡Compiler must identify all input tokens.

1 2

Source File

3 - 7 5 * f… …o o ;=

Page 9: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Not every character has an individual meaning.!➡In Java, a ‘+’ can have two interpretations: ‣A single ‘+’ means addition. ‣A ‘+’ ‘+’ sequence means increment.

➡A sequence of characters that has an atomic meaning is called a token.

➡Compiler must identify all input tokens.

Tokens

!6

Human Analogy: To understand the meaning of an English

sentence, we do not look at individual characters. Rather, we look at individual words.

!

Human word = Program token

1 2

Source File

3 - 7 5 * f… …o o ;=

Page 10: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Tokens

!7

Operator: Assignment

Not every character has an individual meaning.!➡In Java, a ‘+’ can have two interpretations: ‣A single ‘+’ means addition. ‣A ‘+’ ‘+’ sequence means increment.

➡A sequence of characters that has an atomic meaning is called a token.

➡Compiler must identify all input tokens.

1 2

Source File

3 - 7 5 * f… …o o ;=

Page 11: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Tokens

!8

Not every character has an individual meaning.!➡In Java, a ‘+’ can have two interpretations: ‣A single ‘+’ means addition. ‣A ‘+’ ‘+’ sequence means increment.

➡A sequence of characters that has an atomic meaning is called a token.

➡Compiler must identify all input tokens.

1 2

Source File

3 - 7 5 * f… …o o ;=

Integer Literal

Page 12: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Tokens

!9

Not every character has an individual meaning.!➡In Java, a ‘+’ can have two interpretations: ‣A single ‘+’ means addition. ‣A ‘+’ ‘+’ sequence means increment.

➡A sequence of characters that has an atomic meaning is called a token.

➡Compiler must identify all input tokens.

1 2

Source File

3 - 7 5 * f… …o o ;=

Operator: Minus

Page 13: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Tokens

!10

Not every character has an individual meaning.!➡In Java, a ‘+’ can have two interpretations: ‣A single ‘+’ means addition. ‣A ‘+’ ‘+’ sequence means increment.

➡A sequence of characters that has an atomic meaning is called a token.

➡Compiler must identify all input tokens.

1 2

Source File

3 - 7 5 * f… …o o ;=

Integer Literal

Page 14: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Tokens

!11

Not every character has an individual meaning.!➡In Java, a ‘+’ can have two interpretations: ‣A single ‘+’ means addition. ‣A ‘+’ ‘+’ sequence means increment.

➡A sequence of characters that has an atomic meaning is called a token.

➡Compiler must identify all input tokens.

1 2

Source File

3 - 7 5 * f… …o o ;=

Operator: Multiplication

Page 15: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Tokens

!12

Not every character has an individual meaning.!➡In Java, a ‘+’ can have two interpretations: ‣A single ‘+’ means addition. ‣A ‘+’ ‘+’ sequence means increment.

➡A sequence of characters that has an atomic meaning is called a token.

➡Compiler must identify all input tokens.

1 2

Source File

3 - 7 5 * f… …o o ;=

Identifier: foo

Page 16: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Tokens

!13

Not every character has an individual meaning.!➡In Java, a ‘+’ can have two interpretations: ‣A single ‘+’ means addition. ‣A ‘+’ ‘+’ sequence means increment.

➡A sequence of characters that has an atomic meaning is called a token.

➡Compiler must identify all input tokens.

1 2

Source File

3 - 7 5 * f… …o o ;=

Statement separator/terminator

Page 17: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical vs. Syntactical Analysis

!14

Why have a separate lexical analysis phase?

Page 18: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical vs. Syntactical Analysis

• In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing).

!14

Why have a separate lexical analysis phase?

Page 19: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical vs. Syntactical Analysis

• In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing).

• However, this is impractical.

!14

Why have a separate lexical analysis phase?

Page 20: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical vs. Syntactical Analysis

• In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing).

• However, this is impractical.• It is much easier (and much more efficient) to express the syntax rules in terms of tokens.

!14

Why have a separate lexical analysis phase?

Page 21: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical vs. Syntactical Analysis

• In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing).

• However, this is impractical.• It is much easier (and much more efficient) to express the syntax rules in terms of tokens.

• Thus, lexical analysis is made a separate step because it greatly simplifies the subsequently performed syntactical analysis.

!14

Why have a separate lexical analysis phase?

Page 22: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Example: Java Language Specification

LEXICAL STRUCTURE Operators 3.12

31

3.11 Separators

The following nine ASCII characters are the separators (punctuators):

Separator: one of( ) { } [ ] ; , .

3.12 Operators

The following 37 tokens are the operators , formed from ASCII characters:

Operator: one of= > < ! ~ ? :== <= >= != && || ++ --+ - * / & | ^ % << >> >>>+= -= *= /= &= |= ^= %= <<= >>= >>>=

EXPRESSIONS Prefix Increment Operator ++ 15.15.1

487

A variable that is declared final cannot be decremented (unless it is a defi-nitely unassigned (§16) blank final variable (§4.12.4)), because when an access ofsuch a final variable is used as an expression, the result is a value, not a variable.Thus, it cannot be used as the operand of a postfix decrement operator.

15.15 Unary Operators

The unary operators include +, -, ++, --, ~, !, and cast operators. Expressionswith unary operators group right-to-left, so that -~x means the same as -(~x).

UnaryExpression:PreIncrementExpressionPreDecrementExpression+ UnaryExpression- UnaryExpressionUnaryExpressionNotPlusMinus

PreIncrementExpression:++ UnaryExpression

PreDecrementExpression:-- UnaryExpression

UnaryExpressionNotPlusMinus:PostfixExpression~ UnaryExpression! UnaryExpressionCastExpression

The following productions from §15.16 are repeated here for convenience:

CastExpression:( PrimitiveType ) UnaryExpression( ReferenceType ) UnaryExpressionNotPlusMinus

15.15.1 Prefix Increment Operator ++

A unary expression preceded by a ++ operator is a prefix increment expression.The result of the unary expression must be a variable of a type that is convertible(§5.1.8) to a numeric type, or a compile-time error occurs. The type of the prefixincrement expression is the type of the variable. The result of the prefix incrementexpression is not a variable, but a value.

EXPRESSIONS Prefix Increment Operator ++ 15.15.1

487

A variable that is declared final cannot be decremented (unless it is a defi-nitely unassigned (§16) blank final variable (§4.12.4)), because when an access ofsuch a final variable is used as an expression, the result is a value, not a variable.Thus, it cannot be used as the operand of a postfix decrement operator.

15.15 Unary Operators

The unary operators include +, -, ++, --, ~, !, and cast operators. Expressionswith unary operators group right-to-left, so that -~x means the same as -(~x).

UnaryExpression:PreIncrementExpressionPreDecrementExpression+ UnaryExpression- UnaryExpressionUnaryExpressionNotPlusMinus

PreIncrementExpression:++ UnaryExpression

PreDecrementExpression:-- UnaryExpression

UnaryExpressionNotPlusMinus:PostfixExpression~ UnaryExpression! UnaryExpressionCastExpression

The following productions from §15.16 are repeated here for convenience:

CastExpression:( PrimitiveType ) UnaryExpression( ReferenceType ) UnaryExpressionNotPlusMinus

15.15.1 Prefix Increment Operator ++

A unary expression preceded by a ++ operator is a prefix increment expression.The result of the unary expression must be a variable of a type that is convertible(§5.1.8) to a numeric type, or a compile-time error occurs. The type of the prefixincrement expression is the type of the variable. The result of the prefix incrementexpression is not a variable, but a value.

Lexical Structure

Syntactical Structure

Page 23: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Example: Java Language Specification

LEXICAL STRUCTURE Operators 3.12

31

3.11 Separators

The following nine ASCII characters are the separators (punctuators):

Separator: one of( ) { } [ ] ; , .

3.12 Operators

The following 37 tokens are the operators , formed from ASCII characters:

Operator: one of= > < ! ~ ? :== <= >= != && || ++ --+ - * / & | ^ % << >> >>>+= -= *= /= &= |= ^= %= <<= >>= >>>=

EXPRESSIONS Prefix Increment Operator ++ 15.15.1

487

A variable that is declared final cannot be decremented (unless it is a defi-nitely unassigned (§16) blank final variable (§4.12.4)), because when an access ofsuch a final variable is used as an expression, the result is a value, not a variable.Thus, it cannot be used as the operand of a postfix decrement operator.

15.15 Unary Operators

The unary operators include +, -, ++, --, ~, !, and cast operators. Expressionswith unary operators group right-to-left, so that -~x means the same as -(~x).

UnaryExpression:PreIncrementExpressionPreDecrementExpression+ UnaryExpression- UnaryExpressionUnaryExpressionNotPlusMinus

PreIncrementExpression:++ UnaryExpression

PreDecrementExpression:-- UnaryExpression

UnaryExpressionNotPlusMinus:PostfixExpression~ UnaryExpression! UnaryExpressionCastExpression

The following productions from §15.16 are repeated here for convenience:

CastExpression:( PrimitiveType ) UnaryExpression( ReferenceType ) UnaryExpressionNotPlusMinus

15.15.1 Prefix Increment Operator ++

A unary expression preceded by a ++ operator is a prefix increment expression.The result of the unary expression must be a variable of a type that is convertible(§5.1.8) to a numeric type, or a compile-time error occurs. The type of the prefixincrement expression is the type of the variable. The result of the prefix incrementexpression is not a variable, but a value.

EXPRESSIONS Prefix Increment Operator ++ 15.15.1

487

A variable that is declared final cannot be decremented (unless it is a defi-nitely unassigned (§16) blank final variable (§4.12.4)), because when an access ofsuch a final variable is used as an expression, the result is a value, not a variable.Thus, it cannot be used as the operand of a postfix decrement operator.

15.15 Unary Operators

The unary operators include +, -, ++, --, ~, !, and cast operators. Expressionswith unary operators group right-to-left, so that -~x means the same as -(~x).

UnaryExpression:PreIncrementExpressionPreDecrementExpression+ UnaryExpression- UnaryExpressionUnaryExpressionNotPlusMinus

PreIncrementExpression:++ UnaryExpression

PreDecrementExpression:-- UnaryExpression

UnaryExpressionNotPlusMinus:PostfixExpression~ UnaryExpression! UnaryExpressionCastExpression

The following productions from §15.16 are repeated here for convenience:

CastExpression:( PrimitiveType ) UnaryExpression( ReferenceType ) UnaryExpressionNotPlusMinus

15.15.1 Prefix Increment Operator ++

A unary expression preceded by a ++ operator is a prefix increment expression.The result of the unary expression must be a variable of a type that is convertible(§5.1.8) to a numeric type, or a compile-time error occurs. The type of the prefixincrement expression is the type of the variable. The result of the prefix incrementexpression is not a variable, but a value.

Lexical Structure

Syntactical Structure

Token Specification: These strings mean something, but knowledge of the exact meaning is not required to identify them.

Page 24: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Example: Java Language Specification

LEXICAL STRUCTURE Operators 3.12

31

3.11 Separators

The following nine ASCII characters are the separators (punctuators):

Separator: one of( ) { } [ ] ; , .

3.12 Operators

The following 37 tokens are the operators , formed from ASCII characters:

Operator: one of= > < ! ~ ? :== <= >= != && || ++ --+ - * / & | ^ % << >> >>>+= -= *= /= &= |= ^= %= <<= >>= >>>=

EXPRESSIONS Prefix Increment Operator ++ 15.15.1

487

A variable that is declared final cannot be decremented (unless it is a defi-nitely unassigned (§16) blank final variable (§4.12.4)), because when an access ofsuch a final variable is used as an expression, the result is a value, not a variable.Thus, it cannot be used as the operand of a postfix decrement operator.

15.15 Unary Operators

The unary operators include +, -, ++, --, ~, !, and cast operators. Expressionswith unary operators group right-to-left, so that -~x means the same as -(~x).

UnaryExpression:PreIncrementExpressionPreDecrementExpression+ UnaryExpression- UnaryExpressionUnaryExpressionNotPlusMinus

PreIncrementExpression:++ UnaryExpression

PreDecrementExpression:-- UnaryExpression

UnaryExpressionNotPlusMinus:PostfixExpression~ UnaryExpression! UnaryExpressionCastExpression

The following productions from §15.16 are repeated here for convenience:

CastExpression:( PrimitiveType ) UnaryExpression( ReferenceType ) UnaryExpressionNotPlusMinus

15.15.1 Prefix Increment Operator ++

A unary expression preceded by a ++ operator is a prefix increment expression.The result of the unary expression must be a variable of a type that is convertible(§5.1.8) to a numeric type, or a compile-time error occurs. The type of the prefixincrement expression is the type of the variable. The result of the prefix incrementexpression is not a variable, but a value.

EXPRESSIONS Prefix Increment Operator ++ 15.15.1

487

A variable that is declared final cannot be decremented (unless it is a defi-nitely unassigned (§16) blank final variable (§4.12.4)), because when an access ofsuch a final variable is used as an expression, the result is a value, not a variable.Thus, it cannot be used as the operand of a postfix decrement operator.

15.15 Unary Operators

The unary operators include +, -, ++, --, ~, !, and cast operators. Expressionswith unary operators group right-to-left, so that -~x means the same as -(~x).

UnaryExpression:PreIncrementExpressionPreDecrementExpression+ UnaryExpression- UnaryExpressionUnaryExpressionNotPlusMinus

PreIncrementExpression:++ UnaryExpression

PreDecrementExpression:-- UnaryExpression

UnaryExpressionNotPlusMinus:PostfixExpression~ UnaryExpression! UnaryExpressionCastExpression

The following productions from §15.16 are repeated here for convenience:

CastExpression:( PrimitiveType ) UnaryExpression( ReferenceType ) UnaryExpressionNotPlusMinus

15.15.1 Prefix Increment Operator ++

A unary expression preceded by a ++ operator is a prefix increment expression.The result of the unary expression must be a variable of a type that is convertible(§5.1.8) to a numeric type, or a compile-time error occurs. The type of the prefixincrement expression is the type of the variable. The result of the prefix incrementexpression is not a variable, but a value.

Lexical Structure

Syntactical Structure

Token Specification: These strings mean something, but knowledge of the exact meaning is not required to identify them.

Meaning is given by where they can occur in the program (grammar) and

and language semantics.

Page 25: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical Analysis

• The need to identify tokens raises two questions.!➡How can we specify the tokens of a language? ➡How can we recognize tokens in a character stream?

!18

Page 26: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical Analysis

• The need to identify tokens raises two questions.!➡How can we specify the tokens of a language? ➡How can we recognize tokens in a character stream?

!18

Regular Expressions

Language Design and

Specification

Token Specification

Page 27: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical Analysis

• The need to identify tokens raises two questions.!➡How can we specify the tokens of a language? ➡How can we recognize tokens in a character stream?

!18

Regular Expressions

Language Design and

Specification

Token Specification

Deterministic Finite Automata (DFA)

Language Implementation

Token Recognition

Page 28: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical Analysis

• The need to identify tokens raises two questions.!➡How can we specify the tokens of a language? ➡How can we recognize tokens in a character stream?

!18

Regular Expressions

Language Design and

Specification

Token Specification

Deterministic Finite Automata (DFA)

Language Implementation

Token Recognition

DFA Construction

(several steps)

Page 29: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Expression Rules

!19

Page 30: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Expression Rules

!19

Base case: a regular expression (RE) is either

Page 31: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Expression Rules

!19

Base case: a regular expression (RE) is either➡a character (e.g., ‘0’, ‘1’, ...), or

Page 32: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Expression Rules

!19

Base case: a regular expression (RE) is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

Page 33: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Expression Rules

!19

Base case: a regular expression (RE) is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

Page 34: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Expression Rules

!19

Base case: a regular expression (RE) is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

A compound RE is constructed by

Page 35: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Expression Rules

!19

Base case: a regular expression (RE) is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

A compound RE is constructed by➡alternation: two REs separated by “|” next to each other (e.g. “1 | 0”),

Page 36: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Expression Rules

!19

Base case: a regular expression (RE) is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

A compound RE is constructed by➡alternation: two REs separated by “|” next to each other (e.g. “1 | 0”),➡parentheses (in order to avoid ambiguity).

Page 37: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Expression Rules

!19

Base case: a regular expression (RE) is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

A compound RE is constructed by➡alternation: two REs separated by “|” next to each other (e.g. “1 | 0”),➡parentheses (in order to avoid ambiguity).➡concatenation: two REs next to each other (e.g., “(1)(0|1)”),

Page 38: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

Page 39: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ?

Page 40: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ? Yes

Page 41: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ? Yes0x0 ?

Page 42: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ? Yes0x0 ? No

Page 43: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ? Yes0x0 ? No0xdeadbeef ?

Page 44: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ? Yes0x0 ? No0xdeadbeef ? Yes

Page 45: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ? Yes0x0 ? No0xdeadbeef ? Yes0x ?

Page 46: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ? Yes0x0 ? No0xdeadbeef ? Yes0x ? No

Page 47: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ? Yes0x0 ? No0xdeadbeef ? Yes0x ? No0x01 ?

Page 48: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ? Yes0x0 ? No0xdeadbeef ? Yes0x ? No0x01 ? No

Page 49: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

What Does This Regular Expression Match?

•0x(1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)*

!20

0x1 ? Yes0x0 ? No0xdeadbeef ? Yes0x ? No0x01 ? No

Recognizes Positive hexadecimal constants!without leading zeros

Page 50: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Example

!21

Page 51: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Example

!21

•Can we create a regular expression corresponding to the “City, State ZIP-code” line in mailing addresses?

E.g.: Chapel Hill, NC 27599-3175

Page 52: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Example

!21

•Can we create a regular expression corresponding to the “City, State ZIP-code” line in mailing addresses?

E.g.: Chapel Hill, NC 27599-3175! ! Beverly Hills, CA 90210

Page 53: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Example

!21

•Can we create a regular expression corresponding to the “City, State ZIP-code” line in mailing addresses?

E.g.: Chapel Hill, NC 27599-3175! ! Beverly Hills, CA 90210

Page 54: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammars and Languages

• A regular grammar is a kind of grammar.!➡A grammar describes the structure of strings. ➡A string that “matches” a grammar G’s structure is said to be in the language L(G) (which is a set).

!22

Page 55: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammars and Languages

• A regular grammar is a kind of grammar.!➡A grammar describes the structure of strings. ➡A string that “matches” a grammar G’s structure is said to be in the language L(G) (which is a set).

!22

A grammar is a set of productions:!➡ Rules to obtain (produce) a string that is in L(G) via

repeated substitutions. ➡ There are many grammar classes (see COMP 455). ➡ Two are commonly used to describe programming

languages: regular grammars for tokens and context-free grammars for syntax.

Page 56: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!23

Page 57: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Productions

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!24

“A → B” is called a production.

Page 58: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Non-Terminals

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!25

The “name” on the left is called a non-terminal symbol.

Page 59: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Terminals

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!26

The symbols on the right are either terminal or non-terminal symbols. A terminal symbol is just a character.

Page 60: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Definition

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!27

“→” means “is a” or “replace with”

Page 61: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Choice

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!28

“|” denotes or

Page 62: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Example

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!29

Thus, the first production means: A digit is a “0” or ‘1’ or ‘2’ or … or ’9’.

Page 63: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Optional Repetition

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!30

“*” denotes zero or more of a symbol.

Page 64: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Sequence

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!31

Two symbols next to each other means “followed by.”

Page 65: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Example

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!32

Thus, this means: A natural number is a non-zero digit

followed by zero or more digits.

Page 66: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Epsilon

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!33

“ε” is special terminal that means empty. It corresponds to the empty string.

Page 67: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Example

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!34

So, what does this mean?

Page 68: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Grammar 101: Example

digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

natural_number → non_zero_digit digit*

non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )

!35

A non-negative number is a ‘0’ or a natural number, followed by either

nothing or a ‘.’, followed by zero or more digits, followed by (exactly one) digit.

Page 69: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Grammar Rules

!36

Very similar to regular expression rules!

Page 70: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Grammar Rules

!36

Terminals: a terminal is eitherVery similar to regular expression rules!

Page 71: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Grammar Rules

!36

Terminals: a terminal is either➡a character (e.g., ‘0’, ‘1’, ...), or

Very similar to regular expression rules!

Page 72: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Grammar Rules

!36

Terminals: a terminal is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

Very similar to regular expression rules!

Page 73: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Grammar Rules

!36

Terminals: a terminal is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

Very similar to regular expression rules!

Page 74: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Grammar Rules

!36

Terminals: a terminal is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

Non-Terminals: can be constructed using

Very similar to regular expression rules!

Page 75: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Grammar Rules

!36

Terminals: a terminal is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

Non-Terminals: can be constructed using➡alternation: two REs or nonterminals separated by “|” next to each other (e.g. “letter | digit”),

Very similar to regular expression rules!

Page 76: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Grammar Rules

!36

Terminals: a terminal is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

Non-Terminals: can be constructed using➡alternation: two REs or nonterminals separated by “|” next to each other (e.g. “letter | digit”),➡parentheses (in order to avoid ambiguity).

Very similar to regular expression rules!

Page 77: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Grammar Rules

!36

Terminals: a terminal is either➡a character (e.g., ‘0’, ‘1’, ...), or➡the empty string (i.e., ‘ε’).

Non-Terminals: can be constructed using➡alternation: two REs or nonterminals separated by “|” next to each other (e.g. “letter | digit”),➡parentheses (in order to avoid ambiguity).➡concatenation: two REs or nonterminals next to each other (e.g., “letter letter”),

Very similar to regular expression rules!

Page 78: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Terminals: a terminal is either!➡a character (e.g., ‘0’, ‘1’, ...), or ➡the empty string (i.e., ‘ε’). !

Non-Terminals: can be constructed using!➡alternation: two REs or nonterminals separated by “|” next to each other (e.g. “letter | digit”), ➡parentheses (in order to avoid ambiguity). ➡concatenation: two REs or nonterminals next to each other (e.g., “letter letter”), ➡optional repetition: a RE or nonterminal followed by “*” (the Kleene star) to denote zero or more occurrences (e.g., “digit*”)

!37

Regular Grammar RulesVery similar to regular expression rules!

A non-terminal is NEVER defined in terms of itself, not even indirectly!

Thus, regular grammars cannot define recursive statements.

Page 79: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Example

!38

Let’s create a regular grammar corresponding to the “City, State ZIP-code” line in mailing addresses. !

E.g.: Chapel Hill, NC 27599-3175!! ! Beverly Hills, CA 90210

Page 80: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Example

city_line → city ‘, ‘ state_abbrev ‘ ‘ zip_code city → letter (letter | ‘ ‘ letter)* state_abbrev → ‘AL’ | ‘AK’ | ‘AS’ | ‘AZ’ | … | ‘WY’ zip_code → digit digit digit digit digit (extra | ε ) extra → ‘-’ digit digit digit digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 letter → A | B | C | … | ö | …

!39

Let’s create a regular grammar corresponding to the “City, State ZIP-code” line in mailing addresses. !

E.g.: Chapel Hill, NC 27599-3175!! ! Beverly Hills, CA 90210

Page 81: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

city_line → city ‘, ‘ state_abbrev ‘ ‘ zip_code city → letter (letter | ‘ ‘ letter)* state_abbrev → ‘AL’ | ‘AK’ | ‘AS’ | ‘AZ’ | … | ‘WY’ zip_code → digit digit digit digit digit (extra | ε ) extra → ‘-’ digit digit digit digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 letter → A | B | C | … | ö | …

Let’s create a regular grammar corresponding to the “City, State ZIP-code” line in mailing addresses. !

E.g.: Chapel Hill, NC 27599-3175!! ! Beverly Hills, CA 90210

!40

Example

Creating a regular expression from a regular grammar is mechanical and easy.

!

Just take the most general non-terminal and keep substituting until you get down to

terminals. !

The lack of recursion means that you won’t get into an infinite loop.

Page 82: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Sets and Finite AutomataIf a grammar G is a regular grammar,

then the language L(G) is called a regular set.

!41

Page 83: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Sets and Finite AutomataIf a grammar G is a regular grammar,

then the language L(G) is called a regular set.

!41

Equivalently, the language accepted by a regular expression is a regular set.

Page 84: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Regular Sets and Finite Automata

Fundamental equivalence:

For every regular set L(G), there exists a deterministic finite automaton (DFA) that

accepts a string S if and only if S∈L(G).

If a grammar G is a regular grammar, then the language L(G) is called a regular set.

(See COMP 455 for proof.)

!41

Equivalently, the language accepted by a regular expression is a regular set.

Page 85: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA 101

!42

• Deterministic finite automaton:!➡Has a finite number of states. ➡Exactly one start state. ➡One or more final states. ➡Transitions: define how automaton switches between states (given an input symbol).

A(Start)

0

B1

0

C1

0, 1

Page 86: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA 101

!43

• Deterministic finite automaton:!➡Has a finite number of states. ➡Exactly one start state. ➡One or more final states. ➡Transitions: define how automaton switches between states (given an input symbol).

A(Start)

0

B1

0

C1

0, 1

Start State

Page 87: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA 101

!44

• Deterministic finite automaton:!➡Has a finite number of states. ➡Exactly one start state. ➡One or more final states. ➡Transitions: define how automaton switches between states (given an input symbol).

A(Start)

0

B1

0

C1

0, 1

Intermediate State (neither start nor final)

Page 88: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA 101

!45

• Deterministic finite automaton:!➡Has a finite number of states. ➡Exactly one start state. ➡One or more final states. ➡Transitions: define how automaton switches between states (given an input symbol).

A(Start)

0

B1

0

C1

0, 1

Final State (indicated by double border)

Page 89: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA 101

!46

• Deterministic finite automaton:!➡Has a finite number of states. ➡Exactly one start state. ➡One or more final states. ➡Transitions: define how automaton switches between states (given an input symbol).

A(Start)

0

B1

0

C1

0, 1

Transition Given an input of ‘1’, if DFA is in

state A, then transition to state B (and consume the input).

Page 90: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA 101

!47

• Deterministic finite automaton:!➡Has a finite number of states. ➡Exactly one start state. ➡One or more final states. ➡Transitions: define how automaton switches between states (given an input symbol).

A(Start)

0

B1

0

C1

0, 1

Self Transition Given an input of ‘0’, if DFA is in state A, then stay in state A

(and consume the input).

Page 91: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA 101

!48

• Deterministic finite automaton:!➡Has a finite number of states. ➡Exactly one start state. ➡One or more final states. ➡Transitions: define how automaton switches between states (given an input symbol).

A(Start)

0

B1

0

C1

0, 1

Transitions must be unambiguous: For each state and each input, there exist only one transition. This is what makes the DFA deterministic.

Z

Y

X1

1Not a legal DFA!

Page 92: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA 101

!49

• Deterministic finite automaton:!➡Has a finite number of states. ➡Exactly one start state. ➡One or more final states. ➡Transitions: define how automaton switches between states (given an input symbol).

A(Start)

0

B1

0

C1

0, 1

Multiple Transitions Given an input of either ‘0’ or ‘1’, if DFA

is in state C, then stay in state C (and consume the input).

Page 93: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA String Processing

!50

A(Start)

0

B1

0

C1

0, 1

Page 94: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA String Processing

• String processing.!➡Initially in start state. ➡Sequentially make transitions each character in input string.

!50

A(Start)

0

B1

0

C1

0, 1

Page 95: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA String Processing

• String processing.!➡Initially in start state. ➡Sequentially make transitions each character in input string.

• A DFA either accepts or rejects a string.!➡Reject if a character is encountered for which no transition is defined in the current state.

➡Reject if end of input is reached and DFA is not in a final state. ➡Accept if end of input is reached and DFA is in final state.

!50

A(Start)

0

B1

0

C1

0, 1

Page 96: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA Example

!51

A(Start)

0

B1

0

C1

0, 1

1Input: 0

current input character

current state

Initially, DFA is in the start State A.!The first input character is ‘1’.!

This causes a transition to State B.

Page 97: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA Example

!52

A(Start)

0

B1

0

C1

0, 1

1Input: 0

current input character

current state

The next input character is ‘0’.!This causes a self transition in

State B.

Page 98: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA Example

!53

A(Start)

0

B1

0

C1

0, 1

1Input: 0

current input character

current state

The end of the input is reached, but the DFA is not in a final state:!

the string ’10’ is rejected!

Page 99: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA-Equivalent Regular Expression

!54

A(Start)

0

B1

0

C1

0, 1

What’s the RE such that the RE’s language is exactly the set of strings that is accepted by

this DFA?

Page 100: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA-Equivalent Regular Expression

!54

A(Start)

0

B1

0

C1

0, 1

What’s the RE such that the RE’s language is exactly the set of strings that is accepted by

this DFA?

0*10*1(1|0)*

Page 101: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

DFA-Equivalent Regular Expression

!55

A(Start)

0

B1

0

C1

0, 1

What’s the RE such that the RE’s language is exactly the set of strings that is accepted by this DFA?

0*10*1(1|0)*

Page 102: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Recognizing Tokens with a DFA

!56

A(Start)

0

B1

0

C1

0, 1

Page 103: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Recognizing Tokens with a DFA

• Table-driven implementation.!➡DFA’s can be represented as a 2-dimensional table.

!56

A(Start)

0

B1

0

C1

0, 1

Page 104: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Recognizing Tokens with a DFA

• Table-driven implementation.!➡DFA’s can be represented as a 2-dimensional table.

!56

A(Start)

0

B1

0

C1

0, 1

Current State On ‘0’ On ‘1’ NoteA transition to A transition to B startB transition to B transition to C —C transition to C transition to C final

Page 105: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Recognizing Tokens with a DFA

• Table-driven implementation.!➡DFA’s can be represented as a 2-dimensional

!57

A(Start)

0

B1

0

C1

0, 1

Current State On ‘0’ On ‘1’ NoteA transition to A transition to B startB transition to B transition to C —C transition to C transition to C final

currentState = start state;!while end of input not yet reached: {!c = get next input character;!if transitionTable[currentState][c] ≠ null:!currentState = transitionTable[currentState][c]!

else:!reject input!

}!if currentState is final:!accept input!

else:!reject input!

Page 106: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Recognizing Tokens with a DFA

• Table-driven implementation.!➡DFA’s can be represented as a 2-dimensional

!58

A(Start)

0

B1

0

C1

0, 1

Current State On ‘0’ On ‘1’ NoteA transition to A transition to B startB transition to B transition to C —C transition to C transition to C final

currentState = start state;!while end of input not yet reached: {!c = get next input character;!if transitionTable[currentState][c] ≠ null:!currentState = transitionTable[currentState][c]!

else:!reject input!

}!if currentState is final:!accept input!

else:!reject input

This accepts exactly one token in the input.!A real lexer must detect multiple successive tokens.!

This can be achieved by resetting to the start state.!But what happens if the suffix of one token is the prefix of another? !

(See Chapter 2 for a solution.)

Page 107: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical Analysis• The need to identify tokens raises two questions.!➡How can we specify the tokens of a language? ‣With regular expressions.

➡How can we recognize tokens in a character stream? ‣With DFAs.

!59

Deterministic Finite Automata (DFA)Regular Expressions DFA Construction

Token Specification Token Recognition

Page 108: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical Analysis• The need to identify tokens raises two questions.!➡How can we specify the tokens of a language? ‣With regular expressions.

➡How can we recognize tokens in a character stream? ‣With DFAs.

!59

Deterministic Finite Automata (DFA)Regular Expressions DFA Construction

Token Specification Token Recognition

Page 109: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical Analysis• The need to identify tokens raises two questions.!➡How can we specify the tokens of a language? ‣With regular expressions.

➡How can we recognize tokens in a character stream? ‣With DFAs.

!59

Deterministic Finite Automata (DFA)Regular Expressions DFA Construction

Token Specification Token Recognition

Page 110: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Lexical Analysis• The need to identify tokens raises two questions.!➡How can we specify the tokens of a language? ‣With regular expressions.

➡How can we recognize tokens in a character stream? ‣With DFAs.

!59

Deterministic Finite Automata (DFA)Regular Expressions DFA Construction

Token Specification Token Recognition

No single-step algorithm: We first need to construct a Non-Deterministic Finite Automaton…

Page 111: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Non-Deterministic Finite Automaton (NFA)

!60

Z Y

X

1 1

V

A legal NFA fragment.

ε

Page 112: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Non-Deterministic Finite Automaton (NFA)• Like a DFA, but less restrictive:

!60

Z Y

X

1 1

V

A legal NFA fragment.

ε

Page 113: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Non-Deterministic Finite Automaton (NFA)• Like a DFA, but less restrictive:➡Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.)

!60

Z Y

X

1 1

V

A legal NFA fragment.

ε

Page 114: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Non-Deterministic Finite Automaton (NFA)• Like a DFA, but less restrictive:➡Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.)

➡Epsilon transitions do not consume any input. (They correspond to the empty string.)

!60

Z Y

X

1 1

V

A legal NFA fragment.

ε

Page 115: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Non-Deterministic Finite Automaton (NFA)• Like a DFA, but less restrictive:➡Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.)

➡Epsilon transitions do not consume any input. (They correspond to the empty string.)

➡Note that every DFA is also a NFA.

!60

Z Y

X

1 1

V

A legal NFA fragment.

ε

Page 116: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Non-Deterministic Finite Automaton (NFA)• Like a DFA, but less restrictive:➡Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.)

➡Epsilon transitions do not consume any input. (They correspond to the empty string.)

➡Note that every DFA is also a NFA.

!60

Z Y

X

1 1

V

A legal NFA fragment.

ε

Acceptance rule:

Page 117: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Non-Deterministic Finite Automaton (NFA)• Like a DFA, but less restrictive:➡Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.)

➡Epsilon transitions do not consume any input. (They correspond to the empty string.)

➡Note that every DFA is also a NFA.

!60

Z Y

X

1 1

V

A legal NFA fragment.

ε

Acceptance rule:➡ Accepts an input string if there exists

a series of transitions such that the NFA is in a final state when the end of input is reached.

Page 118: 04 — Lexical Analysis-part1 - Computer Sciencebcw/comp524-sp14/slides/04lex1.pdfB. Ward — Spring 2014 The Big Picture Scanner (lexical analysis) Parser (syntax analysis) Semantic

B. Ward — Spring 2014

Non-Deterministic Finite Automaton (NFA)• Like a DFA, but less restrictive:➡Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.)

➡Epsilon transitions do not consume any input. (They correspond to the empty string.)

➡Note that every DFA is also a NFA.

!60

Z Y

X

1 1

V

A legal NFA fragment.

ε

Acceptance rule:➡ Accepts an input string if there exists

a series of transitions such that the NFA is in a final state when the end of input is reached.

➡ Inherent parallelism: all possible paths are explored simultaneously.


Recommended