+ All Categories
Home > Documents > Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis...

Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis...

Date post: 24-Jul-2020
Category:
Upload: others
View: 23 times
Download: 0 times
Share this document with a friend
52
Lexical Analysis Lecture 3 January 10, 2018
Transcript
Page 1: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical AnalysisLecture 3

January 10, 2018

Page 2: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Announcements

É PA1c due tonight at 11:50pm!É Don’t forget about PA1, the Cool implementation!É Use Monday’s lecture, the video guides and Cool

examples if you’re stuck with Cool!

Compiler Construction 2/39

Page 3: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Programming Assignments Going Forward

É C was allowed for PA1, but not for PA2through PA6É How comfortable are we with other languages?É Python, Haskell, Ruby, OCaml, and JavaScript

Compiler Construction 3/39

Page 4: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Analysis Summary

É Lexical analysis turns a stream of charactersinto a stream of tokensÉ Regular expressions are a way to specify sets of

strings, which we use to describe tokens

class Main { ...

Lexical Analyzer

CLASS, IDENT, LBRACE, ...

Compiler Construction 4/39

Page 5: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Analysis Summary

É Lexical analysis turns a stream of charactersinto a stream of tokensÉ Regular expressions are a way to specify sets of

strings, which we use to describe tokens

class Main { ...

Lexical Analyzer

CLASS, IDENT, LBRACE, ...

Compiler Construction 4/39

Page 6: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Analysis Summary

É Lexical analysis turns a stream of charactersinto a stream of tokensÉ Regular expressions are a way to specify sets of

strings, which we use to describe tokens

class Main { ...

Lexical Analyzer

CLASS, IDENT, LBRACE, ...

Compiler Construction 4/39

Page 7: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Analysis Summary

É Lexical analysis turns a stream of charactersinto a stream of tokensÉ Regular expressions are a way to specify sets of

strings, which we use to describe tokens

class Main { ...

Lexical Analyzer

CLASS, IDENT, LBRACE, ...

Compiler Construction 4/39

Page 8: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Analysis Summary

É Lexical analysis turns a stream of charactersinto a stream of tokensÉ Regular expressions are a way to specify sets of

strings, which we use to describe tokens

class Main { ...

Lexical Analyzer

CLASS, IDENT, LBRACE, ...

Compiler Construction 4/39

Page 9: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Analysis Summary

É Lexical analysis turns a stream of charactersinto a stream of tokensÉ Regular expressions are a way to specify sets of

strings, which we use to describe tokens

class Main { ...

Lexical Analyzer

CLASS, IDENT, LBRACE, ...

Compiler Construction 4/39

Page 10: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Cunning Plan

É Informal Sketch of Lexical AnalysisÉ LA identifies tokens from input stringÉ List<Token> lexer ( char[] )

É Issues in Lexical AnalysisÉ LookaheadÉ Ambiguity

É Specifying LexersÉ Regular ExpressionsÉ Examples

Compiler Construction 5/39

Page 11: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Definitions

É Token — set of strings defining an atomicelement with a distinct meaningÉ a syntactic category

É In English:É noun, verb, adjective

É In Programming:É identifier, integer, keyword, whitespace, ...

É Lexeme — a sequence of characters than can becategorized as a Token

Compiler Construction 6/39

Page 12: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Definitions

É Token — set of strings defining an atomicelement with a distinct meaningÉ a syntactic categoryÉ In English:

É noun, verb, adjectiveÉ In Programming:

É identifier, integer, keyword, whitespace, ...

É Lexeme — a sequence of characters than can becategorized as a Token

Compiler Construction 6/39

Page 13: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Tokens and Lexemes

Token Lexeme

CLASS classLT <FALSE falseIDENT variable_name

Compiler Construction 7/39

Page 14: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Tokens and Lexemes

Token Lexeme

CLASS classLT <FALSE falseIDENT variable_name

Compiler Construction 7/39

Page 15: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Tokens and Lexemes

Token Lexeme

CLASS classLT <FALSE falseIDENT variable_name

By the way, what do you think of Cool’s fi, pool,esac...?

Compiler Construction 7/39

Page 16: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Context for Lexers

É Lexing and Parsing go hand-in-handÉ Parser uses distinctions between tokens

É e.g., a keyword is treated differently than an identifier

input Lexer Parser

get_char()

character

get_token()

get_token()

Compiler Construction 8/39

Page 17: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Analysis

É Consider this example:if(i=j) then

z<-0else

z<-1É The input is simply a sequence of characters:

if(i=j) then\n\tz<-0\nelse ...É Goal partition input strings into substrings

É Then, classify them according to their role(tokenize!)

Compiler Construction 9/39

Page 18: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Tokens

É Tokens correspond to sets of strings

É Identifier— strings of letters or digits, startingwith a letterÉ Integer— a non-empty string of digitsÉ Keyword— “else” or “class” or “let” ...É Whitespace— Non-empty sequence of blanks,

newlines, and/or tabsÉ OpenParen— a left parenthesis (É CloseParen— a right parenthesis )

Compiler Construction 10/39

Page 19: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Building a Lexical Analyzer

É Lexer implementation must do three things1. Recognize substrings corresponding to tokens

2. Return the value of lexeme of the token

3. Report errors intelligently (line numbers for Cool)

Compiler Construction 11/39

Page 20: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Analyzer Implementation

É Lexer usually discards “uninteresting” tokensthat don’t contribute to parsing

É Examples: Whitespace, commentsÉ Exceptions: Which languages care about

whitespace?

É Review: What would happen if we removed allwhitespace and comments before lexing?

Compiler Construction 12/39

Page 21: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Example

É Recall:if (i = j) then

z<-0else

z<-1É Our Cool Lexer would return

token-lexeme-linenumber tuples<IF, “if”, 1><WHITESPACE, “ ”, 1><OPENPAREN, “(”, 1><IDENTIFIER, “i”, 1><WHITESPACE, “ ”, 1><EQUALS, “=”, 1>

Compiler Construction 13/39

Page 22: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexing Considerations

É The goal is to partition the input string intomeaningful tokens.É Scan left to right (i.e., in order)É Recognize tokens

É We really need a way to describe the lexemesassociated with each tokenÉ And also a way to handle ambiguities

É is “if” two variables “i”, “f”

Compiler Construction 14/39

Page 23: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexing Considerations

É The goal is to partition the input string intomeaningful tokens.É Scan left to right (i.e., in order)É Recognize tokens

É We really need a way to describe the lexemesassociated with each token

É And also a way to handle ambiguitiesÉ is “if” two variables “i”, “f”

Compiler Construction 14/39

Page 24: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexing Considerations

É The goal is to partition the input string intomeaningful tokens.É Scan left to right (i.e., in order)É Recognize tokens

É We really need a way to describe the lexemesassociated with each tokenÉ And also a way to handle ambiguities

É is “if” two variables “i”, “f”

Compiler Construction 14/39

Page 25: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Regular LanguagesÉ Sounds like we can use DFAs to recognize

lexemesÉ With accepting states corresponding to tokens!

Example: Capture the word “class”c l a s s WS

Example: Capture some variable name

A

AN

WS A = letterAN = alphanumericW = whitespace

Compiler Construction 15/39

Page 26: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Capturing Multiple Tokens

What about both “class” and variable names?

1

2

c l a s s WS

A-c WS

WSAN

Compiler Construction 16/39

Page 27: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Analyzer Generators

É We like regular languages as a means tocategorize lexemes into tokensÉ We don’t like the complexity of implementing

a DFA manually

É We use Regular Expressions to describeregular languagesÉ And our tokens are recognizable as regular

languages!

É Regular Expressions can be automaticallyturned into a DFA for rapid lexing!

Compiler Construction 17/39

Page 28: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Languages Review

É Definition Let Σ be a set of characters. Alanguage over Σ is a set of strings of charactersdrawn from Σ. Σ is called the alphabet

Compiler Construction 18/39

Page 29: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Examples of Languages

É Alphabet = English CharactersÉ Language = English Sentences

É Note: Not every string on English characters is anEnglish sentence

É Example: adsfasdklg gdsajkl

É Alphabet = ASCII charactersÉ Language = C Programs

É Note: ASCII character set is different from Englishcharacter set

Compiler Construction 19/39

Page 30: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Notation

É Languages are sets of strings

É We need some notation for specifying whichsets we wantÉ i.e., which strings are in a set?

É For lexical analysis, we care about regularlanguages, which can we described using regularexpressions

Compiler Construction 20/39

Page 31: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Regular Expressions

É Each regular expressions is a notation for aregular language (a set of “words”)É Notation forthcoming!

É If A is a regular expression, we write L(A) torefer to the language denoted by A

Compiler Construction 21/39

Page 32: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Base Regular Expressions

É Single character: ‘c’É L(‘c’) = { ‘c’ }

É Concatenation: ABÉ A and B are both Regular expressionsÉ L(AB) = { ab | a ∈ L(A) and b ∈ L(B)}

É Example: L(‘i’ ‘f’) = { ‘if’ }

Compiler Construction 22/39

Page 33: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Compound Regular Expressions

É UnionÉ L(A|B) = { s | s ∈ L(A) or s ∈ L(B)}

É ExamplesÉ L(‘if’ | ‘then‘ | ‘else’) = { ‘if’, ‘then’, ‘else’ }É L(‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’) = what?

É L ( (‘0’|‘1’) (‘0’|’1’) ) = what?

Compiler Construction 23/39

Page 34: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Starz!É So far, base and compound regular expressions

only describe finite languagesÉ Iteration: A∗

É L(A∗) = {“”} ∪ {L(A)} ∪ {L(AA)} ∪ {L(AAA)} ∪...

É ExamplesÉ L(‘0′∗) = {“”, “0”, “00”, “000”, ...}É L(‘1′‘0′∗) = {“1”, “10”, “100”, “1000”, ...}

É Empty: εÉ L(ε) = {“”}

Compiler Construction 24/39

Page 35: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Example: Keyword

É Keywords: “else” or “if” or “fi”

‘else’ | ‘if’ | ‘fi’(Recall that ‘else’ abbreviates concatenation of‘e’ ‘l’ ‘s’ ‘e’ )

Compiler Construction 25/39

Page 36: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Example: Integer

É Integer: a non-empty string of digits

digit = ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’

number = digit digit*

É Abbreviation: A+ = AA*

Compiler Construction 26/39

Page 37: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Example: Identifiers

É Identifier: string of letters or digits, start with aletter

letter = ‘A’ | ... | ‘Z’ | ‘a’ | ... ‘Z’ident = letter (letter | digit ) *

É Is (letter*|digit*) the same?

Compiler Construction 27/39

Page 38: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Example: Whitespace

É Whitespace: a non-empty sequence of blanks,newlines, and tabs

( ‘ ’ | ‘\t’ | ‘\n’ | ‘\r’ ) +

Compiler Construction 28/39

Page 39: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Example: Phone Numbers

Regular expressions are everywhere!Consider: (123)-234-4567

Σ { 0, 1, 2, ... 9, (, ), -}area digit digit digitexch digit digit digitphone digit digit digit digitnumber ‘(’ area ‘)’ ‘-’ exch ‘-’ phone

Compiler Construction 29/39

Page 40: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Example: Email addresses

Consider [email protected]Σ {a, b, c, ..., z, ‘.’, ‘@’}name letter+address name ‘@’ name (‘.’ name)*

Compiler Construction 30/39

Page 41: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Regular Expression Summary

É Regular expressions describe many usefullanguagesÉ Given a string s and a regexp R, we can find if

s ∈ L(R)É Is this enough?

É NO Recall we need the original lexeme!

É We must adapt regular expressions to this goal

Compiler Construction 31/39

Page 42: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Regular Expression Summary

É Regular expressions describe many usefullanguagesÉ Given a string s and a regexp R, we can find if

s ∈ L(R)É Is this enough?É NO Recall we need the original lexeme!

É We must adapt regular expressions to this goal

Compiler Construction 31/39

Page 43: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Regular Expression Summary

É Regular expressions describe many usefullanguagesÉ Given a string s and a regexp R, we can find if

s ∈ L(R)É Is this enough?É NO Recall we need the original lexeme!

É We must adapt regular expressions to this goal

Compiler Construction 31/39

Page 44: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Next time

É Specifying lexical structure using regularexpressionsÉ Finite automata

É Deterministic Finite AutomataÉ Nondeterministic Finite Automata

É Implementation of Regular ExpressionsÉ Regexp→NFA→ DFA→ lookup table

É Lexical Analyzer Generation (i.e., doing this allautomatically)

Compiler Construction 32/39

Page 45: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Specification (1)

É Start with a set of tokens (protip, PA2 liststhem for Cool)É Write a regular expressions for the lexemes

representing each tokenÉ Number = digit+É IF = “if”É ELSE = “else”É IDENT = letter ( letter | digit ) *

...

Compiler Construction 33/39

Page 46: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Specification (2)

É Construct R, matching all lexemes for alltokensÉ R =Number | IF | ELSE | IDENT | ...É R = R1 | R2 | R3 | ...

É If s ∈ L(R), then s is a lexemeÉ Also s ∈ L(Rj) for some jÉ The particular j corresponds to the type of token

reported by lexer

Compiler Construction 34/39

Page 47: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Lexical Specification (3)

É For an input x1, ...,xnÉ Each xi ∈Σ

É For each 1≤ i≤ n, checkÉ is x1...xi ∈ L(R)?

É If so, it must be thatx1...xi ∈ L(Rj) for some jÉ Remove x1...xi from input and restart

Compiler Construction 35/39

Page 48: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Example Lexing

R =Whitespace | Integer | Identifier | PlusLex “f + 3 + g”

É “f” matches R (more specifically, Identifier)É “ ” matches R (more specifically, Whitespace)

...

What does the lexer output look like for thisexample?

Compiler Construction 36/39

Page 49: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Ambiguities

É Ambiguities arise in this algorithm

É R =Whitespace | Integer | Identifier | Plus

É Lex “foo+3”É “f”, “fo”, and “foo” all match R, but not “foo+”É How much input do we consume?

É Maximal munch rule: pick the longest possiblesubstring that matches R

Compiler Construction 37/39

Page 50: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Ambiguities (2)

R=Whitespace | ‘new’ | Integer | Identifier | PlusÉ Lex “new foo”

É “new” matches both the ‘new’ rule and the‘Identifier’ rulewhich one do you pick?

É Generally, pick the rule listed firstÉ Arbitrary, but typicalÉ Important for PA2!

‘new’ was listed before ‘Identifier‘, so the tokengiven is ‘new’

Compiler Construction 38/39

Page 51: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

Summary

É Regular expressions provide a concise notationfor string patternsÉ We need to adapt them for lexical analysis to

É Resolve ambiguitiesÉ Handle errors (report line numbers)

É Next time, Lexical Analysis Generators

Compiler Construction 39/39

Page 52: Lexical Analysis - Lecture 3kjleach.eecs.umich.edu/c18/l3.pdf · 2018-01-10 · Lexical Analysis Summary É Lexical analysis turns a stream of characters into a stream of tokens É

c l a s s WS

A

AN

WS A = letterAN = alphanumericW = whitespace

1

2

c l a s s WS

A-c WS

WSAN

Compiler Construction 39/39


Recommended