OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Formal Languages, Grammars and Automata
Helle Hvid Hansen
http://www.cs.ru.nl/~helle/
Foundations Group – Intelligent Systems SectionInstitute for Computing and Information Sciences
Radboud University Nijmegen
25 April 2014
Helle Hvid Hansen 25 April 2014 FLGA 1 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Outline
Organisation
Formal Languages
Regular Expressions and Regular Languages
Conclusion
Helle Hvid Hansen 25 April 2014 FLGA 2 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Course Organisation
• Lectures: Fridays, 13:30 - 15:30.
• Register in Blackboard (to get email announcements).
• Course webpage (information, material, exercises):
www.ru.nl/foundations/education/courses/flga-2014/
Check there first, before asking/emailing me!
• Material (PDFs via webpage):• Languages and Automata, Lecture Notes (v2),
by Alexandra Silva (used in Talen en Automaten, RU)• Lecture Notes on Regular Languages and Finite Automata,
by Andrew Pitts (earlier version used last year)
• Lots of other material available via www.
Helle Hvid Hansen 25 April 2014 FLGA 3 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Examination and Grading
• Final exam (closed-book!): Tue 24 June, 8:30-11:30.
• Midterm test: Friday 23 May 2014 (details TBD).
• Final grade F (given test grade T , exam grade E ):• If E ≤ 5 (fail), then F = E• If E > 5 (pass), then F = max{1/2(T + E ),E}.
• Students with right to extra time, please contact me asap.
Helle Hvid Hansen 25 April 2014 FLGA 4 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Tutorials (Werkcolleges) and Homework
• Time: Fridays 15:30-17:30.
• Tutorial instructors: Lorena van Duuren and Emma Gerritse.
• Group 1: Last name starts with [A-L],instructor: Lorena van Duuren, room: HG00.633.
• Group 2: Last name starts with [M-Z],instructor: Emma Gerritse, room HG00.065.
• No compulsory homework, option to hand in one exercise perweek for feedback.
• Exercises will be made available on webpage Thursday evening(for the coming day).
Helle Hvid Hansen 25 April 2014 FLGA 5 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
How to pass?
• Practice, practice, practice,... Do (non-compulsory) exercises!Yes, it takes discipline....But this is the only way to internalise the material.
• Go to tutorials to get feedback and solutions to exercises.
• Test and exam questions will be in line with exercises.
• 3ec means 3× 28 hours = 84 hours total.20 hrs for exam, 8 hours per week⇒ 4 hours of self-study and exercises per week.
• Re-examination: Tuesday 24 June 2014, 08:30 - 11:30.
Helle Hvid Hansen 25 April 2014 FLGA 6 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Let’s get started
What is a language?
• Natural language (English, Dutch, Chinese, ...).Some words in the English language:“students”, “do”, “homework”.
• Programming language (C, Java, Python, ...)A string of the language C: “printf(“Hello world.”)”
• Mathematical language, e.g. “x − (y − x) = 2x − y”
• Logic languages, e.g. first-order logic: ∀x ∈ N ∃y ∈ N : y > x
• ...
A language consists of words (or strings).Words are sequences of letters/symbols from an alphabet.
Helle Hvid Hansen 25 April 2014 FLGA 7 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Alphabet
Def. An alphabet is a finite set, often denoted Σ. Elements of analphabet are called letters or symbols.
Examples:
Σ1 = {a}Σ2 = {0, 1}Σ3 = {A,C ,G ,T}Σ4 = {a, b, c , d , . . . , x , y , z}Σ5 = Chinese alphabet: ± 40.000 symbolsΣ6 = {+,×,−, 0, 1, 2, 3, . . .}
mathematical “alphabet”, countably infinite, so not alphabet.
Helle Hvid Hansen 25 April 2014 FLGA 8 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Words/Strings
Def. Given an alphabet Σ,
• a word (or string) over Σ is a finite sequence of letters from Σ.
• the empty word (i.e. sequence of length 0) is denoted by λ.
• the set of all words over Σ is denoted by Σ∗.
Examples:
• x − (y − x) = 2x − y is a word of length 12 over the alphabetΣ = {x , y ,−,+, (, ),=, 0, 1, 2}
• The students will do their homework is a word of length 11over the alphabetΣ = {The, students, will, do, their, homework, , a,b,c}
Helle Hvid Hansen 25 April 2014 FLGA 9 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Inductive Definition of Words
Inductive definition of Σ∗:
Σ∗ is the smallest set satisfying the following rules:
1 λ ∈ Σ∗.
2 If w ∈ Σ∗ and a ∈ Σ, then wa ∈ Σ∗ (or equivalently, aw ∈ Σ∗)
(Why is “smallest set” important?)
Properties of Σ∗:
• Σ∗ 6= ∅ (why?)
• If Σ 6= ∅, then Σ∗ is infinite
Helle Hvid Hansen 25 April 2014 FLGA 10 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Definitions by Induction on Words
• Definition of Σ∗ says: w is a word if and only if
w = λ or w = va for some word v and letter a.
• We can define a function f on words by defining f (w) byinduction on w (distinguish cases for f (w)):
Base case (w = λ): f (λ) = ...Inductive case (w = va): f (va) = ... (may use f (v))
• If f takes several arguments, we can choose one for theinduction, for example, define f (u,w) by induction on w(u is fixed wrt induction).
Helle Hvid Hansen 25 April 2014 FLGA 11 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Concatenation of Words
• Given words u = ab and w = bc over Σ = {a, b, c}.We can concatenate them to create new words:
u · w = abbc, w · u = bcab, u · u = abab
• Concatenation is a binary operation · on words.
• We define u · w by induction on w . For all u ∈ Σ∗,
Base case: u · λ = uInductive case: u · va = (u · v)a for all v ∈ Σ∗ and a ∈ Σ.
(We will often write uv instead of u · v)
• Some properties: u(vw) = (uv)w , λu = u
Helle Hvid Hansen 25 April 2014 FLGA 12 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
More Operations
• Reversal of words, e.g. (abc)R = cba.Define wR by induction on w :
Base case: λR = λInductive case: (va)R = a · vR for all v ∈ Σ∗ and a ∈ Σ
• Repeating a word: E.g. (ab)2 = abab, (ab)3 = ababab, etc.Define un by induction on n ∈ N (!)
u0 = λ and un+1 = u · un
(Base case: n = 0, Inductive case: n = n′ + 1.)
Helle Hvid Hansen 25 April 2014 FLGA 13 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Counting Occurrences and Length
• |w |a is the number of occurrences of letter a in word w .E.g., |λ|a = 0, |abb|a = 1, |abb|b = 2.Define by induction on w :
|λ|a = 0 and |vb|a =
{|v |a + 1 if a = b|v |a if a 6= b
• |w | is the length of the word w . E.g., |abb| = 3.(Exercise: define it by induction)
Helle Hvid Hansen 25 April 2014 FLGA 14 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Proof by Induction
Prove that some property P holds for all words.For example, P(u, v) could be |uv | = |u|+ |v |.
A proof by induction works as follows:
• Base case: Show P holds for λ (in example: P(u, λ)
• Induction Hypothesis (IH):Assume that P(u, v) holds for all words v of length < n.
• Show that P(u,w) holds for words w of length n(you may use the IH)
We conclude by induction that P(u,w) holds for all words u,w .
See lecture notes by Silva for more examples.See also exercises of this week.
Helle Hvid Hansen 25 April 2014 FLGA 15 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Formal Language
Def. A language L over Σ is a set of words over Σ, that is, L ⊆ Σ∗.
Examples:
• ∅, {λ} are languages over any Σ.
• L1 = {an ∈ {a, b}∗ | n ∈ N is even}• L2 = {anbn ∈ {a, b}∗ | n ∈ N}• L3 = {anbncn ∈ {a, b, c}∗ | n ∈ N}• L4 = {an ∈ {a}∗ | n ∈ N is prime}• L5 = {w ∈ {0, 1}∗ | w is binary representation of a prime}• L6 = {e | e is a well-formed arithmetical expression}• L7 = {P | P is a syntactically correct Java program}• L8 = {S | S is a grammatically correct English sentence}
Helle Hvid Hansen 25 April 2014 FLGA 16 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Operations on Languages
Let L, L1, L2 ⊆ Σ∗.
Concatenation: L1L2 = {uv ∈ Σ∗ | u ∈ L1, v ∈ L2}
Reversal: LR = {uR ∈ Σ∗ | u ∈ L}
Union: L1 ∪ L2 = {u ∈ Σ∗ | u ∈ L1 or u ∈ L2}
Intersection: L1 ∩ L2 = {u ∈ Σ∗ | u ∈ L1 and u ∈ L2}
Complement: L = {u ∈ Σ∗ | u 6∈ L}
Kleene star: L∗ =⋃
n∈N Ln = L0 ∪ L1 ∪ L2 ∪ L3 ∪ . . .(where L0 = {λ} and Ln+1 = LLn)
= {u1 · · · un | u1, . . . , un ∈ L, n ∈ N}
Helle Hvid Hansen 25 April 2014 FLGA 17 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Regular Expressions
Def. The set RegEx(Σ) of regular expressions over Σ is thesmallest set satisfying:
1 0, 1 and all a ∈ Σ are in RegEx(Σ).
2 If r , s ∈ RegEx(Σ) then also
(r + s), rs, (r)∗
are in RegEx(Σ).
• We assume 0, 1 are not in Σ.
• We will omit parentheses by using convention that: ∗ bindsstronger than concatenation which binds stronger than +.E.g., we write r + st∗ instead of (r + s(t)∗).
Helle Hvid Hansen 25 April 2014 FLGA 18 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Regular Languages
Def. The language L(e) denoted by a regular expressione ∈ RegEx(Σ) is defined inductively by:
L(0) = ∅L(1) = {λ}L(a) = {a} for all a ∈ Σ
L(rs) = L(r)L(s)
L(r + s) = L(r) ∪ L(s)
L(r∗) = L(r)∗
Def. A language L ⊆ Σ∗ is regular if there exists a regularexpression e ∈ RegEx(Σ) such that L = L(e).
Helle Hvid Hansen 25 April 2014 FLGA 19 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Examples of Regular Languages
Let Σ = {a, b}.
regular expression e language L(e)
a + b {a, b} = Σ(a + b)∗ all words over Σ (Σ∗)a(a + b)∗ all words that begin with ab∗(a + 1)b∗ all words that contain zero or one aa(0 + 1 + b)∗ {a, ab, abb, abbb, . . .}(ab∗)∗0 the empty language (∅)((a + b)(a + b))∗ all words of even length(ab∗)∗a∗ Σ∗
Def.Two regular expressions r and s are equivalent if L(r) = L(s).
Helle Hvid Hansen 25 April 2014 FLGA 20 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Some Questions
1 Given a word w and a regular expression e, is there analgorithm that computes whether w ∈ L(e)?
2 Given regular expressions e1, e2 over the same alphabet, isthere an algorithm that computes whether L(e1) = L(e2)?
3 Are all languages regular? If not, then how can we prove thatsome L is not regular?
Helle Hvid Hansen 25 April 2014 FLGA 21 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Summary
Learning goals of today:
• Notion of formal language
• Operations on word and languages
• Regular expressions for specifying regular languages
Helle Hvid Hansen 25 April 2014 FLGA 22 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
Remark on Notation in Lecture Notes
[Silva] [Pitts]
alphabet A, B Σ
empty word/string λ ε
regular expressions 0, 1, r + s ∅, ε, r |s
Helle Hvid Hansen 25 April 2014 FLGA 23 / 24
OrganisationFormal Languages
Regular Expressions and Regular LanguagesConclusion
Radboud University Nijmegen
What is this course (not) about?
FormalLanguages
Grammars(generators)
Automata(acceptors)
This course:regular, context-free languages and their automata and grammars.
Other courses:context-sensitive, recursively enumerable languages, Turingmachines.
Helle Hvid Hansen 25 April 2014 FLGA 24 / 24