+ All Categories
Home > Documents > Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural...

Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural...

Date post: 13-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
74
Natural Language Processing Introduction Regular Expression Notations Examples Text Classification Naive Bayes Example Language Modelling Unigram, Bigram and N-gram Natural Language Processing Nov 19, 2019 1 / 35
Transcript
Page 1: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Natural Language Processing

Nov 19, 2019

1 / 35

Page 2: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

2 / 35

Page 3: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Who wrote the Federalist Papers?

1787-8: anonymous essays try to convince New York toratify U.S Constitution: Jay, Madison, Hamilton.

Authorship of 12 of the letters in dispute.

1963: solved by Mosteller and Wallace using Bayesianmethods.

By the end of this lecture we will see how to do that.

3 / 35

Page 4: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Who wrote the Federalist Papers?

1787-8: anonymous essays try to convince New York toratify U.S Constitution: Jay, Madison, Hamilton.

Authorship of 12 of the letters in dispute.

1963: solved by Mosteller and Wallace using Bayesianmethods.

By the end of this lecture we will see how to do that.

3 / 35

Page 5: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

What makes it hard?

Formal languages are:

unambiguous

Natural languages areambiguous:

“He saw her duck”.“Time flies like an arrow. Fruit flies like a banana”

4 / 35

Page 6: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

By the end of the class

By the end of the class we will see how to do:

1 Text Classification. E.g. Spam detection, Authorshipidentification.

2 Spell Correction. E.g. Auto-correct.

3 Word suggestion.

5 / 35

Page 7: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Regular Expressions

A formal language for specifying text strings.

6 / 35

Page 8: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Notations

• Disjunctions [] :

Pattern Matches[Ww ]oodchuck woodchuck, Woodchuck[0123456789] Any single digit

• Disjunctions |:

Pattern Matchesabc|def Find ‘abc’ or ‘def’.a|b|ab Find ‘a’ or ‘b’ or ‘ab’. Example: ‘abc’

7 / 35

Page 9: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Notations

• Ranges:

Pattern Matches[A− Z ] An uppercase letter.[a− z ] A lowercase letter.[0− 9] A single digit.

• Negation ˆ. (Note: Carat means negation only when its firstin [])

Pattern Matches[ˆA− Z ] Not upper case

[ˆSs] Not ‘S’ nor ‘s’[ˆeˆ] Not ‘e’ nor ‘ˆ’aˆb Search for the pattern‘aˆb’

8 / 35

Page 10: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Notations (? * . + ˆ $)

? 0 or 1 of previous character* 0 or more of previous character+ 1 or more of previous character. Any characterˆ Start anchor$ End anchor\ Escape character

9 / 35

Page 11: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.

Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..

10 / 35

Page 12: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.

• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..

10 / 35

Page 13: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.

cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..

10 / 35

Page 14: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.

• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..

10 / 35

Page 15: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.

end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..

10 / 35

Page 16: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.

• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..

10 / 35

Page 17: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.

end..

10 / 35

Page 18: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..

10 / 35

Page 19: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.

color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.

11 / 35

Page 20: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.

• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.

11 / 35

Page 21: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.

colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.

11 / 35

Page 22: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.

• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.

11 / 35

Page 23: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.

color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.

11 / 35

Page 24: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.

• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.

11 / 35

Page 25: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.

colouur.

11 / 35

Page 26: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Examples

• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.

11 / 35

Page 27: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

We need to find instances of “the” in a text.

the × ‘The’

[Tt]he × ‘Theology’

[ˆA− Za− z ][Tt]he[ˆA− Za− z ]

12 / 35

Page 28: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

We need to find instances of “the” in a text.

the

× ‘The’

[Tt]he × ‘Theology’

[ˆA− Za− z ][Tt]he[ˆA− Za− z ]

12 / 35

Page 29: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

We need to find instances of “the” in a text.

the × ‘The’

[Tt]he × ‘Theology’

[ˆA− Za− z ][Tt]he[ˆA− Za− z ]

12 / 35

Page 30: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

We need to find instances of “the” in a text.

the × ‘The’

[Tt]he

× ‘Theology’

[ˆA− Za− z ][Tt]he[ˆA− Za− z ]

12 / 35

Page 31: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

We need to find instances of “the” in a text.

the × ‘The’

[Tt]he × ‘Theology’

[ˆA− Za− z ][Tt]he[ˆA− Za− z ]

12 / 35

Page 32: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

We need to find instances of “the” in a text.

the × ‘The’

[Tt]he × ‘Theology’

[ˆA− Za− z ][Tt]he[ˆA− Za− z ]

12 / 35

Page 33: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Text classification

Assigning subject categories, topics, or genres.

Spam detection.

Authorship identification.

Age/gender identification.

Language Identification.

· · ·

13 / 35

Page 34: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Text Classification

Inputs:

Document d.Fixed set of classes C = {c1, c2, · · · , cn}.

Output:

A predicted class c ∈ C

14 / 35

Page 35: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Naive Bayes

Relies on simple representation of document – Bag of Words.

For a document d and a class c

P(c |d) =P(d |c)P(c)

P(d)

15 / 35

Page 36: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Naive Bayes

Relies on simple representation of document – Bag of Words.

For a document d and a class c

P(c |d) =P(d |c)P(c)

P(d)

15 / 35

Page 37: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Naive Bayes Classifier

cMAP = argmaxc∈C

P(c |d)

MAP - Maximum a posteriori (most likely class).

cMAP = argmaxc∈C

P(d |c)P(c)

P(d)

cMAP = argmaxc∈C

P(d |c)P(c)

16 / 35

Page 38: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Naive Bayes Classifier

cMAP = argmaxc∈C

P(d |c)P(c)

Let’s say that the document is represented by n featuresx1, x2, · · · xn

cMAP = argmaxc∈C

P(x1, x2, · · · xn|c)P(c)

17 / 35

Page 39: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Assumptions

Bag of words: Position of words does not matter.Conditional Independence: The feature probabilities P(xi |c)are independent given the class c .

P(x1, x2, · · · xn|c) =n∏

i=1

P(xi |c)

18 / 35

Page 40: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Bag of word representation

19 / 35

Page 41: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Bag of word representation

20 / 35

Page 42: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Naive Bayes: Learning

What do we need?Training set of m hand-labeled documents(d1, c1), · · · , (dm, cm)

21 / 35

Page 43: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Naive Bayes: Learning

Let ND be the number of documents, and Ncj be the numberof documents present in class cj .Let Vcj be the set of all words in the documents of class cjNow we find the maximum likelihood estimates:

P̂(cj) =Ncj

ND

P̂(wi |cj) =count(wi , cj)∑

w∈Vcjcount(w , cj)

Now we can classify a document d by:

cd = argmaxcj∈C

P̂(cj)∏wi∈d

P̂(wi |cj)

22 / 35

Page 44: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Naive Bayes: Learning

What if we come across an unknown word in the document d .Let wu be the unknown word P̂(wu|cj) = 0,∀cj .

23 / 35

Page 45: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Laplace smoothing

Let V be the set of all words in the test documents, i.e,V = ∪cjVcj

Add one word for the unknown word in the vocabulary.

P̂(wi |cj) =count(wi , cj) + 1∑

w∈Vcjcount(w , cj) + |V |+ 1

So, for all unknown words, we have:

P̂(wu|cj) =1∑

w∈Vcjcount(w , cj) + |V |+ 1

24 / 35

Page 46: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Training set:

# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b

P̂(a) = 3/4P̂(b) = 1/4

25 / 35

Page 47: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Training set:

# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b

P̂(a) =

3/4P̂(b) = 1/4

25 / 35

Page 48: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Training set:

# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b

P̂(a) = 3/4P̂(b) =

1/4

25 / 35

Page 49: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Training set:

# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b

P̂(a) = 3/4P̂(b) = 1/4

25 / 35

Page 50: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Training set:

# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b

P̂(Carla|a) =(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) = (0 + 1)/(8 + 6 + 1) = 1/15

26 / 35

Page 51: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Training set:

# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b

P̂(Carla|a) =

(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) = (0 + 1)/(8 + 6 + 1) = 1/15

26 / 35

Page 52: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Training set:

# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b

P̂(Carla|a) =(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) =

(0 + 1)/(8 + 6 + 1) = 1/15

26 / 35

Page 53: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Training set:

# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b

P̂(Carla|a) =(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) = (0 + 1)/(8 + 6 + 1) = 1/15

26 / 35

Page 54: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Document d5: Carla Carla Carla Taylor Jessica

P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10

P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008

27 / 35

Page 55: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Document d5: Carla Carla Carla Taylor Jessica

P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10

P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008

27 / 35

Page 56: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Document d5: Carla Carla Carla Taylor Jessica

P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10

P(a|d5) =

3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008

27 / 35

Page 57: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Document d5: Carla Carla Carla Taylor Jessica

P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10

P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) =

1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008

27 / 35

Page 58: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

Document d5: Carla Carla Carla Taylor Jessica

P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10

P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008

27 / 35

Page 59: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Naive Bayes

Naive Bayes is not so naive!!

Robust to Irrelevant Features.

Optimal if the independence assumptions hold.

A good dependable baseline for text classification. - Thereexists other classifiers that give better accuracy

28 / 35

Page 60: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Federalist Papers

Discussion: Federalist papers.E.g. What training set do we need?

29 / 35

Page 61: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Language Modelling

Goal: Assign probability to a sentence.

Why?

Machine Translation. P(high winds tonight) > P(largewinds tonight)

Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)

Speech Recognition. P(I saw a van) > P(eyes awe of an)

30 / 35

Page 62: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Language Modelling

Goal: Assign probability to a sentence.Why?

Machine Translation. P(high winds tonight) > P(largewinds tonight)

Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)

Speech Recognition. P(I saw a van) > P(eyes awe of an)

30 / 35

Page 63: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Language Modelling

Goal: Assign probability to a sentence.Why?

Machine Translation. P(high winds tonight) > P(largewinds tonight)

Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)

Speech Recognition. P(I saw a van) > P(eyes awe of an)

30 / 35

Page 64: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Language Modelling

Goal: Assign probability to a sentence.Why?

Machine Translation. P(high winds tonight) > P(largewinds tonight)

Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)

Speech Recognition. P(I saw a van) > P(eyes awe of an)

30 / 35

Page 65: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Language Modelling

Goal: Assign probability to a sentence.Why?

Machine Translation. P(high winds tonight) > P(largewinds tonight)

Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)

Speech Recognition. P(I saw a van) > P(eyes awe of an)

30 / 35

Page 66: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Language Modelling

Goal: compute the probability of a sentence or sequenceof words. P(W ) = P(w1,w2, · · · ,wn)

Related task: probability of an upcoming word.P(wi |w1,w2, · · · ,wi−1)

Chain rule:

P(x1, x2, · · · , xn)

= P(x1)P(x2|x1)P(x3|x1, x2) · · ·P(xn|x1, x2 · · · , xn−1)

=∏i

P(xi |x1, x2, · · · xi−1)

31 / 35

Page 67: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Language Modelling

Goal: compute the probability of a sentence or sequenceof words. P(W ) = P(w1,w2, · · · ,wn)

Related task: probability of an upcoming word.P(wi |w1,w2, · · · ,wi−1)

Chain rule:

P(x1, x2, · · · , xn)

= P(x1)P(x2|x1)P(x3|x1, x2) · · ·P(xn|x1, x2 · · · , xn−1)

=∏i

P(xi |x1, x2, · · · xi−1)

31 / 35

Page 68: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

P(its water is so transparent that) = P(its)× P(water |its)×P(so|its,water)× P(transparent|its,water , is, so)×P(that|its,water , is, so, transparent)Can we count?

P(that|its,water , is, so, transparent)

=P(its,water , is, so, transparent, that)

P(its,water , is, so, transparent)

No. Too many possibilities.

32 / 35

Page 69: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Example

P(its water is so transparent that) = P(its)× P(water |its)×P(so|its,water)× P(transparent|its,water , is, so)×P(that|its,water , is, so, transparent)Can we count?

P(that|its,water , is, so, transparent)

=P(its,water , is, so, transparent, that)

P(its,water , is, so, transparent)

No. Too many possibilities.

32 / 35

Page 70: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Markov Assumption

Take only the k words preceding it.

P(wi |w1,w2, · · · ,wi−1) ≈ P(wi |wi−k · · · ,wi−1)

P(that|its,water , is, so, transparent) = P(that|transparent)

or,

P(that|its,water , is, so, transparent) = P(that|so, transparent)

33 / 35

Page 71: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Markov Assumption

Take only the k words preceding it.

P(wi |w1,w2, · · · ,wi−1) ≈ P(wi |wi−k · · · ,wi−1)

P(that|its,water , is, so, transparent) = P(that|transparent)

or,

P(that|its,water , is, so, transparent) = P(that|so, transparent)

33 / 35

Page 72: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Unigram, Bigram and N-gram

Unigram model:

P(w1,w2, · · · ,wn−1,wn) ≈∏i

P(wi )

Bigram model:

P(wi |w1,w2, · · · ,wi−1) ≈ P(wi |wi−1)

N-gram model:Extension to trigram, 4-gram, 5-gram, etc.

34 / 35

Page 73: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Discussion

1. Spell correction

2. Word suggestion.

35 / 35

Page 74: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation

NaturalLanguageProcessing

Introduction

RegularExpression

Notations

Examples

TextClassification

Naive Bayes

Example

LanguageModelling

Unigram, Bigramand N-gram

Discussion

1. Spell correction2. Word suggestion.

35 / 35


Recommended