+ All Categories
Home > Documents > Lecture 1: Introduction to natural language processing and text ......

Lecture 1: Introduction to natural language processing and text ......

Date post: 23-Jan-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
106
Neural Natural Language Processing Lecture 1: Introduction to natural language processing and text categorization
Transcript
Page 1: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

Neural Natural Language Processing

Lecture 1: Introduction to natural language processing and text

categorization

Page 2: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 2

Plan of the lecture

● Part 1: About the course: logistics, organization, materials, etc.

● Part 2: Motivation for the course: neural NLP models, the “neural revolution” in NLP.

● Part 3: A short introduction to NLP.● Part 4: Text classification task and a simple

model to solve it using Naive Bayes model.

Page 3: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 3

Lecture 1

Part 1: About the course: logistics, organization, materials, etc.

Page 4: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

4

Acknowledgments

● Based on the materials of the following courses:– Lectures and assignments are adopted from “Neural

Networks for Natural Language Processing” course by Nikolay Arefyev (Samsung Moscow Research Center and Moscow State University).

– Seminars are adopted from various sources, notably NLP course of Yandex School of Data Analysis.

– Additional sources will be indicated as needed.

Page 5: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

5

Instructors

Lectures:

● Prof. Alexander Panchenko, Skoltech

● Dr. Nikolay Arefyev, Samsung / Moscow State University

Seminars, assignments:

● Dr. Artem Shelmanov, Skoltech

● Dr. Varvara Logacheva, Skoltech

● Olga Kozlova, MTS Innovation Center

● Viktoria Chekalina, Skoltech / Philips Innovation Center

● Irina Nikishina, Skoltech

● Daryna Dementieva, Skoltech

Final projects:

● Olga Kozlova, Alexander Panchenko, ...

Page 6: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

6

Tentative schedule of the class

Page 7: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

7

Assignments

● A Kaggle-style competition for the best F-score● One task (sentiment analysis), different models

Page 8: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

8

Assignments

● Sentiment analysis using Naive Bayes classifier.

● Sentiment analysis using Logistic Regression and a Feedforward Neural Network.

● Sentiment analysis using word and document embeddings.

● Sentiment analysis using RNNs.● Sentiment analysis using BERT or ELMo.

Page 9: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

9

Assignments

● Sentiment analysis using Naive Bayes classifier.

● Sentiment analysis using Logistic Regression and a Feedforward Neural Network.

● Sentiment analysis using word and document embeddings.

● Sentiment analysis using RNNs.● Sentiment analysis using BERT or ELMo.

Model complexity, performance(?)

Page 10: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

10

Assignments

Evaluation criterion:● Results: what was the rank of your solution

among other submissions?● Reproducibility: the possibility to get the

results using your script.● Readability: how easy is to understand your

code? ● Timing: did you deliver in time?

Page 11: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

11

Final project

Various options are:● Find an interesting task and propose a (neural)

NLP model to solve it.● Propose a new NLP task or a variant of some

existing one and come up with a baseline for its solution.

● Get a recently published NLP paper and replicate its results. Discuss the outcomes.

Page 12: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

12

Final project

● The list of topics can be found here: http://bit.ly/nnlp_topics– To be further extended.

● Can be done in a group up to 3 people● You can propose yours as well● To propose a topic, enter here your name and topic

http://bit.ly/nnlp_topics_distribution● It is advised to ask an instructor during a seminar about the

suitability of a topic (but this is not a strict requirement).

Page 13: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

13

Final project

Requirements:● Outcome of a project is a Jupyter notebook which describes the

entire experiment:– It should be readable (with supporting text: task, motivation, discussion)– It should be executable: we should be able to reproduce your results

from the first try.

● Due to the time constraints: no oral presentation. Rather communicate what you have done in code, text, formulas, tables, and plots.

● Deadline: 19.12.2019 EoD. ● Suggestion is to start ASAP!

Page 14: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

14

Final project

Evaluation criteria:● Relevance of the task: Are you tackling a relevant research

problem? Did you do something which has not been done yet (at least in some aspect) or a solution is available from Github already before your started?

● Readability: can we easily understand what has been done?● Reproducibility: can we get the same numbers and plots?● Results: did you manage to improve something (or gain some

interesting insights about the negative results)?● Originality: how innovative was the approach you used?● Timing: did you deliver in time?

Page 15: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

15

Exam

● Is not obvious to organize in our case:– … e.g.the Deep Learning course has no exam.

● Mostly questions about various models:– Structure,– Applications,– Training methods,– Objectives.

Page 16: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

16

Cost of various activities

● Assignments: 40%● Final project: 40%● Exam: 20%

● If you already completed a similar NLP course and/or have a publication not lower than a workshop level at a major NLP conference you can do a final project worth 80% and skip the assignments.– The topic will be provided by instructor (less freedom in topic choice).– The load is expected to be the same as Assignments+Final project.

Page 17: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

17

Prerequisites● Basic concepts from Calculus, Linear Algebra, Probability, Statistics, and

Computer Science.

● Fundamentals of Machine Learning:– Recommended machine learning courses

https://www.coursera.org/learn/machine-learning, http://cs229.stanford.edu – … or analogous course on ML and DL in Skoltech!

● Python programming language:– Programming assignments are in Python;– De facto standard for ML/DL/NLP.

● This is NOT a generic machine learning / deep learning course:– Some introductory lectures will give a reminder on the basics, though;– We rather focus on specific architectures of neural networks in NLP.

Page 18: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

18

Outline of the course topics

Page 19: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

19

Lecture logistics

● 45 minutes of lecture● 10 minutes break● 45 minutes of lecture● 10 minutes break● 45 minutes of lecture

Page 20: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

20

Let us drive right in!

Image source: http://fastml.com/introduction-to-pointer-networks

Page 21: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 21

Lecture 1

● Part 1: About the course: logistics, organization, materials, etc.

● Part 2: Motivation for the course: neural NLP models, the “neural revolution” in NLP.

● Part 3: A short introduction to NLP.● Part 4: Text classification task and a simple

model to solve it using Naive Bayes model.

Page 22: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

22

Natural Language

● Language is what makes us different from other living beings:– Allowing sharing and accumulation of knowledge;– Allowing to organize a society in a complex way;– ...

Image source:Wikipedia

Page 23: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 23

Natural Language

Images source: Wikipedia

Page 24: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

24

Natural Language Processing (NLP)● NLP is a subfield of Artificial Intelligence (AI) which relies on:

– Computer Science (recently most notably machine learning)– Linguistics

● The goal is to make computers understand and generate natural language to perform useful tasks like:– Translate a text from one language to another, e.g. Yandex Translate– Search and extract information

● Search engines, e.g. Google● Question answering systems, e.g. IBM Watson

– Dialogue systems● Answer questions, execute voice commands, voice typing● Samsung Bixby, Apple Siri, Google Assistant, etc.

● Language understanding is an “AI-complete” problem– we hope to train computers to extract signal relevant for a particular task

Page 25: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

25

More NLP Applications● Dialog systems for customer support

● Sentiment analysis

● Topic categorization

● Spell checking

● Summarization

● Fact extraction

Page 26: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

26

Traditional NLP Pipeline

Source of the slide: Socher & Manning, cs224n

Page 27: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

27

A glance on the history of Natural Language Processing

A part of table of contents of Jurafsky & Martin (2009) textbook augmented with points 1.6.7 and 1.6.8

Page 28: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

28

ML vs. DL: Function family F?

Source: Socher, Manning. Cs224n, 2017

Page 29: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

29

Good old-fashioned ML

Source: Socher, Manning. CS224n, 2017

Page 30: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

30

Deep Learning

Source: Socher, Manning. CS224n, 2017

Page 31: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

31

Why Deep Learning?

Source: Socher, Manning. CS224n, 2017

Page 32: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

32

Why now?

Source: Socher, Manning. Cs224n, 2017

Page 33: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

33

Speech recognition

Source: Hinton, Neural Networks for Machine Learning @ Coursera, 2012 (Lecture 1, slide 13)

>30% WER improvement

Page 34: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

34

Speech recognition

Source: Hinton, Bengio & LeCun, Deep Learning, NIPS’2015 Tutorial, slide 69

Page 35: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

35

ImageNet● > 1.4M images from the web, 1000 classes

NVIDIA CES 2016 Press Conference, slide 10

● Krizhevsky, Sutskever, Hinton, 2012: ● 74.2% →83.6% Top 5 accuracy● 25.8%→16.4% Top 5 error rate● 36% error reduction (fixed every third error)

Page 36: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 36

ImageNetTop 5 Error Rate

● Human error:– 5.1% (trained and patient)– 15% (non-trained, less patient)

● Best result in 2016: 3.08%Inception-v4 + 3xResNet ensemble

[Fei-Fei Li & Justin Johnson & Serena Yeung, cs231n, 2017. Lecture 1]

[Andrej Karpathy, What I learned from competing against a ConvNet on ImageNet, 2014]

Page 37: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

37

ImageNet – Learnt features

Matthew D. Zeiler and Rob Fergus, Visualizing and Understanding Convolutional Networks

Page 38: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

38

ImageNet – Learnt features

Matthew D. Zeiler and Rob Fergus, Visualizing and Understanding Convolutional Networks

Page 39: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

39

Source: Jawahar G., Sagot B., Seddah D. What does BERT learn about the structure of language? ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Jul 2019, Florence, Italy

What does BERT learn about the structure of language?

Page 40: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

40

Source: Jawahar G., Sagot B., Seddah D. What does BERT learn about the structure of language? ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Jul 2019, Florence, Italy

What does BERT learn about the structure of language?

Page 41: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

41

The ongoing “neural revolution” in NLP: from Collobert to BERT

What problems Neural NLP is addressing:● The need of feature engineering.● Curse of dimensionality:

– SVD and NMT can be used to obtain embeddings, but they the algorithms doesn’t scale well do large datasets.

● The need to develop a custom algorithm / model for each task separately.– Rather the idea is to try to develop a single model for any

NLP task.

Page 42: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

42

The ongoing “neural revolution” in NLP: from Collobert to BERT

What problems Neural NLP is addressing:● The need of feature engineering.

Page 43: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

43

The ongoing “neural revolution” in NLP: from Collobert to BERT

What problems Neural NLP is addressing:● Curse of dimensionality:

Page 44: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

44

A simpler and a more generic NLP pipeline

Source: https://explosion.ai/blog/deep-learning-formula-nlp?ref=Welcome.AI

Page 45: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

45

A simpler and more generic NLP pipeline … which yields good results

Step 1: Embed

An embedding table maps long, sparse, binary vectors into shorter, dense, continuous vectors.

Source: https://explosion.ai/blog/deep-learning-formula-nlp?ref=Welcome.AI

Page 46: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

46

A simpler and more generic NLP pipeline … which yields good resultsStep 2: Encode

Given a sequence of word vectors, the encode step computes a representation that I'll call a sentence matrix, where each row represents the meaning of each token in the context of the rest of the sentence.

Source: https://explosion.ai/blog/deep-learning-formula-nlp?ref=Welcome.AI

Page 47: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

47

A simpler and more generic NLP pipeline … which yields good resultsStep 3: Attend

The attend step reduces the matrix representation produced by the encode step to a single vector, so that it can be passed on to a standard feed-forward network for prediction.

Source: https://explosion.ai/blog/deep-learning-formula-nlp?ref=Welcome.AI

Page 48: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

48

A simpler and more generic NLP pipeline … which yields good results

Step 4: Predict

Once the text or pair of texts has been reduced into a single vector, we can learn the target representation — a class label, a real value, a vector, etc.

Source: https://explosion.ai/blog/deep-learning-formula-nlp?ref=Welcome.AI

Page 49: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

49

Page 50: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

50Source: Socher, Manning. Cs224n, 2017

Page 51: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

51

MT vs. Human translation

https://www.eff.org/ai/metrics#Translation

Page 52: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

52

Google Neural Machine Translation (NMT) System

Source: Socher, Manning. Cs224n, 2017

Page 53: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

53

GLUE benchmark

Source: Wang et al. GLUE: A Multi-task benchmark and analysis platform for Natural Language Understanding, 2019

Page 54: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

54

GLUE leaderboard

Source: https://gluebenchmark.com/leaderboard

Page 55: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

55

Source: https://super.gluebenchmark.com/leaderboard

SuperGLUE leaderboard

Page 56: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 56

Lecture 1

● Part 1: About the course: logistics, organization, materials, etc.

● Part 2: Motivation for the course: neural NLP models, the “neural revolution” in NLP.

● Part 3: A short introduction to NLP.● Part 4: Text classification task and a simple

model to solve it using Naive Bayes model.

Materials in this part are adopted from: Rao D. & McMahan (2019): Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning. O’Reilly. 1st Edition. ISBN-13: 978-1491978238

Page 57: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

57

A Quick Tour of Traditional NLP

● Natural language processing (NLP) and computational linguistics (CL) are two areas of computational study of human language:– NLP – how to build a technical system which knows something about (i.e performs

processing of) human language: solving practical problems involving language, such as:

● information extraction;● automatic speech recognition;● machine translation;● sentiment analysis;● question answering;● summarization.

– CL – how to learn about some aspect of language using various mathematical and computational methods, models, and algorithms: employs computational methods to understand properties of human language.

● How do we understand language?● How do we produce language?● How do we learn languages?● What relationships do languages have with one another?

Page 58: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

58

A Quick Tour of Traditional NLP

● Natural language processing (NLP) and computational linguistics (CL) are two areas of computational study of human language:– NLP – how to build a technical system which knows something about (i.e performs

processing of) human language: solving practical problems involving language, such as:

● information extraction;● automatic speech recognition;● machine translation;● sentiment analysis;● question answering;● summarization.

– CL – how to learn about some aspect of language using various mathematical and computational methods, models, and algorithms: employs computational methods to understand properties of human language.

● How do we understand language?● How do we produce language?● How do we learn languages?● What relationships do languages have with one another?

Page 59: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

59

Corpora, Tokens, and Types

● NLP methods, be they classic or modern, begin with a text dataset, also called a corpus (plural: corpora).– A corpus usually contains raw text (in ASCII or UTF8) and

any metadata associated with the text.

● The raw text is a sequence of characters (bytes), but most times it is useful to group those characters into contiguous units called tokens.

● Types are unique tokens present in a corpus. The set of all types in a corpus is its vocabulary or lexicon.

Page 60: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

60

Corpora, Tokens, and Types

Page 61: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

61

Tokenization

● The process of breaking a text down into tokens is called tokenization.– There are six tokens in the sentence “Mary

slapped the green witch.” – “.” is one of them. – Tokenization can become more complicated than

simply splitting text based on nonalphanumeric characters.

Page 62: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 62

Tokenization: the case of Turkish

Page 63: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

63

Tokenization: Twitter data

● Tokenizing tweets involves preserving hashtags and @handles, and segmenting smilies such as :-) and URLs as one unit.

● Those decisions can significantly affect accuracy in practice!

Page 64: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

64

Tokenzation

● Using SpaCy

● Using NLTK

Page 65: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

65

Feature engineering

● Feature engineering is the process of understanding the linguistics of a language and applying it to solving NLP problem.

● This is something that we keep to a minimum in neural NLP for:– portability of models across languages;– applicability to more tasks;– avoiding the need in expert knowledge.

● When building realworld production systems,feature engineering is indispensable, despiterecent claims to the contrary.– Will it change in future?

Page 66: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

66

Unigrams, Bigrams, Trigrams, …, N-grams

● N-grams are fixed length (n) consecutive token sequences occurring in the text:– Bigram has two tokens;– Unigram has one token, etc.

Page 67: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

67

Unigrams, Bigrams, Trigrams, …, N-grams

● When subword information itself carries useful information, one might want to generate character N-grams:– For example, the suffix “ol” in “methanol” indicates it

is a kind of alcohol.

Page 68: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

68

Lemmas and Stems

● Lemmas are root forms of words.● Verb fly can be inflected into many different

word forms: flow, flew, flies, flown, flowing.● Lemmatization is reducing the tokens to their

lemmas, e.g. to keep the dimensionality of the vector representation low.

Page 69: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

69

Lemmas and Stems

● Stemming use of handcrafted rules to strip endings of words to reduce them to a common form called stems.– Cons: quality, the “poorman’s lemmatization”

– Pros: efficiency, was/is popular in information retrieval for this reason.

Page 70: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 70

Categorizing Sentences and Documents

● One of the earliest applications of NLP– Topic categorization, predicting sentiment of

reviews, filtering spam emails, language identification, spam filtering, etc.

Page 71: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

71

Categorizing Sentences and Documents: TF representation

Page 72: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

72

TF-IDF representation: TF(w) IDF(⋅ w)

● TF representation weights word w proportional to its frequency:– Common words do not add anything to understanding. – A rare word is likely to be indicative.

● TF-IDF penalizes common tokens and rewards rare tokens in the vector representation:– nw is the number of documents containing the word w

and N is the total number of documents

Page 73: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

73

TF-IDF representation: TF(w) IDF(⋅ w)

Page 74: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

74

Categorizing Words: POS Tagging

● One can label not only documents but also individual words or tokens:– Part-of-speech (POS) tagging– Morphological analysis, etc.

Page 75: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

75

Categorizing Spans: Chunking and Named Entity Recognition

● Label a span of text - a contiguous multitoken sequence.– Chunking:

[NP Mary] [VP slapped] [the green witch] – Named entity recognition:

[PER Mary Johnson] slapped the green witch

Page 76: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

76

Categorizing Spans: Chunking and Named Entity Recognition

● Chunking:

● Named entity recognition:

Page 77: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

77

Structure of sentences: identifying relations between phrases

A constituent parse of the sentence “Mary slapped the green witch.”

Page 78: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

78

Structure of sentences: identifying relations between phrases

A dependency parse of the sentence “Mary slapped the green witch.”

Page 79: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 79

Word Senses and Semantics

● Words can have multiple senses– WordNet– Automatic

discovery of senses from context

– ...

Page 80: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

80

Lecture 1

● Part 1: About the course: logistics, organization, materials, etc.

● Part 2: Motivation for the course: neural NLP models, the “neural revolution” in NLP.

● Part 3: A short introduction to NLP.● Part 4: Text classification task and a simple

model to solve it using Naive Bayes model.

Materials in this part are adopted from: Jurafsky & Martin (2019): Speech and Language Processing (3rd edition). https://web.stanford.edu/~jurafsky/slp3/

Page 81: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

81

Who wrote which Federalist papers?

● 1787-8: anonymous essays try to convince New York to ratify U.S Constitution: Jay, Madison, Hamilton.

● Authorship of 12 of the letters in dispute● 1963: solved by Mosteller and Wallace using

Bayesian methods

James Madison Alexander Hamilton

Page 82: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

82

Positive or negative movie review?

● Unbelievably disappointing ● Full of zany characters and richly applied

satire, and some great plot twists● This is the greatest screwball comedy ever

filmed● It was pathetic. The worst part about it was

the boxing scenes.

Page 83: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

83

What is the subject of this article?

• Antogonists and Inhibitors• Blood Supply• Chemistry• Drug Therapy• Embryology• Epidemiology• …

MeSH Subject Category Hierarchy

?

MEDLINE Article

Page 84: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

84

Text Classification

● Assigning subject categories, topics, or genres● Spam detection● Authorship identification● Age/gender identification● Language Identification● Sentiment analysis● …

Page 85: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

85

Text Classification: definition

Input:• a document d• a fixed set of classes C = {c1, c2,…, cJ}

Output: a predicted class c C

Page 86: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

86

Classification Methods: Hand-coded rules

● Rules based on combinations of words or other features– spam: black-list-address OR (“dollars” AND“have

been selected”)

● Accuracy can be high– If rules carefully refined by expert

● But building and maintaining these rules is expensive

Page 87: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

87

Classification Methods:Supervised Machine Learning

• Input: • a document d• a fixed set of classes C = {c1, c2,…, cJ}• A training set of m hand-labeled documents

(d1,c1),....,(dm,cm)

• Output:

• a learned classifier γ:d c

Page 88: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

88

Classification Methods:Supervised Machine Learning

● Any kind of classifier– Naïve Bayes– Logistic regression– Support-vector machines– k-Nearest Neighbors

….● Deep neural networks

Page 89: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

89

Naïve Bayes Intuition

● Simple (“naïve”) classification method based on Bayes rule

● Relies on very simple representation of document– Bag of words

Page 90: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

90

The Bag of Words Representation

Page 91: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

91

The Bag of Words Representation

Page 92: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

92

Bayes’ Rule Applied to Documents and Classes

•For a document d and a class c

P(c | d) =P(d | c)P(c)

P(d)

Page 93: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

93

Naïve Bayes Classifier

MAP is “maximum a posteriori” = most likely class

MAP is “maximum a posteriori” = most likely class

Bayes RuleBayes Rule

Dropping the denominator

Dropping the denominator

cMAP = argmaxcC

P(c | d)

= argmaxcC

P(d | c)P(c)

P(d)

= argmaxcC

P(d | c)P(c)

Document d represented as features x1..xn

Document d represented as features x1..xn

= argmaxcC

P(x1, x2,…, xn | c)P(c)

Page 94: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

94

Naïve Bayes Classifier

How often does this class occur?

How often does this class occur?

O(|X|n•|C|) parametersO(|X|n•|C|) parameters

We can just count the relative frequencies in a corpus

We can just count the relative frequencies in a corpus

Could only be estimated if a very, very large number of training examples was available.

Could only be estimated if a very, very large number of training examples was available.

cMAP = argmaxcC

P(x1, x2,…, xn | c)P(c)

Page 95: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

95

Multinomial Naïve Bayes Independence Assumptions

P(x1, x2,…, xn | c)

• Bag of Words assumption: Assume position doesn’t matter

• Conditional Independence: Assume the feature probabilities P(xi|cj) are independent given the class c.

P(x1,…, xn | c) = P(x1 | c)·P(x2 | c)·P(x3 | c)·...·P(xn | c)

Page 96: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

96

Multinomial Naïve Bayes Classifier

cMAP = argmaxcC

P(x1, x2,…, xn | c)P(c)

cNB = argmaxcC

P(c j ) P(x | c)xX

Õ

positions all word positions in test document

cNB = argmaxc jC

P(c j ) P(xi | c j )ipositions

Õ

Applying Multinomial Naive Bayes Classifiers to Text Classification:

Page 97: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

97

Learning the Multinomial Naïve Bayes Model

• First attempt: maximum likelihood estimates• simply use the frequencies in the data

P̂(c j ) =doccount(C = c j )

Ndoc

fraction of times word wi appears

among all words in documents of topic cj

P̂(wi | c j ) =count(wi,c j )

count(w,c j )wV

å

• Create mega-document for topic j by concatenating all docs in this topic• Use frequency of w in mega-document

Page 98: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

98

Problem with Maximum Likelihood

• What if we have seen no training documents with the word fantastic and classified in the topic positive (thumbs-up)?

• Zero probabilities cannot be conditioned away, no matter the other evidence!

P̂("fantastic" positive) = count("fantastic", positive)

count(w, positivewV

å ) = 0

cMAP = argmaxc P̂(c) P̂(xi | c)i

Õ

Page 99: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

99

Laplace (add-1) smoothing for Naïve Bayes

=count(wi,c)+1

count(w,cwV

å )æ

èçç

ö

ø÷÷ + V

P̂(wi | c) =count(wi,c)

count(w,c)( )wV

å

Page 100: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

100

Multinomial Naïve Bayes: Learning

• Calculate P(cj) terms• For each cj in C do

docsj all docs with class =cj

• From training corpus, extract Vocabulary

P(wk | c j )nk +a

n+a |Vocabulary |

P(c j )| docs j |

| total # documents|

Page 101: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

101

Summary: Naive Bayes is Not So Naive

● Very Fast, low storage requirements● Robust to Irrelevant Features

– Irrelevant Features cancel each other without affecting results

● Very good in domains with many equally important features● Optimal if the independence assumptions hold: If assumed

independence is correct, then it is the Bayes Optimal Classifier for problem

● A good dependable baseline for text classification● But we will see other classifiers that give better accuracy

Page 102: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

102

Evaluation: Precision and Recall

● The 2-by-2 contingency table:

● Precision: % of selected items that are correct● Recall: % of correct items that are selected

correct not correctselected tp fp

not selected fn tn

Page 103: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

103

Evaluation: F1 score

• A combined measure that assesses the P/R tradeoff is F measure (weighted harmonic mean):

• The harmonic mean is a very conservative average;

• People usually use balanced F1 measure i.e., with = 1 (that is, = ½): F = 2PR/(P+R)

RP

PR

RP

F+

+=

-+=

2

2 )1(1

)1(1

1

aa

Page 104: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

104

Evaluation: Confusion matrix c

• For each pair of classes <c1,c2> how many documents from c1 were incorrectly assigned to c2?• c3,2: 90 wheat documents incorrectly assigned to poultry

Docs in test set AssignedUK

Assigned poultry

Assigned wheat

Assigned coffee

Assigned interest

Assigned trade

True UK 95 1 13 0 1 0

True poultry 0 1 0 0 0 0

True wheat 10 90 0 1 0 0

True coffee 0 0 0 34 3 7

True interest - 1 2 13 26 5

True trade 0 0 2 14 5 10

Page 105: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

105

Evaluation: per class measures

Recall: Fraction of docs in class i classified correctly:

Precision: Fraction of docs assigned class i that are actually about class i:

Accuracy: (1 - error rate) Fraction of docs classified correctly:

ciii

å

ciji

åj

å

ciic ji

j

å

ciicij

j

å

Page 106: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

106

Development Test Sets and Cross-validation

• Metric: P/R/F1 or Accuracy• Unseen test set

• avoid overfitting (‘tuning to the test set’)• more conservative estimate of performance

• Cross-validation over multiple splits• Handle sampling errors from different datasets

• Pool results over each split• Compute pooled dev set performance

Training set Development Test Set Test Set

Test Set

Training Set

Training SetDev Test

Training Set

Dev Test

Dev Test


Recommended