+ All Categories
Home > Documents > OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF...

OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF...

Date post: 24-Aug-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
43
OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science University of Guelph
Transcript
Page 1: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING

May 18, 2016

1

Dr. Fei Song School of Computer Science University of Guelph

Page 2: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Outline

q  What is Statistical Natural Language Processing (SNLP)?

q  Language Models for Information Retrieval q  Text Classification and Sentiment Analysis q  Probabilistic Models (LDA, Bayesian HMM,

and POSLDA) for language processing q  References

2

Page 3: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

What is SNLP? 3

q  Infer and rank the structures from text based on statistical language modeling. §  Probability and Statistics §  Machine Learning Techniques

q  Started in late 1950’s, but didn’t get popular until early 1980’s.

q  Many applications: Information Retrieval, Information Extraction, Text Classification, Text Mining, and Biological Data Analysis.

Page 4: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Language Modeling 4

q  A statistical language model requires the estimates for such probabilities: P(w1,n) = P(w1,w2,…,wn)

q  Probabilities to word sequences?

q  Left-context only? §  The {big, pig} dog … §  P(dog|the big) >> P(dog|the pig)

P(w1 w2 … wn) = P(w1) P(w2|w1) … P(wn|w1 w2 … wn-1) e.g., Jack went to the {hospital, number, if, … }

Page 5: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Noisy Channel Framework

q  Through decoding, we want to find the most likely input for the given observation.

5

i)|p(i)p(oargmax p(o)

i)|p(i)p(oargmax o)|p(iargmax Iiii

===ˆ

Decoder Noisy Channel p(o|i)

I O I

§  Applications: machine translation, optical character recognition, speech recognition, spelling correction.

Page 6: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Language Models for IR

q  N-gram models: Unigram: P(w1,n) = P(w1) P(w2) … P(wn)

Bigram: P(w1,n) = P(w1) P(w2| w1) … P(wn|wn-1)

Trigram: P(w1,n) = P(w1) P(w2| w1) … P(wn|wn-2,n-1)

q  Documents as language samples:

∏=

=n

iin dtPdtttP

121 )|()|,...,,(

6

Page 7: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Language Models for IR

q  Query as a generation process:

),...,,(/)|,...,,()(),...,,|(

2121

21

mm

m

tttPdtttPdPtttdP

)|,...,,()( 21 dtttPdP m⇒

∏=

⇒⇒m

iim dtPdtttP

121 )|()|,...,,(

(Bayesian theorem)

(Uniform prior documents)

(Unigram terms)

7

Page 8: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

A Naïve Solution

q  Maximum likelihood estimate:

: the raw term frequency of term t in document d : the total number of tokens in document d.

d

dtmle dl

tfdtP ,)|( =

dttf ,

ddl

8

Page 9: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Sparse Data Problem

q  A document size is often too small

q  A document size is fixed: P(information, retrieval|d) > 0 && keyword ∉d &&

crocodile ∉d => P(keyword|d) >> P(crocodile|d).

0)|(0)|(1

⇒⇒= ∏=

m

iii dtPdtP

9

Page 10: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Zipf's Law 10

q  Given the frequency f of a word and its rank r in the list of words ordered by their frequencies:

f ∝ 1/r or f x r = k for a constant k

0.00

1000.00

2000.00

3000.00

4000.00

5000.00

6000.00

7000.00

8000.00

9000.00

0 10 20 30 40 50 60

rank

frequency

A small number of common words

A reasonable number of medium-freq words

A large number of rare words

Page 11: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Data Smoothing

q  Laplace’s Law: T is the max number of terms.

q  Extensions to Laplace’s: Lidstone’s Law.

Tdltf

dtPd

dtLAP +

+=

1)|( ,

TdtPTdl

tfdtP mle

d

dtLID /)1()|()|( , µµ

λ

λ−+=

+

+=

where )/( λµ Tdldl dd +=

11

Page 12: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Data Smoothing

q  Smoothed with the collection model:

§  The combined probability is still normalized with values between 0 and 1.

§  Further differentiation between missing terms such as “keyword” and “crocodile”.

§  Collection model can be made stable by adding more documents into the collection.

)()1()|()|( tPdtPdtP collectiondocumentcombined ωω −+×=

12

Page 13: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Text Classifications/Categorizations

q  Common classification problems:

q  Common classification methods: decision trees, maximum entropy modeling, neural networks, and clustering.

Problems Input Categories Tagging context of a word tag for the word Disambiguation context of a word sense for the word PP attachment sentence parse trees Author identification document author(s) Language identification document language(s) Text categorization document topic(s)

13

Page 14: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

What is Sentiment Analysis?

“… after a week of using the camera, I am very unhappy with the camera. The LCD screen is too small and the picture quality is poor. This camera is junk.”

14

Page 15: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Subjective Words

q  A consumer is unlikely to write: “This camera is great. It takes great pictures. The LCD screen is great. I love this camera”.

q  But more likely to write: “This camera is great. It takes breathtaking pictures. The LCD screen is bright and clear. I love this camera”.

q  More diverse usage of subjective words: infrequent within but frequent across documents.

15

Page 16: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Topic Models 16

q  Topic modeling is a relatively new statistical approach to understanding the thematic structure in a collection of data §  Uncovering hidden topics in a corpus of documents §  Reducing dimensionality from words down to topics

q  Topic models treat the document creation as a random process of determining a topic proportion and selecting words from the related topic distributions.

Page 17: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Discover Topics study found drug

research risk

drugs researchers

dr patients

disease vioxx health

increased merck text

brain schizophrenia

studies medical effects

charles prince london

marriage parker camilla bowles

wedding british

thursday king royal

married marry wales

queen diana april

relationship couple

bush protest texas

bushs iraq

president cindy war

ranch

crawford sheehan

son casey killed

antiwar

california george mother road

peace

surface atmosphere

space

system earth probe

european moon

huygens

titan mission friday nasa

scientists cassini

saturns agency data titans 14

17

Page 18: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Discover Hierarchies 18

Page 19: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Topic Use Changing Through Time

Year

Frequency

0.005

0.010

0.015

0.020

0.025

1800 1820 1840 1860 1880 1900

Topichorsesshipsvehicles

19

Page 20: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Each document is a random mixture of topics that are shared across the corpus…

*Harvard Law Review, Vol. 118, No. 7 (May, 2005), pp. 2314-2335 (Note).

Documents exhibit multiple topics… Each word is randomly drawn from a topic …

election voter vote president ballot

law legal lawyer court judge

city state florida united california

This is the generative model of LDA

20

Page 21: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

However, all of this thematic information is hidden. We only observe the words…

*Harvard Law Review, Vol. 118, No. 7 (May, 2005), pp. 2314-2335 (Note).

…Using probabilistic reasoning, we wish to infer the latent structure of the documents

Topic Indices

Topics

Topic Proportions

21

Page 22: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Bayesian Probability 22

q  Bayes’ Theorem

q  Subjective probability: model prior by a given distribution

Beta Prior Linear Likelihood Posterior

P(θ | x) = p(x |θ )p(θ )p(x)

posterior∝ likelihood × prior

Page 23: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Dirichlet Distribution 23

q  Distribution over distributions:

P(θ |α) = 1B(α)

θαi−1

i=1

K

B(α) =Γ(αi )i=1

K∏Γ( αii=1

K∑ )

Page 24: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Latent Dirichlet Allocation (LDA)

§  Initially proposed by Blei, et al. (2003): Generative Process: 1.  ϕ(k) ~ Dir(β) 2.  For each document d ∈ M:

a.  θd ~ Dir(α) b.  For each word w ∈ d:

i.  z ~ Discrete(θd) ii.  w ~ Discrete(ϕ(z))

24

M

α

β

Nz wθ

∏∏

∏∏

= =

==

××

M

m

N

nznmmnm

M

mm

K

kk

m

mwpzp

pp

1 1,,

11

)|()|(

)|()|(

φθ

αθβφ

=),|,,,( βαφθzwp

Page 25: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Inference 25

q  We are interested in the posterior distributions for ϕ, z and θ

q  Computing these distributions exactly is intractable q  We therefore turn to approximate inference

techniques: §  Gibbs sampling, variational inference, …

q  Collapsed Gibbs sampling §  The multinomial parameters are integrated out before

sampling

Page 26: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Gibbs Sampling 26

q  Popular MCMC (Markov Chain Monte Carlo) method that samples from the conditional distributions for the posterior variables

q  For the joint distribution p(x)=p(x1,x2,…, xm): 1. Randomly initialize each xi

2. For t = 1, 2, …, T: 2.1.

2.2.

… 2.m.

),,,|(~ 3211

1tm

ttt xxxxpx !+

),,,|(~ 31

121

2tm

ttt xxxxpx !++

),,,|(~ 11

12

11

1 +−

+++ tm

ttm

tm xxxxpx !

Page 27: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

(Collapsed) Gibbs Sampling 27

q  We integrate out the multinomial parameters φ and θ so that the Markov chain stabilizes more quickly and we have less variables to sample.

q  Our sampling equation is given as follows:

q  GibbsLDA++: a free C/C++ implementation of LDA

ββ

α

α

Wnn

nn

wzzpi

iii

z

zw

dz

dz

ii +

+

+∝− )(

.

)(

.)(

.

)(

),|(

Page 28: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Syntax Models 28

q  Hidden Markov Model (HMM): The probability distribution of the latent variable zi follows the Markov property and depends on the value of the previous latent variable zi-1

z2z1 z4z3

x3 x4x1 x2

...

A

ϕ

q  Each latent state z has a unique emission probability §  This is a mixture model like LDA

q  Useful for unsupervised POS tagging §  Language exhibits a structure due to syntax rules §  State-of-the-art: “Bayesian” HMM where transition rows and

emission probabilities are random variables drawn from Dirichlet distributions [3]

Page 29: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Combining Topic and Syntax Models? 29

q  Considering both axes of information can help us model text more precisely and can thus aid in prediction, processing, and ultimately many NLP tasks

q  Example 1: §  Our favourite city during the trip was _________. §  How do we reason about what the missing word might be? §  An HMM should be able to predict that it’s a noun §  LDA might be able to predict that it’s a travel word* §  A combined model could theoretically determine that it’s a

noun about travel

Page 30: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Combining Topic and Syntax Models? 30

q  Example 2: §  Is the word “book” a noun or a verb?

•  If we know that a “library” topic generated it, it’s much more likely to be a noun

•  If we know that an “airline” topic generated it, it’s more likely to be a verb (“to book a flight”)

q  Example 3: §  We know that the word “seal” is a noun, what is its topic?

•  More likely to be related to “marine mammals” than “construction” (“to seal a crack”)

Page 31: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

POSLDA (Part-Of-Speech LDA) Model 31

q  A “multi-faceted” topic model where word w depends on both topic z and class c when c is a “semantic” class §  wi ~ p(wi | ci , zi)

q  When c is a “syntactic” class the emitted word only depends on class c itself

q  This model results in POS-specific topics and can automatically filter out “stop-words” that must be manually removed in LDA

T*SSEM + SSYN

S

M

θα

w1

z1

c1ϕ

β

π γ

w2

z2

c2

w3

z3

c3

...

...

...

Page 32: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

POSLDA Generative Process 32

1.  For each row πr ∈ π: a.  Draw πr ~ Dirichlet(γ)

2.  For each word distribution ϕn ∈ ϕ: a.  Draw ϕn ~ Dirichlet(β)

3.  For each document d ∈ D: a.  Draw θd ~ Dirichlet(α) b.  For each token i ∈ d:

i.  Draw ci ~ π(ci-1) ii.  If ci ∈ CSYN:

A.  Draw wi ~ ϕSYN(ci) iii.  Else (ci ∈ CSEM):

i.  Draw zi ~ θd ii.  Draw wi ~ ϕSEM(ci, zi)

Page 33: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

POSLDA Interpretability 33

q  Learned word distributions from TREC AP corpus:

Page 34: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

Generalized Probabilistic Model 34

q  POSLDA reduces to LDA when the number of classes S = 1.

q  POSLDA reduces to Bayesian HMM when the number of topics K = 1.

q  POSLDA reduces to HMMLDA when the number of semantic classes Ssem = 1.

Page 35: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

FS from Semantic Classes 35

q  Research has shown that semantic classes such as adjectives, adverbs, and verbs are more useful for SA.

q  Select representative words for a semantic class by picking the top-ranked words with the accumulative probability ≥ θ (e.g., 75% or 90%).

q  Merge all selected words into one set Wsem, and reduce it further by DF-cutoff if needed.

Page 36: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

FS from Semantic Classes with Tagging 36

q  POSLDA is unsupervised and the results do not usually match with human labeled answers.

q  A tagging dictionary contains all the POS tags that can be used for the given words in a corpus.

q  With a tagging dictionary, a word is only assigned to its related POS classes, but if not in the dictionary, the word will participate in all POS classes, same as the unsupervised process for POSLDA.

Page 37: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

FS with Automatic Stopword Removal 37

q  Similar to Wsem, we can also build Wsyn from the syntactic classes to extract topic-independent stopwords.

q  Such a process is both automatic and corpus-specific, avoiding under- or over-removal of the related words.

q  Although POSLDA can separate semantic and syntactic classes, removing stopwords explicitly helps reduce the noise in the dataset.

Page 38: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

FS for Aspect-Based SA 38

q  POSLDA associates each topic with its related semantic classes such as “nouns about sports” and “verbs about travel”.

q  By modeling topics as aspects, we can then select features from the corresponding semantic classes using the methods described earlier.

q  To model aspects, we use manually prepared seed lists (possibly extended with a bootstrapping method), and pin them in the related aspects during the modeling process.

Page 39: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

39

Questions?

Page 40: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

References 40

q  Chris Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999.

q  Chris Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008 (online copy available on the web)

q  Daniel Jurafsky and James H. Martin. Speech and Language Processing. Second Edition. Pearson Education, 2008.

Page 41: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

References 41

q  David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, 2003.

q  Sharon Goldwater and Tom Griffiths. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 744–751, Prague, Czech Republic, June 2007.

Page 42: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

References 42

q  Bo Pang , Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. Processings of EMNLP, 2002.

q  David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993-1022, 2003.

q  Sharon Goldwater and Thomas Griffiths. A fully bayesian approach to unsupervised part-of-speech tagging. Proceedings of the 45th Annual Meeting of the ACL, 2007.

Page 43: OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSINGcs785/SNLP-Overview.pdf · OVERVIEW OF STATISTICAL NATURAL LANGUAGE PROCESSING May 18, 2016 1 Dr. Fei Song School of Computer Science

References 43

q  William M. Darling. Generalized Probabilistic Topic and Syntax Models for Natural Language Processing. Ph.D. Thesis, University of Guelph, 2012.

q  Haochen Zhou and Fei Song. Aspect-Level Sentiment Analysis Based on a Generalized Probabilistic Topic and Syntax Model. Proceedings of FLAIRS-28, 2015.


Recommended