+ All Categories
Home > Documents > BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Date post: 08-Jan-2016
Category:
Upload: avital
View: 29 times
Download: 1 times
Share this document with a friend
Description:
BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING. Shuanhu Bai and Haizhou Li Institute for Infocomm Research, Republic of Singapore. Outline. Introduction N-gram Model Bayesian Learning QB Estimation for Incremental Learning Continuous N-gram Model Bayesian Learning - PowerPoint PPT Presentation
16
BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING Shuanhu Bai and Haizhou Li Institute for Infocomm Research, Republic of Singapore
Transcript
Page 1: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Shuanhu Bai and Haizhou Li

Institute for Infocomm Research, Republic of Singapore

Page 2: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Outline

• Introduction

• N-gram Model– Bayesian Learning– QB Estimation for Incremental Learning

• Continuous N-gram Model– Bayesian Learning– QB Estimation for Incremental Learning

• Experimental Results

• Conclusions

Page 3: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Introduction

• Assuming ample training data, the n-gram language models are still far from optimal

• Studies show that they are extremely sensitive to changes in the style, topic or genre

• LM adaptation aims at bridging the mismatch between the models and the test domain

• A typical n-gram LM is trained under maximum likelihood estimation (MLE) criterion

Page 4: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Introduction (cont.)

• One typical adaptation technique is called deleted interpolation which combines the flat, reliable general model (baseline model) with the sharp, volatile domain specific model

• In this paper, we will study the Bayesian learning formulation for n-gram LM adaptation

• Under the Bayesian learning framework, an incremental adaptation procedure is also proposed for dynamically updating of cache-based n-gram

Page 5: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

N-gram Model

• N-gram model

• The quality of a given n-gram LM on a corpus D of size T is commonly assessed by the log-likelihood probability

• Unigram & Bigram

Page 6: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

N-gram Model (cont.)

• MLE

• Smoothing– Backoff

– cache

Page 7: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Bayesian Learning for N-gram Model

• Dirichlet

• The probability of generating a text corpus is obtained by integrating over the parameter space

• MAP

Page 8: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

QB Estimation for Incremental Learning for N-gram Model

• It is of practical use to devise such incremental learning mechanism that adapts both parameters and the prior knowledge over time

• Sub-corpus Dn={D1,D2,…,Dn}

• The updating of parameters can be iterated between the reproducible prior and posterior estimates

Page 9: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

• ML

• MAP

• QB

Vi

ii C

C

Vii

iii mC

mC

V

ni

nin

i m

m

in

ini

Cmm 1

Page 10: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Continuous N-gram Model

• Continuous n-gram model is also called aggregate Markov model

• We introduce Z hidden variable as the “soft” word classes

• Z=1-> unigram, Z=I -> bigram

• The continuous bigram model has two obviously advantages over the discrete bigram:– Parameters : I x I -> I X Z X 2– Can apply EM to estimate parameters under MLE criterion

Page 11: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Continuous N-gram Model (cont.)

• Parameters

Page 12: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Bayesian Learning for Continuous N-gram Model

• Prior

• After EM algorithm

• Can be interpreted as a smoothing between the known priors and the current observations, or cache corpus

Page 13: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

QB Estimation for Incremental Learning for continuous N-gram Model

• Updating of parameters

• Initial parameters

Page 14: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Experimental Results

• Corpus– A: 60 million words from LDC98T30 of finance and business– B: 20 million words from LDC98T30 of sports and fashion for

incremental training – C: A+B for adaptation– D: 20 million words in the same domain of C (open test set)

• Vocabulary: 50,000 words from A + B

Page 15: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Experimental Results (cont.)

Page 16: BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

Conclusions

• Propose a Bayesian learning approach to n-gram modeling – an interpretation for the smoothing or adaptation of language mo

del as a weighting between prior knowledge and current observations

– The Dirichlet conjugate prior not only leads to a batch adaptation– procedure but also a quasi-Bayes incremental learning strategy f

or on-line language modeling


Recommended