+ All Categories
Home > Documents > Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Date post: 27-Jan-2015
Category:
Upload: tomonari-masada
View: 106 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
17
Transcript
Page 1: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Page 2: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Aim› Find trends in document collections

academic papers, patents, blog entries…

Idea› Construct timestamps arrays as a new

observed data

Method› Modify latent Dirichlet allocation (LDA)

Page 3: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Timestamp array for each document

t

“test”

t t

“test” “group” “group” “group” “effect” “space” “space”

t t−1 t −1 t+1 t+1

Page 4: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Modify LDA

› Draw a topic multinomial Multi(θd) from Dirichlet

› For each word tokens

Draw a topic t from Multi(θd)

Draw a word from multinomial Multi(φt)

› For each timestamp tokens

Draw a topic t from Multi(θd)

Draw a timestamp from multinomial Multi(ψt)

Page 5: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

θαz t

z w

β φ

γ ψ

Page 6: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Different Dirichlet priors for word and

timestamp multinomials

› Taking Bayesian approach also for

timestamps

› Not just introducing new vocabulary

Page 7: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Topics over TimeBag of

TimestampsModification of LDA(Beta distributionfor continuous timestamps)

Modification of LDA(Dirichlet-multinomialfor discrete timestamps)

O(NK) time, O(N) spaceN: number of word tokens

O((N+L)K) time, O(N+L) spaceL: sum of timestamp array lengths

Non-Bayesian termin updating formulafor Gibbs sampling

Additional input parameterfor timestamp array lengths

Page 8: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

θαz t

z w

β φ

ψ1,ψ2

Page 9: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Page 10: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Page 11: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Page 12: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Page 13: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Page 14: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Page 15: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Page 16: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Pros

› Bayesian also for timestamps

› Simple in updating computations

Cons

› Clueless in determining timestamp array

lengths

› Weak for fine-grained timestamps

Page 17: Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

Determining timestamp array lengths› Controlling strength of timestamp data

Parallelization› OpenMP

› CUDA

› MPICH2


Recommended