+ All Categories
Home > Documents > NOMAD: A Distributed Framework for Latent Variable...

NOMAD: A Distributed Framework for Latent Variable...

Date post: 30-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
53
NOMAD: A Distributed Framework for Latent Variable Models Inderjit S. Dhillon Department of Computer Science University of Texas at Austin Joint work with H.-F. Yu, C.-J. Hsieh, H. Yun, and S.V.N. Vishwanathan NIPS 2014 Workshop: Distributed Machine Learning and Matrix Computations Inderjit Dhillon (UT Austin.) Dec 12, 2014 1 / 40
Transcript
Page 1: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

NOMAD: A Distributed Framework forLatent Variable Models

Inderjit S. Dhillon

Department of Computer ScienceUniversity of Texas at Austin

Joint work with H.-F. Yu, C.-J. Hsieh, H. Yun,and S.V.N. Vishwanathan

NIPS 2014 Workshop:Distributed Machine Learning and Matrix Computations

Inderjit Dhillon (UT Austin.) Dec 12, 2014 1 / 40

Page 2: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Outline

Challenges

Matrix Completion

Stochastic Gradient MethodExisting Distributed ApproachesOur Solution: NOMAD-MF

Latent Dirichlet Allocation (LDA)

Gibbs SamplingExisting Distributed Solutions: AdLDA, Yahoo LDAOur Solution: F+NOMAD-LDA

Inderjit Dhillon (UT Austin.) Dec 12, 2014 2 / 40

Page 3: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Large-scale Latent Variable Modeling

Latent Variable Models: very useful in many applications

Latent models for recommender systems (e.g., MF)Topic models for document corpus (e.g., LDA)

Fast growth of data

Almost 2.5× 1018 bytes of data added each day90% of the world’s data today was generated in the past two year

Inderjit Dhillon (UT Austin.) Dec 12, 2014 3 / 40

Page 4: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Challenges

Algorithmic as well as hardware level

Many effective algorithms involve fine-grain iterative computation⇒ hard to parallelizeMany current parallel approaches

bulk synchronization⇒ wasted CPU power when communicatingcomplicated locking mechanism⇒ hard to scale to many machinesasynchronous computation using parameter server⇒ not serializable, danger of stale parameters

Proposed NOMAD Framework

access graph analysis to exploit parallelismasynchronous computation, non-blocking communication, and lock-freeserializable (or almost serializable)successful applications: MF and LDA

Inderjit Dhillon (UT Austin.) Dec 12, 2014 4 / 40

Page 5: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Matrix Factorization:

Recommender Systems

Inderjit Dhillon (UT Austin.) Dec 12, 2014 5 / 40

Page 6: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Recommender Systems

Inderjit Dhillon (UT Austin.) Dec 12, 2014 6 / 40

Page 7: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Matrix Factorization Approach A ≈ WHT

Inderjit Dhillon (UT Austin.) Dec 12, 2014 6 / 40

Page 8: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Matrix Factorization Approach A ≈ WHT

Inderjit Dhillon (UT Austin.) Dec 12, 2014 6 / 40

Page 9: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Matrix Factorization Approach

minW∈Rm×k

H∈Rn×k

∑(i ,j)∈Ω

(Aij − wTi hj)

2 + λ(‖W ‖2

F + ‖H‖2F

),

Ω = (i , j) | Aij is observedRegularized terms to avoid over-fitting

A transform maps users/items to latent feature space Rk

the i th user ⇒ i th row of W , wTi ,

the j th item ⇒ j th column of HT , hj .

wTi hj : measures the interaction.

Inderjit Dhillon (UT Austin.) Dec 12, 2014 7 / 40

Page 10: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

SGM: Stochastic Gradient Method

SGM update: pick (i , j) ∈ ΩRij ← Aij − wT

i hj ,

w i ← w i − η( λ|Ωi |w i − Rijhj),

hj ← hj − η( λ|Ωj |

hj − Rijw i ),

Ωi : observed ratings of i-th row.

Ωj : observed ratings of j-th column.

wT1

wT2

wT3

A11 A12 A13

A21 A22 A23

A31 A32 A33

h1 h2 h3

( )

An iteration : |Ω| updates

Time per update: O(k)

Time per iteration: O(|Ω|k),

better than O(|Ω|k2) for ALS

Inderjit Dhillon (UT Austin.) Dec 12, 2014 8 / 40

Page 11: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Parallel Stochastic Gradient Descent for MF

Challenge: direct parallel updates ⇒ memory conflicts.

Multi-core parallelization

Hogwild [Niu 2011]Jellyfish [Recht et al, 2011]FPSGD** [Zhuang et al, 2013]

Multi-machine parallelization:

DSGD [Gemulla et al, 2011]DSGD ++ [Teflioudi et al, 2013]

Inderjit Dhillon (UT Austin.) Dec 12, 2014 9 / 40

Page 12: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

DSGD/JellyFish [Gemulla et al, 2011; Recht et al, 2011]

x

x

x

x

xx

x

x

x

x

x

x

xx

x

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

xx

x

x

x

x

x

x

xx

x

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

Synchronize and communicate

x

x

x

x

xx

x

x

x

x

x

x

xx

x

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

xx

x

x

x

x

x

x

xx

x

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

Synchronize and communicateInderjit Dhillon (UT Austin.) Dec 12, 2014 10 / 40

Page 13: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Proposed Asynchronous Approach:

NOMAD-MF [Yun et al, 2014]

Inderjit Dhillon (UT Austin.) Dec 12, 2014 11 / 40

Page 14: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Motivation

Most existing parallel approaches require

Synchronization and/or

E.g., ALS, DSGD/JellyFish, DSGD++, CCD++Computing power is wasted:

Interleaved computation and communicationCurse of the last reducer

Locking and/or

E.g., parallel SGD, FPSGD**A standard way to avoid conflict and guarantee serializabilityComplicated remote locking slows down the computationHard to implement efficient locking on a distributed system

Computation using stale valuesE.g., Hogwild, Asynchronous SGD using parameter serverLack of serializability

Q: Can we avoid both synchronization and locking but keep CPU frombeing idle and guarantee serializability?

Inderjit Dhillon (UT Austin.) Dec 12, 2014 12 / 40

Page 15: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Our answer: NOMAD

A: Yes, NOMAD keeps CPU and network busy simultaneously

Stochastic gradient update rule

only a small set of variables involved

Nomadic token passing

widely used in telecommunication areaavoids conflict without explicit remote lockingIdea: “owner computes”NOMAD: multiple “active tokens” and nomadic passing

Features:

fully asynchronous computation

lock-free implementation

non-blocking communication

serializable update sequence

Inderjit Dhillon (UT Austin.) Dec 12, 2014 13 / 40

Page 16: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Access Graph for Stochastic Gradient

Access graph G = (V ,E ):

V = wi ∪ hjE = eij : (i , j) ∈ Ω

Connection to SG:

each eij corresponds to a SG updateonly access to wi and hj

Parallelism:

edges without common node can beupdated in parallelidentify “matching” in the graph

Nomadic Token Passing:

mechanism s.t. active edges alwaysform a “matching”serializability guaranteed

users items

hj

wi

Inderjit Dhillon (UT Austin.) Dec 12, 2014 14 / 40

Page 17: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

More Details

Nomadic Tokens for hj:n tokens

(j , hj): O(k) space

Worker:

p workers

a computing unit + a concurrenttoken queue

a block of W : O(mk/p)

a block row of A: O(|Ω|/p) x

x

x

x

xx

x

x

x

x

x

x

xx

x

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 15 / 40

Page 18: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 19: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 20: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 21: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 22: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 23: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 24: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 25: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 26: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 27: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 28: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Illustration of NOMAD communication

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 16 / 40

Page 29: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Comparison on a Multi-core System

On a 32-core processor with enough RAM.

Comparison: NOMAD, FPSGD**, and CCD++.

0 100 200 300 4000.91

0.92

0.93

0.94

0.95

seconds

test

RM

SE

Netflix, machines=1, cores=30, λ = 0.05, k = 100

NOMAD

FPSGD**

CCD++

(100M ratings)

0 100 200 300 400

22

24

26

seconds

test

RMSE

Yahoo!, machines=1, cores=30, λ = 1.00, k = 100

NOMAD

FPSGD**

CCD++

(250M ratings)

Inderjit Dhillon (UT Austin.) Dec 12, 2014 17 / 40

Page 30: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Comparison on a Distributed System

On a distributed system with 32 machines.

Comparison: NOMAD, DSGD, DSGD++, and CCD++.

0 20 40 60 80 100 120

0.92

0.94

0.96

0.98

1

seconds

test

RM

SE

Netflix, machines=32, cores=4, λ = 0.05, k = 100

NOMAD

DSGD

DSGD++

CCD++

(100M ratings)

0 20 40 60 80 100 120

22

24

26

seconds

test

RMSE

Yahoo!, machines=32, cores=4, λ = 1.00, k = 100

NOMAD

DSGD

DSGD++

CCD++

(250M ratings)

Inderjit Dhillon (UT Austin.) Dec 12, 2014 18 / 40

Page 31: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Super Linear Scaling of NOMAD-MF

0 0.5 1 1.5 2

·104

22

23

24

25

26

27

seconds × machines × cores

test

RMSE

Yahoo!, cores=4, λ = 1.00, k = 100

# machines=1

# machines=2

# machines=4

# machines=8

# machines=16

# machines=32

Inderjit Dhillon (UT Austin.) Dec 12, 2014 19 / 40

Page 32: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Topic Modeling:

Latent Dirichlet Allocation

Inderjit Dhillon (UT Austin.) Dec 12, 2014 20 / 40

Page 33: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Latent Dirichlet Allocation (LDA)

Each topic is a multinomial distribution over words

Each document is a multinomial distribution over topics

Each word is drawn from one of these topics

1source: http://www.cs.columbia.edu/~blei/papers/icml-2012-tutorial.pdfInderjit Dhillon (UT Austin.) Dec 12, 2014 21 / 40

Page 34: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Graphical Model for LDA

wi,jzi,jθiα φt β

j = 1, . . . , ni

i = 1, . . . , I

T

Observed word

Topic assignment

Topic proportion

Proportion parameter Topics

Topic parameter

Joint distribution

Pr(·) =T∏t=1

Pr(φt | β)I∏

i=1

Pr(θi | α)

ni∏j=1

Pr(zi ,j | θi )Pr(wi ,j | φzi,j )

Pr(φt | β), Pr(θi | α): Dirichlet distributions

Pr(w | φt), Pr(z | θi ): multinomial distributions

Inderjit Dhillon (UT Austin.) Dec 12, 2014 22 / 40

Page 35: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Inference for LDA

Only documents are observed

θt , φt , zi ,j are latent

Goal: infer these latent structures1source: http://www.cs.columbia.edu/~blei/papers/icml-2012-tutorial.pdfInderjit Dhillon (UT Austin.) Dec 12, 2014 23 / 40

Page 36: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Posterior Inference for LDA

Task: Pr(θi , φt , zi ,j | di, α, β)

Givena corpus of documents di : i = 1, . . . ,N, α, βeach document di = wi,j : j = 1, . . . , ni

Exact inference for zi ,j , θi , φtIntractableLatent variables are dependent when conditioned on data

Approximate Inference approaches:

Variational Methods

See [Blei et al, 2003]an optimization approachruns fasterbut generates biased results

Gibbs Samplings

See [Griffiths & Steyvers, 2004]an MCMC approachmore accuratebut slower with a vanillaimplementation

Goal: Design a scalable Gibbs sampler for LDA

Inderjit Dhillon (UT Austin.) Dec 12, 2014 24 / 40

Page 37: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Gibbs Sampling for LDA [Griffiths & Steyvers, 2004]

Count matrices for topic assignment zi ,j:ndt : # words of document d assigned to topic tnwt : # of times word w assigned to topic tnt :=

∑w nwt =

∑d ndt

Gibbs Sampling Step1 choose w := wi,j with old assignment to := zi,j of document d := di2 Decrease ndto , nwto , nto by 13 Resample a new assignment tn := zi,j according to

Pr(zi,j = t) ∝ (ndt + α) (nwt + β)

nt + β, ∀t = 1, . . . ,T .

4 Increase ndtn , nwtn , ntn by 1

Constants

J: vocabulary sizeβ = β × J

Inderjit Dhillon (UT Austin.) Dec 12, 2014 25 / 40

Page 38: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Access Pattern for Gibbs Sampling

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

WordsTopics

Docs

zij

nwt

ndt

nt

Inderjit Dhillon (UT Austin.) Dec 12, 2014 26 / 40

Page 39: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Multinomial Sampling Techniques for p ∈ R>+

Initialization Generation Parameter UpdateTime Space Time Time

LSearch Θ(T ) Θ(1) Θ(T ) Θ(1)BSearch Θ(T ) Θ(1) Θ(logT ) Θ(T )Alias Method Θ(T ) Θ(T ) Θ(1) Θ(T )F+tree Sampling Θ(T ) Θ(1) Θ(logT ) Θ(logT )

LSearch

maintain cT = p>1linear searchΘ(1) update

BSearch

maintain c = cumsum(p)binary searchno support for update

Alias Method

Alias tableconstruction has some overheadno support for updates

F+tree

a variant of Fenwick treeconstruction has low overheadlogarithmic time for sampling andupdate

Inderjit Dhillon (UT Austin.) Dec 12, 2014 27 / 40

Page 40: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

F+Tree: Construction

Construction in Θ(T ) time

p = [0.3, 1.5, 0.4, 0.3]>

2.5

001

1.8

010

0.3

100

1.5

101

0.7

011

0.4

110

0.3

111

= p4= p2 = p3= p1

=0.3+1.5 =0.4+0.3

=1.8+0.7

1 2 3 4

Inderjit Dhillon (UT Austin.) Dec 12, 2014 28 / 40

Page 41: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

F+Tree: Sampling

Multinomial sampling in Θ(logT ) time

Initial u: a uniformly number drawn from [0,F [1])

2.5

001

1.8

010

0.3

100

1.5

101

0.7

011

0.4

110

0.3

111

u=2.1 u≥1.8

u=0.3 u< 0.4

1 2 3 4

Inderjit Dhillon (UT Austin.) Dec 12, 2014 29 / 40

Page 42: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

F+Tree: Update

Update in Θ(logT ) time

p3 ← p3 + δ

3.5

001

1.8

010

0.3

100

1.5

101

1.7

011

1.4

110

0.3

111

=0.4+δ

=0.7+δ

=2.5+δ

1 2 3 4

Inderjit Dhillon (UT Austin.) Dec 12, 2014 30 / 40

Page 43: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

F+LDA = LDA with F+tree Sampling

Decomposition of p

pt =(ndt + α) (nwt + β)

nt + β, ∀t = 1, . . . ,T .

= β

(ndt + α

nt + β

)︸ ︷︷ ︸

qt

+ nwt

(ndt + α

nt + β

)︸ ︷︷ ︸

rt

. (1)

p = βq + rtwo-level sampling for p

q is denseonly 2 entries (qto , qtn) change for each Gibbs step in the same documentuse F+Tree for q

r is sparsenonzero entries: Tw := t : ntw 6= 0entire r changes for each Gibbs stepuse BSearch for r

Can also work on word-by-word update sequence

Inderjit Dhillon (UT Austin.) Dec 12, 2014 31 / 40

Page 44: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

F+LDA: Alternative Decomposition

Word-by-word Gibbs sampling sequence

Decomposition of p

pt =(ndt + α) (nwt + β)

nt + β, ∀t = 1, . . . ,T .

= α

(nwt + β

nt + β

)︸ ︷︷ ︸

qt

+ ndt

(nwt + β

nt + β

)︸ ︷︷ ︸

rt

. (2)

p = αq + r

q: slight changes for this sequence ⇒ use F+Tree

r: |Td := t : ndt 6= 0| nonzeros ⇒ use BSearch

Inderjit Dhillon (UT Austin.) Dec 12, 2014 32 / 40

Page 45: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Comparison to Other LDA Sampling

F+LDA F+LDA Sparse-LDA Alias-LDASequence Word-by-Word Doc-by-Doc Doc-by-Doc Doc-by-DocExact? Yes Yes Yes No

Decomposition α(nwt+βnt+β

)+ndt

(nwt+βnt+β

)β(ndt+αnt+β

)+nwt

(ndt+αnt+β

)αβ

nt+β+β(

ndtnt+β

)+nwt

(ndt+αnt+β

)α(nwt+βnt+β

)+ndt

(nwt+βnt+β

)Structure F+tree BSearch F+tree BSearch LSearch LSearch LSearch Alias AliasFresh samples Yes Yes Yes Yes Yes Yes Yes No YesInitialization Θ(logT ) Θ(|Td |) Θ(logT ) Θ(|Tw |) Θ(1) Θ(1) Θ(|Tw |) Θ(1) Θ(|Td |)Sampling Θ(logT ) Θ(log |Td |) Θ(logT ) Θ(log |Tw |) Θ(T ) Θ(|Td |) Θ(|Tw |) Θ(#MH) Θ(#MH)

F+LDA: word-by-word faster than doc-by-doc for large I

|Td | bounded by ni , but |Tw | approaches to Tper Gibbs step cost: ρF logT + ρB |Td |

SparseLDA:

per Gibbs step cost: Θ(T + |Td |+ |Tw |)the first Θ(T ) rarely happens but |Tw | → T for large I

AliasLDA:

per Gibbs step cost: ρA|Td |+ #MHρA ≈ 3× ρB : construction overhead of Alias tableIf (ρA − ρB) |Td | > ρF logT ⇒ AliasLDA slower than F+LDAsay |Td | ≈ 100, F+LDA still faster for T < 250

Inderjit Dhillon (UT Austin.) Dec 12, 2014 33 / 40

Page 46: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Comparison of various sampling methods

Single machine, single thread

y-axis: speedup over normal O(T ) multinomial sampling

Enron: 38K docs with 6M tokens

NyTimes: 0.3M docs with 100M tokens

Inderjit Dhillon (UT Austin.) Dec 12, 2014 34 / 40

Page 47: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Access Pattern for Gibbs Sampling

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

WordsTopics

Docs

zij

nwt

ndt

nt

Inderjit Dhillon (UT Austin.) Dec 12, 2014 35 / 40

Page 48: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Access Graph for Gibbs Sampling

G = (V ,E ): a hyper graph

V = di ∪ wj ∪ sE = eij = (di ,wj , s)

Connection to Gibbs sampling

(di )t := ndi t , (wj)t := nwj t , (s)t := nteach eij : a Gibbs step for word wj in diaccess to (di ,wj , s)

Parallelism: more challenging

all edges incident to sall (s)t are large in general⇒ slightly stale s is fine for accuracyduplicate s for parallelism

words documents

s

summation node

di

wj

Inderjit Dhillon (UT Austin.) Dec 12, 2014 36 / 40

Page 49: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Nomadic Tokens for wj

Nomadic Tokens forwj : j = 1, . . . , J:

J tokens

(j ,wj): O(T ) space

Worker:

p workers

a computing unit + a concurrenttoken queue

a subset of di: O(IT/p)

“x”: an occurrence of a word

bigger rectangle: a subset ofcorpus

smaller rectangle: a unit subtask

x

x

x

x

xx

x

x

x

x

x

x

xx

x

x

x

x

x

xx

x

x

x

x

x

x

x

x

x

Inderjit Dhillon (UT Austin.) Dec 12, 2014 37 / 40

Page 50: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Nomadic Token for s: Circular Delta Update

Single global stravels among machines as amessengerbroadcasts local delta updates

Every machine p: (sp, s)sp: local working copys: snapshot version of global s

s1, s

s2, s

s3, s

s4, s

s← s + (s3 − s)s← ss3 ← s

s

Inderjit Dhillon (UT Austin.) Dec 12, 2014 38 / 40

Page 51: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Comparison on a single multi-core machine

On a machine with a 20-core processor

Comparison: F+NOMAD LDA, Yahoo! LDA

PubMed: 9M docs with 700M tokens

Amazon: 30M docs with 1.5B tokens

Inderjit Dhillon (UT Austin.) Dec 12, 2014 39 / 40

Page 52: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Comparison on a Multi-machine System

32 machines, each with a 20-core processor.

Comparison: F+NOMAD LDA, Yahoo! LDA

Amazon: 30M docs with 1.5B tokens

UMBC: 40M docs with 1.5B tokens

Inderjit Dhillon (UT Austin.) Dec 12, 2014 40 / 40

Page 53: NOMAD: A Distributed Framework for Latent Variable Modelsstanford.edu/~rezab/nips2014workshop/slides/inderjit.pdfNOMAD: A Distributed Framework for Latent Variable Models Inderjit

Conclusions

NOMAD framework uses nomadic tokens to provide

Asynchronous computationNon-blocking communicationLock-free implementationSerializable or near Serializable

Recommender System: Matrix factorization

scalable parallel stochastic gradientSerializability guarantee

Topic Modeling: Latent Dirichlet Allocation

Logarithmic F+tree samplingEfficient Gibbs SamplingDuplicated nomadic tokens for the common nodeOutperforms Yahoo! LDA

Inderjit Dhillon (UT Austin.) Dec 12, 2014 41 / 40


Recommended