+ All Categories
Home > Documents > Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel...

Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel...

Date post: 05-Oct-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
49
Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling – IMA, Jan 25, 2016 1 / 49
Transcript
Page 1: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Asynchronous Parallel Computing in Signal Processing andMachine Learning

Wotao Yin (UCLA Math)

joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU)

Optimization and Parsimonious Modeling – IMA, Jan 25, 2016

1 / 49

Page 2: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Do we need parallel computing?

2 / 49

Page 3: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Back in 1993

3 / 49

Page 4: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

2006

4 / 49

Page 5: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

2015

5 / 49

Page 6: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

35 Years of CPU Trend

1995 2000 2005 2010 2015

Number

of CPUs

Performance

per core

Cores

per CPU

D. Henty. Emerging Architectures and Programming Models for Parallel Computing, 2012.

• In May 2014, Intel cancelled its Tejas project (single-core) and announced anew multi-core project.

6 / 49

Page 7: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Today: 4x ADM 16-core 3.5GHz CPUs (64 cores total)

7 / 49

Page 8: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Today: Tesla K80 GPU (2496 cores)

8 / 49

Page 9: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Today: Octa-Core Headsets

9 / 49

Page 10: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Free lunch was over

• before 2005: a single-threaded algorithm automatically gets faster

• now, new algorithms must be developed for faster speeds by

• exploiting problem structures

• taking advantages of dataset properties

• using all the cores available

10 / 49

Page 11: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

How to use all the cores available?

11 / 49

Page 12: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Parallel computing

Agent

Agent

Agent

t1t2tN · · ·

Problem

12 / 49

Page 13: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Parallel speedup

• definition:speedup = serial time

parallel timetime is in the wall-clock sense

• Amdahl’s Law: N agents, no overhead, ρ = percentage of parallelcomputing

ideal speedup = 1(ρ/N) + (1− ρ)

100

102

104

106

108

0

5

10

15

20

Number of processors

Speedup

25%50%90%95%

13 / 49

Page 14: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Parallel speedup

• ε := parallel overhead (startup, synchronization, collection)• in the real world

actual speedup = 1(ρ/N) + (1− ρ) + ε

100

102

104

106

108

0

2

4

6

8

10

Number of processors

Speedup

25%50%90%95%

100

102

104

106

108

0

5

10

15

20

Number of processors

Speedup

25%50%90%95%

when ε = N when ε = log(N)

14 / 49

Page 15: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Sync-parallel versus async-parallel

Agent 1

Agent 2

Agent 3

idle idle

idle

idle

Synchronous(wait for the slowest)

Agent 1

Agent 2

Agent 3

Asynchronous(non-stop, no wait)

15 / 49

Page 16: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Async-parallel coordinate updates

16 / 49

Page 17: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Fixed point iteration and its parallel version

• H = H1 × · · · × Hm

• original iteration: xk+1 = Txk =: (I − ηS)xk

• all agents do in parallel:

agent 1: xk+11 ← T1(xk) = xk1 − ηS1(xk)

agent 2: xk+12 ← T2(xk) = xk2 − ηS2(xk)

...agent m: xk+1

m ← Tm(xk) = xkm − ηSm(xk)

• assumption:1. coordinate friendliness: cost of Six ∼ 1

mcost of Sx

2. synchronization after each iteration

17 / 49

Page 18: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Comparison

Synchronousnew iteration = all agents finish

Agent 1

Agent 2

Agent 3

t0 t1 t2 t3 t4 t5 t6 t7 t10t8 t9

Asynchronousnew iteration = any agent finishes

18 / 49

Page 19: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

ARock1: Async-parallel coordinate update

• H = H1 × · · · × Hm• p agents, possibly p 6= m

• each agent randomly picks i ∈ {1, . . . ,m} and updates just xi:

xk+11 ← xk1

...xk+1i ← xki − ηkSixk−dk

...xk+1m ← xkm

• 0 ≤ dk ≤ τ , maximum delay

1Peng-Xu-Yan-Y.’1519 / 49

Page 20: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Two ways to model xk−dk

definitions: let x0, ..., xk, ... be the states of x in the memory

1. xk−dk is consistent if dk is a scalar2. xk−dk is possibly inconsistent if dk is a vector, different components are

delayed by different amounts

ARock allows both consistent and inconsistent read.

20 / 49

Page 21: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Memory lock illustration

Agent 1 read [0, 0, 0, 0]T = x0

consistent read

Agent 1 read [0, 0, 0, 2]T 6∈ {x0, x1, x2}

inconsistent read

21 / 49

Page 22: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

History and recent literature

22 / 49

Page 23: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Brief history of async-parallel algorithms(mostly worst case analysis)

• 1969 – a linear equation solver by Chazan and Miranker;

• 1978 – extended to the fixed-point problem by Baudet under theabsolute-contraction2 type of assumption.

• For 20–30 years, mainly solve linear, nonlinear and differential equations bymany people

• 1989 – Parallel and Distributed Computation: Numerical Methods byBertsekas and Tsitsiklis. 2000 – Review by Frommer and Szyld.

• 1991 – gradient-projection itr assuming a local linear-error bound by Tseng

• 2001 – domain decomposition assuming strong convexity by Tai & Tseng

2An operator T : Rn → Rn is absolute-contractive if |T (x)− T (y)| ≤ P |x− y|, component-wise, where|x| denotes the vector with components |xi|, i = 1, ..., n, and P ∈ Rn×n

+ and ρ(P ) < 1.23 / 49

Page 24: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Absolute-contraction

• Absolute-contractive operator T : Rn → Rn:if |T (x)− T (y)| ≤ P |x− y|, component-wise, where |x| denotes thevector with components |xi|, i = 1, ..., n, and P ∈ Rn×n+ and ρ(P ) < 1.

• Interpretation: a series of nested rectangular boxes for xk+1 = Txk

• Applications:• diagonally dominated A for Ax = b

• diagonally dominated ∇2f for minx f(x) (just strong convexity is notenough)

• some network flow problems

24 / 49

Page 25: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Recent work(stochastic analysis)

• AsySCD for convex smooth and composite minimization by Liu et al’14and Liu-Wright’14. Async dual CD (regression problems) by Hsieh et al.’15

• Async randomized (splitting/distributed/incremental) methods:Wei-Ozdaglar’13, Iutzeler et al’13, Zhang-Kwok’14, Hong’14, Chang etal’15

• Async SGD: Hogwild!, Lian’15, etc.

• Async operator sample and CD: SMART Davis’15

25 / 49

Page 26: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Random coordinate selection

• select xi to update with probability pi, where mini pi > 0

• drawback:• agents cannot cache data• either global memory or communication is required• pseudo-random number generation takes time

• benefits:• often faster than the fixed cyclic order• automatic load balance• simplifies certain analysis

26 / 49

Page 27: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Convergence summary

27 / 49

Page 28: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Convergence guarantees

m is # coordinates, τ is the maximum delay, uniform selection pi ≡ 1m

Theorem (almost sure convergence)Assume that T is nonexpansive and has a fixed point. Use step sizesηk ∈ [ε, 1

2m−1/2τ+1 ), ∀k. Then, with probability one, xk ⇀ x∗ ∈ FixT .

In addition, rates can be derived.Consequence: step size is O(1) if τ ∼ √m. Under equal agents and updates,attaining linear speedup if using p = O(

√m) agents. p can be bigger if T is

sparse.

28 / 49

Page 29: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Sketch of proof

• typical inequality:

‖xk+1 − x∗‖2 ≤‖xk − x∗‖2 − c‖Txk − xk‖2

+ harmful terms(xk−1, . . . , xk−τ )

• Descent inequality under a new metric:

E(‖xk+1 − x∗‖2

M

∣∣X k) ≤ ‖xk − x∗‖2M − c ‖Txk − xk‖2

where• the history up to iteration k• xk = (xk, xk−1, . . . , xk−τ ) ∈ Hτ+1, k ≥ 0• any x∗ = (x∗, x∗, . . . , x∗) ∈ X∗ ⊆ Hτ+1

• M is a positive definite matrix.• c = c(ηk,m, τ)

29 / 49

Page 30: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

• apply the Robbins-Siegmund theorem:

E(αk+1|Fk) + vk ≤ (1 + ξk)αk + ηk

where all are nonnegative, αk is random, and ξk, ηk are summable. Thenαk converges a.s.

• prove weakly convergent clustering points are fixed-points;

• assume H is separable and apply results [Combettes, Pesquet 2014].

30 / 49

Page 31: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Applications and numerical results

31 / 49

Page 32: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Linear equations (asynchronous Jacobi)

• require: invertible square matrix A with nonzero diagonal entries

• let D be the diagonal part of A; then

Ax = b ⇐⇒ (I −D−1A)x+D−1b︸ ︷︷ ︸Tx

= x.

• T is nonexpansive if ‖I −D−1A‖2 ≤ 1, i.e., A is diagonal dominating

• xk+1 = Txk recovers the Jacobi algorithm

32 / 49

Page 33: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Algorithm 1: ARock for linear equations

Input : shared variables x ∈ Rn, K > 0;set global iteration counter k = 0;while k < K, every agent asynchronously and continuously do

sample i ∈ {1, . . . ,m} uniformly at random;add − ηk

aii(∑

jaij x

kj − bi) to shared variable xi;

update the global counter k ← k + 1;

33 / 49

Page 34: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Sample code

loadData (A, data_file_name );loadData (b, label_file_name );# pragma omp parallel num_threads (p) shared (A,b,x,para){ // A, b, x, and para are passed by reference

call Jacobi (A,b,x,para) or ARock(A,b,x,para );}

• p: the number of threads• A,b,x: shared variable• para: other parameters

34 / 49

Page 35: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Jacobi worker function

for(int itr =0; itr < max_itr ; itr ++){

// compute the update for the assigned x[i]// ...

# pragma omp barrier{

// write x[i] in global memory}

# pragma omp barrier}

Jacobi needs the barrier directive for synchronization

35 / 49

Page 36: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

ARock worker function

for(int itr =0; itr < max_itr ; itr ++){

// pick i at random// compute the update for x[i]// ...// write x(i) in global memory

}

ARock has no synchronization barrier directive

36 / 49

Page 37: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Minimizing smooth functions

• require: convex and Lipschitz differentiable function f

• if ∇f is L-Lipschitz, then

minimizex

f(x) ⇐⇒ x =(I − 2

L∇f)︸ ︷︷ ︸

T

x.

where T is nonexpansive

• ARock will be efficient when ∇xif(x) is easy to compute

37 / 49

Page 38: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Minimizing composite functions

• require: convex smooth g(·) and convex (possibly nonsmooth) f(·)

• proximal map: proxγf (y) = arg min f(x) + 12γ ‖x− y‖

2.

minimizex

f(x) + g(x) ⇐⇒ x = proxγf ◦ (I − γ∇g)︸ ︷︷ ︸T

x.

• ARock will be fast if• ∇xig(x) is easy to compute• f(·) is separable (e.g., `1 and `1,2)

38 / 49

Page 39: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Example: sparse logistic regression

• n features, N labeled samples

• each sample ai ∈ Rn has its label bi ∈ {1,−1}

• `1 regularized logistic regression:

minimizex∈Rn

λ‖x‖1 + 1N

N∑i=1

log(1 + exp(−bi · aTi x)

), (1)

• compare sync-parallel and ARock (async-parallel) on two datasets:

Name N (#samples) n (# features) # nonzeros in {a1, . . . , aN}rcv1 20,242 47,236 1,498,952

news20 19,996 1,355,191 9,097,916

39 / 49

Page 40: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Speedup tests

• implemented in C++ and OpenMP.• 32 cores shared memory machine.

#coresrcv1 news20

Time (s) Speedup Time (s) Speedupasync sync async sync async sync async sync

1 122.0 122.0 1.0 1.0 591.1 591.3 1.0 1.02 63.4 104.1 1.9 1.2 304.2 590.1 1.9 1.04 32.7 83.7 3.7 1.5 150.4 557.0 3.9 1.18 16.8 63.4 7.3 1.9 78.3 525.1 7.5 1.1

16 9.1 45.4 13.5 2.7 41.6 493.2 14.2 1.232 4.9 30.3 24.6 4.0 22.6 455.2 26.1 1.3

40 / 49

Page 41: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

More applications

41 / 49

Page 42: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Minimizing composite functions

• require: both f and g are convex (possibly nonsmooth) functions

minimize f(x) + g(x) ⇐⇒ z = reflγf ◦ reflγg︸ ︷︷ ︸TPRS

(z).

recover x = proxγg(z)

• TPRS is known as the Peaceman-Rachford splitting operator3

• also, the Douglas-Rachford splitting operator: 12I + 1

2TPRS

• ARock runs fast when• reflγf is separable• (reflγg)i is easy-to-compute/maintain

3reflective proximal map: reflγf := 2proxγf − I. The maps reflγf , reflγg and thusreflγf ◦ reflγg are nonexpansive

42 / 49

Page 43: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Parallel/distributed ADMM

• require: m convex functions fi

• consensus problem:

minimizex

∑m

i=1 fi(x) + g(x)

⇐⇒ minimizexi,y

∑m

i=1 fi(xi) + g(y)

subject to

I

I

. . .I

x1

x2...xm

−I

I...I

y = 0

• Douglas-Rachford-ARock to the dual problem ⇒ async-parallel ADMM:

• m subproblems are solved in the async-parallel fashion• y and zi (dual variables) are updated in global memory (no lock)

43 / 49

Page 44: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Decentralized computing

• n agents in a connected network G = (V,E) with bi-directional links E

• each agent i has a private function fi

• problem: find a consensus solution x∗ to

minimizex∈Rp

f(x) :=n∑i=1

fi(Aixi) subject to xi = xj , ∀i, j.

• challenges: no center, only between-neighbor communication

• benefits: fault tolerance, no long-dist communication, privacy

44 / 49

Page 45: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Async-parallel decentralized ADMM

• a graph of connected agents: G = (V,E).

• decentralized consensus optimization problem:

minimizexi∈Rd,i∈V

f(x) :=∑

i∈V fi(xi)

subject to xi = xj , ∀(i, j) ∈ E

• ADMM reformulation: constraints xi = yij , xj = yij , ∀(i, j) ∈ E

• apply• ARock version 1: nodes asynchronously activate• ARock version 2: edges asynchronously activate

no global clock, no central controller, each agent keeps fi private andtalks to its neighbors

45 / 49

Page 46: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

notation:

• N(i) all edges of agent i, N(i) = L(i) ∪R(i)

• L(i) neighbors j of agent i, j < i

• R(i) neighbors j of agent i, j > i

Algorithm 2: ARock for the decentralized consensus problem

Input : each agent i sets x0i ∈ Rd, dual variables z0

e,i for e ∈ E(i), K > 0.while k < K, any activated agent i do

receive zkli,l from neighbors l ∈ L(i) and zkir,r from neighbors r ∈ R(i);update local xki , zk+1

li,i and zk+1ir,i according to (2a)–(2c), respectively;

send zk+1li,i to neighbors l ∈ L(i) and zk+1

ir,i to neighbors r ∈ R(i).

xki ∈ arg minxi

fi(xi) +(∑

l∈L(i) zkli,l +

∑r∈R(i) z

kir,r

)xi + γ

2 |E(i)|‖xi‖2,

(2a)

zk+1ir,i = zkir,i − ηk((zkir,i + zir,r)/2 + γxki ), ∀r ∈ R(i), (2b)zk+1li,i = zkli,i − ηk((zkli,i + zli,l)/2 + γxki ), ∀l ∈ L(i). (2c)

46 / 49

Page 47: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Summary

47 / 49

Page 48: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Summary of async-parallel coordinate descent

benefits:

• eliminate idle time

• reduce communication / memory-access congestion

• random job selection: load balance

mathematics: analysis of disorderly (partial) updates

• asynchronous delay

• inconsistent read and write

48 / 49

Page 49: Asynchronous Parallel Computing in Signal Processing and …€¦ · Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin

Thank you!

Acknowledgements: NSF DMSReference: Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin. UCLA CAM15-37.Website: http://www.math.ucla.edu/˜wotaoyin/arock

49 / 49


Recommended