Multi-core Parallelization in Clojure - a Case Study · Clojure • Functional programming language...

Johann M. Kraus and Hans A. Kestler

AG Bioinformatics and Systems BiologyInstitute of Neural Information Processing

University of Ulm

Multi-core Parallelization in Clojure -a Case Study

29.06.2009

Outline

1. Concepts of parallel programming

2. Short introduction to Clojure

3. Multi-core parallel K-means - the case study

4. Analysis and Results

5. Summary

Parallel Programming

Parallel programming is a form of programming where many calculations are performed simultaneously.

Definition:

• Physical constraints prevent frequency scaling of processors

• This led to an increasing interest in parallel hardware and parallel programming

• Multi-core hardware is standard on desktop computers

• Parallel software can use this hardware to the full capacity

• Large problems are divided into smaller ones and the sub-problems are solved simultaneously

• Speedup S is limited by the fraction of parallelizable code P

• Amdahl’s law: S =1

1! P + PN

Amdahl's law

Number of processors

Spee

dup

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536

02

46

810

1214

1618

20

Fraction of parallelizable code0.95 %0.90 %0.75 %0.50 %

Concepts of Parallel Programming

Explicit vs. implicit parallelization

• Functional programming allows implicit parallelization:

• Explicitly define communication and synchronization details for each task:

• MPI

• Java Threads

• Parallel processing of functions

• Functions are free of side-effects

• Data is immutable

Distributed vs. local hardware

Master

Slave 0

Shared Memory

CPU 4

CPU 0

CPU 1

CPU 3

CPU2

Slave 4

Slave 3

Slave 1

Slave 2

readwrite

send data

send result

• Master - Slave parallelization (e.g. Message Passing Interface)

• Shared memory parallelization (e.g. Open Multi-Processing)

Thread programming

newstart

runnable

running

waiting

terminated

schedule

end block

awake

• Threads are refinements of a process that share the same memory and can be processed separately and simultaneously

• Available in many languages, e.g. PThreads (C), Java Threads (Java), OpenMP Threads (C, Fortran)

• Execution of threads is handled by a scheduler that manages the available processing time

• Communication between threads is faster than communication between processes

• Invoking threads is also faster than fork/join processes

Concurrency control via locking and synchronizing

• Concurrency control ensures that threads can access shared memory without violating data integrity

• The most popular approach to concurrency is locking and synchronizing

• Problems might occur when using too many locks, too few locks, wrong locks, or locks in the wrong order

• Using locks can be fatally error-prone, e.g. dead-locks

public class Counter{private int value = 0 ;public synchronized void i n c r {

value = value + 1 ;}

}Counter counter = new Counter ( ) ;counter . i n c r ( ) ;

• Transactional memory offers a flexible alternative to lock-based concurrency control

• Functionality is analogous to controlling simultaneous access to database management systems

• Transactions ensure properties:

• Atomicity: Either all changes of a transaction occur or none do

• Consistency: Only valid changes are committed

• Isolation: No transaction sees the effect of other transactions

• Durability: Changes from transactions will be persistent

Concurrency control via transactional memory

:Transaction 0 :Transaction 1:Data

get data

[consistent data]send modified data


get data


get data

TIME

• Software transactional memory maps transactional memory to concurrency control in parallel programming

Clojure

• Functional programming language hosted on the JVM

• Extends the code-as-data paradigm to maps and vectors

• Based on immutable data structures

• Provides built-in concurrency support via software transactional memory

• Completely symbiotic to Java, e.g. easy access to Java libraries

• Platform independent

• Java interaction

• Add type hints to speed up code

(defn da+ [#ˆdoubles as #ˆdoubles bs ](amap as i r e t(+ ( aget as i ) ( aget bs i ) ) ) )

• Dynamic typing and multi-methods

• An object is defined as the sum of what it can do (methods), rather than the sum of what it is (type hierarchy)

( import ’ ( cern . j e t . random . samplingRandomSamplingAssistant ) )

(defn sample[ n k ]( seq ( . RandomSamplingAssistant

( sampleArray k ( int!array ( range n ) ) ) ) ) )

• Transactional references ensure safe coordinated synchronous changes to mutable storage locations

• Are bound to a single storage location for their lifetime

• Only allow mutation of that location to occur within transactions

• Available operations are ref-set, alter, and commute

• No explicit locking is required

Transactional references and STM

(def counter ( ref 0) )(dosync ( alter counter inc ) )

• Agents allow independent asynchronous change of mutable locations

• Are bound to a single storage location for their lifetime

• Only allow mutation of that location to a new state to occur as a result of an action

• Actions are functions that are asynchronously applied to the state of an Agent

• The return value of an action becomes new state of the Agent

• Agents are integrated with the STM

(def counter (agent 0) )(send counter inc )

Agents

Cluster analysis

3 cluster 9 cluster

• Given a data set X compute a partition of X into k disjoint clusters C, such that:

• How many clusters are in the data set?

(1)k!

i=1

Ci = X

(2) Ci != " and Ci # Cj = "

Cluster algorithms

• For all possible partitions evaluate the objective function f and search the optimum.

Cluster algorithms provide a heuristic for this search:

• The cardinality of the set of all possible partitions is given by:

• Partitional clustering (K-means, Neuralgas, SOM, Fuzzy C-means, ...)

• Hierarchical clustering (Divisive/agglomerative, Complete linkage, ...)

• Graph-based clustering (Spectral clustering, NMF, Affinity propagation, ...)

• Model-based clustering, Biclustering, Semi-supervised clustering

SkN =

1k!

k!

i=0

(!1)k!i"

k

i

#iN

0 5 10 15 20 25 30 35

0 5

1015

2025

30

0

5

10

15

20

25

30

35

Number of clusters

Num

ber o

f dat

a po

ints

Runt

ime

(nan

osec

ond)

Stirling numbers ofthe second kind

K-means algorithm

Function KMeans

Input : X = { x 1 , . . . , x n } ( Data to be c l u s t e r e d )k (Number o f c l u s t e r s )

Output : C = { c 1 , . . . , c k } ( C l u s t e r c e n t r o i d s )m: X !> C ( C l u s t e r a s s i gnment s )

I n i t i a l i z e C ( e . g . random s e l e c t i o n from X)While C has changedFor each x i i n Xm( x i ) = a r gm i n j d i s t a n c e ( x i , c j )

EndFor each c j i n C

c j = c e n t r o i d ({ x i | m( x i ) = j })End

End

Cluster Validation

• MCA-index: mean proportion of samples being consistent over different clusterings

MCA = 1n max!

!ki=1 |Ai !Bj |

• Evaluation requires repeated runs of clustering, e.g.:

• Resampled data sets

• Different parameters

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

cluster

me

an

mca

in

de

x

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

cluster

me

an

mca

in

de

x

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

cluster

me

an

mca

in

de

x

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

cluster

me

an

mca

in

de

x

Estimation of the expected value of a validation index

Random label: randomly assign each item to a cluster k

Random partition: choose a random partition

Random prototype: assign each item to its next prototype

Mean value from 100 runs

Multi-core K-means with Clojure

• Split the data set into smaller pieces that are handled by agents

• Each cluster is represented by an agent

• Add a commutative list of cluster members within a transactional reference to accelerate the centroid update step

Cluster Agent 1

Member Ref 1

Data Agent 0

Cluster Agent k

Member Ref k

Cluster Agent 0

Member Ref 0

Data Agent 1

Data Agent n

Data Agent 2

Data Agent 3

read

write

Data Agent 0

Cluster Agent 0

Data Agent 1

Data Agent n

Cluster Agent 1

Cluster Agent k

Member Ref 0

Member Ref 1

Member Ref 2

simultaneous read

simultaneous write

Data Agent 0

Data Agent 1

Data Agent n

(defn assignment [ ](map #(send % update!dataagent ) DataAgents )

(defn update!dataagent [ datapo int s ](map update!datapoint datapo int s ) )

(defn update!datapoint [ datapoint ]( l e t [ newass ( nearest!c l u s t e r datapoint ) ](dosync (commute (nth MemberRefs newass )

conj ( : data datapoint ) ) )( assoc datapoint : ass ignment newass ) ) )

read: (nearest-cluster)

write: (commute) (assoc)

Benchmark results

• Each data point is sampled from N(0,1)

• Summary for 10 runs of K-means

050

100

150

run

tim

e (

seco

nd

s)

0150

300

450

ParaKMeans K-means R McKmeans K-means R McKmeans

10.000 cases, 100 dimensions

20 Cluster

1.000.000 cases, 200 dimensions

20 Cluster

run

tim

e (

min

ute

s)

Large data sets (artificial):

1 4 8

050

010

0015

00

100.000 x 50020 cluster

number of computer cores

runt

ime

(sec

onds

)

4 6 8 10

020

040

060

080

0

100.000 x 50020 cluster

number of data agents

runt

ime

(sec

onds

)

• Number of computer cores used • Number of data agents used

• Data sampled from a multi-variate normal distribution

• 100000 samples, 200/500 dimensions, 10/20 cluster

05

00

10

00

15

00

20

00

Number of samples / Number of clusters

run

tim

e (

se

co

nd

s)

200 / 10 200 / 20 500 / 10 500 / 20 200 / 10 200 / 20 500 / 10 500 / 20

K-means R McKmeans

Large data sets with cluster structure

• Measured with the MCA index

• Red bars indicate the random-prototype baseline

0.0

0.2

0.4

0.6

0.8

1.0

MC

A i

nd

ex

_ __ _ _ _ _ _

McKmeans K-means R McKmeans K-means R McKmeans K-means R McKmeans K-means R

100.000 x 20010 cluster

100.000 x 20020 cluster

100.000 x 50010 cluster

100.000 x 50020 cluster

Accuracy compared to the known grouping of data

• Microarray data (Radiation-induced changes in human gene expression)

• 22277 samples (genes) and 465 features (profiles)

Number of clusters

run

tim

e (

seco

nd

s)

050

150

250

350

2 Cluster 5 Cluster 10 Cluster 20 Cluster 2 Cluster 5 Cluster 10 Cluster 20 Cluster

K-means R McKmeans

Real world data set

Smirnov D, Morley M, Shin E, Spielman R, Cheung V: Genetic analysis of radiation-induced changes in human gene expression. Nature 2009, 459:587–591

Application to Cluster Number Estimation• Repeated clustering with different subsets of data

• Repeated for different number of clusters k

• Most stable clustering is produced for the ‘real’ cluster number

2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

number of clusters

MC

A ind

ex

_ _ _ __ _

• Jackknife resampling

• Evaluation with MCA index

• Data set:100000 samples, 100 features, 3 cluster

• 10 runs per cluster number

• 49.26 minutes on dual-quad core 3.2 GHz

Java GUI

( import ’ ( javax . swing JFrame JLabel JTextFie ld JButton )’ ( java . awt . event Act ionL i s t ene r )’ ( java . awt GridLayout ) )

( l e t [ frame (new JFrame ”Hel lo , World ! ” )he l l o ! button (new JButton ”Say h e l l o ”)he l l o ! l a b e l (new JLabel ” ” ) ]

( . h e l l o ! button( addAct ionListener

( proxy [ Act i onL i s t ene r ] [ ]( act ionPerformed [ evt ]

( . h e l l o ! l a b e l( setText ”Hel lo , World ! ” ) ) ) ) ) )

( doto frame( . setLayout (new GridLayout 1 1 3 3) )( . add he l l o ! button )( . add he l l o ! l a b e l )( . s e t S i z e 300 80)( . s e tV i s i b l e t rue ) ) )

Summary

• Writing parallel programs usually requires a careful software design and a deep knowledge about thread-safe programming

• Concurrency control via transactional memory circumvents problems of lock-based concurrency strategies

• Immutable data structures play a key role to software transactional memory

• Clojure combines Lisp, Java and a powerful STM system

• This enables fast parallelization of algorithms, even for rapid prototyping

• Our simulations show a good performance of the parallelized code

Thank you for your attention.

Statistical computing library

• http://wiki.github.com/liebke/incanter

• Clojure-based statistical computing

• R-like semantics

• COLT library for numerical computation

• JFreeChart library for graphics

http://wiki.github.com/liebke/incanter

http://wiki.github.com/liebke/incanter

Date post:	05-Jul-2020
Category:	Documents
Upload:	others
View:	10 times
Download:	0 times

Multi-core Parallelization in Clojure - a Case Study · Clojure • Functional programming language...

Documents