Clojure concurrency

Post on 12-May-2015

2,258 views 3 download

Tags:

description

Concurrency concepts (STM) in clojure with comparison to existing implementation of concurrency in imperative languages

transcript

(do “Concurrency in Clojure”)

(by (and “Alex” “Nithya”))

Agenda

Introduction Features Concurrency, Locks, Shared state STM Vars, Atoms, Refs, Agents Stock picker example Q&A

Introduction

What is clojure?

– Lisp style

– Runs on JVM/CLR

Why Clojure?

– Immutable Persistent Data structures

– FP aspects

– Concurrency

– Open Source

– Short and sweet

Java Libraries

JVM

Evaluator

Clojure/Repl

Byte code

public class StringUtils { public static boolean isBlank(String str) { int strLen; if (str == null || (strLen = str.length()) == 0) { return true; } for (int i = 0; i < strLen; i++) { if ((Character.isWhitespace(str.charAt(i)) == false)) { return false; } } return true; }}

(defn blank? [s] (every? #(Character/isWhitespace %) s))

Fn name

parameters

body

Immutable data structures

Functions as first class objects, closures

Java Interop Tail Recursion

Features

(def vector [1 2 3]) (def list '(1 2 3)) (def map {:A “A”}) (def set #{“A”}) (defn add [x] ( fn [y] + x y) )

Lazy evaluation - abstract sequences + library– “cons cell” - (cons 4 '(1 2 3))

Features

Ins to generate the next component seqItem

FirstRest

(defn lazy-counter-iterate [base increment] ( iterate (fn [n] (+ n increment)) base))

user=> (def iterate-counter (lazy-counter-iterate 2 3))user=> (nth iterate-counter 1000000)

user=> (nth (lazy-counter-iterate 2 3) 1000000)3000002

Mutable objects are the new spaghetti code

– Hard to understand, test, reason about

– Concurrency disaster

– Default architecture (Java/C#/Python/Ruby/Groovy)

State – You are doing it wrong

Object

Data

Behaviour

Object 2

Data

Behaviour

Mutable Variables

Identity points to a different state after the update which is supported via atomic references to values.

location:Chennai location:Bangalore

Values are constants, they never change

(def location (ref “”) ) Identity - have different states in different point of time

States value of anidentity

Interleaving / parallel coordinated execution Usual Issues

– Deadlock

– Livelock

– Race condition

UI should be functional with tasks running Techniques - Locking, CAS, TM, Actors

Concurrency

One thread per lock - blocking lock/synchronized (resource) { .. }

Cons Reduces concurreny

Readers block readers What Order ? - deadlock, livelock Overlapping/partial operations Priority inversion

public class LinkedBlockingQueue<E>

public E peek() { final ReentrantLock takeLock = this.takeLock; takeLock.lock(); try { Node<E> first = head.next; if (first == null) return null; else return first.item; } finally { takeLock.unlock(); }

}

Locks

CAS operation includes three operands - a memory location (V), expected old value (A), and a new value (B) Wait- free algorithms Dead locks are avoided

Cons Complicated to implement JSR 166- not intended to be

used directly by most developers

public class AtomicInteger extends Number public final int getAndSet(int newValue) { for (;;) { int current = get(); if (compareAndSet(current, newValue)) return current; } }

public final boolean compareAndSet(int expect, int update) {

return unsafe.compareAndSwapInt(this, valueOffset, expect, update);

}

Compare And Swap

Enhancing Read Parallelism

Multi-reader/Single -writer locks

– Readers don't block each others

– Writers wait for readers

CopyOnWrite Collections

– Read snapshot

– Copy & Atomic writes

– Expensive

– Multi-step writes still

require locks

public boolean add(E e) { final ReentrantLock lock = this.lock;

lock.lock(); try {

Object[] elements = getArray(); int len = elements.length;

Object[] newElements = Arrays.copyOf(elements, len + 1);

newElements[len] = e; setArray(newElements);

return true; } finally {

lock.unlock();} }

Threads modifies shared memory Doesn't bother about other threads Records every read/write in a log Analogous to database transactions ACI(!D)

– Atomic -> All changes commit or rollback

– Consistency -> if validation fn fails transaction fails

– Isolation -> Partial changes in txn won't be visible to other threads

– Not durable -> changes are lost if s/w crashes or h/w fails

Software Transaction Memory

Txn

Txn

Clojure Transactions

Adam

Mr B

Minion

R ~ 40 U ~ 30

(defstruct stock :name :quantity)(struct stock “CSK” 40)

StocksList

R ~ 30

Buy 10

Buy 5

Sell 10

CSK ~ 30

TxnFail & RetryR - 40 U ~ 35

R - 30 U - 25

U ~ 40

Transaction Creation - (dosync (...))

Clojure STM

Concurrency semantics for references

• Automatic/enforced

• No locks!

Clojure does not replace the Java thread system, rather it works with it.

Clojure functions (IFn implement java.util.concurrent.Callable, java.lang.Runnable)

STM

Pros Optimistic, increased concurrency - no thread waiting Deadlock/Livelock is prevented/handled by Transaction manager Data is consistent Simplifies conceptual understanding – less effort

Cons Overhead of transaction retrying Performance hit (<4 processor) on maintaining committed, storing in-

transaction values and locks for commiting transactions Cannot perform any operation that cannot be undone, including most I/O

Solved using queues (Agents in Clojure)

Persistent Data Structures

Immutable + maintain old versions Structure sharing – not full copies

Thread/Iteration safe Clojure data structures are persistent

Hash map and vector – array mapped hash tries (bagwell) Sorted map – red black tree

MVCC – Multi-version concurrency control Support sequencing, meta-data Pretty fast: Near constant time read access for maps and

vectors

(actually O(log32n))

PersistentHashMap

32 children per node, so O(log32 n)

static interface INode{ INode assoc(int shift, int hash,

Object key, Object val, Box addedLeaf); LeafNode find(int hash, Object key);}BitMapIndexedNode

Concurrency Library

Coordinating multiple activities happening simutaneously

Reference Types

Refs

Atoms

Agents

Vars

Uncoordinated Coordinated

Synchronous Var Atom Ref

Asynchronous Agent

Vars Vars - per-thread mutables, atomic read/write

(def) is shared root binding – can be unbound

(binding) to set up a per-thread override

Bindings can only be used when def is defined at the top level

(set!) if per-thread binding

T1

T2

(def x 10) ; Global object

(defn get-val [] (+ x y))(defn fn []

(println x)(binding [x 2] (get-val))

Can’t see the binded value

Vars

Safe use mutable storage location via thread isolation

Thread specific Values

Setting thread local dynamic binding

Scenarios:

Used for constants and configuration variables such as *in*, *out*, *err*

Manually changing a program while running (def max-users 10)

Functions defined with defn are stored in Vars enables re-definition

of functions – AOP like enabling logging

user=> (def variable 1)#'user/variable

user=> (.start (Thread. (fn [] (println variable))))niluser=> 1

user=> (def variable 1)#'user/variableuser=>(defn print [] (println variable))user=> (.start (Thread. (fn [] (binding [variable 42] (print)))))niluser=> 1

(set! var-symbol value)

(defn say-hello [] (println "Hello")) (binding [say-hello #(println "Goodbye")] (say-hello))

Vars...

Augmenting the behavior

– Memoization – to wrap functions

Has great power

Should be used sparsely

Not pure functions (ns test-memoization)(defn triple[n](Thread/sleep 100)(* n 3))

(defn invoke_triple [] ( map triple [ 1 2 3 4 4 3 2 1]))

(time (dorun (invoke_triple))) -> "Elapsed time: 801.084578 msecs"

;(time (dorun (binding [triple (memoize triple)] (invoke_triple)))) ->

"Elapsed time: 401.87119 msecs"

Atoms

Single value shared across threads

Reads are atomic

Writes are atomic

Multiple updates are not possible

(def current-track (atom “Ooh la la la”))

(deref current-track ) or @current-track

(reset! current-track “Humma Humma”

(reset! current-track {:title : “Humma Humma”, composer” “What???”})

(def current-track (atom {:title : “Ooh la la la”, :composer: “ARR”}))

(swap! current-track assoc {:title” : “Hosana”})

Refs

Mutable reference to a immutable state

Shared use of mutable storage location via STM

ACI and retry properties

Reads are atomic

Writes inside an STM txn

Refs in Txn

• Maintained by each txn

• Only visible to code running in the txn

• Committed at end of txn if successful

• Cleared after each txn try

• Committed values

• Maintained by each Ref in a circular linked-list (tvals field)

• Each has a commit “timestamp” (point field in TVal objects)

Changing Ref

Txn retry( ref-set ref new-value)

( alter ref function arg*)

Commute

( commute ref function arg*)

Order of changes doesn't matter

Another txn change will not invoke retry

Commit -> all commute fns invoked using latest commit values

Example:Adding objects to collection

(def account1 (ref 1000))(def account2 (ref 2000))

(defn transfer "transfers amount of money from a to b" [a b amount] (dosync ( alter a - amount) ( alter b + amount)))

(transfer account1 account2 300)(transfer account2 account1 50)

;@account1 -> 750;@account2 -> 2250

Validators

Validators:

Invoked when the transaction is to commit

When fails -> IllegalStateException is thrown( ref initial-value :validator validator-fn)

user=> (def my-ref (ref 5))#'user/my-ref

user=> (set-validator! my-ref (fn [x] (< 0 x)))Nil

user=> (dosync (alter my-ref – 10))#<CompilerException java.lang.IllegalStateException: Invalid Reference State>

user=> (dosync (alter my-ref – 10) (alter my-ref + 15))10

user=> @my-ref5

Watches

Called when state changes Called on an identity

Example:

( add-watch identity key watch-function)

(defn function-name [key identity old-val new-val] expressions)

(remove-watch identity key)

user=> (defn my-watch [key identity old-val new-val] ( println (str "Old: " old-val)) ( println (str "New: " new-val)))

#'user/my-watchuser=> (def my-ref (ref 5))#'user/my-refuser=> (add-watch my-ref "watch1" my-watch)#<Ref 5>user=> (dosync (alter my-ref inc))Old: 5

Other features...

Write Skew Ensure

Doesn't change the state of ref Forces a txn retry if ref changes Ensures that ref is not changed during the txn

Agents

Agents share asynchronous independent changes between threads

State changes through actions (functions)

Actions are sent through send, send-off

Agents run in thread pools

- send fn is tuned to no of processors

- send-off for intensive operations, pre-emptive

Agents

Only one agent per action happens at a time

Actions of all Agents get interleaved amongst threads in a thread pool

Agents are reactive - no imperative message loop and no blocking receive

Agents

(def my-agent (agent 5))

( send my-agent + 3)

( send an-agent / 0)

( send an-agent + 1)

java.lang.RuntimeException: Agent is failed, needs restart

( agent-error an-agent)

( restart-agent my-agent 5 :clear-actions true)

Concurrency

Parallel Programming

(defn heavy [f] (fn [& args] (Thread/sleep 1000) (apply f args)))

(time (+ 5 5));>>> "Elapsed time: 0.035009 msecs"(time ((heavy +) 5 5));>>> "Elapsed time: 1000.691607 msecs"

pmap

(time (doall (map (heavy inc) [1 2 3 4 5])));>>> "Elapsed time: 5001.055055 msecs"(time (doall (pmap (heavy inc) [1 2 3 4 5])));>>> "Elapsed time: 1004.219896 msecs"

(pvalues (+ 5 5) (- 5 3) (* 2 4))(pcalls #(+ 5 2) #(* 2 5))

– Process

Pid = spawn(fun() -> loop(0) end)

Pid ! Message,

.....

– Receiving Process

receive

Message1 ->

Actions1;

Message2 ->

Actions2;

...

after Time ->

TimeOutActions

end

Immutable Message MachineMachine

Process

Erlang

Actors - a process that executes a function. Process - a lightweight user-space thread.Mailbox - essentially a queue with multiple producers

Actor Model

In an actor model, state is encapsulated in an actor (identity) and can only be affected/seen via the passing of messages (values).

In an asynchronous system like Erlang’s, reading some aspect of an actor’s state requires sending a request message, waiting for a response, and the actor sending a response.

Principles

* No shared state

* Lightweight processes

* Asynchronous message-passing

* Mailboxes to buffer incoming messages

* Mailbox processing with pattern matching

Actor Model

Advantages

Lots of computers (= fault tolerant scalable ...)

No locks

Location Transparency

Not for Clojure

Actor model was designed for distributed programs – location transparency

Complex programming model involving 2 message conversation for simple reads

Potential for deadlock since blocking messages

Copy structures to be sent

Coordinating between multiple actors is difficult

References

http://clojure.org/concurrent_programming

http://www.cis.upenn.edu/~matuszek/cis554-2010/Pages/clojure-cheat-sheet.txt

http://blip.tv/file/812787