Date post: | 28-Nov-2014 |
Category: |
Technology |
Upload: | ryanstout |
View: | 1,229 times |
Download: | 2 times |
Author: Ryan Stout
ConcurrencyAn overview of concurrent processing models
Date: June 22nd, 2013
Wednesday, July 10, 13
Wednesday, July 10, 13
single core performance isn’t increasing anymore
Concurrency Models
No More Free Lunch
Wednesday, July 10, 13
single core performance isn’t increasing anymore
Concurrency Models
No More Free Lunch
Wednesday, July 10, 13
Concurrency Models
Question: How do we take advantage of the extra cores?
Wednesday, July 10, 13
Concurrency Models
Wednesday, July 10, 13
Concurrency Models
Answer: Concurrency & Parallelism
A
Concurrent Parallel
B
Sequential
A
A
B
B
A B
Wednesday, July 10, 13
• Only difficult when sharing state
• Hard to reason about
• Race conditions
• Deadlock
Concurrency Models
Problem: Parallel can be difficult
Wednesday, July 10, 13
1. Processes
2. Threads and Locks
3. Evented-io
4. Vectorization/SIMD
5. Actor Model - Erlang, Celluloid, Akka
6. CSP - Google Go
7. Software Transactional Memory
Different way to do things at the same time
Concurrency Models
Solution (sort of): Better Concurrency Models
Wednesday, July 10, 13
separate memory model
Concurrency Models
1. Processes
Wednesday, July 10, 13
separate memory model
Concurrency Models
1. Processes
Wednesday, July 10, 13
• Pros:
• Easy
• OS handles everything
• Good for stateless operations (Responding to web requests)
• No Memory Leaks
• Cons:
• Communication between processes is difficult
• Ram gets used for each process
The way unix intended
Concurrency Models
1. Processes Process A
ThreadA
Process B
Thread B
Wednesday, July 10, 13
Concurrency Models
2. Threads
Wednesday, July 10, 13
Concurrency Models
2. Threads
Process
Thread A
Thread B
Wednesday, July 10, 13
require 'thread'
class ThreadExample def initialize @count = 0 @count_mutex = Mutex.new
# Start a few readers threads = [] 3.times do threads << Thread.new do read_values end end
threads.each(&:join) end
...
Concurrency Models
2. Threaded/Mutex Example... def read_values loop do @count_mutex.synchronize do local_count = @count
# Get the "value" local_count += rand(10) @count = local_count
puts "Value: #{local_count}" end end endend
ThreadExample.new
Wednesday, July 10, 13
Shared Memory Model
• Pros:
• Less ram than processes
• Cons:
• Manually handling locks
• Deadlocks
• Non-determanistic
• Hard to Debug
the way Java intended
Concurrency Models
2. Threads
Wednesday, July 10, 13
Concurrency Models
3. Evented-ioProcessReactor
A
B
A
Wednesday, July 10, 13
require 'socket'
server = TCPServer.open(host, port)
while client = server.accept Thread.new do line = client.gets client.puts(line) client.close endend
Concurrency Models
Threaded Echo Server
Wednesday, July 10, 13
require 'eventmachine'
class Echo < EventMachine::Connection def receive_data(data) send_data(data) endend
EventMachine.run { EventMachine.connect '127.0.0.1', 8081, Echo}
Concurrency Models
Evented-io - Echo Server
Wednesday, July 10, 13
module ProxyConnection def initialize(local_connection) @local_connection = local_connection end
def receive_data(data) @local_connection.send_data(data) endend
module ProxyServer def post_init # Make a connection to the remote server @connection = EventMachine.connect 'www.zeebly.com', 80, ProxyConnection, self end
def receive_data(data) @connection.send_data(data) endend
EventMachine::run do EventMachine::start_server "127.0.0.1", 8080, ProxyServerend
Concurrency ModelsEvented-io - Proxy Server
Wednesday, July 10, 13
A form of cooperative multitasking
• Pros:
• Great for heavy IO work
• Saves thread switching cost
• Good OS level support (epoll, kqueue)
• Cons:
• Long running operations block all other operations
• Does not use multiple cores
• Somewhat difficult to reason about
Libraries: EventMachine (ruby), Tornado (python), nodejs, libev (C)
the way nginx intended - concurrent, not parallel
Concurrency Models
3. Evented-io ProcessReactor
A
B
A
Wednesday, July 10, 13
Concurrency Models
4. Vectorization/SIMDProcess
Data A
Data B0.51 0.98 1.5 0.29 0.75 0.11
0.39 0.19 1.22 1.6 0.84 0.90
Instruction: Multiply(A,B)
Wednesday, July 10, 13
void MatrixMulOnHost(float* M, float* N, float* P, int Width) { for (int i = 0; i < Width; ++i) { for (int j = 0; j < Width; ++j) { float sum = 0; for (int k = 0; k < Width; ++k) { float a = M[i * Width + k]; float b = N[k * Width + j]; sum += a * b; } P[i * Width + j] = sum; } }}
Concurrency Models
Example Matrix Multiplication - C
Wednesday, July 10, 13
__global__ void MatrixMulKernel(float* d_M, float* d_N, float* d_P, int Width) { int row = threadIdx.y; int col = threadIdx.x; float P_val = 0; for (int k = 0; k < Width; ++k) { float M_elem = d_M[row * Width + k]; float N_elem = d_N[k * Width + col]; P_val += M_elem * N_elem; } d_p[row*Widthh+col] = P_val;}
Concurrency Models
Example Matrix Multiplication - CUDA
Wednesday, July 10, 13
SIMD = Single Instruction Multiple Data
• Pros:
• Very fast for some operations (~50x speed imporvement in some
cases)
• Cons:
• Only works when working with vector data
• No standard way to program (CUDA, OpenCL, SSE, BLAS)
• Time to copy data between CPU and GPU adds up
the way Nvidia intended
Concurrency Models
4. Vectorization/SIMD
Wednesday, July 10, 13
“Don’t communicate by sharing state; share
state by communicating”
http://www.igvita.com/2010/12/02/concurrency-with-actors-goroutines-ruby/
Concurrency Models
5. & 6. Actor Model & CSP
Wednesday, July 10, 13
• State is encapsulated in “Actors”
• Actors send messages to communicate
• Messages are asynchronous
• Messages buffer in a “mailbox”
Libraries: Akka (scala), Celluloid (ruby), Erlang (built in)
the way Erlang intended
Concurrency Models
5. Actor Model
Wednesday, July 10, 13
the way Erlang intended
Concurrency Models
5. Actor Model
Process
Actor A Actor B
Actor C
Wednesday, July 10, 13
require 'actor'ping = nil
ping = Actor.spawn do loop do msg = Actor.receive puts "Ping #{msg}" pong << (msg + 1) endend
...
Concurrency Models
Actor Example
...
pong = Actor.spawn do loop do msg = Actor.receive puts "Ping #{msg}" break if msg > 10000 ping << (msg + 1) endend
ping << 0
Wednesday, July 10, 13
• Pros:
• No deadlocks (if you follow the rules)
• Fairly easy to reason about
• Cons:
• Connecting actors is difficult
• Error handling is tricky
the way Erlang intended
Concurrency Models
5. Actor Model
Wednesday, July 10, 13
Concurrency Models
Process
A
C
6. Communicating Sequential Processes
B
Channel A
Wednesday, July 10, 13
• Google Go’s Model
• Variables are always thread local
• State is shared by sized channels
• Functions can be run as a “go routine”
• Anonymous thread of execution
• Can not be referenced or joined
• Communication takes place over channels
• Channels are sized synchronized buffers, which can be “closed”
the way google intended
Concurrency Models
6. Communicating Sequential Processes
Wednesday, July 10, 13
• Pros:
• Easy to reason about
• Simple model for distributing data processing work
• Simple to implement by hand (sized queue’s)
• Cons:
• Can be extra code for simple synchrnoization
Libraries: Google Go (language), Agent (ruby)
the way google intended
Concurrency Models
6. Communicating Sequential Processes
Wednesday, July 10, 13
• In Actor we name the Actors
• In CSP we name the communication channels
• Both allow us to keep out non-determinism (if we follow
the rules)
Concurrency Models
Difference between CSP and Actor?
Wednesday, July 10, 13
Concurrency Models
7. Software Transactional Memory
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Block Aread
process
transactional write
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Block Aread
process
transactional write
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Block Aread
process
transactional write
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Block Aread
process
transactional write
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Block Aread
process
transactional write
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Block Aread
process
transactional write
SUCCESS
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Block Aread
process
transactional write
SUCCESS FAIL
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Block Aread
process
transactional write
SUCCESS FAIL
Wednesday, July 10, 13
Concurrency Models
7. Software Transactional Memory
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
Wednesday, July 10, 13
Concurrency Models
Process
MemoryVariable A
7. Software Transactional Memory
Block Bread
process
transactional write
SUCCESS
Wednesday, July 10, 13
• Pros:
• Easy to reason about
• Cons:
• More resource contention = worse performance
• In order to prevent contention, we must use immutable data
structures -- which makes it more difficult to reason about again
the way Clojure intended
Concurrency Models
7. Software Transactional Memory
Wednesday, July 10, 13
• Functional Reactive Programming
Concurrency Models
8. Bonus Model
Wednesday, July 10, 13
Concurrency Models
Which Model is Best?
Wednesday, July 10, 13
• Depends on the problem
Concurrency Models
Which Model is Best?
Wednesday, July 10, 13
• Depends on the problem
• Depends on the language
Concurrency Models
Which Model is Best?
Wednesday, July 10, 13
• http://www.igvita.com/2010/12/02/concurrency-with-actors-goroutines-ruby/
• http://en.wikipedia.org/wiki/Dining_philosophers_problem
• Dining Philosophers Problem - http://rosettacode.org/wiki/Dining_philosophers
Concurrency Models
More Info
Wednesday, July 10, 13
Author: Ryan Stout
Thanks!any questions?
Date: June 22nd, 2013
Wednesday, July 10, 13