Date post: | 09-Aug-2015 |
Category: |
Engineering |
Upload: | patrick-charrier |
View: | 93 times |
Download: | 2 times |
Multithreading withBoost and TBB
Patrick CharrierTU Darmstadt
Partner:
Multithreading in Computer Animation
Partner:
Data-parallel Programming OpenMP OpenCL
Task-parallel Programming Boost.Atomic Boost.Thread Boost.Lockfree Intel Thread Building Blocks (TBB)
Partner:
Why Boost.Thread? Platform-independent Abstraction over pthreads and WinAPI Modern C++ API
C++03 Support Syntax compatible with C++11 std::thread
Partner:
Boost.Thread Overview Threads Mutexes Locks Condition variables Futures
Extensions
Partner:
Boost.Thread "Hello World“
Notes: boost::thread::interrupt() interrupts the thread. boost::scoped_thread<CallableThread> would
join automatically at end of scope.
Partner:
#include <boost/thread.hpp>
void f() { for(size_t i=0; i<100; ++i) std::cout << "Hello number " << i << std::endl;}
int main(){ boost::thread thread(f); thread.join();
return 0;}
Boost.Thread this_thread
Sleeping, Exit Function, Interruption points
Partner:
namespace boost { namespace this_thread { thread::id get_id() noexcept; template <class Clock, class Duration> void sleep_until(const chrono::time_point<Clock, Duration>& abs_time); template <class Rep, class Period> void sleep_for(const chrono::duration<Rep, Period>& rel_time);
template<typename Callable> void at_thread_exit(Callable func); // EXTENSION
void interruption_point(); // EXTENSION bool interruption_requested() noexcept; // EXTENSION bool interruption_enabled() noexcept; // EXTENSION class disable_interruption; // EXTENSION class restore_interruption; // EXTENSION
// ... }}
Boost.Thread and Joe’s Bank Account
Partner:
BankAccount JoesAccount;
void bankAgent(){ for (int i =10; i>0; --i) { //... JoesAccount.Deposit(500); //... }}
void Joe() { for (int i =10; i>0; --i) { //... JoesAccount.Withdraw(100); int myPocket = JoesAccount.GetBalance(); std::cout << myPocket << std::endl; //... }}
int main() { //... boost::thread thread1(bankAgent); // start concurrent execution of bankAgent boost::thread thread2(Joe); // start concurrent execution of Joe thread1.join(); thread2.join(); return 0;}
Boost.Thread and Joe’s Bank Account
Joe will have problems with his balance.
Partner:
class BankAccount { int balance_;public: void Deposit(int amount) { balance_ += amount; } void Withdraw(int amount) { balance_ -= amount; } int GetBalance() { int b = balance_; return b; }};
Boost.Thread, Joe’s Bank Accountand Internal Locks on Mutexes
Partner:
class BankAccount { boost::mutex mtx_; int balance_;public: void Deposit(int amount) { mtx_.lock(); balance_ += amount; mtx_.unlock(); } void Withdraw(int amount) { mtx_.lock(); balance_ -= amount; mtx_.unlock(); } int GetBalance() { mtx_.lock(); int b = balance_; mtx_.unlock(); return b; }};
Boost.Thread, Joe’s Bank Accountand Internal Locks through Lock Guards
Lock Guards simplify locking.
Partner:
class BankAccount { boost::mutex mtx_; // explicit mutex declaration int balance_;public: void Deposit(int amount) { boost::lock_guard<boost::mutex> guard(mtx_); balance_ += amount; } void Withdraw(int amount) { boost::lock_guard<boost::mutex> guard(mtx_); balance_ -= amount; } int GetBalance() { boost::lock_guard<boost::mutex> guard(mtx_); return balance_; }};
Boost.Thread Mutex Types
Partner:
Mutex Type Description
mutex ‘just’ a mutex
timed_mutex mutex with timeout
recursive_mutex may be recursively locked from same thread
recursive_timed_mutex recursive lock with timeout
shared_mutex ‘fair’ reader/writer
upgrade_mutex reader/writer
null_mutex ‘convenience’ mutex
Boost.Atomic as an alternative to Mutexes
Atomics guarantee atomicity for additions, subtractions, etc. memory order
Partner:
boost::atomic<int> balance(0);
thread1: balance.fetch_add(500, boost::memory_order_release);
thread2: balance.fetch_sub(100, boost::memory_order_release);
Boost.Thread and External Locks
Joe is charged 2 Euros perCredit Card Withdrawal.
The whole transaction should be “atomic”. Why is his account suddenly locked?
Partner:
void ATMWithdrawal(BankAccount& acct, int sum) { boost::lock_guard<boost::mutex> guard(acct.mtx_);
acct.Withdraw(sum); acct.Withdraw(2);}
Boost.Thread - Lock Types
Partner:
Lock Type Description
strict_lock<Lockable> non-copyable lock
unique_lock<Lockable> Lock with many features (try, timeout, deferred)
upgrade_lock<Lockable> upgradable to strict_lock
reverse_lock<Lockable> unlock on construction, lock on destruction
spinlock busy waiting, no sleep (Boost.Atomic)
Boost.Thread - Locks as Permits
Observation: Whenever an account is modifieda lock must be acquired.
Rephrase: Whenever an account is modifiedit must be permitted.
The helper class externally_locked<T,Lockable>treats locks as permits.Partner:
Boost.Thread - externally_locked<T,Lockable>
Partner:
class BankAccount { int balance_;public: void Deposit(int amount) { balance_ += amount; } void Withdraw(int amount) { balance_ -= amount; }};
class AccountManager : public basic_lockable_adapter<thread_mutex>{public: typedef basic_lockable_adapter<thread_mutex> lockable_base_type; AccountManager() : checkingAcct_(*this) , savingsAcct_(*this) {} inline void Checking2Savings(int amount); inline void AMoreComplicatedChecking2Savings(int amount);private:
externally_locked<BankAccount, AccountManager> checkingAcct_; externally_locked<BankAccount, AccountManager> savingsAcct_;};
Boost.Thread - externally_locked<T,Lockable>
externally_locked<T,Lockable>allows to protect arbitrary objects of type T.
Locking needs strict_lock<Lockable> !
Partner:
void AccountManager::Checking2Savings(int amount) { strict_lock<AccountManager> guard(*this);
checkingAcct_.get(guard).Withdraw(amount); savingsAcct_.get(guard).Deposit(amount);}
Boost.Thread - Lock Options Specify in lock constructor:
x_lock(Lockable& m, boost::adopt_lock_t) x_lock(Lockable& m, boost::defer_lock_t) x_lock(Lockable& m, boost::try_lock_t)
Adopt lock: Immediately lock (default). Defer lock: Must be locked later manually. Try lock: Lock is only acquired
when mutex is currently unlocked.
Partner:
Boost.Thread and another deadlock
Partner:
#include <boost/thread.hpp>
boost::mutex mut;bool data_ready;
void process_data();
void wait_for_data_to_process(){ boost::unique_lock<boost::mutex> lock(mut); while(!data_ready) {} process_data();}
void retrieve_data();void prepare_data();
void prepare_data_for_processing(){ retrieve_data(); prepare_data(); { boost::lock_guard<boost::mutex> lock(mut); data_ready=true; }}
Boost.Thread and Condition Variables
Goal: Wait until data is ready. Condition variable unlocks mutex on
wait(). Sleeps while waiting.
Partner:
#include <boost/thread.hpp>
boost::condition_variable cond;boost::mutex mut;bool data_ready;
void process_data();
void wait_for_data_to_process(){ boost::unique_lock<boost::mutex> lock(mut); while(!data_ready) { cond.wait(lock); } process_data();}
Boost.Thread and Condition Variables
Condition Variables are more than Atomics!Atomics do not support waking up threads.
notify_one() wakes one waiting thread. Note: notify_all() wakes all waiting threads.
Partner:
void retrieve_data();void prepare_data();
void prepare_data_for_processing(){ retrieve_data(); prepare_data(); { boost::lock_guard<boost::mutex> lock(mut); data_ready=true; } cond.notify_one();}
Boost.Thread and Futures Suppose we want to run a computation
async.
“The computation” is implemented ina single Functor (function or function object).
That Functor returns a single object ofan arbitrary type (int, float, double, …).
At some point in the future we require this value.
Partner:
Boost.Thread and Packaged Tasks
Partner:
#include <boost/thread/future.hpp>
int calculate_the_answer_to_life_the_universe_and_everything(){ return 42;}
int main() { boost::packaged_task<int> pt(calculate_the_answer_to_life_the_universe_and_everything); boost::unique_future<int> fi = pt.get_future();
boost::thread task(boost::move(pt)); // launch task on a thread
fi.wait(); // wait for it to finish
assert(fi.is_ready()); assert(fi.has_value()); assert(!fi.has_exception()); assert(fi.get_state()==boost::future_state::ready); assert(fi.get()==42);
std::cout << fi.get() << std::endl; std::cin.get();
return 0;}
Boost.Thread and Promises Promises are like Packaged Tasks,
except there is no Functor that needs to return.
A value is simply set at an unknown pointin the future from somewhere else.
Partner:
int main() { boost::promise<int> pi; boost::unique_future<int> fi; fi=pi.get_future();
pi.set_value(42);
assert(fi.is_ready()); assert(fi.has_value()); assert(!fi.has_exception()); assert(fi.get_state()==boost::future_state::ready); assert(fi.get()==42);
return 0;}
Boost.Thread and Futures The Future is now!
If we can immediately return the result,this is much faster.
Note: boost::async() creates Futures.
Partner:
boost::unique_future<int> compute(int x){ if (x == 0) return boost::make_ready_future(0); if (x < 0) return boost::make_ready_future<int>(std::logic_error("Error")); boost::unique_future<int> f1 = boost::async(calculate_the_answer_to_life_the_universe_and_everything); return f1;}
Boost.Thread and Futures Futures can be piped!
Not yet, but in the future (Boost 1.56) ?
Partner:
using namespace boost;
std::string interpret_the_answer_to_life_the_universe_and_everything(unique_future<int> fi){ if(42 == fi.get()) return "I do not understand."; else return "That is even more confusing.";}
int main(){ unique_future<int> f1 = boost::async(calculate_the_answer_to_life_the_universe_and_everything); unique_future<std::string> f2 = f1.then(interpret_the_answer_to_life_the_universe_and_everything);
std::cout << f2.get() << std::endl; return 0;}
Boost.Thread - Advanced Topics Locking multiple mutexes at once
(boost::lock) Thread Groups One-time initialization Barriers Synchronized values
Other mutex types Other lock types
Partner:
Boost.Lockfree Provides a number of lockfree data
structures: Queue Single Producer Single Consumer Queue (SPSC) Stack
Not just concurrent, but really lockfree!
Partner:
Boost.Lockfree Queue Example
Partner:
boost::lockfree::queue<int> queue;
void producer(void){ for (int i = 0; i != iterations; ++i) { int value = ++producer_count; while (!queue.push(value)) ; }}
boost::atomic<bool> done (false);void consumer(void){ int value; while (!done) { while (queue.pop(value)) ++consumer_count; }
while (queue.pop(value)) ++consumer_count;}
Boost.Lockfree SPSC Queue Example
Partner:
boost::lockfree::spsc_queue<int, boost::lockfree::capacity<1024> > spsc_queue;
void producer(void){ for (int i = 0; i != iterations; ++i) { int value = ++producer_count; while (!spsc_queue.push(value)) ; }}
boost::atomic<bool> done (false);void consumer(void){ int value; while (!done) { while (spsc_queue.pop(value)) ++consumer_count; }
while (spsc_queue.pop(value)) ++consumer_count;}
Boost.Lockfree Stack Example
Partner:
boost::lockfree::stack<int> stack(128);
void producer(void){ for (int i = 0; i != iterations; ++i) { int value = ++producer_count; while (!stack.push(value)) ; }}
boost::atomic<bool> done (false);void consumer(void){ int value; while (!done) { while (stack.pop(value)) ++consumer_count; }
while (stack.pop(value)) ++consumer_count;}
Intel Thread Building Blocks Not as low level as Boost.Thread. Has more data-structures than
Boost.Lockfree.
Not strictly task-parallel, but also data-parallel.
But more flexible than OpenMP or OpenCL.
A specialist library for “difficult cases”.
Partner:
Intel TBB – parallel_do Number of elements/iterations
is not known in advance - Cook until done.
New elements can be inserted dynamically and concurrently, even by the Functor itself.
Partner:
class ApplyFoo {public: void operator()( Item& item ) const { Foo(item); }};
void ParallelApplyFooToList( const std::list<Item>& list ) { tbb::parallel_do( list.begin(), list.end(), ApplyFoo() ); }
Intel TBB – pipeline Parallel assembly lines Multiple sequential steps for each element
Partner:
void RunPipeline( int ntoken, FILE* input_file, FILE* output_file ) { tbb::filter_t<void,TextSlice*> f1( tbb::filter::serial_in_order, MyInputFunc(input_file) ); tbb::filter_t<TextSlice*,TextSlice*> f2(tbb::filter::parallel, MyTransformFunc() ); tbb::filter_t<TextSlice*,void> f3(tbb::filter::serial_in_order, MyOutputFunc(output_file) ); tbb::filter_t<void,void> f = f1 & f2 & f3; tbb::parallel_pipeline(ntoken,f);}
Intel TBB – Concurrent Containers concurrent_hash_map<T> concurrent_vector<T> concurrent_queue<T>
Partner:
// A concurrent hash table that maps strings to ints.typedef concurrent_hash_map<string,int,MyHashCompare> StringTable;
void CountOccurrences() { // Construct empty table. StringTable table;
// Put occurrences into the table parallel_for( blocked_range<string*>( Data, Data+N, 1000 ), Tally(table) );
// Display the occurrences for( StringTable::iterator i=table.begin(); i!=table.end(); ++i ) printf("%s %d\n",i->first.c_str(),i->second);}
Intel TBB – Mutual Exclusion Much like Boost.Thread Mutexes
Partner:
Mutex Scalable Fair Recursive Long Wait
mutex OS dependent OS dependent no blocks
recursive_mutex OS dependent OS dependent yes blocks
spin_mutex no no no yields
queuing_mutex ✓ ✓ no yields
spin_rw_mutex no no no yields
queuing_rw_mutex ✓ ✓ no yields
null_mutex moot ✓ ✓ never
null_rw_mutex moot ✓ ✓ never
Intel TBB – Atomic Operations Much like Boost.Atomic
Partner:
atomic<int> x(1);
int old = x.fetch_and_add<release>(-1);
Memory order Description Default For
acquire Operations after the atomic operation never move over it.
read
release Operations before the atomic operation never move over it.
write
sequentially consistent Operations on either side never move over the atomic operation and the sequentially consistent atomic operations have a global order.
fetch_and_store fetch_and_add compare_and_swap
Intel TBB – Cache Aligned Allocation Cache-efficiency is very important with
today’s hardware. Often more important than
number of instructions of an algorithm. Cache lines require memory alignment.
Cache Aligned Allocator Example:
Partner:
std::vector<int,cache_aligned_allocator<int> >;
Intel TBB – The Task Scheduler Reasons for Task-based Programming
Matching parallelism to available resources
Faster task startup and shutdown More efficient evaluation order Improved load balancing Higher–level thinking
But: More potential for errors!Partner:
Intel TBB – The Task Scheduler Task Example
Partner:
class FibTask: public task {public: const long n; long* const sum; FibTask( long n_, long* sum_ ) : n(n_), sum(sum_) {} task* execute() { // Overrides virtual function task::execute if( n<CutOff ) { *sum = SerialFib(n); } else { long x, y; FibTask& a = *new( allocate_child() ) FibTask(n-1,&x); FibTask& b = *new( allocate_child() ) FibTask(n-2,&y); // Set ref_count to 'two children plus one for the wait". set_ref_count(3); // Start b running. spawn( b ); // Start a running and wait for all children (a and b). spawn_and_wait_for_all(a); // Do the sum *sum = x+y; } return NULL; }};
Intel TBB – Design Patterns The hidden strength of TBB lays in
many parallel design patternsit supports and documents: Agglomeration Element-wise Odd-even communication Wavefront Compare and Swap Loop More …
Partner:
FRAGEN?
Partner:Partner:
Vielen Dank![Speaker]
Ich freue mich auf Ihr Feedback!
Partner:Partner:
Boost.Thread barrier
Partner:
#include <boost/atomic.hpp>#include <boost/thread.hpp>#include <boost/thread/scoped_thread.hpp>#include <boost/thread/barrier.hpp>#include <boost/bind.hpp>
boost::atomic<int> current(0);boost::mutex io_mutex;
void thread_fun(boost::barrier& cur_barier){ current.fetch_add(1,boost::memory_order_relaxed); cur_barier.wait();
boost::lock_guard<boost::mutex> locker(io_mutex); std::cout << current << std::endl;}
int main(){ boost::barrier bar(3); boost::scoped_thread<> thr1(boost::thread(boost::bind(&thread_fun, boost::ref(bar)))); boost::scoped_thread<> thr2(boost::thread(boost::bind(&thread_fun, boost::ref(bar)))); boost::scoped_thread<> thr3(boost::thread(boost::bind(&thread_fun, boost::ref(bar))));
std::cin.get(); return 0;}