I want to run two concurrent threads both of which call a function (e.g. void channel()) and that function needs access to a few objects namely a std::random_device, a PRNG engine, and two std::uniform_int_distribution objects. However I am not sure where I should define each of these objects. 3 different options come to my mind:
Globally
Inside the channel function as thread_local (this one seems more natural to me)
In the body of each of thread functions (i.e. thread1 and thread2) as automatic variables.
Which one is the most efficient and least problematic? And is there any other option that might be better?
Here's an MRE (link):
#include <chrono>
#include <random>
#include <thread>
#include <fmt/core.h>
#include <fmt/std.h>
using std::chrono_literals::operator""ms;
// globally defined
// extern thread_local std::random_device rand_dev;
// extern thread_local std::random_device rand_dev { };
void
channel( /*std::mt19937& mtgen*/ )
{
// inside the function as thread_local
thread_local std::random_device rand_dev { };
thread_local std::mt19937 mtgen { rand_dev( ) };
thread_local std::uniform_int_distribution uniform_50_50_dist { 1, 2 };
if ( uniform_50_50_dist( mtgen ) == 1 )
{
thread_local std::uniform_int_distribution<size_t> uniform_dist_for_bit_select { 0, 10 };
const auto random_index { uniform_dist_for_bit_select( mtgen ) };
std::this_thread::sleep_for( 100ms );
fmt::print( "thread: {} produced {}\n", std::this_thread::get_id( ),
random_index );
}
}
void thread1( )
{
// inside the actual thread as automatic storage duration
// std::random_device rand_dev { };
// std::mt19937 mtgen { rand_dev( ) };
for ( size_t count { }; count < 10; ++count )
{
channel( /*mtgen*/ );
}
}
void thread2( )
{
// inside the actual thread as automatic storage duration
// std::random_device rand_dev { };
// std::mt19937 mtgen { rand_dev( ) };
for ( size_t count { }; count < 5; ++count )
{
channel( /*mtgen*/ );
}
}
int main( )
{
std::jthread th1 { thread1 };
std::jthread th2 { thread2 };
}
Now I may have to mention that I want each thread to have a separate std::random_device and a separate std::mt19937 engine so that I don't have to share them between threads (because that way I'll have to synchronize them using e.g. mutexes). Although I guess using a single std::random_device is possible by locking its mutex with a std::scoped_lock before accessing it via its call operator().
Related
When trying to learn threads most examples suggests that I should put std::mutex, std::condition_variable and std::queue global when sharing data between two different threads and it works perfectly fine for simple scenario. However, in real case scenario and bigger applications this may soon get complicated as I may soon lose track of the global variables and since I am using C++ this does not seem to be an appropriate option (may be I am wrong)
My question is if I have a producer/consumer problem and I want to put both in separate classes, since they will be sharing data I would need to pass them the same mutex and queue now how do I share these two variables between them without defining it to be global and what is the best practice for creating threads?
Here is a working example of my basic code using global variables.
#include <iostream>
#include <thread>
#include <mutex>
#include <queue>
#include <condition_variable>
std::queue<int> buffer;
std::mutex mtx;
std::condition_variable cond;
const int MAX_BUFFER_SIZE = 50;
class Producer
{
public:
void run(int val)
{
while(true) {
std::unique_lock locker(mtx) ;
cond.wait(locker, []() {
return buffer.size() < MAX_BUFFER_SIZE;
});
buffer.push(val);
std::cout << "Produced " << val << std::endl;
val --;
locker.unlock();
// std::this_thread::sleep_for(std::chrono::seconds(2));
cond.notify_one();
}
}
};
class Consumer
{
public:
void run()
{
while(true) {
std::unique_lock locker(mtx);
cond.wait(locker, []() {
return buffer.size() > 0;
});
int val = buffer.front();
buffer.pop();
std::cout << "Consumed " << val << std::endl;
locker.unlock();
std::this_thread::sleep_for(std::chrono::seconds(1));
cond.notify_one();
}
}
};
int main()
{
std::thread t1(&Producer::run, Producer(), MAX_BUFFER_SIZE);
std::thread t2(&Consumer::run, Consumer());
t1.join();
t2.join();
return 0;
}
Typically, you want to have synchronisation objects packaged alongside the resource(s) they are protecting.
A simple way to do that in your case would be a class that contains the buffer, the mutex, and the condition variable. All you really need is to share a reference to one of those to both the Consumer and the Producer.
Here's one way to go about it while keeping most of your code as-is:
class Channel {
std::queue<int> buffer;
std::mutex mtx;
std::condition_variable cond;
// Since we know `Consumer` and `Producer` are the only entities
// that will ever access buffer, mtx and cond, it's better to
// not provide *any* public (direct or indirect) interface to
// them, and use `friend` to grant access.
friend class Producer;
friend class Consumer;
public:
// ...
};
class Producer {
Channel* chan_;
public:
explicit Producer(Channel* chan) : chan_(chan) {}
// ...
};
class Consumer {
Channel* chan_;
public:
explicit Consumer(Channel* chan) : chan_(chan) {}
// ...
};
int main() {
Channel channel;
std::thread t1(&Producer::run, Producer(&channel), MAX_BUFFER_SIZE);
std::thread t2(&Consumer::run, Consumer(&channel));
t1.join();
t2.join();
}
However, (Thanks for the prompt, #Ext3h) a better way to go about this would be to encapsulate access to the synchronisation objects as well, i.e. keep them hidden in the class. At that point Channel becomes what is commonly known as a Synchronised Queue
Here's what I'd subjectively consider a nicer-looking implementation of your example code, with a few misc improvements thrown in as well:
#include <cassert>
#include <iostream>
#include <thread>
#include <mutex>
#include <queue>
#include <optional>
#include <condition_variable>
template<typename T>
class Channel {
static constexpr std::size_t default_max_length = 10;
public:
using value_type = T;
explicit Channel(std::size_t max_length = default_max_length)
: max_length_(max_length) {}
std::optional<value_type> next() {
std::unique_lock locker(mtx_);
cond_.wait(locker, [this]() {
return !buffer_.empty() || closed_;
});
if (buffer_.empty()) {
assert(closed_);
return std::nullopt;
}
value_type val = buffer_.front();
buffer_.pop();
cond_.notify_one();
return val;
}
void put(value_type val) {
std::unique_lock locker(mtx_);
cond_.wait(locker, [this]() {
return buffer_.size() < max_length_;
});
buffer_.push(std::move(val));
cond_.notify_one();
}
void close() {
std::scoped_lock locker(mtx_);
closed_ = true;
cond_.notify_all();
}
private:
std::size_t max_length_;
std::queue<value_type> buffer_;
bool closed_ = false;
std::mutex mtx_;
std::condition_variable cond_;
};
void producer_main(Channel<int>& chan, int val) {
// Don't use while(true), it's Undefined Behavior
while (val >= 0) {
chan.put(val);
std::cout << "Produced " << val << std::endl;
val--;
}
}
void consumer_main(Channel<int>& chan) {
bool running = true;
while (running) {
auto val = chan.next();
if (!val) {
running = false;
continue;
}
std::cout << "Consumed " << *val << std::endl;
};
}
int main()
{
// You are responsible for ensuring the channel outlives both threads.
Channel<int> channel;
std::thread producer_thread(producer_main, std::ref(channel), 13);
std::thread consumer_thread(consumer_main, std::ref(channel));
producer_thread.join();
channel.close();
consumer_thread.join();
return 0;
}
I'm rephrasing this question more simply, and with a simpler MCVE after a prior version didn't gain much traction.
I had formed the impression that after main() ends, all that happens is global object destruction, then static object destruction, in that order.
I'd never considered the possibility of implications of other "stuff" happening during this period between end of main() and end of the process. But I've recently been working with Linux timers, and experimentally, it appears that timers' callbacks can be invoked during this "late phase" of a process, after main() exits and even after static global objects have been destroyed.
Question: Is that assessment correct? Can a timer callback be invoked after static global objects have been destroyed?
I'd never given much thought to what happens this "late" in a process' lifetime. I suppose I'd naively assumed "something" "prevented" "stuff happening" after main() exited.
Question: My timer callback uses a static global object -- the intent being that the object would "always" be around, regardless of when the callback was invoked. But if timer callbacks can be invoked after static global objects have been destroyed, then that strategy isn't safe. Is there a well-known/correct way to handle this: i.e. prevent timer callbacks from ever accessing invalid objects/memory?
The code below creates "many" timers set to expire 2 seconds in the future, whose callback references a static global object. main() exits right around the middle of when timer callbacks are being invoked. couts show that the static global object is destroyed while timer callbacks are still being invoked.
// main.cpp
#include <algorithm>
#include <cerrno>
#include <csignal>
#include <cstring>
#include <iostream>
#include <map>
#include <mutex>
#include <string>
#include <unistd.h>
using namespace std;
static int tmp = ((srand ( time( NULL ) )), 0);
class Foo { // Encapsulates a random-sized, random-content string.
public:
Foo() {
uint32_t size = (rand() % 24) + 1;
std::generate_n( std::back_inserter( s_ ), size, randChar );
}
void operator=( const Foo& other ) { s_ = other.s_; }
std::string s_;
private:
static char randChar() { return ('a' + rand() % 26); }
};
class GlobalObj { // Encapsulates a map<timer_t, Foo>.
public:
~GlobalObj() { std::cout << __FUNCTION__ << std::endl; }
Foo* getFoo( const timer_t& timer ) {
Foo* ret = NULL;
{
std::lock_guard<std::mutex> l( mutex_ );
std::map<timer_t, Foo*>::iterator i = map_.find( timer );
if ( map_.end() != i ) {
ret = i->second;
map_.erase( i );
}
}
return ret;
}
void setFoo( const timer_t& timer, Foo* foo ) {
std::lock_guard<std::mutex> l( mutex_ );
map_[timer] = foo;
}
private:
std::mutex mutex_;
std::map<timer_t, Foo*> map_;
};
static GlobalObj global_obj; // static global GlobalObj instance.
void osTimerCallback( union sigval sv ) { // The timer callback
timer_t* timer = (timer_t*)(sv.sival_ptr);
if ( timer ) {
Foo* foo = global_obj.getFoo(*timer);
if ( foo ) {
cout << "timer[" << *timer << "]: " << foo->s_ << endl;
delete foo;
}
delete timer;
}
}
bool createTimer( const struct timespec& when ) { // Creates an armed timer.
timer_t* timer = new timer_t;
struct sigevent se;
static clockid_t clock_id =
#ifdef CLOCK_MONOTONIC
CLOCK_MONOTONIC;
#else
CLOCK_REALTIME;
#endif
memset( &se, 0, sizeof se );
se.sigev_notify = SIGEV_THREAD;
se.sigev_value.sival_ptr = timer;
se.sigev_notify_function = osTimerCallback;
if ( timer_create( clock_id, &se, timer ) ) {
cerr << "timer_create() err " << errno << " " << strerror( errno ) << endl;
return false;
}
{
struct itimerspec its;
memset( &its, 0, sizeof its );
its.it_value.tv_sec = when.tv_sec;
its.it_value.tv_nsec = when.tv_nsec;
if ( timer_settime( *timer, 0, &its, NULL ) ) {
cerr << "timer_settime err " << errno << " " << strerror( errno ) << endl;
return false;
}
global_obj.setFoo( *timer, new Foo );
}
return true;
}
int main( int argc, char* argv[] ) { // Creates many armed timers, then exits
static const struct timespec when = { 2, 0 };
for ( uint32_t i = 0; i < 100; ++i ) {
createTimer( when );
}
usleep( 2000010 );
return 0;
}
Example error:
$ g++ --version && g++ -g ./main.cpp -lrt && ./a.out
g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
timer[timer[~GlobalObj0x55b34c17bd700x55b34c17be60
]: gx
*** Error in `./a.out': double free or corruption (fasttop): 0xtimer[0x55b34c17bf50]: wsngolhdjvhx
]: npscgelwujjfp
Aborted
Note that the error mentions "double free"; the code above has two delete statements: removing them does not seem to impact reproducibility of the problem. I believe the error message is a red herring, due to accessing invalidated memory.
Increasing the usleep() in main() to sufficiently large so as to allow all timer callback invocations to occur before the static global object destruction results in consistently successful execution.
No, there is no magic preventing a timer to fire after the end of main.
The common way to prevent things like this happening in C++ is to create a small resource owning class for each type of resource that needs to be manually released. See RAII and The rule of three/five/zero.
The foundation for such a class could look like this:
#include <cerrno> // errno
#include <cstring> // std::strerror
#include <stdexcept> // std::runtime_error
#include <string> // std::string
#include <utility> // std::exchange
class Timer {
public:
Timer(clockid_t clockid, sigevent& sev) {
if(timer_create(clockid, &sev, &timerid))
throw std::runtime_error(std::string("timer_create: ") +
std::strerror(errno));
}
// rule of 5
Timer(const Timer&) = delete; // no copy construction
Timer(Timer&& rhs) : // move construction ok
timerid(std::exchange(rhs.timerid, nullptr)) {}
Timer& operator=(const Timer&) = delete; // no copy assignment
Timer& operator=(Timer&& rhs) { // move assignment ok
if(this != &rhs) {
if(timerid) timer_delete(timerid);
timerid = std::exchange(rhs.timerid, nullptr);
}
return *this;
}
~Timer() {
if(timerid) timer_delete(timerid);
}
private:
timer_t timerid;
};
You can now store your Timers in a container and they will be properly deleted when the container goes out of scope.
Having this approach whenever one has to deal with one of these create / delete pairs, that are commonly found in C API:s, usually limits the number of surprises like the one you got.
Also read about the Static Initialization Order Fiasco to avoid other potential pitfalls.
Note: This implementation makes use of the fact that timer_t is a pointer type on my system and I don't know if that's always the case.
On Ubuntu, I have a shared library mylibrary.so, with a function AlphaFunction. I want to load this function in C++ using dlopen, and then call it in two different threads. However, this is giving me run-time errors, presumably because the two threads are both trying to access the same memory where the function is stored.
The library itself controls a robot arm via USB, and the actual run-time error I get is: LIBUSB_ERROR_NO_DEVICE returned by the Write operation.
I know how to use std::atomic for dealing with shared variables, but what about a shared function?
For example:
void Foo(int (*FooFunction)())
{
while(true)
{
FooFunction();
}
}
void Bar(int (*BarFunction)())
{
while(true)
{
BarFunction();
}
}
int main()
{
void* api_handle = dlopen("mylibrary.so", RTLD_NOW|RTLD_GLOBAL);
int (*MoveRobot)() = (int (*)()) dlsym(api_handle, "Move");
std::thread t1(Foo, MoveRobot);
std::thread t2(Bar, MoveRobot);
t1.join();
t2.join();
return 0;
}
I've had a look at the comments. Here's a solution that covers all concerns:
the robot library is not thread safe, and
all calls to the robot library must be on the same thread
This answer proposes a solution in which a third thread is started up which acts as the robot request marshaller. The other threads post tasks to this thread's queue, which are executed one at a time, with the result of the call being returned via a future on which the caller can wait.
#include <thread>
#include <mutex>
#include <queue>
#include <future>
#include <functional>
// these definitions here just to make the example compile
#define RTLD_NOW 1
#define RTLD_GLOBAL 2
extern "C" void* dlopen(const char*, int);
extern "C" void* dlsym(void*, const char*);
struct RobotCaller final
{
RobotCaller()
{
_library_handle = dlopen("mylibrary.so", RTLD_NOW|RTLD_GLOBAL);
_Move = (int (*)()) dlsym(_library_handle, "Move");
// caution - thread starts. do not derive from this class
start();
}
void start()
{
_robot_thread = std::thread([this]{
consume_queue();
});
}
~RobotCaller() {
if (_robot_thread.joinable()) {
std::unique_lock<std::mutex> lock(_queue_mutex);
_should_quit = true;
lock.unlock();
_queue_condition.notify_all();
_robot_thread.join();
}
// close library code goes here
}
std::future<int> Move()
{
return queue_task(_Move);
}
private:
void consume_queue() {
;
for(std::unique_lock<std::mutex> lock(_queue_mutex) ; !_should_quit ; lock.lock()) {
_queue_condition.wait(lock, [this]{
return _should_quit || (!_task_queue.empty());
});
if (!_task_queue.empty()) {
auto task = std::move(_task_queue.front());
_task_queue.pop();
lock.unlock();
task();
}
}
}
std::future<int> queue_task(int (*f)())
{
std::packaged_task<int()> task(f);
auto fut = task.get_future();
std::unique_lock<std::mutex> lock(_queue_mutex);
_task_queue.push(std::move(task));
return fut;
}
private:
// library management
void* _library_handle = nullptr;
int (*_Move)() = nullptr;
// queue management
std::thread _robot_thread;
std::queue<std::packaged_task<int()>> _task_queue;
bool _should_quit = false;
std::mutex _queue_mutex;
std::condition_variable _queue_condition;
};
void Foo(std::function<std::future<int>()> FooFunction)
{
while(true)
{
// marshal the call onto the robot queue and wait for a result
auto result = FooFunction().get();
}
}
void Bar(std::function<std::future<int>()> BarFunction)
{
while(true)
{
// marshal the call onto the robot queue and wait for a result
auto result = BarFunction().get();
}
}
int main()
{
RobotCaller robot_caller;
std::thread t1(Foo, std::bind(&RobotCaller::Move, &robot_caller));
std::thread t2(Bar, std::bind(&RobotCaller::Move, &robot_caller));
t1.join();
t2.join();
return 0;
}
I find that Visual Sudio 2012 makes std::mutex copy constructor private, so I think it can only be passed by reference or pointer, and I test both of them, but to my surprise, the pointer style pass, while the reference style rejected by the compiler, and it says:
" error C2248: 'std::mutex::mutex' : cannot access private member declared in class 'std::mutex' ", so the compiler assumes that I try to copy the std::mutex, but in fact I pass it by reference! Anyone have a experience about that? I list my code here:
#include <iostream>
#include <vector>
#include <thread>
#include <mutex>
struct SharedMemory
{
public:
int s;
std::mutex mutex;
public:
SharedMemory( int s = 1 ) : s(s){}
void write( int s )
{
mutex.lock();
this->s = s;
mutex.unlock();
}
int read()
{
int tmp;
mutex.lock();
tmp = this->s;
mutex.unlock();
return tmp;
}
void print()
{
std::cout << read() << std::endl;
}
};
void modify( SharedMemory& sm, int i ) // **must use the pointer to the shared memory**
{
//sm->write(i);
sm.write(i);
}
int main( int argc, char* argv[] )
{
SharedMemory sm;
SharedMemory& tmp = sm;
std::vector< std::thread > vec;
for( int i = 0; i < 10; ++i )
{
vec.push_back( std::thread( modify, sm, i ) );
sm.print();
}
for( auto& it : vec ) it.join();
return 0;
}
Can someone give me a TBB example how to:
set the maximum count of active threads.
execute tasks that are independent from each others and presented in the form of class, not static functions.
Here's a couple of complete examples, one using parallel_for, the other using parallel_for_each.
Update 2014-04-12: These show what I'd consider to be a pretty old fashioned way of using TBB now; I've added a separate answer using parallel_for with a C++11 lambda.
#include "tbb/blocked_range.h"
#include "tbb/parallel_for.h"
#include "tbb/task_scheduler_init.h"
#include <iostream>
#include <vector>
struct mytask {
mytask(size_t n)
:_n(n)
{}
void operator()() {
for (int i=0;i<1000000;++i) {} // Deliberately run slow
std::cerr << "[" << _n << "]";
}
size_t _n;
};
struct executor
{
executor(std::vector<mytask>& t)
:_tasks(t)
{}
executor(executor& e,tbb::split)
:_tasks(e._tasks)
{}
void operator()(const tbb::blocked_range<size_t>& r) const {
for (size_t i=r.begin();i!=r.end();++i)
_tasks[i]();
}
std::vector<mytask>& _tasks;
};
int main(int,char**) {
tbb::task_scheduler_init init; // Automatic number of threads
// tbb::task_scheduler_init init(2); // Explicit number of threads
std::vector<mytask> tasks;
for (int i=0;i<1000;++i)
tasks.push_back(mytask(i));
executor exec(tasks);
tbb::parallel_for(tbb::blocked_range<size_t>(0,tasks.size()),exec);
std::cerr << std::endl;
return 0;
}
and
#include "tbb/parallel_for_each.h"
#include "tbb/task_scheduler_init.h"
#include <iostream>
#include <vector>
struct mytask {
mytask(size_t n)
:_n(n)
{}
void operator()() {
for (int i=0;i<1000000;++i) {} // Deliberately run slow
std::cerr << "[" << _n << "]";
}
size_t _n;
};
template <typename T> struct invoker {
void operator()(T& it) const {it();}
};
int main(int,char**) {
tbb::task_scheduler_init init; // Automatic number of threads
// tbb::task_scheduler_init init(4); // Explicit number of threads
std::vector<mytask> tasks;
for (int i=0;i<1000;++i)
tasks.push_back(mytask(i));
tbb::parallel_for_each(tasks.begin(),tasks.end(),invoker<mytask>());
std::cerr << std::endl;
return 0;
}
Both compile on a Debian/Wheezy (g++ 4.7) system with g++ tbb_example.cpp -ltbb (then run with ./a.out)
(See this question for replacing that "invoker" thing with a std::mem_fun_ref or boost::bind).
Here's a more modern use of parallel_for with a lambda; compiles and runs on Debian/Wheezy with g++ -std=c++11 tbb_example.cpp -ltbb && ./a.out:
#include "tbb/parallel_for.h"
#include "tbb/task_scheduler_init.h"
#include <iostream>
#include <vector>
struct mytask {
mytask(size_t n)
:_n(n)
{}
void operator()() {
for (int i=0;i<1000000;++i) {} // Deliberately run slow
std::cerr << "[" << _n << "]";
}
size_t _n;
};
int main(int,char**) {
//tbb::task_scheduler_init init; // Automatic number of threads
tbb::task_scheduler_init init(tbb::task_scheduler_init::default_num_threads()); // Explicit number of threads
std::vector<mytask> tasks;
for (int i=0;i<1000;++i)
tasks.push_back(mytask(i));
tbb::parallel_for(
tbb::blocked_range<size_t>(0,tasks.size()),
[&tasks](const tbb::blocked_range<size_t>& r) {
for (size_t i=r.begin();i<r.end();++i) tasks[i]();
}
);
std::cerr << std::endl;
return 0;
}
If you just want to run a couple of tasks concurrently, it might be easier to just use a tbb::task_group. Example taken from tbb:
#include "tbb/task_group.h"
using namespace tbb;
int Fib(int n) {
if( n<2 ) {
return n;
} else {
int x, y;
task_group g;
g.run([&]{x=Fib(n-1);}); // spawn a task
g.run([&]{y=Fib(n-2);}); // spawn another task
g.wait(); // wait for both tasks to complete
return x+y;
}
}
Note however that
Creating a large number of tasks for a single task_group is not scalable, because task creation becomes a serial bottleneck.
In those cases, use timday's examples with a parallel_for or alike.
1-
//!
//! Get the default number of threads
//!
int nDefThreads = tbb::task_scheduler_init::default_num_threads();
//!
//! Init the task scheduler with the wanted number of threads
//!
tbb::task_scheduler_init init(nDefThreads);
2-
Maybe if your code permits, the best way to run independent task with TBB is the parallel_invoke. In the blog of intel developers zone there is a post explaining some cases of how helpfull parallel_invoke could be. Check out this