I'm facing a rather basic problem that I'd like to implement in C++(11 is Ok) using only standard libs.
Assume a function "Message()" which can be called any given number of times, after it's not called for a given time I want to trigger an action. For example:
Message();
Message();
Message();
Message();
sleep(10); <-- the idle time triggers an action
Message();
Hope this makes sense.
The way I implemented this so far is using a combination of an async while loop and a condition variable which I pulse every time a message comes in. Once the wait for the CV times out I take action.
In pseudo-code:
void Message(){
unique_lock ul(M);
new_message = false;
CV.notify();
}
...
future = std::async([]{
do {
unique_lock ul(M);
new_message = false;
got_message_within_time = CV.wait_for(ul, 20ms, new_message==true);
} while got_message_within_time;
// Got a timeout here...
time_to_take_action();
});
...
Message()
Message()
Message()
sleep(10)
I'm not convinced this is the most elegant solution out there, anyone has better suggestions? All in all the fundamental statement I want to implement is: "once I stop calling you, do something"
Any help/suggestion is welcome
Thanks !
I use a method similar to what you do, but I leave a monitoring thread running.
#include <chrono>
#include <iostream>
#include <thread>
using namespace std::chrono;
time_point<steady_clock> lastMessage(steady_clock::now());
bool shutdown(false);
void InactiveMessageLoop(){
while(!shutdown){
if((steady_clock::now() - lastMessage) > milliseconds(1000)){
std::cout << "Are you still there?" << std::endl;
}
std::this_thread::sleep_for(milliseconds(1000));
}
std::cout << "No hard feelings." << std::endl;
}
void Message(){
lastMessage = steady_clock::now();
// do stuff;
}
int main(){
std::thread MessageThread(InactiveMessageLoop);
Message();
Message();
std::this_thread::sleep_for(milliseconds(5000));
Message();
Message();
shutdown = true;
if(MessageThread.joinable()) MessageThread.join();
return 0;
}
Alternatively you could look into sending a signal, but it's probably just as effective to just interlace your code with a call to a message check function, since its considered bad practice to put business logic inside of a signal function.
Edit: I added the join at the end.
Related
I am currently using boost 1.70 and I was trying to implement io service loop to have a custom call between each invoked handle, and I couldn't get it to work. After some examination, I gained suspicion there are multiples handles executed in one call of "run_one" function. So I wrote a test code:
#include <boost/asio/io_service.hpp>
#include <boost/asio/strand.hpp>
#include <boost/asio/post.hpp>
#include <thread>
#include <mutex>
class StrandPost
{
private:
boost::asio::io_service service_;
boost::asio::io_service::work work_;
boost::asio::io_service::strand strand_;
std::thread module_thread_;
void Run() {
auto run_one = [this]() {
std::cout << " ---- Running one ----" << std::endl;
auto retval = service_.run_one();
return retval;
};
while (run_one());
std::cout << " ---- Ending run ----" << std::endl;
}
public:
StrandPost()
: service_()
, work_(service_)
, strand_(service_)
, module_thread_(&StrandPost::Run, this)
{}
~StrandPost() {
service_.stop();
if (module_thread_.joinable()) {
module_thread_.join();
}
}
void PlanOutput(const std::string& string) {
boost::asio::post(strand_,[string](){
std::cout << string <<std::endl;
});
// boost::asio::post(service_,[string](){
// std::cout << string <<std::endl;
// });
}
};
} // ----- end anonymous namespace -----
int main() {
StrandPost strand;
strand.PlanOutput("First message");
strand.PlanOutput("Second message");
strand.PlanOutput("Third message");
strand.PlanOutput("Fourth message");
std::this_thread::sleep_for(std::chrono::seconds(1));
return 0;
}
And the output of that code confirmed my theory, because it was:
---- Running one ----
First message
---- Running one ----
Second message
Third message
Fourth message
---- Running one ----
---- Ending run ----
When using "io_service" directly, it works as expected, but when using "strand", after the first handle, multiple handles are executed as one.
So, the strand effectively merged several handlers into one.
My question is:
Is this bug or is this intentional? Am I doing something wrong?
If this is a bug, is it reported? Because I could not find a mention of this anywhere.
I'm fairly certain this is intentional. The strand itself is a queue of jobs, that only one thread at a time can perform.
When io_service::run_one runs, it causes the thread to run the strand ready queue. I believe the 'only one once' logic isn't passed through to the strand's processing loop. Think of it it this way, the io_service is told to do one handler, but the strand's handler runs several jobs in sequence before returning.
The best fix for your issue is, if you're going to have your own io_service in your class, is don't use the strand at all, and post directly to the io_service. Then you'll have the behavior you desire.
This is, indeed, as intended. The strand_executor_service pops all ready handlers on the same strand:
void strand_executor_service::run_ready_handlers(implementation_type& impl)
{
// Indicate that this strand is executing on the current thread.
call_stack<strand_impl>::context ctx(impl.get());
// Run all ready handlers. No lock is required since the ready queue is
// accessed only within the strand.
boost::system::error_code ec;
while (scheduler_operation* o = impl->ready_queue_.front())
{
impl->ready_queue_.pop();
o->complete(impl.get(), ec, 0);
}
}
It is quite obvious that this can have a great performance improving impact.
Well, its not that easy, since I also need to be guaranteed that handles posted for execution from a given thread will be executed in the order of posting. Preserving order between posts from different threads is irrelevant, however order of posts from a given thread must be preserved, and as far as I know, "io_service" does not guarantee this. But thanks for the answer, looking further into the boost implementation, it looks you are completely right. –
TStancek
6 hours ago
io_service does have the ordering guarantees of a strand (in fact, the strand's guarantees derive from that). In your case, there is - by definition - only one thread, so everything on the service will be in an implicit strand (see Why do I need strand per connection when using boost::asio?).
Summary
You can do without the strand for the example code in your question.
If your situation is more involved and you need the one-by-one message processing control, you would do better to have a task queue that implements this explicitly, instead of depending on implementation details.
I have been working on a idea for a system where I can have many workers that are triggered on a regular basis by a a central timer class. The part I'm concerned about here is a TriggeredWorker which, in a loop, uses the mutex & conditionVariable approach to wait to be told to do work. It has a method trigger that is called (by a different thread) that triggers work to be done. It is an abstract class that has to be subclassed for the actual work method to be implemented.
I have a test that shows that this mechanism works. However, as I increase the load by reducing the trigger interval, the test starts to fail. When I delay 20 microseconds between triggers, the test is 100% reliable. As I reduce down to 1 microsecond, I start to get failures in that the count of work performed reduces from 1000 (expected) to values like 986, 933, 999 etc..
My questions are: (1) what is it that is going wrong and how can I capture what is going wrong so I can report it or do something about it? And, (2) is there some better approach that I could use that would be better? I have to admit that my experience with c++ is limited to the last 3 months, although I have worked with other languages for several years.
Many thanks for reading...
Here are the key bits of code:
Triggered worker header file:
#ifndef TIMER_TRIGGERED_WORKER_H
#define TIMER_TRIGGERED_WORKER_H
#include <thread>
#include <plog/Log.h>
class TriggeredWorker {
private:
std::mutex mutex_;
std::condition_variable condVar_;
std::atomic<bool> running_{false};
std::atomic<bool> ready_{false};
void workLoop();
protected:
virtual void work() {};
public:
void start();
void stop();
void trigger();
};
#endif //TIMER_TRIGGERED_WORKER_H
Triggered worker implementation:
#include "TriggeredWorker.h"
void TriggeredWorker::workLoop() {
PLOGD << "workLoop started...";
while(true) {
std::unique_lock<std::mutex> lock(mutex_);
condVar_.wait(lock, [this]{
bool ready = this->ready_;
bool running = this->running_;
return ready | !running; });
this->ready_ = false;
if (!this->running_) {
break;
}
PLOGD << "Calling work()...";
work();
lock.unlock();
condVar_.notify_one();
}
PLOGD << "Worker thread completed.";
}
void TriggeredWorker::start() {
PLOGD << "Worker start...";
this->running_ = true;
auto thread = std::thread(&TriggeredWorker::workLoop, this);
thread.detach();
}
void TriggeredWorker::stop() {
PLOGD << "Worker stop.";
this->running_ = false;
}
void TriggeredWorker::trigger() {
PLOGD << "Trigger.";
std::unique_lock<std::mutex> lock(mutex_);
ready_ = true;
lock.unlock();
condVar_.notify_one();
}
and the test:
#include "catch.hpp"
#include "TriggeredWorker.h"
#include <thread>
TEST_CASE("Simple worker performs work when triggered") {
static std::atomic<int> twt_count{0};
class SimpleTriggeredWorker : public TriggeredWorker {
protected:
void work() override {
PLOGD << "Incrementing counter.";
twt_count.fetch_add(1);
}
};
SimpleTriggeredWorker worker;
worker.start();
for (int i = 0; i < 1000; i++) {
worker.trigger();
std::this_thread::sleep_for(std::chrono::microseconds(20));
}
std::this_thread::sleep_for(std::chrono::seconds(1));
CHECK(twt_count == 1000);
std::this_thread::sleep_for(std::chrono::seconds(1));
worker.stop();
}
What happens when worker.trigger() is called twice before workLoop acquires the lock? You loose one of those "triggers". Smaller time gap means higher probability of test failure, because of higher probability of multiple consecutive worker.trigger() calls before workLoop wakes up. Note that there's nothing that guarantees that workLoop will acquire the lock after worker.trigger() but before another worker.trigger() happens, even when those calls happen one after another (i.e. not in parallel). This is governed by the OS scheduler and we have no control over it.
Anyway the core problem is that setting ready_ = true twice looses information. Unlike incrementing an integer twice. And so the simplest solution is to replace bool with int and do inc/dec with == 0 checks. This solution is also known as semaphore. More advanced (potentially better, especially when you need to pass some data to the worker) approach is to use a (bounded?) thread safe queue. That depends on what exactly you are trying to achieve.
BTW 1: all your reads and updates, except for stop() function (and start() but this isn't really relevant), happen under the lock. I suggest you fix stop() to be under lock as well (since it is rarely called anyway) and turn atomics into non-atomics. There's an unnecessary overhead of atomics at the moment.
BTW 2: I suggest not using thread.detach(). You should store the std::thread object on TriggeredWorker and add destructor that does stop with join. These are not independent beings and so without detach() you make your code safer (one should never die without the other).
i have a vector of objects std::vector and the fo object has a method start() where i create the thread specific to this object and now depends on a variable from this object i want to put it in sleep.
so for example if my object is f1 and the variable is bool sleep = false; when the sleep variable is true i want it to go to sleep.
i have tried this method but it doesn't seem to work. i think the if
class fo {
public :
thread* t ;
bool bedient = false , spazieren = false;
void start(){
t = new thread([this]() {
while (!this->bedient){
if (this->spazieren == true){
std::this_thread::sleep_for(std::chrono::seconds(10));
this->spazieren = false ;
}
}
this->join();
});
}
void join(){
t->join(); delete t;
}
};
You have "generated" a lot of problems on your code:
1)
Setting any kind of variable in one thread is potentially invisible in any other thread. If you want to make the other threads sees you changes in the first thread, you have to synchronize your memory. That can be done by using std::mutex with lock and unlock around every change of data or using std::atomic variables, which do the sync themselves or a lot of other methods. Please read a book about multi threaded programming!
2)
You try to join your own thread. That is not the correct usage at all. Any thread can join on others execution end but not on itself. That makes no sense!
3)
If you do not set manually the "sleep" var, your thread is running a loop and is simply doing nothing. A good method to heat up your core and the planet ;)
class fo {
public :
std::thread* t=nullptr ; // prevent corrupt delete if no start() called!
std::atomic<bool> bedient = false ;
std::atomic<bool> spazieren = false;
void start()
{
t = new std::thread([this]()
{
while (!this->bedient)
{
if (this->spazieren == true)
{
std::cout << "We are going to sleep" << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(3));
this->spazieren = false ;
}
}
std::cout << "End loop" << std::endl;
});
}
~fo() { delete t; }
void join()
{
std::cout << "wait for thread ends" << std::endl;
t->join();
}
};
int main()
{
fo f1;
f1.start();
sleep(1);
f1.spazieren = true;
sleep(1);
f1.bedient = true;
f1.join();
}
BTW:
Please do not use using namespace std!
Your design seems to be problematic. Setting vars from external threads to control execution of a thread is typically an abuse. You should think again for your design!
Manually using new/delete can be result in memory leaks.
Creating something with a start() method which later on will be deleted is mysterious. You should create all objects in the constructor.
I would try refactoring your code to use std::future instead of std::thread, furthermore there are a few issues which I believe you'll run into in the short term.
You shouldn't try to join while in the thread you're joining. That is, the code as you have it will never terminate. The lambda you've defined will attempt to call join, however, the lambda will never return since it's waiting on join which will only itself return when the lambda does so. In other words, you're telling it to wait on itself.
You're revealing too much information about the functionality of your class to the outside world. I would suggest moving implementation details into a .cc rather than putting it in the class declaration. Short of that, however, you're providing immediate access to your control variables spazieren and bedient. This is a problem because it complicates control flow and makes for weak abstraction.
Your bools are not atomic. If you attempt to modify them from outside the thread they're being read you'll run into crashes. And in some environments these crashes might be sporadic and very hard to debug.
Only sleeping when asked can be useful if you absolutely need to finish a task as soon as possible, but be aware that it's going to max out a core and if deployed to the wrong environment can cause major problems and slowdowns. I don't know what the end goal is for this program, but I would suggest considering changing the yield in the following code example to -some- period of time to sleep, 10 ms should be sufficient to prevent putting too much stress on your cpu.
Your threads status as to whether or not it's actively running is unclear with your implementation. I'd suggest considering an additional bool to indicate if it's running or not so you can more properly decide what to do if start() is called more than once.
When this object destructs it's going to crash if the thread is still running. You need to be sure to join before your destructor finishes running too.
I would consider the following refactorings:
#include <memory>
#include <future>
#include <atomic>
class fo
{
public:
~fo()
{
this->_bedient = true;
_workThread.wait();
}
void start()
{
_workThread = std::async(std::launch::async, [this]() -> bool
{
while(!this->_bedient)
{
if(true == this->_spazieren)
{
std::this_thread::sleep_for(std::chrono::seconds(10));
this->_spazieren = false;
}
else
{
std::this_thread::yield();
}
}
return true;
});
}
void ShouldSleep(bool shouldSleep)
{
this->_spazieren = shouldSleep;
}
void ShouldStop(bool shouldStop)
{
this->_bedient = !shouldStop;
}
private:
std::future<bool> _workThread = {};
std::atomic<bool> _bedient{ false };
std::atomic<bool> _spazieren{ false };
};
I have this race condition with an audio playback class, where every time I start playback I set keepPlaying as true, and false when I stop.
The problem happens when I stop() immediately after I start, and the keepPlaying flag is set to false, then reset to true again.
I could put a delay in stop(), but I don't think that's a very good solution. Should I use conditional variable to make stop() wait until keepPlaying is true?
How would you normally solve this problem?
#include <iostream>
#include <thread>
using namespace std;
class AudioPlayer
{
bool keepRunning;
thread thread_play;
public:
AudioPlayer(){ keepRunning = false; }
~AudioPlayer(){ stop(); }
void play()
{
stop();
// keepRunning = true; // A: this works OK
thread_play = thread(&AudioPlayer::_play, this);
}
void stop()
{
keepRunning = false;
if (thread_play.joinable()) thread_play.join();
}
void _play()
{
cout << "Playing: started\n";
keepRunning = true; // B: this causes problem
while(keepRunning)
{
this_thread::sleep_for(chrono::milliseconds(100));
}
cout << "Playing: stopped\n";
}
};
int main()
{
AudioPlayer ap;
ap.play();
ap.play();
ap.play();
return 0;
}
Output:
$ ./test
Playing: started
(pause indefinitely...)
Here is my suggestion, combining many comments from below as well:
1) Briefly synchronized the keepRunning flag with a mutex so that it cannot be modified while a previous thread is still changing state.
2) Changed the flag to atomic_bool, as it is also modified while the mutex is not used.
class AudioPlayer
{
thread thread_play;
public:
AudioPlayer(){ }
~AudioPlayer()
{
keepRunning = false;
thread_play.join();
}
void play()
{
unique_lock<mutex> l(_mutex);
keepRunning = false;
if ( thread_play.joinable() )
thread_play.join();
keepRunning = true;
thread_play = thread(&AudioPlayer::_play, this);
}
void stop()
{
unique_lock<mutex> l(_mutex);
keepRunning = false;
}
private:
void _play()
{
cout << "Playing: started\n";
while ( keepRunning == true )
{
this_thread::sleep_for(chrono::milliseconds(10));
}
cout << "Playing: stopped\n";
}
atomic_bool keepRunning { false };
std::mutex _mutex;
};
int main()
{
AudioPlayer ap;
ap.play();
ap.play();
ap.play();
this_thread::sleep_for(chrono::milliseconds(100));
ap.stop();
return 0;
}
To answer the question directly.
Setting keepPlaying=true at point A is synchronous in the main thread but setting it at point B it is asynchronous to the main thread.
Being asynchronous the call to ap.stop() in the main thread (and the one in the destructor) might take place before point B is reached (by the asynchronous thread) so the last thread runs forever.
You should also make keepRunning atomic that will make sure that the value is communicated between the threads correctly. There's no guarantee of when or if the sub-thread will 'see' the value set by the main thread without some synchronization. You could also use a std::mutex.
Other answers don't like .join() in stop(). I would say that's a design decision. You certainly need to make sure the thread has stopped before leaving main()(*) but that could take place in the destructor (as other answers suggest).
As a final note the more conventional design wouldn't keep re-creating the 'play' thread but would wake/sleep a single thread. There's an overhead of creating a thread and the 'classic' model treats this as a producer/consumer pattern.
#include <iostream>
#include <thread>
#include <atomic>
class AudioPlayer
{
std::atomic<bool> keepRunning;
std::thread thread_play;
public:
AudioPlayer():keepRunning(false){
}
~AudioPlayer(){ stop(); }
void play()
{
stop();
keepRunning = true; // A: this works OK
thread_play = std::thread(&AudioPlayer::_play, this);
}
void stop()
{
keepRunning=false;
if (thread_play.joinable()){
thread_play.join();
}
}
void _play()
{
std::cout<<"Playing: started\n";
while(keepRunning)
{
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
std::cout<<"Playing: stopped\n";
}
};
int main()
{
AudioPlayer ap;
ap.play();
ap.play();
ap.play();
ap.stop();
return 0;
}
(*) You can also detach() but that's not recommended.
First, what you have here is indeed the definition of a data race - one thread is writing to a non-atomic variable keepRunning and another is reading from it. So even if you uncomment the line in play, you'd still have a data race. To avoid that, make keepRunning a std::atomic<bool>.
Now, the fundamental problem is the lack of symmetry between play and stop - play does the actual work in a spawned thread, while stop does it in the main thread. To make the flow easier to reason about, increase symmetry:
set keepRunning in play, or
have play wait for the thread to be up and running and done with any setup (also eliminating the need for the if in stop).
As a side note, one way to handle cases where a flag is set and reset in possibly uneven order is to replace it with a counter. You then stall until you see the expected value, and only then apply the change (using CAS).
Ideally, you'd just set keepPlaying before starting the thread (as in your commented out play() function). That's the neatest solution, and skips the race completely.
If you want to be more fancy, you can also use a condition_variable and signal the playing thread with notify_one or notify_all, and in the loop check wait_until with a duration of 0. If it's not cv_status::timeout then you should stop playing.
Don't make stop pause and wait for state to settle down. That would work here, but is a bad habit to get into for later.
As noted in the comment, it is undefined behavior to write to a variable while simultaneously reading from it. atomic<bool> solves this, but wouldn't fix your race on its own, it just makes the reads and writes well defined.
I modified your program a bit and it works now. Let's discuss problems first:
Problem 1: using plain bool variable in 2 threads
Here both threads update the variable and it might lead to a race condition, because it is highly dependent which thread comes first and even end up in undefined behaviour. Undefined behaviour especially might occur when write from one thread is interrupted by another. Here Snps brought up links to the following SO answers:
When do I really need to use atomic<bool> instead of bool?
trap representation
In addition I was searching if write can be interrupted for bool on x86 platforms and came across this answer:
Can a bool read/write operation be not atomic on x86?
Problem 2: Caching as compiler optimization
Another problem is that variables are allowed to be cached. It means that the «playing thread» might cache the value of keepRunning and thus never terminate or terminate after considerable amount of time. In previous C++ version (98, 2003) a volatile modifier was the only construct to mark variables to prevent/avoid caching optimization and in this case force the compiler to always read the variable from its actual memory location. Thus given the «playing thread» enters the while loop keepRunning might be cached and never read or with considerable delays no matter when stop() modifies it.
After C++ 11 atomic template and atomic_bool specialization were introduced to make such variables as non-cachable and being read/set in an uninterruptible manner, thus adressing Problems 1 & 2.
Side note: volatile and caching explained by Andrei Alexandrescu in the Dr. Dobbs article which addresses exactly this situation:
Caching variables in registers is a very valuable optimization that applies most of the time, so it would be a pity to waste it. C and C++ give you the chance to explicitly disable such caching. If you use the volatile modifier on a variable, the compiler won't cache that variable in registers — each access will hit the actual memory location of that variable.
Problem 3: stop was called before _play() function was even started
The problem here is that in multi-threaded OSs scheduler grants some time slice for a thread to run. If the thread can progress and this time slice is not over thread continues to run. In «main thread» all play() calls were executed even before the «play threads» started to run. Thus the object destruction took place before _play() function started running. And there you set the variable keepRunning to true.
How I fixed this problem
We need to ensure that play() returns when the _play() function started running. A condition_variable is of help here. play() blocks so long until _play() notifies it that it has started the execution.
Here is the code:
#include <iostream>
#include <thread>
#include <atomic>
using namespace std;
class AudioPlayer
{
atomic_bool keepRunning;
thread thread_play;
std::mutex mutex;
std::condition_variable play_started;
public:
AudioPlayer()
: keepRunning{false}
{}
~AudioPlayer(){ stop(); }
void play()
{
stop();
std::unique_lock<std::mutex> lock(mutex);
thread_play = thread(&AudioPlayer::_play, this);
play_started.wait(lock);
}
void stop()
{
keepRunning = false;
cout << "stop called" << endl;
if (thread_play.joinable()) thread_play.join();
}
void _play()
{
cout << "Playing: started\n";
keepRunning = true; // B: this causes problem
play_started.notify_one();
while(keepRunning)
{
this_thread::sleep_for(chrono::milliseconds(100));
}
cout << "Playing: stopped\n";
}
};
int main()
{
AudioPlayer ap;
ap.play();
ap.play();
ap.play();
return 0;
}
Your solution A is actually almost correct. It's still undefined behavior to have one thread read from non-atomic variable that another is writing to. So keepRunning must be made an atomic<bool>. Once you do that and in conjunction with your fix from A, your code will be fine. That is because stop now has a correct post condition that no thread will be active (in particular no _play call) after it exits.
Note that no mutex is necessary. However, play and stop are not themselves thread safe. As long as the client of AudioPlayer is not using the same instance of AudioPlayer in multiple threads though that shouldn't matter.
I have a program which spawns multiple threads, each of which executes a long-running task. The main thread then waits for all worker threads to join, collects results, and exits.
If an error occurs in one of the workers, I want the remaining workers to stop gracefully, so that the main thread can exit shortly afterwards.
My question is how best to do this, when the implementation of the long-running task is provided by a library whose code I cannot modify.
Here is a simple sketch of the system, with no error handling:
void threadFunc()
{
// Do long-running stuff
}
void mainFunc()
{
std::vector<std::thread> threads;
for (int i = 0; i < 3; ++i) {
threads.push_back(std::thread(&threadFunc));
}
for (auto &t : threads) {
t.join();
}
}
If the long-running function executes a loop and I have access to the code, then
execution can be aborted simply by checking a shared "keep on running" flag at the top of each iteration.
std::mutex mutex;
bool error;
void threadFunc()
{
try {
for (...) {
{
std::unique_lock<std::mutex> lock(mutex);
if (error) {
break;
}
}
}
} catch (std::exception &) {
std::unique_lock<std::mutex> lock(mutex);
error = true;
}
}
Now consider the case when the long-running operation is provided by a library:
std::mutex mutex;
bool error;
class Task
{
public:
// Blocks until completion, error, or stop() is called
void run();
void stop();
};
void threadFunc(Task &task)
{
try {
task.run();
} catch (std::exception &) {
std::unique_lock<std::mutex> lock(mutex);
error = true;
}
}
In this case, the main thread has to handle the error, and call stop() on
the still-running tasks. As such, it cannot simply wait for each worker to
join() as in the original implementation.
The approach I have used so far is to share the following structure between
the main thread and each worker:
struct SharedData
{
std::mutex mutex;
std::condition_variable condVar;
bool error;
int running;
}
When a worker completes successfully, it decrements the running count. If
an exception is caught, the worker sets the error flag. In both cases, it
then calls condVar.notify_one().
The main thread then waits on the condition variable, waking up if either
error is set or running reaches zero. On waking up, the main thread
calls stop() on all tasks if error has been set.
This approach works, but I feel there should be a cleaner solution using some
of the higher-level primitives in the standard concurrency library. Can
anyone suggest an improved implementation?
Here is the complete code for my current solution:
// main.cpp
#include <chrono>
#include <mutex>
#include <thread>
#include <vector>
#include "utils.h"
// Class which encapsulates long-running task, and provides a mechanism for aborting it
class Task
{
public:
Task(int tidx, bool fail)
: tidx(tidx)
, fail(fail)
, m_run(true)
{
}
void run()
{
static const int NUM_ITERATIONS = 10;
for (int iter = 0; iter < NUM_ITERATIONS; ++iter) {
{
std::unique_lock<std::mutex> lock(m_mutex);
if (!m_run) {
out() << "thread " << tidx << " aborting";
break;
}
}
out() << "thread " << tidx << " iter " << iter;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
if (fail) {
throw std::exception();
}
}
}
void stop()
{
std::unique_lock<std::mutex> lock(m_mutex);
m_run = false;
}
const int tidx;
const bool fail;
private:
std::mutex m_mutex;
bool m_run;
};
// Data shared between all threads
struct SharedData
{
std::mutex mutex;
std::condition_variable condVar;
bool error;
int running;
SharedData(int count)
: error(false)
, running(count)
{
}
};
void threadFunc(Task &task, SharedData &shared)
{
try {
out() << "thread " << task.tidx << " starting";
task.run(); // Blocks until task completes or is aborted by main thread
out() << "thread " << task.tidx << " ended";
} catch (std::exception &) {
out() << "thread " << task.tidx << " failed";
std::unique_lock<std::mutex> lock(shared.mutex);
shared.error = true;
}
{
std::unique_lock<std::mutex> lock(shared.mutex);
--shared.running;
}
shared.condVar.notify_one();
}
int main(int argc, char **argv)
{
static const int NUM_THREADS = 3;
std::vector<std::unique_ptr<Task>> tasks(NUM_THREADS);
std::vector<std::thread> threads(NUM_THREADS);
SharedData shared(NUM_THREADS);
for (int tidx = 0; tidx < NUM_THREADS; ++tidx) {
const bool fail = (tidx == 1);
tasks[tidx] = std::make_unique<Task>(tidx, fail);
threads[tidx] = std::thread(&threadFunc, std::ref(*tasks[tidx]), std::ref(shared));
}
{
std::unique_lock<std::mutex> lock(shared.mutex);
// Wake up when either all tasks have completed, or any one has failed
shared.condVar.wait(lock, [&shared](){
return shared.error || !shared.running;
});
if (shared.error) {
out() << "error occurred - terminating remaining tasks";
for (auto &t : tasks) {
t->stop();
}
}
}
for (int tidx = 0; tidx < NUM_THREADS; ++tidx) {
out() << "waiting for thread " << tidx << " to join";
threads[tidx].join();
out() << "thread " << tidx << " joined";
}
out() << "program complete";
return 0;
}
Some utility functions are defined here:
// utils.h
#include <iostream>
#include <mutex>
#include <thread>
#ifndef UTILS_H
#define UTILS_H
#if __cplusplus <= 201103L
// Backport std::make_unique from C++14
#include <memory>
namespace std {
template<typename T, typename ...Args>
std::unique_ptr<T> make_unique(
Args&& ...args)
{
return std::unique_ptr<T>(new T(std::forward<Args>(args)...));
}
} // namespace std
#endif // __cplusplus <= 201103L
// Thread-safe wrapper around std::cout
class ThreadSafeStdOut
{
public:
ThreadSafeStdOut()
: m_lock(m_mutex)
{
}
~ThreadSafeStdOut()
{
std::cout << std::endl;
}
template <typename T>
ThreadSafeStdOut &operator<<(const T &obj)
{
std::cout << obj;
return *this;
}
private:
static std::mutex m_mutex;
std::unique_lock<std::mutex> m_lock;
};
std::mutex ThreadSafeStdOut::m_mutex;
// Convenience function for performing thread-safe output
ThreadSafeStdOut out()
{
return ThreadSafeStdOut();
}
#endif // UTILS_H
I've been thinking about your situation for sometime and this maybe of some help to you. You could probably try doing a couple of different methods to achieve you goal. There are 2-3 options that maybe of use or a combination of all three. I will at minimum show the first option for I'm still learning and trying to master the concepts of Template Specializations as well as using Lambdas.
Using a Manager Class
Using Template Specialization Encapsulation
Using Lambdas.
Pseudo code of a Manager Class would look something like this:
class ThreadManager {
private:
std::unique_ptr<MainThread> mainThread_;
std::list<std::shared_ptr<WorkerThread> lWorkers_; // List to hold finished workers
std::queue<std::shared_ptr<WorkerThread> qWorkers_; // Queue to hold inactive and waiting threads.
std::map<unsigned, std::shared_ptr<WorkerThread> mThreadIds_; // Map to associate a WorkerThread with an ID value.
std::map<unsigned, bool> mFinishedThreads_; // A map to keep track of finished and unfinished threads.
bool threadError_; // Not needed if using exception handling
public:
explicit ThreadManager( const MainThread& main_thread );
void shutdownThread( const unsigned& threadId );
void shutdownAllThreads();
void addWorker( const WorkerThread& worker_thread );
bool isThreadDone( const unsigned& threadId );
void spawnMainThread() const; // Method to start main thread's work.
void spawnWorkerThread( unsigned threadId, bool& error );
bool getThreadError( unsigned& threadID ); // Returns True If Thread Encountered An Error and passes the ID of that thread,
};
Only for demonstration purposes did I use bool value to determine if a thread failed for simplicity of the structure, and of course this can be substituted to your like if you prefer to use exceptions or invalid unsigned values, etc.
Now to use a class of this sort would be something like this: Also note that a class of this type would be considered better if it was a Singleton type object since you wouldn't want more than 1 ManagerClass since you are working with shared pointers.
SomeClass::SomeClass( ... ) {
// This class could contain a private static smart pointer of this Manager Class
// Initialize the smart pointer giving it new memory for the Manager Class and by passing it a pointer of the Main Thread object
threadManager_ = new ThreadManager( main_thread ); // Wouldn't actually use raw pointers here unless if you had a need to, but just shown for simplicity
}
SomeClass::addThreads( ... ) {
for ( unsigned u = 1, u <= threadCount; u++ ) {
threadManager_->addWorker( some_worker_thread );
}
}
SomeClass::someFunctionThatSpawnsThreads( ... ) {
threadManager_->spawnMainThread();
bool error = false;
for ( unsigned u = 1; u <= threadCount; u++ ) {
threadManager_->spawnWorkerThread( u, error );
if ( error ) { // This Thread Failed To Start, Shutdown All Threads
threadManager->shutdownAllThreads();
}
}
// If all threads spawn successfully we can do a while loop here to listen if one fails.
unsigned threadId;
while ( threadManager_->getThreadError( threadId ) ) {
// If the function passed to this while loop returns true and we end up here, it will pass the id value of the failed thread.
// We can now go through a for loop and stop all active threads.
for ( unsigned u = threadID + 1; u <= threadCount; u++ ) {
threadManager_->shutdownThread( u );
}
// We have successfully shutdown all threads
break;
}
}
I like the design of manager class since I have used them in other projects, and they come in handy quite often especially when working with a code base that contains many and multiple resources such as a working Game Engine that has many assets such as Sprites, Textures, Audio Files, Maps, Game Items etc. Using a Manager Class helps to keep track and maintain all of the assets. This same concept can be applied to "Managing" Active, Inactive, Waiting Threads, and knows how to intuitively handle and shutdown all threads properly. I would recommend using an ExceptionHandler if your code base and libraries support exceptions as well as thread safe exception handling instead of passing and using bools for errors. Also having a Logger class is good to where it can write to a log file and or a console window to give an explicit message of what function the exception was thrown in and what caused the exception where a log message might look like this:
Exception Thrown: someFunctionNamedThis in ThisFile on Line# (x)
threadID 021342 failed to execute.
This way you can look at the log file and find out very quickly what thread is causing the exception, instead of using passed around bool variables.
The implementation of the long-running task is provided by a library whose code I cannot modify.
That means you have no way to synchronize the job done by working threads
If an error occurs in one of the workers,
Let's suppose that you can really detect worker errors; some of then can be easily detected if reported by the used library others cannot i.e.
the library code loops.
the library code prematurely exit with an uncaught exception.
I want the remaining workers to stop **gracefully**
That's just not possible
The best you can do is writing a thread manager checking on worker thread status and if an error condition is detected it just (ungracefully) "kills" all the worker threads and exits.
You should also consider detecting a looped working thread (by timeout) and offer to the user the option to kill or continue waiting for the process to finish.
Your problem is that the long running function is not your code, and you say you cannot modify it. Consequently you cannot make it pay any attention whatsoever to any kind of external synchronisation primitive (condition variables, semaphores, mutexes, pipes, etc), unless the library developer has done that for you.
Therefore your only option is to do something that wrestles control away from any code no matter what it's doing. This is what signals do. For that, you're going to have to use pthread_kill(), or whatever the equivalent is these days.
The pattern would be that
The thread that detects an error needs to communicate that error back to the main thread in some manner.
The main thread then needs to call pthread_kill() for all the other remaining threads. Don't be confused by the name - pthread_kill() is simply a way of delivering an arbitrary signal to a thread. Note that signals like STOP, CONTINUE and TERMINATE are process-wide even if raised with pthread_kill(), not thread specific so don't use those.
In each of those threads you'll need a signal handler. On delivery of the signal to a thread the execution path in that thread will jump to the handler no matter what the long running function was doing.
You are now back in (limited) control, and can (probably, well, maybe) do some limited cleanup and terminate the thread.
In the meantime the main thread will have been calling pthread_join() on all the threads it's signaled, and those will now return.
My thoughts:
This is a really ugly way of doing it (and signals / pthreads are notoriously difficult to get right and I'm no expert), but I don't really see what other choice you have.
It'll be a long way from looking 'graceful' in source code, though the end user experience will be OK.
You will be aborting execution part way through running that library function, so if there's any clean up it would normally do (e.g. freeing up memory it has allocated) that won't get done and you'll have a memory leak. Running under something like valgrind is a way of working out if this is happening.
The only way of getting the library function to clean up (if it needs it) will be for your signal handler to return control to the function and letting it run to completion, just what you don't want to do.
And of course, this won't work on Windows (no pthreads, at least none worth speaking of, though there may be an equivalent mechanism).
Really the best way is going to be to re-implement (if at all possible) that library function.