How to structure an object in C++ that does asynchronous background tasks

How to structure an object in C++ that does asynchronous background tasks - c++

I'd like to have an object in C++ that does an asynchronous task in the background, but can still be requested to be freed by code using it without breaking things.
Let's say I have an object like this:
class MyObj {
std::thread* asyncTask;
bool result;
public:
MyObj() {
asyncTask = new std::thread([this](){
result = doSomething();
});
};
bool isResult() {
return result;
};
}
How would you go about making sure that the object can still be freed without terminating the process(due to thread still joinable/running at time of destruction)? I've thought about something involving delaying the destructor with a thread running counter, but that doesn't seem like the right solution. Part of the complexity is that the thread needs to normally access elements of the class, so it can't just detach either.

The only way to do this in general is to create a new process to handle the task (expensive, lots of marshalling and busywork), or have the thread cooperate.
A thread that cooperates regularly checks if it should abort. When it detects it should abort, it does so. It has to do this even when it is blocking on some resource.
For simple tasks, this is simple. For general tasks, next to impossible.
C++ compilers basically assume the threads get to act single threaded unless you go and explicitly synchronize operations. This permits certain important optimizations. The cost is that the state of a C++ thread need not make any sense at any point; so killing it or suspending it externally cannot be made safe (without cooperation).
In short, write your doSomething with cooperation and abort in mind.

Related

Why do I need to explicitly detach a short term variable?

Let's say I have a small operation which I want to perform in a separate thread. I do not need to know when it completes, nor do I need to wait for its completion, but I do not want the operation blocking my current thread. When I write the following code, I will get a crash:
void myFunction() {
// do other stuff
std::thread([]()
{
// do thread stuff
});
}
This crash is solved by assigning the thread to a variable, and detaching it:
void myFunction() {
// do other stuff
std::thread t([]()
{
// do thread stuff
});
t.detach();
}
Why is this step necessary? Or is there a better way to create a small single-use thread?

Because the std::thread::~thread() specification says so:
A thread object does not have an associated thread (and is safe to destroy) after
it was default-constructed
it was moved from
join() has been called
detach() has been called
It looks like detach() is the only one of these that makes sense in your case, unless you want to return the thread object (by moving) to the caller.
Why is this step necessary?
Consider that the thread object represents a long-running "thread" of execution (a lightweight process or kernel schedulable entity or similar).
Allowing you to destroy the object while the thread is still executing, leaves you no way to subsequently join (and find the result of) that thread. This may be a logical error, but it can also make it hard even to correctly exit your program.
Or is there a better way to create a small single-use thread?
Not obviously, but it's frequently better to use a thread pool for running tasks in the background, instead of starting and stopping lots of short-lived threads.
You might be able to use std::async() instead, but the future it returns may block in the destructor in some circumstances, if you try to discard it.

See the documentation of the destructor of std:thread:
If *this has an associated thread (joinable() == true), std::terminate() is called.
You should explicitly say that you don't care what's going to happen with the thread, and that you're OK with loosing any control over it. And that is what detach is for.
In general, this looks like a design problem so crashing makes sense: it's hard to propose a general and not surprising rule about what should happen in such a case (e.g. your program might as well normally end its execution - what should happen with the thread?).

Basically, your use case requires a call to detach() because your use case is pretty weird, and not what C++ is trying to make easy.
While Java and .Net blithely let you toss away a Thread object whose associated thread is still running, in the C++ model the Thread is closer to being the thread, in the sense that the existence of the Thread object coincides with the lifetime, or at least joinability, of the execution it refers to. Note how it's not possible to create a Thread without starting it (except in the case of the default constructor, which is really just there in the service of move semantics), or to copy it or to make one from a thread id. C++ wants Thread to outlive the thread.
Maintaining that condition has various benefits. Final cleanup of a thread's control data doesn't have to be done automagically by the OS, because once a Thread goes away, nothing can ever try to join it. It's easier to ensure that variables with thread storage get destroyed in time, since the main thread is the last to exit (barring some move shenanigans). And a missing join -- which is an extremely common type of bug -- gets properly flagged at runtime.
Letting some thread wander off into the distance, in contrast, is allowed, but it's an unusual thing to do. Unless it's interacting with your other threads through sync objects, there's no way to ensure it's done whatever it was meant to do. A detached thread is on the level of reinterpret_cast: You're allowed to tell the compiler that you know something it doesn't, but that has to be explicit, not just the consequence of the function you didn't call.

Consider this: thread A creates thread B and thread A leaves its scope of execution. The handle for thread B is about to be lost. What should happen now? There are several possibilities, with most obvious as follows:
Thread B is detached and continues its execution indempedently
Thread A waits (joins) thread B before quiting its own scope
Now you can argue which is better: 1 or 2? How should we (the compiler) decide on which one of these is better?
So what the designers did was something different: crash terminate the code so that the developer picks one of these solutions explicitely. In order to avoid implicit (perhaps unwanted) behaviuor. It's a signal for you: "hey, pay attention now, this piece of code is important and I (the compiler) don't want to decide for you".

Is there a reliable way to force a thread to stop in C++? (especially detached ones)

I am recently working with threads in C++11. now I am thinking about how to force stop a thread. I couldn't find it on stackoverflow, and also tried these.
One variable each thread : not so reliable
return in the main thread : I have to force quit only one not all
and I have no more ideas. I have heard about WinAPI, but I want a portable solution. (that also means I wont use fork())
Can you please give me a solution of this? I really want to do it.

One of the biggest problems with force closing a thread in C++ is the RAII violation.
When a function (and subsequently, a thread), gracefully finishes, everything it held is gracefully cleaned up by the destructors of the objects the functions/threads created.
Memory gets freed,
OS resources (handles, file descriptors etc.) are closed and returned to the OS
Locks are getting unlocked so other threads can use the shared resources they protect.
other important tasks are preformed (such as updating counters, logging, etc.).
If you brutally kill a thread (aka by TerminateThread on Windows, for example), non of these actually happen, and the program is left in a very dangerous state.
A (not-so) common pattern that can be used is to register a "cancellation token" on which you can monitor and gracefully shut the thread if other thread asks so (a la TPL/PPL). something like
auto cancellationToken = std::make_shared<std::atomic_bool>();
cancellationToken->store(false);
class ThreadTerminator : public std::exception{/*...*/};
std::thread thread([cancellationToken]{
try{
//... do things
if (cancellationToken->load()){
//somone asked the thred to close
throw ThreadTerminator ();
}
//do other things...
if (cancellationToken->load()){
//somone asked the thred to close
throw ThreadTerminator ();
}
//...
}catch(ThreadTerminator){
return;
}
});
Usually, one doesn't even open a new thread for a small task, it's better to think of a multi threaded application as a collection of concurrent tasks and parallel algorithms. one opens a new thread for some long ongoing background task which is usually performed in some sort of a loop (such as, accepting incoming connections).
So, anyway, the cases for asking a small task to be cancelled are rare anyway.
tldr:
Is there a reliable way to force a thread to stop in C++?
No.

Here is my approach for most of my designs:
Think of 2 kinds of Threads:
1) primary - I call main.
2) subsequent - any thread launched by main or any subsequent thread
When I launch std::thread's in C++ (or posix threads in C++):
a) I provide all subsequent threads access to a boolean "done", initialized to false. This bool can be directly passed from main (or indirectly through other mechanisms).
b) All my threads have a regular 'heartbeat', typically with a posix semaphore or std::mutex, sometimes with just a timer, and sometimes simply during normal thread operation.
Note that a 'heartbeat' is not polling.
Also note that checking a boolean is really cheap.
Thus, whenever main wants to shut down, it merely sets done to true and 'join's with the subsequent threads.
On occasion main will also signal any semaphore (prior to join) that a subsequent thread might be waiting on.
And sometimes, a subsequent thread has to let its own subsequent thread know it is time to end.
Here is an example -
main launching a subsequent thread:
std::thread* thrd =
new std::thread(&MyClass_t::threadStart, this, id);
assert(nullptr != thrd);
Note that I pass the this pointer to this launch ... within this class instance is a boolean m_done.
Main Commanding shutdown:
In main thread, of course, all I do is
m_done = true;
In a subsequent thread (and in this design, all are using the same critical section):
void threadStart(uint id) {
std::cout << id << " " << std::flush; // thread announce
do {
doOnce(id); // the critical section is in this method
}while(!m_done); // exit when done
}
And finally, at an outer scope, main invokes the join.
Perhaps the take away is - when designing a threaded system, you should also design the system shut down, not just add it on.

C++0x thread interruption

According to the C++0x final draft, there's no way to request a thread to terminate. That said, if required we need to implement a do-it-yourself solution.
On the other hand boost::thread provides a mechanism to interrupt a thread in a safe manner.
In your opinion, what's the best solution? Designing your own cooperative 'interruption mechanism' or going native?

All the language specification says that the support isn't built into the language.
boost::thread::interrupt needs some support from the thread function, too:
When the interrupted thread next executes one of the specified interruption points (or if it is currently blocked whilst executing one)
i.e. when the thread function doesn't give the caller a chance to interrupt, you are still stuck.
I'm not sure what you mean with "going native" - there is no native support, unless you are spellbound to boost:threads.
Still, I'd use an explicit mechanism. You have to think about having enough interruption points anyway, why not make them explicit? The extra code is usually marginal in my experience, though you may need to change some waits from single-object to multiple-objects, which - depending on your library - may look uglier.
One could also pull the "don't use exceptions for control flow", but compared to messing around with threads, this is just a guideline.

Using native handle to cancel a thread is a bad option in C++ as you need to destroy all the stack allocated objects. This was the main reason they don't included a cancel operation.
Boost.Thread provides an interrupt mechanism, that needs to pool on any waiting primitive. As this can be expensive as a general mechanism, the standard has not included it.
You will need to implement it by yourself. See my answer here to a similar question on how to implement this by yourself. To complete the solution an interruption should be throw when interrupted is true and the thread should catch this interruption and finish.

Here is my humble implementation of a thread canceller (for C++0x).
I hope it will be useful.
// Class cancellation_point
#include <mutex>
#include <condition_variable>
struct cancelled_error {};
class cancellation_point
{
public:
cancellation_point(): stop_(false) {}
void cancel() {
std::unique_lock<std::mutex> lock(mutex_);
stop_ = true;
cond_.notify_all();
}
template <typename P>
void wait(const P& period) {
std::unique_lock<std::mutex> lock(mutex_);
if (stop_ || cond_.wait_for(lock, period) == std::cv_status::no_timeout) {
stop_ = false;
throw cancelled_error();
}
}
private:
bool stop_;
std::mutex mutex_;
std::condition_variable cond_;
};
// Usage example
#include <thread>
#include <iostream>
class ThreadExample
{
public:
void start() {
thread_ = std::unique_ptr<std::thread>(
new std::thread(std::bind(&ThreadExample::run, this)));
}
void stop() {
cpoint_.cancel();
thread_->join();
}
private:
void run() {
std::cout << "thread started\n";
try {
while (true) {
cpoint_.wait(std::chrono::seconds(1));
}
} catch (const cancelled_error&) {
std::cout << "thread cancelled\n";
}
}
std::unique_ptr<std::thread> thread_;
cancellation_point cpoint_;
};
int main() {
ThreadExample ex;
ex.start();
ex.stop();
return 0;
}

It is unsafe to terminate a thread preemptively because the state of the entire process becomes indeterminate after that point. The thread might have acquired a critical section prior to being terminated. That critical section will now never be released. The heap could become permanently locked, and so on.
The boost::thread::interrupt solution works by asking nicely. It will only interrupt a thread doing something thats interruptible, like waiting on a Boost.Thread condition variable, or if the thread does one of these things after interrupt is called. Even then, the thread isn't unceremoniously put through the meat grinder as, say, Win32's TerminateThread function does, it simply induces an exception, which, if you've been a well-behaved coder and used RAII everywhere, will clean up after itself and gracefully exit the thread.

Implementing a do-it-yourself solution makes the most sense, and it really should not be that hard to do. You will need a shared variable that you read/write synchronously, indicating whether the thread is being asked to terminate, and your thread periodically reads from this variable when it is in a state where it can safely be interrupted. When you want to interrupt a thread, you simply write synchronously to this variable, and then you join the thread. Assuming it cooperates appropriately, it should notice that that the variable has been written and shut down, resulting in the join function no longer blocking.
If you were to go native, you would not gain anything by it; you would simply throw out all the benefits of a standard and cross-platform OOP threading mechanism. In order for your code to be correct, the thread would need to shut down cooperatively, which implies the communication described above.

Its unsafe to terminate a thread, since you would have no control over the state of any data-structures is was working on at that moment.
If you want to interrupt a running thread, you have to implement your own mechanism. IMHO if you need that, your design is not prepared for multiple threads.
If you just want to wait for a thread to finish, use join() or a future.

My implementation of threads uses the pimpl idiom, and in the Impl class I have one version for each OS I support and also one that uses boost, so I can decide which one to use when building the project.
I decided to make two classes: one is Thread, which has only the basic, OS-provided, services; and the other is SafeThread, which inherits from Thread and has method for collaborative interruption.
Thread has a terminate() method that does an intrusive termination. It is a virtual method which is overloaded in SafeThread, where it signals an event object. There's a (static) yeld() method which the running thread should call from time to time; this methods checks if the event object is signaled and, if yes, throws an exception caught at the caller of the thread entry point, thereby terminating the thread. When it does so it signals a second event object so the caller of terminate() can know that the thread was safely stopped.
For cases in which there's a risk of deadlock, SafeThread::terminate() can accept a timeout parameter. If the timeout expires, it calls Thread::terminate(), thus killing intrusively the thread. This is a last-resource when you have something you can't control (like a third-party API) or in situations in which a deadlock does more damage than resource leaks and the like.
Hope this'll be useful for your decision and will give you a clear enough picture about my design choices. If not, I can post code fragments to clarify if you want.

I agree with this decision. For example, .NET allows to abort any worker thread, and I never use this feature and don't recommend to do this to any professional programmer. I want to decide myself, when a worker thread may be interrupted, and what is the way to do this. It is different for hardware, I/O, UI and other threads. If thread may be stopped at any place, this may cause undefined program behavior with resource management, transactions etc.

Is it okay to use "delete this;" on an object that inherits from a Thread class?

In general, if you have a class that inherits from a Thread class, and you want instances of that class to automatically deallocate after they are finished running, is it okay to delete this?
Specific Example:
In my application I have a Timer class with one static method called schedule. Users call it like so:
Timer::schedule((void*)obj, &callbackFunction, 15); // call callbackFunction(obj) in 15 seconds
The schedule method creates a Task object (which is similar in purpose to a Java TimerTask object). The Task class is private to the Timer class and inherits from the Thread class (which is implemented with pthreads). So the schedule method does this:
Task *task = new Task(obj, callback, seconds);
task->start(); // fork a thread, and call the task's run method
The Task constructor saves the arguments for use in the new thread. In the new thread, the task's run method is called, which looks like this:
void Timer::Task::run() {
Thread::sleep(this->seconds);
this->callback(this->obj);
delete this;
}
Note that I can't make the task object a stack allocated object because the new thread needs it. Also, I've made the Task class private to the Timer class to prevent others from using it.
I am particularly worried because deleting the Task object means deleting the underlying Thread object. The only state in the Thread object is a pthread_t variable. Is there any way this could come back to bite me? Keep in mind that I do not use the pthread_t variable after the run method finishes.
I could bypass calling delete this by introducing some sort of state (either through an argument to the Thread::start method or something in the Thread constructor) signifying that the method that is forked to should delete the object that it is calling the run method on. However, the code seems to work as is.
Any thoughts?

I think the 'delete this' is safe, as long as you don't do anything else afterwards in the run() method (because all of the Task's object's member variables, etc, will be freed memory at that point).
I do wonder about your design though... do you really want to be spawning a new thread every time someone schedules a timer callback? That seems rather inefficient to me. You might look into using a thread pool (or even just a single persistent timer thread, which is really just a thread pool of size one), at least as an optimization for later. (or better yet, implement the timer functionality without spawning extra threads at all... if you're using an event loop with a timeout feature (like select() or WaitForMultipleObjects()) it is possible to multiplex an arbitrary number of independent timer events inside a single thread's event loop)

There's nothing particularly horrible about delete this; as long as you assure that:the object is always dynamically allocated, andno member of the object is ever used after it's deleted.
The first of these is the difficult one. There are steps you can take (e.g. making the ctor private) that help, but nearly anything you do can be bypassed if somebody tries hard enough.
That said, you'd probably be better off with some sort of thread pool. It tends to be more efficient and scalable.
Edit: When I talked about being bypassed, I was thinking of code like this:
class HeapOnly {
private:
HeapOnly () {} // Private Constructor.
~HeapOnly () {} // A Private, non-virtual destructor.
public:
static HeapOnly * instance () { return new HeapOnly(); }
void destroy () { delete this; } // Reclaim memory.
};
That's about as good of protection as we can provide, but getting around it is trivial:
int main() {
char buffer[sizeof(HeapOnly)];
HeapOnly *h = reinterpret_cast<HeapOnly *>(buffer);
h->destroy(); // undefined behavior...
return 0;
}
When it's direct like this, this situation's pretty obvious. When it's spread out over a larger system, with (for example) an object factory actually producing the objects, and code somewhere else entirely allocating the memory, etc., it can become much more difficult to track down.
I originally said "there's nothing particularly horrible about delete this;", and I stand by that -- I'm not going back on that and saying it shouldn't be used. I am trying to warn about the kind of problem that can arise with it if other code "Doesn't play well with others."

delete this frees the memory you have explicitly allocated for the thread to use, but what about the resources allocated by the OS or pthreads library, such as the thread's call stack and kernel thread/process structure (if applicable)? If you never call pthread_join() or pthread_detach() and you never set the detachstate, I think you still have a memory leak.
It also depends on how your Thread class is designed to be used. If it calls pthread_join() in its destructor, that's a problem.
If you use pthread_detach() (which your Thread object might already be doing), and you're careful not to dereference this after deleting this, I think this approach should be workable, but others' suggestions to use a longer-lived thread (or thread pool) are well worth considering.

If all you ever do with a Task object is new it, start it, and then delete it, why would you need an object for it anyway? Why not simply implement a function which does what start does (minus object creation and deletion)?

C++ Thread question - setting a value to indicate the thread has finished

Is the following safe?
I am new to threading and I want to delegate a time consuming process to a separate thread in my C++ program.
Using the boost libraries I have written code something like this:
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
Where finished_flag is a boolean member of my class. When the thread is finished it sets the value and the main loop of my program checks for a change in that value.
I assume that this is okay because I only ever start one thread, and that thread is the only thing that changes the value (except for when it is initialised before I start the thread)
So is this okay, or am I missing something, and need to use locks and mutexes, etc

You never mentioned the type of finished_flag...
If it's a straight bool, then it might work, but it's certainly bad practice, for several reasons. First, some compilers will cache the reads of the finished_flag variable, since the compiler doesn't always pick up the fact that it's being written to by another thread. You can get around this by declaring the bool volatile, but that's taking us in the wrong direction. Even if reads and writes are happening as you'd expect, there's nothing to stop the OS scheduler from interleaving the two threads half way through a read / write. That might not be such a problem here where you have one read and one write op in separate threads, but it's a good idea to start as you mean to carry on.
If, on the other hand it's a thread-safe type, like a CEvent in MFC (or equivilent in boost) then you should be fine. This is the best approach: use thread-safe synchronization objects for inter-thread communication, even for simple flags.

Instead of using a member variable to signal that the thread is done, why not use a condition? You are already are using the boost libraries, and condition is part of the thread library.
Check it out. It allows the worker thread to 'signal' that is has finished, and the main thread can check during execution if the condition has been signaled and then do whatever it needs to do with the completed work. There are examples in the link.
As a general case I would neve make the assumption that a resource will only be modified by the thread. You might know what it is for, however someone else might not - causing no ends of grief as the main thread thinks that the work is done and tries to access data that is not correct! It might even delete it while the worker thread is still using it, and causing the app to crash. Using a condition will help this.
Looking at the thread documentation, you could also call thread.timed_join in the main thread. timed_join will wait for a specified amount for the thread to 'join' (join means that the thread has finsihed)

I don't mean to be presumptive, but it seems like the purpose of your finished_flag variable is to pause the main thread (at some point) until the thread thrd has completed.
The easiest way to do this is to use boost::thread::join
// launch the thread...
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
// ... do other things maybe ...
// wait for the thread to complete
thrd.join();

If you really want to get into the details of communication between threads via shared memory, even declaring a variable volatile won't be enough, even if the compiler does use appropriate access semantics to ensure that it won't get a stale version of data after checking the flag. The CPU can issue reads and writes out of order as long (x86 usually doesn't, but PPC definitely does) and there is nothing in C++9x that allows the compiler to generate code to order memory accesses appropriately.
Herb Sutter's Effective Concurrency series has an extremely in depth look at how the C++ world intersects the multicore/multiprocessor world.

Having the thread set a flag (or signal an event) before it exits is a race condition. The thread has not necessarily returned to the OS yet, and may still be executing.
For example, consider a program that loads a dynamic library (pseudocode):
lib = loadLibrary("someLibrary");
fun = getFunction("someFunction");
fun();
unloadLibrary(lib);
And let's suppose that this library uses your thread:
void someFunction() {
volatile bool finished_flag = false;
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
while(!finished_flag) { // ignore the polling loop, it's besides the point
sleep();
}
delete thrd;
}
void myclass::mymethod() {
// do stuff
finished_flag = true;
}
When myclass::mymethod() sets finished_flag to true, myclass::mymethod() hasn't returned yet. At the very least, it still has to execute a "return" instruction of some sort (if not much more: destructors, exception handler management, etc.). If the thread executing myclass::mymethod() gets pre-empted before that point, someFunction() will return to the calling program, and the calling program will unload the library. When the thread executing myclass::mymethod() gets scheduled to run again, the address containing the "return" instruction is no longer valid, and the program crashes.
The solution would be for someFunction() to call thrd->join() before returning. This would ensure that the thread has returned to the OS and is no longer executing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to structure an object in C++ that does asynchronous background tasks - c++

Related

Why do I need to explicitly detach a short term variable?

Is there a reliable way to force a thread to stop in C++? (especially detached ones)

C++0x thread interruption

Is it okay to use "delete this;" on an object that inherits from a Thread class?

C++ Thread question - setting a value to indicate the thread has finished

Categories

Resources