I'm developing an SDK as a dynamic library (DLL/so). The user can set a lot of parameters prior to run the computation. But I would like to offer a way to dynamically change the parameters, which should stop the current computation and relaunch it with the new parameters. So a general usage should looks like:
Client Caller thread -----> Call my SDK -----> Computation code
^
|
|
Client UI Thread ----> Request cancelation ------------
I have a lot of questions about the mechanics and I'm wondering what are the good practices to do so.
1) How to handle the interrupt ?
Should I run my computation in an async thread, and just drop the results of that thread?
Should I use a std::atomic<bool> that the computation thread checks sometime to return to start point?
2) If using the second options, what is the best way to return to the launch point?
Is it okay to use C++ exceptions that case? (NOTE: I already use exception for really rare case in the computation code.)
Should it have Error code handling all along with early checks to avoid computing?
Could longjmp or something similar that can be used?
For proper object cleanup and resource reclamation, you need to either throw an exception or just stop the calculation and let the calculation functions return normally.
You should not abort the thread or use longjmp, as they will not destroy the objects your calculation has created, leading to leaks of memory and whatever other resources (like file handles) you may be using.
Using a std::atomic (that is easily accessible from everywhere) that your calculations poll periodically is one way to achieve this. You'll need to check this regularly, so the check will need to be in or near any loops you have. Short, quick loops don't need to check while they are looping, but there should be some sort of check at least several times a second. Once you detect the cancellation request, you can either throw your exception, or return from the current function (so the parent function would also need to check for cancellation).
One downside to all that is if you miss a check in a loop someplace, your cancellation may not happen right away.
Related
Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.
Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.
Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!
I have a problem with long running boost::regex_match(...) invocation in a threaded process environment. But it could be another lib (API call) having the same problem.
Is there a generic way to set up a watchdog for such?
For non-threaded process alarm() can be used to detect timeout.
However, signals don't play nicely with threads. I can avoid direct use of alarm() in the thread and delegate timer mgt. to a dedicated separate thread and let that one use pthread_kill(...) to address the correct threads (this is just an idea - i didn't yet verify that part).
However, also this only interrupts and detects the situation, but cannot gracefully stop boost::regex_match(...).
I played around with Throwing an exception from within a signal handler using sigsetjmp() and siglongjmp() for the thread using boost::regex_match(..).
But it causes memory leaks in boost::regex_match(...) becausesiglongjmp()` bypasses destructors.
How can i gracefully stop a 3rd party API call - presuming that it's implemented exception safe?
Or does it have to be supported by some "stoppable" feature actively implemented in the 3rd party API? (is there some for the boost library?)
Maybe some strange idea, but:
Code can be implemented to be "thread-safe" and/or "exception-safe".
Would it be an option to define "longjmp-safe"? This could be done by passing an additional token to a lib to let is associate all resource allocations to that token. After longjmp() the client SW could ask the API separately to release those resources.
simpler maybe would just be some central init()/release() or register()/unregister() API call, by which the API could clean-up itself.
In a case where you have to:
monitor exceeding execution time
stop execution of processing
you should simply think for tasks instead of threads.
Using threads is something which sounds like "state of the art" but in practice tasks are very often the better way of implementation. Especially for controlling memory leeks in "undefined" end of execution, confine unwanted memory excess and control stack overruns etc.
In the case you have mentioned I tend to implement that as tasks. IPC works well on all known platforms but is not portable. If portability is no problem, changing to a task based solution is not a big deal.
A hanging task can be killed by a os call and all locks, memory and other resources like ipc/shared memory/pipes etc. will be removed automatically. So this fits much better to your problem and it did not depend on your external and maybe unchangeable third party components.
I know you cannot kill a boost thread, but can you change it's task?
Currently I have an array of 8 threads. When a button is pressed, these threads are assigned a task. The task which they are assigned to do is completely independent of the main thread and the other threads. None of the the threads have to wait or anything like that, so an interruption point is never reach.
What I need is to is, at anytime, change the task that each thread is doing. Is this possible? I have tried looping through the array of threads and changing what each thread object points to to a new one, but of course that doesn't do anything to the old threads.
I know you can interrupt pThreads, but I cannot find a working link to download the library to check it out.
A thread is not some sort of magical object that can be made to do things. It is a separate path of execution through your code. Your code cannot be made to jump arbitrarily around its codebase unless you specifically program it to do so. And even then, it can only be done within the rules of C++ (ie: calling functions).
You cannot kill a boost::thread because killing a thread would utterly wreck some of the most fundamental assumptions a programmer makes. You now have to take into account the possibility that the next line doesn't execute for reasons that you can neither predict nor prevent.
This isn't like exception handling, where C++ specifically requires destructors to be called, and you have the ability to catch exceptions and do special cleanup. You're talking about executing one piece of code, then suddenly inserting a call to some random function in the middle of already compiled code. That's not going to work.
If you want to be able to change the "task" of a thread, then you need to build that thread with "tasks" in mind. It needs to check every so often that it hasn't been given a new task, and if it has, then it switches to doing that. You will have to define when this switching is done, and what state the world is in when switching happens.
Our (Windows native C++) app is composed of threaded objects and managers. It is pretty well written, with a design that sees Manager objects controlling the lifecycle of their minions. Various objects dispatch and receive events; some events come from Windows, some are home-grown.
In general, we have to be very aware of thread interoperability so we use hand-rolled synchronization techniques using Win32 critical sections, semaphores and the like. However, occasionally we suffer thread deadlock during shut-down due to things like event handler re-entrancy.
Now I wonder if there is a decent app shut-down strategy we could implement to make this easier to develop for - something like every object registering for a shutdown event from a central controller and changing its execution behaviour accordingly? Is this too naive or brittle?
I would prefer strategies that don't stipulate rewriting the entire app to use Microsoft's Parallel Patterns Library or similar. ;-)
Thanks.
EDIT:
I guess I am asking for an approach to controlling object life cycles in a complex app where many threads and events are firing all the time. Giovanni's suggestion is the obvious one (hand-roll our own), but I am convinced there must be various off-the-shelf strategies or frameworks, for cleanly shutting down active objects in the correct order. For example, if you want to base your C++ app on an IoC paradigm you might use PocoCapsule instead of trying to develop your own container. Is there something similar for controlling object lifecycles in an app?
This seems like a special case of the more general question, "how do I avoid deadlocks in my multithreaded application?"
And the answer to that is, as always: make sure that any time your threads have to acquire more than one lock at a time, that they all acquire the locks in the same order, and make sure all threads release their locks in a finite amount of time. This rule applies just as much at shutdown as at any other time. Nothing less is good enough; nothing more is necessary. (See here for a relevant discussion)
As for how to best do this... the best way (if possible) is to simplify your program as much as you can, and avoid holding more than one lock at a time if you can possibly help it.
If you absolutely must hold more than one lock at a time, you must verify your program to be sure that every thread that holds multiple locks locks them in the same order. Programs like helgrind or Intel thread checker can help with this, but it often comes down to simply eyeballing the code until you've proved to yourself that it satisfies this constraint. Also, if you are able to reproduce the deadlocks easily, you can examine (using a debugger) the stack trace of each deadlocked thread, which will show where the deadlocked threads are forever-blocked at, and with that information, you can that start to figure out where the lock-ordering inconsistencies are in your code. Yes, it's a major pain, but I don't think there is any good way around it (other than avoiding holding multiple locks at once). :(
One possible general strategy would be to send an "I am shutting down" event to every manager, which would cause the managers to do one of three things (depending on how long running your event-handlers are, and how much latency you want between the user initiating shutdown, and the app actually exiting).
1) Stop accepting new events, and run the handlers for all events received before the "I am shutting down" event. To avoid deadlocks you may need to accept events that are critical to the completion of other event handlers. These could be signaled by a flag in the event or the type of the event (for example). If you have such events then you should also consider restructuring your code so that those actions are not performed through event handlers (as dependent events would be prone to deadlocks in ordinary operation too.)
2) Stop accepting new events, and discard all events that were received after the event that the handler is currently running. Similar comments about dependent events apply in this case too.
3) Interrupt the currently running event (with a function similar to boost::thread::interrupt()), and run no further events. This requires your handler code to be exception safe (which it should already be, if you care about resource leaks), and to enter interruption points at fairly regular intervals, but it leads to the minimum latency.
Of course you could mix these three strategies together, depending on the particular latency and data corruption requirements of each of your managers.
As a general method, use an atomic boolean to indicate "i am shutting down", then every thread checks this boolean before acquiring each lock, handling each event etc. Can't give a more detailed answer unless you give us a more detailed question.
I have several thread pools and I want my application to handle a cancel operation.
To do this I implemented a shared operation controller object which I poll at various spots in each thread pool worker function that is called.
Is this a good model, or is there a better way to do it?
I just worry about having all of these operationController.checkState() littered throughout the code.
Yes it's a good approach. Herb Sutter has a nice article comparing it with the alternatives (which are worse).
With any kind of ansynchronous cancellation you're going to have to periodically poll some sort of flag. There's a fundamental issue of having to keep things in a consitant state. If you just kill a thread in the middle of whatever it's doing, bad things will happen sooner or later.
Depending on what you are actually doing, you may be able to just ignore the result of the operation instead of cancelling it. You let the operation continue on, but just don't wait for it to complete and never check the result.
If you actually need to stop the operation, then you're going to have to poll at appropriate points, and do whatever cleanup is necessary.
It's a good way to do it.
Another possible way to do it is, if there's some other subroutine[s] which the threads call regularly anyway, to check within that subroutine and throw an exception (to be caught at the top of the thread), assuming that "cancel" may be considered exceptional and assuming that the code being executed by the thread is exception-safe.
I wouldn't do it that way, checking a shared object.
I most likely will provide each thread object with a way to cancel the execution inside the own thread, be it an event, a threadsafe state variable or whatever.
The problem with the shared operation controller is that, from my point of view, the logic is reversed, Why are you calling it "controller" when it doesn't control anything?
For me, Operation Controller shall recive a cancelation order and then, in turn select the appropiate threads and signal them to stop. That would be a correct "chain of command" if you know what I mean. The way you do it you introduce an unnatural behaivour on the thread wich doesn't "obey" orders to stop, instead if checks each time if his "superior" has "written the order somewere". Somehow it just doesn't feel right.
In addition, what if you just one "some" of the threads to stop in the future? What if you want to include some advanced logic so that threads will only stop given a condition? Then you'll have to rewrite the code in each and every thread to handle that condition.
So I will provide a way, for each thread to be able to handle signals to them, for example by using a Command Pattern with a FIFO structure.
(By the way, I realize they're thread pool workers, not actual Thread Classes but still, I think each worker must be signaled to stop separately, not the other way around).
In similar situations I have used an event, non-auto-reset, all threads can look at that event. Quite similar to polling except that if your threads block at times, they can sleep for the "stop"-event as well. (Easier on Windows.)
/L