I have 2 versions of a function which are available in a C++ library which do the same task. One is a synchronous function, and another is of asynchronous type which allows a callback function to be registered.
Which of the below strategies is preferable for giving a better memory and performance optimization?
Call the synchronous function in a worker thread, and use mutex synchronization to wait until I get the result
Do not create a thread, but call the asynchronous version and get the result in callback
I am aware that worker thread creation in option 1 will cause more overhead. I am wanting to know issues related to overhead caused by thread synchronization objects, and how it compares to overhead caused by asynchronous call. Does the asynchronous version of a function internally spin off a thread and use synchronization object, or does it uses some other technique like directly talk to the kernel?
"Profile, don't speculate." (DJB)
The answer to this question depends on too many things, and there is no general answer. The role of the developer is to be able to make these decisions. If you don't know, try the options and measure. In many cases, the difference won't matter and non-performance concerns will dominate.
"Premature optimisation is the root of all evil, say 97% of the time" (DEK)
Update in response to the question edit:
C++ libraries, in general, don't get to use magic to avoid synchronisation primitives. The asynchronous vs. synchronous interfaces are likely to be wrappers around things you would do anyway. Processing must happen in a context, and if completion is to be signalled to another context, a synchronisation primitive will be necessary to do that.
Of course, there might be other considerations. If your C++ library is talking to some piece of hardware that can do processing, things might be different. But you haven't told us about anything like that.
The answer to this question depends on context you haven't given us, including information about the library interface and the structure of your code.
Use asynchronous function because will probably do what you want to do manually with synchronous one but less error prone.
Asynchronous: Will create a thread, do work, when done -> call callback
Synchronous: Create a event to wait for, Create a thread for work, Wait for event, On thread call sync version , transfer result, signal event.
You might consider that threads each have their own environment so they use more memory than a non threaded solution when all other things are equal.
Depending on your threading library there can also be significant overhead to starting and stopping threads.
If you need interprocess synchronization there can also be a lot of pain debugging threaded code.
If you're comfortable writing non threaded code (i.e. you won't burn a lot of time writing and debugging it) then that might be the best choice.
Related
What would be a smart way to implement something like the following?
// Plain C function for example purposes.
void sleep_async(delay_t delay, void (* callback)(void *), void * data);
That is, a means of asynchronously executing a callback after a delay. POSIX, for example, has a few functions that do something like this, but they are mostly for asynchronous I/O (see this for what I mean). What interests me about those functions how they are executed "as if" on a new thread, according to that manual page, where an implementation may choose to spawn "a single thread...to receive all notifications". I am aware that some may nonetheless choose to spawn a whole thread for each of them, and that stuff like this may require support from the OS itself, so this is just an example.
I already have a couple of ways I could implement this (e.g. priority queue of events sorted by wake time on a timer loop, with no need to start a thread at all), but I am wondering whether there already exists smart[er] or [more] complete implementations of what I want to accomplish. For example, maybe implementations of Task.Delay() from C♯ (and coroutines like it in other language environments) do something smart in minimizing the amount of thread spawning for getting asynchronous delays.
Why am I looking for something like this? As implied by the title, I'm looking for something asynchronous. The above signature is just a simple C example to illustrate roughly what POSIX does. I am implementing some C++20 coroutines for use with co_await and friends, with thread pools and whatnot. Scheduling anything that would end up synchronously waiting on something is probably a bad idea, as it would prevent otherwise free threads from doing any work. Spawning [and potentially immediately detaching] a new thread just to add in an asynchronous delay doesn't seem like a very smart idea, either. My timer loop idea could be okay, but that implies needing a predefined timer granularity, and overhead from the priority queue.
Edit
I neglected to mention any real set of target platforms, as a commenter mentioned. I don't expect to target anything outside the "usual" desktop platforms, so the quirks of embedded development are ignored. The way I plan to use asynchronous delays themselves this way does not necessarily require threading support (everything could just be on a timer loop), but threading will nonetheless be required and used in accord (namely thread pools on which coroutines would be scheduled).
The simple but inefficient way would be to spawn a thread, have it sleep for delay, and then call the callback. This can be done in just a few lines using std::async():
auto delayed_call = std::async(std::launch::async, [&]{
std::this_thread::sleep_for(delay);
callback(data);
});
As mentioned by Thomas Matthews, this requires support for threads. While it's fine for a one-off call, it's not efficient if you have many such delayed calls. Having a priority queue and an event loop or a dedicated thread to handle events in this queue, as you already mentioned, is probably the most efficient way to do it. If you are looking for a library that implements this, then have a look at boost::asio.
As for using C++20 coroutines, I do not think that this will make something like your sleep_async() any easier. However, an event loop could be implemented on top of it.
A smart way? You mean really, really smart? That would be my own implementation, of course. You know about POSIX timers, you probably know about linux timers and the various hacks involving std::thread. But, more seriously, what you require sounds mostly to the tune of something like libeio, or libuv - both of these provide callbacks. It depends on what you can afford in binary size and whether you like the particular abstractions a library offers. The 2 libraries seem to be evolved versions of libevent and libev, libevent being the progenitor of them all.
Creating a std::thread instance involves allocating a stack frame, at the very least, which is by no means cheap.
Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.
Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.
Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!
I have a problem with long running boost::regex_match(...) invocation in a threaded process environment. But it could be another lib (API call) having the same problem.
Is there a generic way to set up a watchdog for such?
For non-threaded process alarm() can be used to detect timeout.
However, signals don't play nicely with threads. I can avoid direct use of alarm() in the thread and delegate timer mgt. to a dedicated separate thread and let that one use pthread_kill(...) to address the correct threads (this is just an idea - i didn't yet verify that part).
However, also this only interrupts and detects the situation, but cannot gracefully stop boost::regex_match(...).
I played around with Throwing an exception from within a signal handler using sigsetjmp() and siglongjmp() for the thread using boost::regex_match(..).
But it causes memory leaks in boost::regex_match(...) becausesiglongjmp()` bypasses destructors.
How can i gracefully stop a 3rd party API call - presuming that it's implemented exception safe?
Or does it have to be supported by some "stoppable" feature actively implemented in the 3rd party API? (is there some for the boost library?)
Maybe some strange idea, but:
Code can be implemented to be "thread-safe" and/or "exception-safe".
Would it be an option to define "longjmp-safe"? This could be done by passing an additional token to a lib to let is associate all resource allocations to that token. After longjmp() the client SW could ask the API separately to release those resources.
simpler maybe would just be some central init()/release() or register()/unregister() API call, by which the API could clean-up itself.
In a case where you have to:
monitor exceeding execution time
stop execution of processing
you should simply think for tasks instead of threads.
Using threads is something which sounds like "state of the art" but in practice tasks are very often the better way of implementation. Especially for controlling memory leeks in "undefined" end of execution, confine unwanted memory excess and control stack overruns etc.
In the case you have mentioned I tend to implement that as tasks. IPC works well on all known platforms but is not portable. If portability is no problem, changing to a task based solution is not a big deal.
A hanging task can be killed by a os call and all locks, memory and other resources like ipc/shared memory/pipes etc. will be removed automatically. So this fits much better to your problem and it did not depend on your external and maybe unchangeable third party components.
Our (Windows native C++) app is composed of threaded objects and managers. It is pretty well written, with a design that sees Manager objects controlling the lifecycle of their minions. Various objects dispatch and receive events; some events come from Windows, some are home-grown.
In general, we have to be very aware of thread interoperability so we use hand-rolled synchronization techniques using Win32 critical sections, semaphores and the like. However, occasionally we suffer thread deadlock during shut-down due to things like event handler re-entrancy.
Now I wonder if there is a decent app shut-down strategy we could implement to make this easier to develop for - something like every object registering for a shutdown event from a central controller and changing its execution behaviour accordingly? Is this too naive or brittle?
I would prefer strategies that don't stipulate rewriting the entire app to use Microsoft's Parallel Patterns Library or similar. ;-)
Thanks.
EDIT:
I guess I am asking for an approach to controlling object life cycles in a complex app where many threads and events are firing all the time. Giovanni's suggestion is the obvious one (hand-roll our own), but I am convinced there must be various off-the-shelf strategies or frameworks, for cleanly shutting down active objects in the correct order. For example, if you want to base your C++ app on an IoC paradigm you might use PocoCapsule instead of trying to develop your own container. Is there something similar for controlling object lifecycles in an app?
This seems like a special case of the more general question, "how do I avoid deadlocks in my multithreaded application?"
And the answer to that is, as always: make sure that any time your threads have to acquire more than one lock at a time, that they all acquire the locks in the same order, and make sure all threads release their locks in a finite amount of time. This rule applies just as much at shutdown as at any other time. Nothing less is good enough; nothing more is necessary. (See here for a relevant discussion)
As for how to best do this... the best way (if possible) is to simplify your program as much as you can, and avoid holding more than one lock at a time if you can possibly help it.
If you absolutely must hold more than one lock at a time, you must verify your program to be sure that every thread that holds multiple locks locks them in the same order. Programs like helgrind or Intel thread checker can help with this, but it often comes down to simply eyeballing the code until you've proved to yourself that it satisfies this constraint. Also, if you are able to reproduce the deadlocks easily, you can examine (using a debugger) the stack trace of each deadlocked thread, which will show where the deadlocked threads are forever-blocked at, and with that information, you can that start to figure out where the lock-ordering inconsistencies are in your code. Yes, it's a major pain, but I don't think there is any good way around it (other than avoiding holding multiple locks at once). :(
One possible general strategy would be to send an "I am shutting down" event to every manager, which would cause the managers to do one of three things (depending on how long running your event-handlers are, and how much latency you want between the user initiating shutdown, and the app actually exiting).
1) Stop accepting new events, and run the handlers for all events received before the "I am shutting down" event. To avoid deadlocks you may need to accept events that are critical to the completion of other event handlers. These could be signaled by a flag in the event or the type of the event (for example). If you have such events then you should also consider restructuring your code so that those actions are not performed through event handlers (as dependent events would be prone to deadlocks in ordinary operation too.)
2) Stop accepting new events, and discard all events that were received after the event that the handler is currently running. Similar comments about dependent events apply in this case too.
3) Interrupt the currently running event (with a function similar to boost::thread::interrupt()), and run no further events. This requires your handler code to be exception safe (which it should already be, if you care about resource leaks), and to enter interruption points at fairly regular intervals, but it leads to the minimum latency.
Of course you could mix these three strategies together, depending on the particular latency and data corruption requirements of each of your managers.
As a general method, use an atomic boolean to indicate "i am shutting down", then every thread checks this boolean before acquiring each lock, handling each event etc. Can't give a more detailed answer unless you give us a more detailed question.
I'm using SQLite3 in a Windows application. I have the source code (so-called SQLite amalgamation).
Sometimes I have to execute heavy queries. That is, I call sqlite3_step on a prepared statement, and it takes a lot of time to complete (due to the heavy I/O load).
I wonder if there's a possibility to abort such a call. I would also be glad if there was an ability to do some background processing in the middle of the call within the same thread (since most of the time is spent in waiting for the I/O to complete).
I thought about modifying the SQLite code myself. In the simplest scenario I could check some condition (like an abort event handle for instance) before every invocation of either ReadFile/WriteFile, and return an error code appropriately. And in order to allow the background processing the file should be opened in the overlapped mode (this enables asynchronous ReadFile/WriteFile).
Is there a chance that interruption of WriteFile may in some circumstances leave the database in the inconsistent state, even with the journal enabled? I guess not, since the whole idea of the journal file is to be prepared for any error of any kind. But I'd like to hear more opinions about this.
Also, did someone tried something similar?
Thanks in advance.
EDIT:
Thanks to ereOn. I wasn't aware of the existence of sqlite3_interrupt. This probably answers my question.
Now, for all of you who wonders how (and why) one expects to do some background processing during the I/O within the same thread.
Unfortunately not many people are familiar with so-called "Overlapped I/O".
http://en.wikipedia.org/wiki/Overlapped_I/O
Using it one issues an I/O operation asynchronously, and the calling thread is not blocked. Then one receives the I/O completion status using one of the completion mechanisms: waitable event, new routine queued into the APC, or the completion port.
Using this technique one doesn't have to create extra threads. Actually the only real legitimation for creating threads is when your bottleneck is the computation time (i.e. CPU load), and the machine has several CPUs (or cores).
And creating a thread just to let it be blocked by the OS most of the time - this doesn't make sense. This leads to the unjustified waste of the OS resources, complicates the program (need for synchronization and etc.).
Unfortunately not all the libraries/APIs allow asynchronous mode of operation, thus making creating extra threads the necessarily evil.
EDIT2:
I've already found the solution, thansk to ereOn.
For all those who nevertheless insist that it's not worth doing things "in background" while "waiting" for the I/O to complete using overlapped I/O. I disagree, and I think there's no point to argue about this. At least this is not related to the subject.
I'm a Windows programmer (as you may noticed), and I have a very extensive experience in all kinds of multitasking. Plus I'm also a driver writer, so that I also know how things work "behind the scenes".
I know that it's a "common practice" to create several threads to do several things "in parallel". But this doesn't mean that this is a good practice. Please allow me not to follow the "common practice".
I don't understand why you want the interruption to come from the same thread and I even don't understand how that would be possible: if the current thread is blocked, waiting for some IO, you can't execute any other code. (Yeah, that's what "blocked" means)
Perhaps if you give us more hints about why you want this, we might help further.
Usually, I use sqlite3_interrupt() to cancel calls. But this, obviously, involves that the call is made from another thread.
By default, SQLite is threadsafe. It sounds to me like the easiest thing to do would be to start the Sqlite command on a background thread, and let SQLite to the necessary locking to have that work.
From your perspective then, the sqlite call looks like an asynchronous bit of I/O, and you can continue normal processing on this thread, such as e.g. using a loop including interruptible sleep and a bit of occasional background processing (e.g. to update a liveness indicator). When the SQLite statement completes, the background thread should set a state variable to indicate this, wake the main thread (if necessary), and terminate.