I am working on a project that includes event handling. I have access to a g++-9, C++17 (also possible C++20) environment.
I require the behavior of a semaphore. My event handler, pushes the event in a queue, to be processed by another thread (event processor). The event handler needs to be extremely lightweight to not miss fast occurring events. So I plan to just enqueue and increment the semaphore in the event handler, then do the load work in the event processor, decrementing the semaphore. (Avoiding busy waiting in the event processor, which will always be running.)
This is very easy using POSIX semaphore, however I also read that semaphores are implementable using condition_variables, counters, unique_locks, mutex' in C++. I wonder is it worth the trouble just to write C++ style to achieve simple POSIX semaphore behavior. More importantly, which one is faster ? Which is the better option for me ?
Thanks in advance.
Just grab an off-the-shelf C++ "thread-safe queue" object and let it do the dirty work for you. Correct implementations of this sort of thing have already been done (to death ...).
Read discussions like this: https://juanchopanzacpp.wordpress.com/2013/02/26/concurrent-queue-c11/.
Related
What would be a smart way to implement something like the following?
// Plain C function for example purposes.
void sleep_async(delay_t delay, void (* callback)(void *), void * data);
That is, a means of asynchronously executing a callback after a delay. POSIX, for example, has a few functions that do something like this, but they are mostly for asynchronous I/O (see this for what I mean). What interests me about those functions how they are executed "as if" on a new thread, according to that manual page, where an implementation may choose to spawn "a single thread...to receive all notifications". I am aware that some may nonetheless choose to spawn a whole thread for each of them, and that stuff like this may require support from the OS itself, so this is just an example.
I already have a couple of ways I could implement this (e.g. priority queue of events sorted by wake time on a timer loop, with no need to start a thread at all), but I am wondering whether there already exists smart[er] or [more] complete implementations of what I want to accomplish. For example, maybe implementations of Task.Delay() from C♯ (and coroutines like it in other language environments) do something smart in minimizing the amount of thread spawning for getting asynchronous delays.
Why am I looking for something like this? As implied by the title, I'm looking for something asynchronous. The above signature is just a simple C example to illustrate roughly what POSIX does. I am implementing some C++20 coroutines for use with co_await and friends, with thread pools and whatnot. Scheduling anything that would end up synchronously waiting on something is probably a bad idea, as it would prevent otherwise free threads from doing any work. Spawning [and potentially immediately detaching] a new thread just to add in an asynchronous delay doesn't seem like a very smart idea, either. My timer loop idea could be okay, but that implies needing a predefined timer granularity, and overhead from the priority queue.
Edit
I neglected to mention any real set of target platforms, as a commenter mentioned. I don't expect to target anything outside the "usual" desktop platforms, so the quirks of embedded development are ignored. The way I plan to use asynchronous delays themselves this way does not necessarily require threading support (everything could just be on a timer loop), but threading will nonetheless be required and used in accord (namely thread pools on which coroutines would be scheduled).
The simple but inefficient way would be to spawn a thread, have it sleep for delay, and then call the callback. This can be done in just a few lines using std::async():
auto delayed_call = std::async(std::launch::async, [&]{
std::this_thread::sleep_for(delay);
callback(data);
});
As mentioned by Thomas Matthews, this requires support for threads. While it's fine for a one-off call, it's not efficient if you have many such delayed calls. Having a priority queue and an event loop or a dedicated thread to handle events in this queue, as you already mentioned, is probably the most efficient way to do it. If you are looking for a library that implements this, then have a look at boost::asio.
As for using C++20 coroutines, I do not think that this will make something like your sleep_async() any easier. However, an event loop could be implemented on top of it.
A smart way? You mean really, really smart? That would be my own implementation, of course. You know about POSIX timers, you probably know about linux timers and the various hacks involving std::thread. But, more seriously, what you require sounds mostly to the tune of something like libeio, or libuv - both of these provide callbacks. It depends on what you can afford in binary size and whether you like the particular abstractions a library offers. The 2 libraries seem to be evolved versions of libevent and libev, libevent being the progenitor of them all.
Creating a std::thread instance involves allocating a stack frame, at the very least, which is by no means cheap.
I'm working on a project, where a primary server thread needs to dispatch events to a series of worker threads. The work that goes on in the worker threads relies on polling (ie. epoll or kqueue depending on the UNIX system in question) with timeouts on these operations needing to be handles. This means, that a normal conditional variable or semaphore structure is not viable for this dispatch, as it would make one or the other block resulting in an unwanted latency between either handling the events coming from polling or the events originating from the server thread.
So, I'm wondering what the most optimal construct for dispatching such events between threads in a pollable fashion is? Essentially, all that needs to be delivered is a pollable "signal" that tells the worker thread, that it has more events to fetch. I've looked at using UNIX pipes (unnamed ones, as it's internal to the process) which seems like a decent solution given that a single byte can be written to the pipe and read back out when the queue is cleared -- but, I'm wondering if this is the best approach available? Or the fastest?
Alternatively, there is the possibility to use signalfd(2) on Linux, but as this is not available on BSD systems, I'd rather like to avoid this construct. I'm also wondering how great the overhead in using system signals actually is?
Jan Hudec's answer is correct, although I wouldn't recommend using signals for a few reasons:
Older versions of glibc emulated pselect and ppoll in a non-atomic fashion, making them basically worthless. Even when you used the mask correctly, signals could get "lost" between the pthread_sigprocmask and select calls, meaning they don't cause EINTR.
I'm not sure signalfd is any more efficient than the pipe. (Haven't tested it, but I don't have any particular reason to believe it is.)
signals are generally a pain to get right. I've spent a lot of effort on them (see my sigsafe library) and I'd recommend avoiding them if you can.
Since you're trying to have asynchronous handling portable to several systems, I'd recommend looking at libevent. It will abstract epoll or kqueue for you, and it will even wake up workers on your behalf when you add a new event. See event.c
2058 static inline int
2059 event_add_internal(struct event *ev, const struct timeval *tv,
2060 int tv_is_absolute)
2061 {
...
2189 /* if we are not in the right thread, we need to wake up the loop */
2190 if (res != -1 && notify && EVBASE_NEED_NOTIFY(base))
2191 evthread_notify_base(base);
...
2196 }
Also,
The worker thread deals with both socket I/O and asynchronous disk I/O, which means that it is optimally always waiting for the event queuing mechanism (epoll/kqueue).
You're likely to be disappointed here. These event queueing mechanisms don't really support asynchronous disk I/O. See this recent thread for more details.
As far as performance goes, the cost of system call is comparably huge to other operations, so it's the number of system calls that matters. There are two options:
Use the pipes as you wrote. If you have any useful payload for the message, you get one system call to send, one system call to wait and one system call to receive. Try to pass any relevant data down the pipe instead of reading them from a shared structure to avoid additional overhead from locking.
The select and poll have variants, that also waits for signals (pselect, ppoll). Linux epoll can do the same using signalfd, so it remains a question whether kqueue can wait for signals, which I don't know. If it can, than you could use them (you are using different mechanism on Linux and *BSD anyway). It would save you the syscall for reading if you don't have good use for the passed data.
I would expect passing the data over socket to be more efficient if it allows you do do away with any other locking.
The MSDN documentation for SHGetFileInfo says, quite rightly:
You should call this function from a background thread. Failure to do so could cause the UI to stop responding.
so I'm trying to figure out a good way to do this where I have a large list (80+) of them to do, and would like to parallelize the underlying I/O. I could use a thread-pool, but I'm not an expert Windows programmer, so I was wondering if there was a better technique for this.
Create a queue of file names to process (a linked list would suffice; consider STL's std::list). Create a lock for that queue (a critical section would do). Spawn a bunch of threads (2-4). Each thread would acquire the lock, get the head from the queue, release the lock and retrieve the icon - in the loop. If no more items in the queue, the thread quits. Something like this.
Making parallel I/O calls might not give you any speed up (likely only an overhead). You can experiment with GBL library and see if threading helps you. Write handler functions to respond to events. Then you can easily switch between AsyncHandler, which uses thread pool and Handler, which is synchronous to see if you can achieve any speedup.
In the concurrency runtime introduced in VS2010, there is a concurrent_queue class. It has a non blocking try_pop() function.
Similar in Intel Thread Building Blocks (TBB), the blocking pop() call was removed when going from version 2.1 to 2.2.
I wonder what the problem is with a blocking call. Why was it removed from TBB? And why is there no blocking concurrent_queue?
I'm in a situation where I need a blocking concurrent queue, and I don't want a busy wait.
Apart from writing a queue myself, is there another possibility in the concurrency runtime?
From a comment from Arch Robison, and it doesn't get much more "horse's mouth" than that (a):
PPL's concurrent_queue has no blocking pop, hence neither does tbb::strict_ppl::concurrent_queue. The blocking pop is available in tbb::concurrent_bounded_queue.
The design argument for omitting blocking pop is that in many cases, the synchronization for blocking is provided outside of the queue, in which case the implementation of blocking inside the queue becomes unnecessary overhead.
On the other hand, the blocking pop of the old tbb::concurrent_queue was popular among users who did not have outside synchronization.
So we split the functionality. Use cases that do not need blocking or boundedness can use the new tbb::concurrent_queue, and use cases that do need it can use tbb::concurrent_bounded_queue.
(a) Arch is the architect of Threading Building Blocks.
If you need a blocking pop without a busy wait, you need a method of signaling. This implies synchronization between pusher and poper and the queue is no longer without (expensive) synchronization primitives. You basically get a normal synchronized queue with a condition variable being used to notify poppers of pushes, which is not in the spirity of the concurrent_* collections.
The question was if there was another option in the Concurrency Runtime that provides blocking queue functionality because concurrent_queue does not and there is one in VS2010.
Arch's comment is of course completely correct, blocking queues and unblocking queues are separate use cases and this is why they are different in VS2010 and in TBB.
In VS2010 you can use the template class unbounded_buffer located in , the appropriate methods are called enqueue and dequeue.
-Rick
There is no situation, from the queue's standpoint, that it should need to block for an insert or remove. The fact that you may need to block and wait for an insert is immaterial.
You can achieve the functionality you desire by using a condition variable, or a counting semaphore, or something along those lines (whatever your specific API provides). Your trouble isn't with blocking/non-blocking; it sounds like a classic producer-consumer.
I have a main process that uses a single thread library and I can only the library functions from the main process. I have a thread spawned by the parent process that puts info it receives from the network into a queue.
I need to able to tell the main process that something is on the queue. Then it can access the queue and process the objects. The thread cannot process those objects because the library can only be called by one process.
I guess I need to use pipes and signals. I also read from various newsgroups that I need to use a 'self-trick' pipe.
How should this scenario be implemented?
A more specific case of the following post:
How can unix pipes be used between main process and thread?
Why not use a simple FIFO (named pipe)? The main process will automatically block until it can read something.
If it shouldn't block, it must be possible to poll instead, but maybe it will suck CPU. There probably exists an efficient library for this purpose.
I wouldn't recommend using signals because they are easy to get wrong. If you want to use them anyway, the easiest way I've found is:
Mask all signals in every thread,
A special thread handles signals with sigwait(). It may have to wake up another thread which will handle the signal, e.g. using condition variables.
The advantage is that you don't have to worry anymore about which function is safe to call from the handler.
The "optimal" solution depends quite a bit on your concrete setup. Do you have one process with a main thread and a child thread or do you have one parent process and a child process? Which OS and which thread library do you use?
The reason for the last question is that the current C++03 standard has no notion of a 'thread'. This means in particular that whatever solution your OS and your thread library offer are platform specific. The most portable solutions will only hide these specifics from you in their implementation.
In particular, C++ has no notion of threads in its memory model, nor does it have a notion of atomic operations, synchronization, ordered memory accesses, race conditions etc.
Chances are, however, that whatever library you are using already provides a solution for your problem on your platform.
I highly suggest you used a thread-safe queue such as this one (article and source code). I have personally used it and it's very simple to use. The API consist in simple methods such as push(), try_pop(), wait_and_pop() and empty().
Note that it is based on Boost.Thread.