pthread-win32 extension sem_post_multiple - c++

I am currently building a thin C++ wrapper around pthreads for in-house use. Windows as well as QNX are targeted and fortunately the pthreads-win32 ports seems to work very well, while QNX is conformant to POSIX for our practical purposes.
Now, while implementing semaphores, I hit the function
sem_post_multiple(sem_t*, int)
which is apparently only available on pthreads-win32 but is missing from QNX. As the name suggests the function is supposed to increment the semaphore by the count given as second argument. As far as I can tell the function is not part of neither POSIX 1b nor POSIX 1c.
Although there is currently no requirement for said function I am still wondering why pthreads-win32 provides the function and whether it might be of use. I could try to mimic it for QNX using similar to the following:
sem_post_multiple_qnx(sem_t* sem, int count)
{
for(;count > 0; --count)
{
sem_post(sem);
}
}
What I am asking for are suggestion/advice on how to proceed. If consensus suggests to do implement the function for QNX I would also appreciate comments on whether the suggested code snipped is a viable solution.
Thanks in advance.
PS: I deliberately left out my fancy C++ class for clarity. For all folks suggesting boost to the rescue: it is not an option in my current project due to management reasons.

In any case semaphores are an optional extension in POSIX. E.g OS X doesn't seem to implement them fully. So if you are concerned with portability, you'd have to provide wrappers of the functionalities you need, anyhow.
Your approach to emulate an atomic increment by iterated sem_post has certainly downsides.
It might be performing badly,
where usually sem_t are used in
performance critical contexts.
This operation would not be
atomic. So confusing things might
happen before you finish the loop.
I would stick to the just necessary, strictly POSIX conforming. Beware that sem_timedwait is yet another optional part of the semaphore option.

Your proposed implementation of sem_post_multiple doesn't play nicely with sem_getvalue, since sem_post_multiple is an atomic increase and therefore it's not possible for a "simultaneous" call to sem_getvalue to return any of the intermediate values.
Personally I'd want to leave them both out: trying to add fundamental synchronization operations to a system which lacks them is a mug's game, and your wrapper might soon cease to be "thin". So don't get into it unless you have code that uses sem_post_multiple, that you absolutely have to port.

sem_post_multiple() is a non-standard helper function introduced by the win32-pthreads maintainers. Your implementation isn't the same as theirs because the multiple decrements aren't atomic. Whether or not this is a problem depends on the intended use. (Personally, I wouldn't try to implement this function unless/until the need arises.)

This is an interesting question. +1.
I agree with the current prevailing consensus here that it is probably not a good idea to implement that function. While your proposed implementation would probably be work just fine in most situations, there are definitely conditions in which the results could be dramatically different due to the non-atomicity. The following is one (extremely) contrived situation:
Start thread A which calls sem_post_multiple( s, 10 )
Thread B waiting on s is released. Thread B kills thread A.
In the above unfriendly scenario, the atomic version would have incremented the semaphore by 10. With non-atomic version, it may only be incremented once. This example is certainly not likely in the real world. For example, the killing of a thread is almost always a bad idea not to mention the fact that it could leave the semaphore object in an invalid state. The Win32 implementation could leave a mutex lock on the semaphore - see this for why.

Related

Is C++ std::atomic compatible with pthreads?

I have 2 pthread threads where one is writing a bool value and another is reading it.
I dont care for portability. Its x86 architecture. The only which concerns me is writing thread sets bool to true and starts doing its own work (which happens once a day at midnight) closing a file. And the other thread had read the bool as false and proceeds with its work (writing to a file) at the same time. Its very difficult to reproduce this scenario so I better get best possible theoretical solution.
Can I use std::atomic in case of pthreads?
Can I use std::atomic in case of pthreads?
Yes, that's what std::atomic is for.
It works with std::thread, POSIX threads, and any other kind of threads. Behind the scenes it uses "magical" compiler annotations to prevent certain thread-incompatible optimizations, and processor-specific locking instructions to guarantee that thread-safe code is generated1.
It makes (almost) no sense to use std::atomic without threads (you could use std::atomic instead of volatile for signal handlers, but there is no advantage in doing so).
The only which concerns me ...
The rest of your question makes no sense to me.
1 When used correctly, which is often non-trivial thing to do, and which is why you generally should try not to use std::atomic unless you are an expert.

Deduce if a program is going to use threads

Thread-safe or thread-compatible code is good.
However there are cases in which one could implement things differently (more simply or more efficiently) if one knows that the program will not be using threads.
For example, I once heard that things like std::shared_ptr could use different implementations to optimize the non-threaded case (but I can't find a reference).
I think historically std::string in some implementation could use Copy-on-write in non-threaded code.
I am not in favor or against these techniques but I would like to know if that there is a way, (at least a nominal way) to determine at compile time if the code is being compiled with the intention of using threads.
The closest I could get is to realize that threaded code is usually (?) compiled with the -pthreads (not -lpthreads) compiler option.
(Not sure if it is a hard requirement or just recommended.)
In turn -pthreads defines some macros, like _REENTRANT or _THREAD_SAFE, at least in gcc and clang.
In some some answers in SO, I also read that they are obsolete.
Are these macros the right way to determine if the program is intended to be used with threads? (e.g. threads launched from that same program). Are there other mechanism to detect this at compile time? How confident would the detection method be?
EDIT: since the question can be applied to many contexts apparently, let me give a concrete case:
I am writing a header only library that uses another 3rd party library inside. I would like to know if I should initialize that library to be thread-safe (or at least give a certain level of thread support). If I assume the maximum level of thread support but the user of the library will not be using threads then there will be cost paid for nothing. Since the 3rd library is an implementation detail I though I could make a decision about the level of thread safety requested based on a guess.
EDIT2 (2021): By chance I found this historical (but influential) library Blitz++ which in the documentation says (emphasis mine)
8.1 Blitz++ and thread safety
To enable thread-safety in Blitz++, you need to do one of these
things:
Compile with gcc -pthread, or CC -mt under Solaris. (These options define_REENTRANT,which tells Blitz++ to generate thread-safe code).
Compile with -DBZ_THREADSAFE, or #define BZ_THREADSAFE before including any Blitz++ headers.
In threadsafe mode, Blitz++ array reference counts are safeguarded by
a mutex. By default, pthread mutexes are used. If you would prefer a
different mutex implementation, add the appropriate BZ_MUTEX macros to
<blitz/blitz.h> and send them toblitz-dev#oonumerics.org for
incorporation. Blitz++ does not do locking for every array element
access; this would result in terrible performance. It is the job of
the library user to ensure that appropriate synchronization is used.
So it seems that at some point _REENTRANT was used as a clue for the need of multi-threading code.
Maybe it is a very old reference to take seriously.
I support the other answer in that thread-safety decision ideally should not be done on whole program basis, rather they should be for specific areas.
Note that boost::shared_ptr has thread-unsafe version called boost::local_shared_ptr. boost::intrusive_ptr has safe and unsafe counter implementation.
Some libraries use "null mutex" pattern, that is a mutex, which does nothing on lock / unlock. See boost or Intel TBB null_mutex, or ATL CComFakeCriticalSection. This is specifically to substitute real mutex for threqad-safe code, and a fake one for thread-unsafe.
Even more, sometimes it may make sense to use the same objects in thread-safe and thread-unsafe way, depending on current phase of execution. There's also atomic_ref which serves the purpose of providing thread-safe access to underlying type, but still letting work with it in thread unsafe.
I know a good example of runtime switches between thread-safe and thread-unsafe. See HeapCreate with HEAP_NO_SERIALIZE, and HeapAlloc with HEAP_NO_SERIALIZE.
I know also a questionable example of the same. Delphi recommends calling its BeginThread wrapper instead of CreateThread API function. The wrapper sets a global variable telling that from now on Delphi Memory Manager should be thread-safe. Not sure if this behavior is still in place, but it was there for Delphi 7.
Fun fact: in Windows 10, there are virtually no single-threaded programs. Before the first statement in main is executed, static DLL dependencies are loaded. Current Windows version makes this DLL loading paralleled where possible by using thread pool. Once program is loaded, thread pool threads are waiting for other tasks that could be issued by using of Windows API calls or std::async. Sure if program by itself will not use threads and TLS, it will not notice, but technically it is multi-threaded from the OS perspective.
How confident would the detection method be?
Not really. Even if you can unambiguously detect if code is compiled to be used with multiple threads, not everything must be thread safe.
Making everything thread-safe by default, even though it is only ever used only by a single thread would defeat the purpose of your approach. You need more fine grainded control to turn on/off thread safety if you do not want to pay for what you do not use.
If you have class that has a thread-safe and a non-thread-safe version then you could use a template parameter
class <bool isThreadSafe> Foo;
and let the user decide on a case for case basis.

Does std::future spin-wait?

Before explaining the question in more detail, I'll note that the answer is obviously implementation dependent, so I'm primarily asking about libstdc++, but I would also be interested in hearing about libc++. The operating system is Linux.
Calling wait() or get() on a std::future blocks until the result is set by an asynchronous operation -- either an std::promise, std::packaged_task, or std::asyn function. The availability of the result is communicated through a shared state, which is basically an atomic variable: the future waits for the shared state to be marked as ready by the promise (or async task). This waiting and notification is implemented in libstdc++ via a futex system call. Granted that futexes are highly performant, in situations where a future expects to only wait for an extremely brief period (on the order of single microseconds), it would seem that a performance gain could be made by spinning on the shared state for a short time before proceeding to wait on the futex.
I have not found any evidence of such spinning in the current implementation, however, I did find a comment in atomic_futex.h at line 161 where I would expect to find such spinning:
// TODO Spin-wait first.
So my question is rather the following: Are there really plans to implement a spin-wait, and if so, how will the duration be decided? Additionally, is this the type of functionality that could eventually be specified through a policy to the future?
I will answer the question: Does std::future::get() perform a spin-wait?
The answer for all of C++ is: It is an implementation detail. A conforming standard library might spin or it might not (in the same vein, std::mutex::lock() is allowed to spin). Will there be a mechanism for specifying if and how to spin in the future? The places to look are in std::experimental::future (coming to a full version of the Standard Library soon), boost::future (proving ground for what might later go into the standard), and hpx::future (performance-focused library with advanced facilities for future management). None of these have mechanisms for explicit specification of spinning, nor has there been discussion in meeting minutes that I know of nor on the ISO CPP mailing list. It is safe to say that something like a get_with_spins function is not in the pipeline.
To answer for libstdc++ (and libc++): They also do not spin. Aside from the TODO which came from the original patch, it does not look like there is any plan to change this. I've searched the GCC mailing list for mentions of changing this behavior, but found none. Doing a pre-sleep spin can hurt in the general case (if none of the get()s have a value, you've wasted a lot of CPU cycles), so a change here could have negative impacts.
To summarize: Implementations do not appear to spin now and there appear to be no plans to change the behavior in the near future, but that can change at any moment.

Does Boost have support for Windows EnterCriticalSection API?

I know Boost has support for mutexes and lock_guard, which can be used to implement critical sections.
But Windows has a special API for critical sections (see EnterCriticalSection and LeaveCriticalSection) which is a LOT faster than a mutex (for rarely contended, short sections of code).
Hence my question - it is possible in Boost to take advantage of this API, and fallback to spinlock/mutex/futex-based implementation on other platforms?
The simple answer is no.
Here's some relevant background from an old mailing list thread:
BTW. I am agree that mutex is more universal solution from a
performance point of view. But to be fair - CS are faster in simple
design. I believe that possibility to support them should be at
least
taken in account.
This was the article that someone pointed me to. The conclusion was
that CS are only faster if:
There are less than 8 threads total in the process.
You weren't running in the background.
You weren't on an dual processor machine.
To me this means that simple testing yields good CS performance
results, but any real world program is better off with a full blown
mutex.
I'm not adverse to supporting a CS implementation. However, I
originally chose not to for the following reasons:
You get either construction and destruction hits from using a PIMPL
idiom or you must include Windows.h in the Boost.Threads headers,
which I simply don't want to do. (This can be worked around by
emulating a CS ala OPTEX from the MSDN.)
According to this research paper most programs won't benefit from
a CS design.
It's trivial to code a (non-portable) critical_section class that
follows the Mutex model if you truly can make use of this.
For now I think I've made the right choice, though down the road we
may change the implementation to use a critical section or OPTEX.
Bill Kempf
Speaking as someone who helps out maintaining Boost.Thread, and as someone who failed to get an event object into Boost.Thread, I don't think critical sections have ever been added nor would be added to Boost for these reasons:
A Win32 critical section is trivially easy to build using a boost::atomic and a boost::condition_variable, so much so it isn't really worth having an official one. Here is probably the most complex one you could imagine, but extremely configurable including being constexpr ready (don't ask!): https://github.com/ned14/boost.outcome/blob/master/include/boost/outcome/v1/spinlock.hpp#L331
You can build your own simply by matching (Basic)Lockable concept and using atomic compare_exchange (non-x86/x64) or atomic exchange (x86/x64) and then grab it using a lock_guard around the critical section.
Some may object that a win32 critical section is not this. I am afraid it is: it simply spins on an atomic for a spin count, and then lazily tries to allocate a win32 event object which it then waits upon. Nothing special.
As much as you might think critical sections (really user mode mutexes) are better/faster/whatever, they probably are not as great as you might think. boost::mutex is a big vast heavyweight thing on Windows internally using a win32 semaphore as the kernel wait object because of the need to emulate thread cancellation and to behave well in a general purpose use context. It's easy to write a concurrency structure which is faster than another for some single use case, but it is very very hard to write a concurrency structure which is all of:
Faster than a standard implementation in the uncontended case.
Faster than a standard implementation in the lightly contended case.
Faster than a standard implementation in the heavily contended case.
Even if you manage all three of the above, that still isn't enough: you also need some guarantees on worst case progression ordering, so whether certain patterns of locks, waits and unlocks produce predictable outcomes. This is why threading facilities can appear to look slow in narrow use case scenarios, so Boost.Thread much as the STL can appear to be much slower than hand rolled locking code in say an uncontended use case.
Boost.Thread already does substantial work in user mode to avoid going to kernel sleep on Windows. On POSIX any of the major pthreads implementations also does substantial work to avoid kernel sleeps and hence Boost.Thread doesn't replicate that work. In other words, critical sections don't gain you anything in terms of scaling to load behaviours, though inevitably Boost.Thread v4 especially on Windows does a ton load of work a naive implementation does not (the planned rewrite of Boost.Thread is vastly more efficient on Windows as it can assume Windows Vista or above).
So, it looks like the default Boost mutex doesn't support it, but asio::detail::mutex does.
So I ended up using that:
#include <boost/asio/detail/mutex.hpp>
#include <boost/thread.hpp>
using boost::asio::detail::mutex;
using boost::lock_guard;
int myFunc()
{
static mutex mtx;
lock_guard<mutex> lock(mtx);
. . .
}

Is it smart to replace boost::thread and boost::mutex with c++11 equivalents?

Motivation: reason why I'm considering it is that my genius project manager thinks that boost is another dependency and that it is horrible because "you depend on it"(I tried explaining the quality of boost, then gave up after some time :( ). Smaller reason why I would like to do it is that I would like to learn c++11 features, because people will start writing code in it.
So:
Is there a 1:1 mapping between #include<thread> #include<mutex>and
boost equivalents?
Would you consider a good idea to replace boost stuff with c++11
stuff. My usage is primitive, but are there examples when std doesnt
offer what boost does? Or (blasphemy) vice versa?
P.S.
I use GCC so headers are there.
There are several differences between Boost.Thread and the C++11 standard thread library:
Boost supports thread cancellation, C++11 threads do not
C++11 supports std::async, but Boost does not
Boost has a boost::shared_mutex for multiple-reader/single-writer locking. The analogous std::shared_timed_mutex is available only since C++14 (N3891), while std::shared_mutex is available only since C++17 (N4508).
C++11 timeouts are different to Boost timeouts (though this should soon change now Boost.Chrono has been accepted).
Some of the names are different (e.g. boost::unique_future vs std::future)
The argument-passing semantics of std::thread are different to boost::thread --- Boost uses boost::bind, which requires copyable arguments. std::thread allows move-only types such as std::unique_ptr to be passed as arguments. Due to the use of boost::bind, the semantics of placeholders such as _1 in nested bind expressions can be different too.
If you don't explicitly call join() or detach() then the boost::thread destructor and assignment operator will call detach() on the thread object being destroyed/assigned to. With a C++11 std::thread object, this will result in a call to std::terminate() and abort the application.
To clarify the point about move-only parameters, the following is valid C++11, and transfers the ownership of the int from the temporary std::unique_ptr to the parameter of f1 when the new thread is started. However, if you use boost::thread then it won't work, as it uses boost::bind internally, and std::unique_ptr cannot be copied. There is also a bug in the C++11 thread library provided with GCC that prevents this working, as it uses std::bind in the implementation there too.
void f1(std::unique_ptr<int>);
std::thread t1(f1,std::unique_ptr<int>(new int(42)));
If you are using Boost then you can probably switch to C++11 threads relatively painlessly if your compiler supports it (e.g. recent versions of GCC on linux have a mostly-complete implementation of the C++11 thread library available in -std=c++0x mode).
If your compiler doesn't support C++11 threads then you may be able to get a third-party implementation such as Just::Thread, but this is still a dependency.
std::thread is largely modelled after boost::thread, with a few differences:
boost's non-copyable, one-handle-maps-to-one-os-thread, semantics are retained. But this thread is movable to allow returning thread from factory functions and placing into containers.
This proposal adds cancellation to the boost::thread, which is a significant complication. This change has a large impact not only on thread but the rest of the C++ threading library as well. It is believed this large change is justifiable because of the benefit.
The thread destructor must now call cancel prior to detaching to avoid accidently leaking child threads when parent threads are canceled.
An explicit detach member is now required to enable detaching without canceling.
The concepts of thread handle and thread identity have been separated into two classes (they are the same class in boost::thread). This is to support easier manipulation and storage of thread identity.
The ability to create a thread id which is guaranteed to compare equal to no other joinable thread has been added (boost::thread does not have this). This is handy for code which wants to know if it is being executed by the same thread as a previous call (recursive mutexes are a concrete example).
There exists a "back door" to get the native thread handle so that clients can manipulate threads using the underlying OS if desired.
This is from 2007, so some points are no longer valid: boost::thread has a native_handle function now, and, as commenters point out, std::thread doesn't have cancellation anymore.
I could not find any significant differences between boost::mutex and std::mutex.
Enterprise Case
If you are writing software for the enterprise that needs to run on a moderate to large variety of operating systems and consequently build with a variety of compilers and compiler versions (especially relatively old ones) on those operating systems, my suggestion is to stay away from C++11 altogether for now. That means that you cannot use std::thread, and I would recommend using boost::thread.
Basic / Tech Startup Case
If you are writing for one or two operating systems, you know for sure that you will only ever need to build with a modern compiler that mostly supports C++11 (e.g. VS2015, GCC 5.3, Xcode 7), and you are not already dependent on the boost library, then std::thread could be a good option.
My Experience
I am personally partial to hardened, heavily used, highly compatible, highly consistent libraries such as boost versus a very modern alternative. This is especially true for complicated programming subjects such as threading. Also, I have long experienced great success with boost::thread (and boost in general) across a vast array of environments, compilers, threading models, etc. When its my choice, I choose boost.
There is one reason not to migrate to std::thread.
If you are using static linking, std::thread becomes unusable due to these gcc bugs/features:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52590
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57740
Namely, if you call std::thread::detach or std::thread::join it will lead to either exception or crash, while boost::thread works ok in these cases.
With Visual Studio 2013 the std::mutex seems to behave differently than the boost::mutex, which caused me some problems (see this question).
With regards to std::shared_mutex added in C++17
The other answers here provide a very good overview of the differences in general. However, there are several issues with std::shared_mutex that boost solves.
Upgradable mutices. These are absent from std::thread. They allow a reader to be upgraded to a writer without allowing any other writers to get in before you. These allow you to do things like pre-process a large computation (for example, reindexing a data structure) when in read mode, then upgrade to write to apply the reindex while only holding the write lock for a short time.
Fairness. If you have constant read activity with a std::shared_mutex, your writers will be softlocked indefinitely. This is because if another reader comes along, they will always be given priority. With boost:shared_mutex, all threads will eventually be given priority.(1) Neither readers nor writers will be starved.
The tl;dr of this is that if you have a very high-throughput system with no downtime and very high contention, std::shared_mutex will never work for you without manually building a priority system on top of it. boost::shared_mutex will work out of the box, although you might need to tinker with it in certain cases. I'd argue that std::shared_mutex's behavior is a latent bug waiting to happen in most code that uses it.
(1) The actual algorithm it uses is based on the OS thread scheduler. In my experience, when reads are saturated, there are longer pauses (when obtaining a write lock) on Windows than on OSX/Linux.
I tried to use shared_ptr from std instead of boost and I actually found a bug in gcc implementation of this class. My application was crashing because of destructor called twice (this class should be thread-safe and shouldn't generate such problems). After moving to boost::shared_ptr all problems disappeared. Current implementations of C++11 are still not mature.
Boost has also more features. For example header in std version doesn't provide serializer to a stream (i.e. cout << duration). Boost has many libraries that use its own , etc. equivalents, but do not cooperate with std versions.
To sum up - if you already have an application written using boost, it is safer to keep your code as it is instead of putting some effort in moving to C++11 standard.