I find neither boost nor tbb library's condition variable has the interface of working with reader-writer lock (ie. shared mutex in boost). condition_variable::wait() only accepts mutex lock. But I think it's quite reasonable to have it work with reader-writer lock. Can anyone tell me the reason why they don't support that, or why people don't do that?
Thanks,
Cui
The underlying platform's native threading API might not be able to support it easily. For example, on a POSIX platform where a condition variable is implemented in terms of pthread_cond_t it can only be used with pthread_mutex_t. In order to get maximum performance the basic condition variable type is a lightweight wrapper over the native types, with no additional overhead.
If you want to use other types of mutex you should use std::condition_variable_any or boost::condition_variable_any, which work with any type of mutex. This has a small additional overhead due to using an internal mutex of the native plaform's type in addition to the user-supplied mutex. (I don't know if TBB offers an equivalent type.)
It's a design trade-off that allows either performance or flexibility. If you want maximum performance you get it with condition_variable but can only use simple mutexes. If you want more flexibility you can get that with condition_variable_any but you must sacrifice a little performance.
Related
I have 2 pthread threads where one is writing a bool value and another is reading it.
I dont care for portability. Its x86 architecture. The only which concerns me is writing thread sets bool to true and starts doing its own work (which happens once a day at midnight) closing a file. And the other thread had read the bool as false and proceeds with its work (writing to a file) at the same time. Its very difficult to reproduce this scenario so I better get best possible theoretical solution.
Can I use std::atomic in case of pthreads?
Can I use std::atomic in case of pthreads?
Yes, that's what std::atomic is for.
It works with std::thread, POSIX threads, and any other kind of threads. Behind the scenes it uses "magical" compiler annotations to prevent certain thread-incompatible optimizations, and processor-specific locking instructions to guarantee that thread-safe code is generated1.
It makes (almost) no sense to use std::atomic without threads (you could use std::atomic instead of volatile for signal handlers, but there is no advantage in doing so).
The only which concerns me ...
The rest of your question makes no sense to me.
1 When used correctly, which is often non-trivial thing to do, and which is why you generally should try not to use std::atomic unless you are an expert.
C++20 is introducing https://en.cppreference.com/w/cpp/atomic/atomic/wait and https://en.cppreference.com/w/cpp/atomic/atomic/notify_one, which introduces atomic waiting and notification functionality.
I'd like to use this but my compiler doesn't support it yet.
Are there any reliable library implementations of this functionality?
I'm envisioning using this functionality in lock-free producer-consumer situation where the consumer goes to sleep and is woken up when there's something to consume. There are no mutexes in my current implementation. Am I correct that such wait/notify functionality would allow me to do what I want to do (i.e., let my consumer thread go to sleep and let the producer efficiently wake it up without acquiring a mutex (something that would be necessary, I believe, with a condition variable).
Actually, is there an even better way to do this that I'm not currently thinking of?
Are there any reliable library implementations of this functionality?
You can use Boost.Atomic if your compiler's standard library implementation does not support it yet.
Am I correct that such wait/notify functionality would allow me to do what I want to do (i.e., let my consumer thread go to sleep and let the producer efficiently wake it up without acquiring a mutex (something that would be necessary, I believe, with a condition variable).
Correct.
Actually, is there an even better way to do this that I'm not currently thinking of?
Atomic wait is a good way. You may also want to use it the best way:
Avoid notifying when you know that the other party does not wait. Some implementations would check it for you, some wouldn't.
Use the same atomic size as underlying OS primitive uses. This means using 32-bit queue counters on Linux, where the native primitive is futex, and it takes 32-bit integer. On Windows, WaitOnAddress can take any size of 8, 16, 32, or 64. On other systems consider their implementation of atomic wait.
As other good way you can consider C++20 semaphores.
I know Boost has support for mutexes and lock_guard, which can be used to implement critical sections.
But Windows has a special API for critical sections (see EnterCriticalSection and LeaveCriticalSection) which is a LOT faster than a mutex (for rarely contended, short sections of code).
Hence my question - it is possible in Boost to take advantage of this API, and fallback to spinlock/mutex/futex-based implementation on other platforms?
The simple answer is no.
Here's some relevant background from an old mailing list thread:
BTW. I am agree that mutex is more universal solution from a
performance point of view. But to be fair - CS are faster in simple
design. I believe that possibility to support them should be at
least
taken in account.
This was the article that someone pointed me to. The conclusion was
that CS are only faster if:
There are less than 8 threads total in the process.
You weren't running in the background.
You weren't on an dual processor machine.
To me this means that simple testing yields good CS performance
results, but any real world program is better off with a full blown
mutex.
I'm not adverse to supporting a CS implementation. However, I
originally chose not to for the following reasons:
You get either construction and destruction hits from using a PIMPL
idiom or you must include Windows.h in the Boost.Threads headers,
which I simply don't want to do. (This can be worked around by
emulating a CS ala OPTEX from the MSDN.)
According to this research paper most programs won't benefit from
a CS design.
It's trivial to code a (non-portable) critical_section class that
follows the Mutex model if you truly can make use of this.
For now I think I've made the right choice, though down the road we
may change the implementation to use a critical section or OPTEX.
Bill Kempf
Speaking as someone who helps out maintaining Boost.Thread, and as someone who failed to get an event object into Boost.Thread, I don't think critical sections have ever been added nor would be added to Boost for these reasons:
A Win32 critical section is trivially easy to build using a boost::atomic and a boost::condition_variable, so much so it isn't really worth having an official one. Here is probably the most complex one you could imagine, but extremely configurable including being constexpr ready (don't ask!): https://github.com/ned14/boost.outcome/blob/master/include/boost/outcome/v1/spinlock.hpp#L331
You can build your own simply by matching (Basic)Lockable concept and using atomic compare_exchange (non-x86/x64) or atomic exchange (x86/x64) and then grab it using a lock_guard around the critical section.
Some may object that a win32 critical section is not this. I am afraid it is: it simply spins on an atomic for a spin count, and then lazily tries to allocate a win32 event object which it then waits upon. Nothing special.
As much as you might think critical sections (really user mode mutexes) are better/faster/whatever, they probably are not as great as you might think. boost::mutex is a big vast heavyweight thing on Windows internally using a win32 semaphore as the kernel wait object because of the need to emulate thread cancellation and to behave well in a general purpose use context. It's easy to write a concurrency structure which is faster than another for some single use case, but it is very very hard to write a concurrency structure which is all of:
Faster than a standard implementation in the uncontended case.
Faster than a standard implementation in the lightly contended case.
Faster than a standard implementation in the heavily contended case.
Even if you manage all three of the above, that still isn't enough: you also need some guarantees on worst case progression ordering, so whether certain patterns of locks, waits and unlocks produce predictable outcomes. This is why threading facilities can appear to look slow in narrow use case scenarios, so Boost.Thread much as the STL can appear to be much slower than hand rolled locking code in say an uncontended use case.
Boost.Thread already does substantial work in user mode to avoid going to kernel sleep on Windows. On POSIX any of the major pthreads implementations also does substantial work to avoid kernel sleeps and hence Boost.Thread doesn't replicate that work. In other words, critical sections don't gain you anything in terms of scaling to load behaviours, though inevitably Boost.Thread v4 especially on Windows does a ton load of work a naive implementation does not (the planned rewrite of Boost.Thread is vastly more efficient on Windows as it can assume Windows Vista or above).
So, it looks like the default Boost mutex doesn't support it, but asio::detail::mutex does.
So I ended up using that:
#include <boost/asio/detail/mutex.hpp>
#include <boost/thread.hpp>
using boost::asio::detail::mutex;
using boost::lock_guard;
int myFunc()
{
static mutex mtx;
lock_guard<mutex> lock(mtx);
. . .
}
I have a hash table data structure that I wish to make thread safe by use of a reader/writer lock (my read:write ratio is likely somewhere in the region of 100:1).
I have been looking around for how to implement this lock using C++11 (such as the method here), but it has come to my attention that it should be possible to use C++14's shared_lock to accomplish the same thing. However, after looking on cppreference I have found both std::shared_lock and std::unique_lock but I don't understand how to use them together (compared to the Boost way which has simple method calls for locking both uniquely and in shared mode).
How can I recreate this relatively simple reader/writer lock interface in C++14 using only the standard library?
C++14 has the read/writer lock implementation std::shared_timed_mutex.
Side-note: C++17 added the simpler class std::shared_mutex, which you can use if you don't need the extra timing functions (like shared_timed_mutex::try_lock_for and shared_timed_mutex::try_lock_until).
However, before using a read/writer lock, be aware of the potentially harmful performance implications. Depending on the situation, a simple std::mutex might be faster.
Note that I can conduct the research inside the boost source code, and may do this to answer my own curiosity if there isn't anyone out there with an answer.
I do ask however because maybe someone has already done this comparison and can answer authoritatively?
It would seem that creating a shared memory mapped file between processes, and through construction with InterlockedIncrement() one could create a largely usermode mutex akin to a CRITICAL_SECTION, which would be considerably more performant than the Win32 Mutex for interprocess synchronisation.
So my expectation is that it may be probably for the implementation on Win32 of boost::interprocess_mutex to have been implemented in this manner, and for it to be substantially quicker than the native API offering.
I only however have a supposition, I don't know through field testing what the performance of the boost::interprocess_mutex is for interprocess synchronisation, or deeply investigated its implementation.
Does anyone have experience in using it or profiling its relative performance, or can they comment on using the safety of using InterlockedIncrement() across processes using shared memory?
In boost 1.39.0, there is only specific support for pthreads. On all other platforms, it becomes a busy-loop with a yield call in the middle (essentially the same system that you describe). See boost/interprocess/sync/emulation/interprocess_mutex.hpp. For example, here's the implementation of lock():
inline void interprocess_mutex::lock(void)
{
do{
boost::uint32_t prev_s = detail::atomic_cas32(const_cast<boost::uint32_t*>(&m_s), 1, 0);
if (m_s == 1 && prev_s == 0){
break;
}
// relinquish current timeslice
detail::thread_yield();
}while (true);
}
What this means is that a contended boost::interprocess::mutex on windows is VERY expensive - although the uncontended case is almost free. This could potentially be improved by adding an event object or similar to sleep on, but this would not fit well with boost::interprocess's API, as there would be nowhere to put the per-process HANDLE needed to access the mutex.
It would seem that creating a shared memory mapped file between processes, and through construction with InterlockedIncrement() one could create a largely usermode mutex akin to a CRITICAL_SECTION, which would be considerably more performant than the Win32 Mutex for interprocess synchronisation.
CRITICAL_SECTION internally can use a synchronization primitive when there's contention. I forget if it's an event, semaphore, or mutex.
You can "safely" use Interlocked functions on memory, so there's no reason why you couldn't use it for cross-process synchronization, other than that would be really crazy and you should probably either use threads or a real synchronization primitive.
But officially, you can.