Concurrecy: Mutual Exclusion only? - concurrency

is providing mutual exclusion (ie. via spinlock mechanism) enough to ensure effective implementation of concurrency? Or do we have to explicitly implement some synchronization method as well?
In sum:
Provision of concurrency = effective mutual exclusion implementation
OR
Provision of concurrency = effective mutual exclusion implementation + effective synchronization implementation
?
Thanks.

Concurrency includes the both concept of "mutual exclusion and sycronization".
Concurrency is the expression of a state.
Mutual exclusion is the expression of a state in Concurrecy.
Mutual exclusion is a technique to acquire sycronization in Concurrecy.

All you really need (though other things are often helpful for performance reasons) is mutual exclusion and some mechanism to ensure that operations can't 'move' across the mutual exclusion barriers.

Related

When and why is std::binary_semaphore more performant than std::condition_variable?

According to the C++20 semaphore docs, semaphores can be used in a similar manner to condition variables:
Semaphores are also often used for the semantics of
signalling/notifying rather than mutual exclusion, by initializing the
semaphore with ​0​ and thus blocking the receiver(s) that try to
acquire(), until the notifier "signals" by invoking release(n). In
this respect semaphores can be considered alternatives to
std::condition_variables, often with better performance.
Emphasis mine. I've used semaphores in this manner in Java and Swift in the past, but in C++, I've normally resorted to using std::condition_variable for this signal/notify pattern. With C++20, I now have access to std::binary_semaphore, and I'm wondering what the difference is.
When does a semaphore (in particular, std::binary_semaphore) have better performance than using std::condition_variable in the same manner, and why?

std::scoped_lock and mutex ordering

I'm trying to determine if std::scoped_lock tries to establish an ordering or mutex id to acquire locks in a prescribed order. Is not clear to me that it does it from a somewhat brief looking at a browsable implementation I found googling around.
In case it is not doing that, What would be the closest to standard implementation of acquiring an ordered set of locks?
Typically the cleanest way to avoid deadlocks is to always acquire a group of locks in the same order (and yes, sure, always release them all before trying to acquire new locks again, but perhaps 2PL is a little beyond the scope of what std::scoped_lock should aim to do)
The order for std::lock isn't defined until run-time, and it is not fixed. It is discovered experimentally by the algorithm for each individual call to std::lock. The second call to std::lock could lock the mutexes in a different order than the first, even though both calls might use the same list of mutexes in the same order at the call site.
Here is a detailed performance analysis of several possible implementations of std::lock: http://howardhinnant.github.io/dining_philosophers.html
Using a fixed ordering of the mutexes is one of the algorithms that is performance-compared in the above link. It is not the best performing algorithm for the experiments conducted.
The libstdc++ implementation the OP points to is a high quality implementation of what the analysis labels "Smart & Polite"
scoped_lock's constructor is stated to call std::lock on its mutexes, so its behavior is governed by this function. And std::lock is specifically defined to avoid deadlocks on what it locks. The order it locks the mutexes in is undefined, but it won't result in a deadlock.

Is shared_future<void> a legitimate replacement for a condition_variable?

Josuttis states ["Standard Library", 2nd ed, pg 1003]:
Futures allow you to block until data by another thread is provided or another thread is done. However, a future can pass data from one thread to another only once. In fact, a future's major purpose is to deal with return values or exceptions of threads.
On the other hand, a shared_future<void> can be used by multiple threads, to identify when another thread has done its job.
Also, in general, high-level concurrency features (such as futures) should be preferred to low-level ones (such as condition_variables).
Therefore, I'd like to ask: Is there any situation (requiring synchronization of multiple threads) in which a shared_future<void> won't suffice and a condition_variable is essential?
As already pointed out in the comments by #T.C. and #hlt, the use of futures/shared_futures is mostly limited in the sense that they can only be used once. So for every communication task you have to have a new future. The pros and cons are nicely explained by Scott Meyers in:
Item 39: Consider void futures for one-shot event
communication.
Scott Meyers: Effective Modern C++ (emphasis mine)
His conclusion is that using promise/future pairs dodges many of the problems with the use of condidition_variables, providing a nicer way of communicating one-shot events. The price to pay is that you are using dynamically allocated memory for the shared states and more importantly, that you have to have one promise/future pair for every event that you want to communicate.
While the notion of using high-level abstracts instead of low-level abstract is laudable, there is a misconception here. std::future is not a high-level replacement for std::conditional_variable. Instead, it is a specific high-level construct build for a specific use-case of std::condition_variable - namely, a one-time return of the value.
Obviously, not all uses of condition variable is for this scenario. For example, an message queue can not be implemented with std::future, no matter how much you try. Such a thread is another high-level construct built on low-level building block. So yes, shoot for high-level constructs, but do not expect a one-to-one map mapping between high and low level.

Reader-writer lock with condition variable

I find neither boost nor tbb library's condition variable has the interface of working with reader-writer lock (ie. shared mutex in boost). condition_variable::wait() only accepts mutex lock. But I think it's quite reasonable to have it work with reader-writer lock. Can anyone tell me the reason why they don't support that, or why people don't do that?
Thanks,
Cui
The underlying platform's native threading API might not be able to support it easily. For example, on a POSIX platform where a condition variable is implemented in terms of pthread_cond_t it can only be used with pthread_mutex_t. In order to get maximum performance the basic condition variable type is a lightweight wrapper over the native types, with no additional overhead.
If you want to use other types of mutex you should use std::condition_variable_any or boost::condition_variable_any, which work with any type of mutex. This has a small additional overhead due to using an internal mutex of the native plaform's type in addition to the user-supplied mutex. (I don't know if TBB offers an equivalent type.)
It's a design trade-off that allows either performance or flexibility. If you want maximum performance you get it with condition_variable but can only use simple mutexes. If you want more flexibility you can get that with condition_variable_any but you must sacrifice a little performance.

Which STL container has a thread-safe insertion process?

Which STL container has a thread safe insertion process? I want several threads to simultaneously insert in the same container. Any implementation other than STL (i.e., Boost) is welcome!
The STL containers are not thread safe. You have to impose that yourself, should you so wish, with your own synchronisation.
I am trying to avoid the critical region in multi-threading because it deteriorates the performance !
On the contrary, it improves performance. Because the kind of locking a container class can do is only the very fine-grained kind, having to acquire the lock for each simple operation. That's expensive. When you take care of locking, you have the luxury of acquiring the lock and perform many operations. That does not improve the odds for concurrency but greatly reduces the locking overhead. You can choose the strategy that makes most sense for your app, it isn't forced on you.
Add to this that it is next to impossible to write a thread-safe container implementation that isn't either prone to deadlock or very expensive. Iterators are the problem. The library writer has to choose between taking a lock for the life time of the iterator (risking deadlock) or needs to update all live iterators when another thread changes the collection (expensive). Only the expensive choice is safe. Again, you choose the strategy that makes most sense, the expensive choice is not forced on you.
The Standard does not require any STL containers to be thread safe. An implementation could be thread safe, although I'm not sure how they could pull it off with the current API; and changing the API would make them no longer compatible with the Standard.
If the LGPL is acceptable, Intel TBB has thread safe containers (these containers use locks internally, which does affect their performance).
Take a look at Boost.Lockfree (http://www.boost.org/doc/libs/1_53_0/doc/html/lockfree.html). It provides threadsafe implementations of:
boost::lockfree::queue
a lock-free multi-produced/multi-consumer queue
boost::lockfree::stack
a lock-free multi-produced/multi-consumer stack
boost::lockfree::spsc_queue
a wait-free single-producer/single-consumer queue (commonly known as ringbuffer)
Containers follow KISS principle (Keep It Simple) and therefore do not have synchronization features. Most of the time this hypothetical embedded synchronization is not enough because most of the time access to some other objects must be synchronized with the access to the container. Combine your container with one lock, and that's it really.
Since you said any other (non-STL) implementation is welcome, I suggest Intel's Thread Building Blocks. They have thread safe concurrent containers that have really good performance characteristics.