I have a hash table data structure that I wish to make thread safe by use of a reader/writer lock (my read:write ratio is likely somewhere in the region of 100:1).
I have been looking around for how to implement this lock using C++11 (such as the method here), but it has come to my attention that it should be possible to use C++14's shared_lock to accomplish the same thing. However, after looking on cppreference I have found both std::shared_lock and std::unique_lock but I don't understand how to use them together (compared to the Boost way which has simple method calls for locking both uniquely and in shared mode).
How can I recreate this relatively simple reader/writer lock interface in C++14 using only the standard library?
C++14 has the read/writer lock implementation std::shared_timed_mutex.
Side-note: C++17 added the simpler class std::shared_mutex, which you can use if you don't need the extra timing functions (like shared_timed_mutex::try_lock_for and shared_timed_mutex::try_lock_until).
However, before using a read/writer lock, be aware of the potentially harmful performance implications. Depending on the situation, a simple std::mutex might be faster.
Related
I know Boost has support for mutexes and lock_guard, which can be used to implement critical sections.
But Windows has a special API for critical sections (see EnterCriticalSection and LeaveCriticalSection) which is a LOT faster than a mutex (for rarely contended, short sections of code).
Hence my question - it is possible in Boost to take advantage of this API, and fallback to spinlock/mutex/futex-based implementation on other platforms?
The simple answer is no.
Here's some relevant background from an old mailing list thread:
BTW. I am agree that mutex is more universal solution from a
performance point of view. But to be fair - CS are faster in simple
design. I believe that possibility to support them should be at
least
taken in account.
This was the article that someone pointed me to. The conclusion was
that CS are only faster if:
There are less than 8 threads total in the process.
You weren't running in the background.
You weren't on an dual processor machine.
To me this means that simple testing yields good CS performance
results, but any real world program is better off with a full blown
mutex.
I'm not adverse to supporting a CS implementation. However, I
originally chose not to for the following reasons:
You get either construction and destruction hits from using a PIMPL
idiom or you must include Windows.h in the Boost.Threads headers,
which I simply don't want to do. (This can be worked around by
emulating a CS ala OPTEX from the MSDN.)
According to this research paper most programs won't benefit from
a CS design.
It's trivial to code a (non-portable) critical_section class that
follows the Mutex model if you truly can make use of this.
For now I think I've made the right choice, though down the road we
may change the implementation to use a critical section or OPTEX.
Bill Kempf
Speaking as someone who helps out maintaining Boost.Thread, and as someone who failed to get an event object into Boost.Thread, I don't think critical sections have ever been added nor would be added to Boost for these reasons:
A Win32 critical section is trivially easy to build using a boost::atomic and a boost::condition_variable, so much so it isn't really worth having an official one. Here is probably the most complex one you could imagine, but extremely configurable including being constexpr ready (don't ask!): https://github.com/ned14/boost.outcome/blob/master/include/boost/outcome/v1/spinlock.hpp#L331
You can build your own simply by matching (Basic)Lockable concept and using atomic compare_exchange (non-x86/x64) or atomic exchange (x86/x64) and then grab it using a lock_guard around the critical section.
Some may object that a win32 critical section is not this. I am afraid it is: it simply spins on an atomic for a spin count, and then lazily tries to allocate a win32 event object which it then waits upon. Nothing special.
As much as you might think critical sections (really user mode mutexes) are better/faster/whatever, they probably are not as great as you might think. boost::mutex is a big vast heavyweight thing on Windows internally using a win32 semaphore as the kernel wait object because of the need to emulate thread cancellation and to behave well in a general purpose use context. It's easy to write a concurrency structure which is faster than another for some single use case, but it is very very hard to write a concurrency structure which is all of:
Faster than a standard implementation in the uncontended case.
Faster than a standard implementation in the lightly contended case.
Faster than a standard implementation in the heavily contended case.
Even if you manage all three of the above, that still isn't enough: you also need some guarantees on worst case progression ordering, so whether certain patterns of locks, waits and unlocks produce predictable outcomes. This is why threading facilities can appear to look slow in narrow use case scenarios, so Boost.Thread much as the STL can appear to be much slower than hand rolled locking code in say an uncontended use case.
Boost.Thread already does substantial work in user mode to avoid going to kernel sleep on Windows. On POSIX any of the major pthreads implementations also does substantial work to avoid kernel sleeps and hence Boost.Thread doesn't replicate that work. In other words, critical sections don't gain you anything in terms of scaling to load behaviours, though inevitably Boost.Thread v4 especially on Windows does a ton load of work a naive implementation does not (the planned rewrite of Boost.Thread is vastly more efficient on Windows as it can assume Windows Vista or above).
So, it looks like the default Boost mutex doesn't support it, but asio::detail::mutex does.
So I ended up using that:
#include <boost/asio/detail/mutex.hpp>
#include <boost/thread.hpp>
using boost::asio::detail::mutex;
using boost::lock_guard;
int myFunc()
{
static mutex mtx;
lock_guard<mutex> lock(mtx);
. . .
}
I find neither boost nor tbb library's condition variable has the interface of working with reader-writer lock (ie. shared mutex in boost). condition_variable::wait() only accepts mutex lock. But I think it's quite reasonable to have it work with reader-writer lock. Can anyone tell me the reason why they don't support that, or why people don't do that?
Thanks,
Cui
The underlying platform's native threading API might not be able to support it easily. For example, on a POSIX platform where a condition variable is implemented in terms of pthread_cond_t it can only be used with pthread_mutex_t. In order to get maximum performance the basic condition variable type is a lightweight wrapper over the native types, with no additional overhead.
If you want to use other types of mutex you should use std::condition_variable_any or boost::condition_variable_any, which work with any type of mutex. This has a small additional overhead due to using an internal mutex of the native plaform's type in addition to the user-supplied mutex. (I don't know if TBB offers an equivalent type.)
It's a design trade-off that allows either performance or flexibility. If you want maximum performance you get it with condition_variable but can only use simple mutexes. If you want more flexibility you can get that with condition_variable_any but you must sacrifice a little performance.
I am looking for the optimal strategy to use STL containers (like std::map and std::vector) and pthreads.
What is the canonical way to go? A simple example:
std::map<string, vector<string>> myMap;
How do we guarantee concurrency?
mutex_lock;
write at myMap;
mutex_unlock;
Additionally, I would like to know if pthreads and STL face performance issues when used together.
System: Liunx, g++, pthreads, no boost, no Intel TBB
The C++03 Standard does not talk about concurrency at all, So the concurrency aspect is left out as an implementation detail for compilers. So the documentation that comes with your compiler is where one should look to for answers related to concurrency.
Most of the STL implementations are not thread safe as such.
Since STL containers do not provide any explicit Thread safety, So yes you will have to use your own synchronization mechanism. And while you are at it You should use RAII rather than manage the synchronization resource(mutex unlock etc) manually.
You can refer the Documentations here:
MSDN:
If a single object is being written to by one thread, then all reads and writes to that object on the same or other threads must be protected. For example, given an object A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.
GCC Documentation says:
We currently use the SGI STL definition of thread safety, which states:
The SGI implementation of STL is thread-safe only in the sense that simultaneous accesses to distinct containers are safe, and simultaneous read accesses to to shared containers are safe. If multiple threads access a single container, and at least one thread may potentially write, then the user is responsible for ensuring mutual exclusion between the threads during the container accesses.
Point to Note: GCC's Standard Library is a derivative of SGI's STL code.
The canonical way to provide concurrency is to hold a lock while accessing the collection.
That works in 90% of the cases where access to the collection isn't performance-critical anyway. If you're accessing a shared collection so much that locking around it harms performance, you should rethink your design. (And odds are, your design is okay and it won't affect performance anywhere near as much as you might suspect.)
You should take a look at intel thread building blocks tbb ( http://threadingbuildingblocks.org/ ). They have a few very optimized data structures that handle concurrency internally using non-blocking strategies.
Motivation: reason why I'm considering it is that my genius project manager thinks that boost is another dependency and that it is horrible because "you depend on it"(I tried explaining the quality of boost, then gave up after some time :( ). Smaller reason why I would like to do it is that I would like to learn c++11 features, because people will start writing code in it.
So:
Is there a 1:1 mapping between #include<thread> #include<mutex>and
boost equivalents?
Would you consider a good idea to replace boost stuff with c++11
stuff. My usage is primitive, but are there examples when std doesnt
offer what boost does? Or (blasphemy) vice versa?
P.S.
I use GCC so headers are there.
There are several differences between Boost.Thread and the C++11 standard thread library:
Boost supports thread cancellation, C++11 threads do not
C++11 supports std::async, but Boost does not
Boost has a boost::shared_mutex for multiple-reader/single-writer locking. The analogous std::shared_timed_mutex is available only since C++14 (N3891), while std::shared_mutex is available only since C++17 (N4508).
C++11 timeouts are different to Boost timeouts (though this should soon change now Boost.Chrono has been accepted).
Some of the names are different (e.g. boost::unique_future vs std::future)
The argument-passing semantics of std::thread are different to boost::thread --- Boost uses boost::bind, which requires copyable arguments. std::thread allows move-only types such as std::unique_ptr to be passed as arguments. Due to the use of boost::bind, the semantics of placeholders such as _1 in nested bind expressions can be different too.
If you don't explicitly call join() or detach() then the boost::thread destructor and assignment operator will call detach() on the thread object being destroyed/assigned to. With a C++11 std::thread object, this will result in a call to std::terminate() and abort the application.
To clarify the point about move-only parameters, the following is valid C++11, and transfers the ownership of the int from the temporary std::unique_ptr to the parameter of f1 when the new thread is started. However, if you use boost::thread then it won't work, as it uses boost::bind internally, and std::unique_ptr cannot be copied. There is also a bug in the C++11 thread library provided with GCC that prevents this working, as it uses std::bind in the implementation there too.
void f1(std::unique_ptr<int>);
std::thread t1(f1,std::unique_ptr<int>(new int(42)));
If you are using Boost then you can probably switch to C++11 threads relatively painlessly if your compiler supports it (e.g. recent versions of GCC on linux have a mostly-complete implementation of the C++11 thread library available in -std=c++0x mode).
If your compiler doesn't support C++11 threads then you may be able to get a third-party implementation such as Just::Thread, but this is still a dependency.
std::thread is largely modelled after boost::thread, with a few differences:
boost's non-copyable, one-handle-maps-to-one-os-thread, semantics are retained. But this thread is movable to allow returning thread from factory functions and placing into containers.
This proposal adds cancellation to the boost::thread, which is a significant complication. This change has a large impact not only on thread but the rest of the C++ threading library as well. It is believed this large change is justifiable because of the benefit.
The thread destructor must now call cancel prior to detaching to avoid accidently leaking child threads when parent threads are canceled.
An explicit detach member is now required to enable detaching without canceling.
The concepts of thread handle and thread identity have been separated into two classes (they are the same class in boost::thread). This is to support easier manipulation and storage of thread identity.
The ability to create a thread id which is guaranteed to compare equal to no other joinable thread has been added (boost::thread does not have this). This is handy for code which wants to know if it is being executed by the same thread as a previous call (recursive mutexes are a concrete example).
There exists a "back door" to get the native thread handle so that clients can manipulate threads using the underlying OS if desired.
This is from 2007, so some points are no longer valid: boost::thread has a native_handle function now, and, as commenters point out, std::thread doesn't have cancellation anymore.
I could not find any significant differences between boost::mutex and std::mutex.
Enterprise Case
If you are writing software for the enterprise that needs to run on a moderate to large variety of operating systems and consequently build with a variety of compilers and compiler versions (especially relatively old ones) on those operating systems, my suggestion is to stay away from C++11 altogether for now. That means that you cannot use std::thread, and I would recommend using boost::thread.
Basic / Tech Startup Case
If you are writing for one or two operating systems, you know for sure that you will only ever need to build with a modern compiler that mostly supports C++11 (e.g. VS2015, GCC 5.3, Xcode 7), and you are not already dependent on the boost library, then std::thread could be a good option.
My Experience
I am personally partial to hardened, heavily used, highly compatible, highly consistent libraries such as boost versus a very modern alternative. This is especially true for complicated programming subjects such as threading. Also, I have long experienced great success with boost::thread (and boost in general) across a vast array of environments, compilers, threading models, etc. When its my choice, I choose boost.
There is one reason not to migrate to std::thread.
If you are using static linking, std::thread becomes unusable due to these gcc bugs/features:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52590
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57740
Namely, if you call std::thread::detach or std::thread::join it will lead to either exception or crash, while boost::thread works ok in these cases.
With Visual Studio 2013 the std::mutex seems to behave differently than the boost::mutex, which caused me some problems (see this question).
With regards to std::shared_mutex added in C++17
The other answers here provide a very good overview of the differences in general. However, there are several issues with std::shared_mutex that boost solves.
Upgradable mutices. These are absent from std::thread. They allow a reader to be upgraded to a writer without allowing any other writers to get in before you. These allow you to do things like pre-process a large computation (for example, reindexing a data structure) when in read mode, then upgrade to write to apply the reindex while only holding the write lock for a short time.
Fairness. If you have constant read activity with a std::shared_mutex, your writers will be softlocked indefinitely. This is because if another reader comes along, they will always be given priority. With boost:shared_mutex, all threads will eventually be given priority.(1) Neither readers nor writers will be starved.
The tl;dr of this is that if you have a very high-throughput system with no downtime and very high contention, std::shared_mutex will never work for you without manually building a priority system on top of it. boost::shared_mutex will work out of the box, although you might need to tinker with it in certain cases. I'd argue that std::shared_mutex's behavior is a latent bug waiting to happen in most code that uses it.
(1) The actual algorithm it uses is based on the OS thread scheduler. In my experience, when reads are saturated, there are longer pauses (when obtaining a write lock) on Windows than on OSX/Linux.
I tried to use shared_ptr from std instead of boost and I actually found a bug in gcc implementation of this class. My application was crashing because of destructor called twice (this class should be thread-safe and shouldn't generate such problems). After moving to boost::shared_ptr all problems disappeared. Current implementations of C++11 are still not mature.
Boost has also more features. For example header in std version doesn't provide serializer to a stream (i.e. cout << duration). Boost has many libraries that use its own , etc. equivalents, but do not cooperate with std versions.
To sum up - if you already have an application written using boost, it is safer to keep your code as it is instead of putting some effort in moving to C++11 standard.
I need several STL containers, threadsafe.
Basically I was thinking I just need 2 methods added to each of the STL container objects,
.lock()
.unlock()
I could also break it into
.lockForReading()
.unlockForReading()
.lockForWriting()
.unlockForWriting()
The way that would work is any number of locks for parallel reading are acceptable, but if there's a lock for writing then reading AND writing are blocked.
An attempt to lock for writing waits until the lockForReading semaphore drops to 0.
Is there a standard way to do this?
Is how I'm planning on doing this wrong or shortsighted?
This is really kind of bad. External code will not recognize or understand your threading semantics, and the ease of availability of aliases to objects in the containers makes them poor thread-safe interfaces.
Thread-safety occurs at design time. You can't solve thread safety by throwing locks at the problem. You solve thread safety by not having two threads writing to the same data at the same time- in the general case, of course. However, it is not the responsibility of a specific object to handle thread safety, except direct threading synchronization primitives.
You can have concurrent containers, designed to allow concurrent use. However, their interfaces are vastly different to what's offered by the Standard containers. Less aliases to objects in the container, for example, and each individual operation is encapsulated.
The standard way to do this is acquire the lock in a constructor, and release it in the destructor. This is more commonly know as Resource Acquisition Is Initialization, or RAII. I strongly suggest you use this methodology rather than
.lock()
.unlock()
Which is not exception safe. You can easily forget to unlock the mutex prior to throwing, resulting in a deadlock the next time a lock is attempted.
There are several synchronization types in the Boost.Thread library that will be useful to you, notably boost::mutex::scoped_lock. Rather than add lock() and unlock() methods to whatever container you wish to access from multiple threads, I suggest you use a boost:mutex or equivalent and instantiate a boost::mutex::scoped_lock whenever accessing the container.
Is there a standard way to do this?
No, and there's a reason for that.
Is how I'm planning on doing this
wrong or shortsighted?
It's not necessarily wrong to want to synchronize access to a single container object, but the interface of the container class is very often the wrong place to put the synchronization (like DeadMG says: object aliases, etc.).
Personally I think both TBB and stuff like concurrent_vector may either be overkill or still the wrong tools for a "simple" synchronization problem.
I find that ofttimes just adding a (private) Lock object (to the class holding the container) and wrapping up the 2 or 3 access patterns to the one container object will suffice and will be much easier to grasp and maintain for others down the road.
Sam: You don't want a .lock() method because something could go awry that prevents calling the .unlock() method at the end of the block, but if .unlock() is called as a consequence of object destruction of a stack allocated variable then any kind of early return from the function that calls .lock() will be guaranteed to free the lock.
DeadMG:
Intel's Threading Building Blocks (open source) may be what you're looking for.
There's also Microsoft's concurrent_vector and concurrent_queue, which already comes with Visual Studio 2010.