Data race with std::unordered_map, despite locking insertions with mutex - c++

I have a C++11 program that does some computations and uses a std::unordered_map to cache results of those computations. The program uses multiple threads and they use a shared unordered_map to store and share the results of the computations.
Based on my reading of unordered_map and STL container specs, as well as unordered_map thread safety, it seems that an unordered_map, shared by multiple threads, can handle one thread writing at a time, but many readers at a time.
Therefore, I'm using a std::mutex to wrap my insert() calls to the map, so that at most only one thread is inserting at a time.
However, my find() calls do not have a mutex as, from my reading, it seems that many threads should be able to read at once. However, I'm occasionally getting data races (as detected by TSAN), manifesting themselves in a SEGV. The data race clearly points to the insert() and find() calls that I mentioned above.
When I wrap the find() calls in a mutex, the problem goes away. However, I don't want to serialize the concurrent reads, as I'm trying to make this program as fast as possible. (FYI: I'm running using gcc 5.4.)
Why is this happening? Is my understanding of the concurrency guarantees of std::unordered_map incorrect?

You still need a mutex for your readers to keep the writers out, but you need a shared one. C++14 has a std::shared_timed_mutex that you can use along with scoped locks std::unique_lock and std::shared_lock like this:
using mutex_type = std::shared_timed_mutex;
using read_only_lock = std::shared_lock<mutex_type>;
using updatable_lock = std::unique_lock<mutex_type>;
mutex_type mtx;
std::unordered_map<int, std::string> m;
// code to update map
{
updatable_lock lock(mtx);
m[1] = "one";
}
// code to read from map
{
read_only_lock lock(mtx);
std::cout << m[1] << '\n';
}

There are several problems with that approach.
first, std::unordered_map has two overloads of find - one which is const, and one which is not. I'd dare to say that I don't believe that that non-const version of find will mutate the map, but still for the compiler invoking non const method from a multiple threads is a data race and some compilers actually use undefined behavior for nasty optimizations.
so first thing - you need to make sure that when multiple threads invoke std::unordered_map::find they do it with the const version. that can be achieved by referencing the map with a const reference and then invoking find from there.
second, you miss the the part that many thread may invoke const find on your map, but other threads can not invoke non const method on the object! I can definitely imagine many threads call find and some call insert on the same time, causing a data race. imagine that, for example, insert makes the map's internal buffer reallocate while some other thread iterates it to find the wanted pair.
a solution to that is to use C++14 shared_mutex which has an exclusive/shared locking mode. when thread call find, it locks the lock on shared mode, when a thread calls insert it locks it on exclusive lock.
if your compiler does not support shared_mutex, you can use platform specific synchronization objects, like pthread_rwlock_t on Linux and SRWLock on Windows.
another possibility is to use lock-free hashmap, like the one provided by Intel's thread-building blocks library, or concurrent_map on MSVC concurrency runtime. the implementation itself uses lock-free algorithms which makes sure access is always thread-safe and fast on the same time.

Related

Multithreading - synchronised value vs mutexes?

When writing multithreaded code, I often need to read / write to shared memory. To prevent data races, the go - to solution would be to use something like lock_guard. However recently, I came across the concept of "synchronised values" which are usually implemented something in the lines of :
template <typename T>
class SynchronizedValue {
T value;
std::mutex lock;
/* Public helper functions to read/write to a value, making sure the lock is locked when the value is written to*/
};
This class Synchronised value will have a method SetValueTo which will lock the mutex, write to the value, and unlock the mutex, making sure that you can write to a value safely without any data races.
This makes writing multithreaded code so much easier! However, are there any drawbacks / performance overhead of using these synchronised values in contrast to mutexes / lock_guard?
are there any drawbacks / performance overhead of using these SynchronisedValues...?
Before you ask whether there is any drawback, You first ought to ask whether there is any benefit. The standard C++ library already defines std::atomic<T>. You didn't say what /* public helper functions...*/ you had in mind, but if they're just getters and setters for value, then what does your SynchronizedValues<T> class offer that you don't already get from std::atomic<T> ?
There's an important reason why "atomic" variables don't eliminate the need for mutexes, B.T.W. Mutexes aren't just about ensuring "visibility" of memory updates: The most important way to think about mutexes is that they can protect relationships between data in a program.
E.g., Imagine a program that has multiple containers for some class of object, imagine that the program needs to move objects from container to container, and imagine that it is important for some thread to occasionally count all of the objects, and be guaranteed to get an accurate count.
The program can use a mutex to make that possible. It just has to obey two simple rules; (1) No thread may remove an object from any container unless it has the mutex locked, and (2) no thread may release the mutex until every object is in a container. If all of the threads obey those two rules, then the thread that counts the objects can be guaranteed to find all of them if it locks the mutex before it starts counting.
The thing is, you can't guarantee that just by making all of the variables atomic, because atomic doesn't protect any relationship between the variable in question and any other variable. At most, it only protects relationships between the value of the variable before and after some "atomic" operation such as an atomic increment.
When there's more than one variable participating in the relationship, then you must have a mutex (or something equivalent to a mutex.)
If you look under the hood at what is actually happening in each case you just find different ways of saying and doing the same thing.

C++ std::vector access using software transactional memory

I'm currently trying to wrap my head around the problem of thread-safety using C++ STL containers. I recently tried to implement a thread safe std::vector by using a std::mutex as a member variable, just to then realize that although I could make member functions thread-safe by locking the lock, I couldn't make lib functions like std::sort thread-safe, since they only get the begin()/end() iterators, which is a result of the fundamental split between containers and algorithms in the STL in general.
So then I thought, if I can't use locks, how about software transactional memory (STM)?
So now I'm stuck with this:
#include <atomic>
#include <cstdlib>
#include <iostream>
#include <thread>
#include <vector>
#define LIMIT 10
std::atomic<bool> start{false};
std::vector<char> vec;
void thread(char c)
{
while (!start)
std::this_thread::yield();
for (int i = 0; i < LIMIT; ++i) {
__transaction_atomic {
vec.push_back(c);
}
}
}
int main()
{
std::thread t1(thread, '*');
std::thread t2(thread, '#');
start.store(true);
t1.join();
t2.join();
for (auto i = vec.begin(); i != vec.end(); ++i)
std::cout << *i;
std::cout << std::endl;
return EXIT_SUCCESS;
}
Which I compile with:
g++ -std=c++11 -fgnu-tm -Wall
using g++ 4.8.2 and which gives me the following error:
error: unsafe function call to push_back within atomic transaction
Which I kinda get, since push_back or sort or whatever isn't declared transaction_safe but which leaves me with the following questions:
a) How can I fix that error?
b) If I can't fix that error, then what are these transactional blocks usually used for?
c) How would one implement a lock-free thread-safe vector?!
Thanks in advance!
Edit:
Thanks for the answers so far but they don't really scratch my itch. Let me give you an example:
Imagine I have a global vector and access to this vector shall be shared amongst multiple threads. All threads try to do sorted inserts, so they generate a random number and try to insert this number into the vector in a sorted manner, so the vector stays sorted all the time (including duplicates ofc). To do the sorted insert they use std::lower_bound to find the "index" where to insert and then do the insert using vector.insert().
If I write a wrapper for the std::vector that contains a std::mutex as a member, than I can write wrapper functions, e.g. insert which locks the mutex using std::lock_guard and then does the actual std::vector.insert() call. But std::lower_bound doesn't give a damn about the member mutex. Which is a feature, not a bug afaik.
This leaves my threads in quite a pickle because other threads can change the vector while someone's doing his lower_bound thing.
The only fix I can think of: forgett the wrapper and have a global mutex for the vector instead. Whenever anybody wants to do anything on/with/to this vector, he needs that lock.
THATS the problem. What alternatives are there for using this global mutex.
and THATS where software transactional memory came to mind.
So now: how to use STMs on STL containers? (and a), b), c) from above).
I believe that the only way you can make an STL container 100% thread safe is to wrap it in your own object (keeping the actual container private) and use appropriate locking (mutexes, whatever) in your object in order to prevent multi-thread access to the STL container.
This is the moral equivalent of just locking a mutex in the caller around every container operation.
In order to make the container truly thread safe, you'd have to muck about with the container code, which there's no provision for.
Edit: One more note - be careful about the interface you give to your wrapper object. You can't very well go handing out references to stored objects, as that would allow the caller to get around the locking of the wrapper. So you can't just duplicate vector's interface with mutexes and expect things to work.
I'm not sure I understand why you cannot use mutexes. If you lock the mutex each time you are accessing the vector then no matter what operation you are doing you are certain that only a single thread at a time is using it. There is certainly space for improvement depending on your needs for the safe vector, but mutexes should be perfectly viable.
lock mutex -> call std::sort or whatever you need -> unlock mutex
If on the other side what you want is to use std::sort on your class, then again it is a matter of providing thread-safe access and reading methods through the iterators of your container, as those are the ones that std::sort needs to use anyway in order to sort a vector, since it is not a friend of containers or anything of the sort.
You can use simple mutexes to make your class thread safe. As stated in another answer, you need to use a mutex to lock the vector before use and then unlock after use.
CAUTION! All of the STL functions can throw exceptions. If you use simple mutexes, you will have a problem if any function throws because the mutex will not be released. To avoid this problem, wrap the mutex in a class that releases it in the destructor. This is a good programming practice to learn about: http://c2.com/cgi/wiki?ResourceAcquisitionIsInitialization

C++ Access to vector from multiple threads

In my program I've some threads running. Each thread gets a pointer to some object (in my program - vector). And each thread modifies the vector.
And sometimes my program fails with a segm-fault. I thought it occurred because thread A begins doing something with the vector while thread B hasn't finished operating with it? Can it be true?
How am I supposed to fix it? Thread synchronization? Or maybe make a flag VectorIsInUse and set this flag to true while operating with it?
vector, like all STL containers, is not thread-safe. You have to explicitly manage the synchronization yourself. A std::mutex or boost::mutex could be use to synchronize access to the vector.
Do not use a flag as this is not thread-safe:
Thread A checks value of isInUse flag and it is false
Thread A is suspended
Thread B checks value of isInUse flag and it is false
Thread B sets isInUse to true
Thread B is suspended
Thread A is resumed
Thread A still thinks isInUse is false and sets it true
Thread A and Thread B now both have access to the vector
Note that each thread will have to lock the vector for the entire time it needs to use it. This includes modifying the vector and using the vector's iterators as iterators can become invalidated if the element they refer to is erase() or the vector undergoes an internal reallocation. For example do not:
mtx.lock();
std::vector<std::string>::iterator i = the_vector.begin();
mtx.unlock();
// 'i' can become invalid if the `vector` is modified.
If you want a container that is safe to use from many threads, you need to use a container that is explicitly designed for the purpose. The interface of the Standard containers is not designed for concurrent mutation or any kind of concurrency, and you cannot just throw a lock at the problem.
You need something like TBB or PPL which has concurrent_vector in it.
That's why pretty much every class library that offers threads also has synchronization primitives such as mutexes/locks. You need to setup one of these, and aquire/release the lock around every operation on the shared item (read AND write operations, since you need to prevent reads from occuring during a write too, not just preventing multiple writes happening concurrently).

Multiple mutex locking strategies and why libraries don't use address comparison

There is a widely known way of locking multiple locks, which relies on choosing fixed linear ordering and aquiring locks according to this ordering.
That was proposed, for example, in the answer for "Acquire a lock on two mutexes and avoid deadlock". Especially, the solution based on address comparison seems to be quite elegant and obvious.
When I tried to check how it is actually implemented, I've found, to my surprise, that this solution in not widely used.
To quote the Kernel Docs - Unreliable Guide To Locking:
Textbooks will tell you that if you always lock in the same order, you
will never get this kind of deadlock. Practice will tell you that this
approach doesn't scale: when I create a new lock, I don't understand
enough of the kernel to figure out where in the 5000 lock hierarchy it
will fit.
PThreads doesn't seem to have such a mechanism built in at all.
Boost.Thread came up with
completely different solution, lock() for multiple (2 to 5) mutexes is based on trying and locking as many mutexes as it is possible at the moment.
This is the fragment of the Boost.Thread source code (Boost 1.48.0, boost/thread/locks.hpp:1291):
template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3)
{
unsigned const lock_count=3;
unsigned lock_first=0;
for(;;)
{
switch(lock_first)
{
case 0:
lock_first=detail::lock_helper(m1,m2,m3);
if(!lock_first)
return;
break;
case 1:
lock_first=detail::lock_helper(m2,m3,m1);
if(!lock_first)
return;
lock_first=(lock_first+1)%lock_count;
break;
case 2:
lock_first=detail::lock_helper(m3,m1,m2);
if(!lock_first)
return;
lock_first=(lock_first+2)%lock_count;
break;
}
}
}
where lock_helper returns 0 on success and number of mutexes that weren't successfully locked otherwise.
Why is this solution better, than comparing addresses or any other kind of ids? I don't see any problems with pointer comparison, which can be avoided using this kind of "blind" locking.
Are there any other ideas on how to solve this problem on a library level?
From the bounty text:
I'm not even sure if I can prove correctness of the presented Boost solution, which seems more tricky than the one with linear order.
The Boost solution cannot deadlock because it never waits while already holding a lock. All locks but the first are acquired with try_lock. If any try_lock call fails to acquire its lock, all previously acquired locks are freed. Also, in the Boost implementation the new attempt will start from the lock failed to acquire the previous time, and will first wait till it is available; it's a smart design decision.
As a general rule, it's always better to avoid blocking calls while holding a lock. Therefore, the solution with try-lock, if possible, is preferred (in my opinion). As a particular consequence, in case of lock ordering, the system at whole might get stuck. Imagine the very last lock (e.g. the one with the biggest address) was acquired by a thread which was then blocked. Now imagine some other thread needs the last lock and another lock, and due to ordering it will first get the other one and will wait on the last lock. Same can happen with all other locks, and the whole system makes no progress until the last lock is released. Of course it's an extreme and rather unlikely case, but it illustrates the inherent problem with lock ordering: the higher a lock number the more indirect impact the lock has when acquired.
The shortcoming of the try-lock-based solution is that it can cause livelock, and in extreme cases the whole system might also get stuck for at least some time. Therefore it is important to have some back-off schema that make pauses between locking attempts longer with time, and perhaps randomized.
Sometimes, lock A needs to be acquired before lock B does. Lock B might have either a lower or a higher address, so you can't use address comparison in this case.
Example: When you have a tree data-structure, and threads try to read and update nodes, you can protect the tree using a reader-writer lock per node. This only works if your threads always acquire locks top-down root-to-leave. The address of the locks does not matter in this case.
You can only use address comparison if it does not matter at all which lock gets acquired first. If this is the case, address comparison is a good solution. But if this is not the case you can't do it.
I guess the Linux kernel requires certain subsystems to be locked before others are. This cannot be done using address comparison.
The "address comparison" and similar approaches, although used quite often, are special cases. They works fine if you have
a lock-free mechanism to get
two (or more) "items" of the same kind or hierarchy level
any stable ordering schema between those items
For example: You have a mechanism to get two "accounts" from a list. Assume that the access to the list is lock-free. Now you have pointers to both items and want to lock them. Since they are "siblings" you have to choose which one to lock first. Here the approach using addresses (or any other stable ordering schema like "account id") is OK.
But the linked Linux text talks about "lock hierarchies". This means locking not between "siblings" (of the same kind) but between "parent" and "children" which might be from different types. This may happen in actual tree structures as well in other scenarios.
Contrived example: To load a program you must
lock the file inode,
lock the process table
lock the destination memory
These three locks are not "siblings" not in a clear hierarchy. The locks are also not taken directly one after the other - each subsystem will take the locks at free will. If you consider all usecases where those three (and more) subsystems interact you see, that there is no clear, stable ordering you can think of.
The Boost library is in the same situation: It strives to provide generic solutions. So they cannot assume the points from above and must fall back to a more complicated strategy.
One scenario when address compare will fail is if you use the proxy pattern.
You can delegate the locks to the same object and the addresses will be different.
Consider the following example
template<typename MutexType>
class MutexHelper
{
MutexHelper(MutexType &m) : _m(m) {}
void lock()
{
std::cout <<"locking ";
m.lock();
}
void unlock()
{
std::cout <<"unlocking ";
m.unlock();
}
MutexType &_m;
};
if the function
template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3);
will actually use address compare the following code ca produce a deadlock
Mutex m1;
Mutex m1;
thread1
MutexHelper hm1(m1);
MutexHelper hm2(m2);
lock(hm1, hm2);
thread2:
MutexHelper hm2(m2);
MutexHelper hm1(m1);
lock(hm1, hm2);
EDIT:
this is an interesting thread that share some light on boost::lock implementation
thread-best-practice-to-lock-multiple-mutexes
Address compare does not work for inter-process shared mutexes (named synchronization objects).

Thread safety of std::map for read-only operations

I have a std::map that I use to map values (field ID's) to a human readable string. This map is initialised once when my program starts before any other threads are started, and after that it is never modified again. Right now, I give every thread its own copy of this (rather large) map but this is obviously inefficient use of memory and it slows program startup. So I was thinking of giving each thread a pointer to the map, but that raises a thread-safety issue.
If all I'm doing is reading from the map using the following code:
std::string name;
//here N is the field id for which I want the human readable name
unsigned field_id = N;
std::map<unsigned,std::string>::const_iterator map_it;
// fields_p is a const std::map<unsigned, std::string>* to the map concerned.
// multiple threads will share this.
map_it = fields_p->find(field_id);
if (map_it != fields_p->end())
{
name = map_it->second;
}
else
{
name = "";
}
Will this work or are there issues with reading a std::map from multiple threads?
Note: I'm working with visual studio 2008 currently, but I'd like this to work acros most main STL implementations.
Update: Edited code sample for const correctness.
This will work from multiple threads as long as your map remains the same. The map you use is immutable de facto so any find will actually do a find in a map which does not change.
Here is a relevant link: http://www.sgi.com/tech/stl/thread_safety.html
The SGI implementation of STL is
thread-safe only in the sense that
simultaneous accesses to distinct
containers are safe, and simultaneous
read accesses to to shared containers
are safe. If multiple threads access a
single container, and at least one
thread may potentially write, then the
user is responsible for ensuring
mutual exclusion between the threads
during the container accesses.
You fall into he "simultaneous read accesses to shared containers" category.
Note: this is true for the SGI implementation. You need to check if you use another implementation. Of the two implementations which seem widely used as an alternative, STLPort has built-in thread safety as I know. I don't know about the Apache implementation though.
It should be fine.
You can use const references to it if you want to document/enforce read-only behaviour.
Note that correctness isn't guaranteed (in principle the map could choose to rebalance itself on a call to find), even if you do use const methods only (a really perverse implementation could declare the tree mutable). However, this seems pretty unlikely in practise.
Yes it is.
See related post with same question about std::set:
Is the C++ std::set thread-safe?
For MS STL implementation
Thread Safety in the C++ Standard Library
The following thread safety rules apply to all classes in the C++ Standard Library—this includes shared_ptr, as described below. Stronger guarantees are sometimes provided—for example, the standard iostream objects, as described below, and types specifically intended for multithreading, like those in .
An object is thread-safe for reading from multiple threads. For example, given an object A, it is safe to read A from thread 1 and from thread 2 simultaneously.