Mutithreading accessing one std::map , will cause unsafe behavior? - c++

If more than one thread access one map object, but, I can make sure any of these threads accessing will not have the same key, and the accessing is just like:
//find value by key
//if find
// erase the object or change the value
//else
// add new object of the key
Will the operation cause synchronization problem?

Yes, doing concurrent updates without proper synchronization may cause crashes, even if your threads access different keys: the std::map is based on trees, trees get rebalanced, so you can cause a write to a parent of a node with a seemingly unrelated key.
Moreover, it is not safe to perform read-only access concurrently with writing, or searching unlocked + locking on write: if you have threads that may update or delete nodes, you must lock out all readers before you write.

You will have concurrency problems if any of the threads inserts into the tree. STL map is implemented using a red-black tree (or at least that's what I'm familiar with — I don't know whether the Standard mandates red-black tree). Red-black trees may be rebalanced upon insert, which would lead to all sorts of races between threads.
Read-only access (absolutely no writers) would be fine, but keep in mind operator[] is not read-only; it potentially adds a new element. You'd need to use the find() method, get the iterator, and derefence it yourself.

Unless the docs (ie, the ISO C++11 standard) say it's thread-safe (and they don't), then that's it. Period. It's not thread-safe.
There may be implementations of a std::map that would allow this but it's by no means portable.
Maps are often built on red-black trees or other auto-balancing data structures so that a modification to the structure (such as inserting or deleting a key) will cause rebalancing.
You should wrap read and write operations on the map with something like a mutex semaphore, to ensure that synchronisation is done correctly.

Related

Concurrent hash table in C++

In my application I basically have multiple threads that perform inserts and mostly one thread that is iterating through a map and removing items if it meets certain criteria. The reason I wanted to use a concurrent structure is that it would have provided finer grain locking in the code that removes items from the queue which looks similar to this which is not ideal for various reasons including that the thread could get pre-empted while holding the lock.
Function_reap()
{
while(timetaken != timeoutTime)
{
my_map_mutex.lock();
auto iter = my_unordered_map.begin();
while(iter != my_unordered_map.end())
{
if(status_completed == iter->second.status)
{
iter = my_unordered_map.erase(iter);
}
}
my_map_mutex.unlock();
}
}
Was going through the documentation for Intel TBB(Threading Building Blocks) and more specifically the concurrent_unordered_map documentation (https://software.intel.com/en-us/node/506171) to see if this is a good fit for my application and came across this excerpt.
Description concurrent_unordered_map and concurrent_unordered_multimap support concurrent insertion and
traversal, but not concurrent erasure. The interfaces have no visible
locking. They may hold locks internally, but never while calling
user-defined code. They have semantics similar to std::unordered_map
and std::unordered_multimap respectively, except as follows:
The erase and extract methods are prefixed with unsafe_, to indicate that they are not concurrency safe.
Why does TBB not provide safe synchronized deletion from the map? what is the technical reason for this?
What if any other options do i have here? Ideally something that definitely works on Linux and if possible portable to windows.
Well, it is difficult to design a solution that (efficiently) supports all operations. TBB has the concurrent_unordered_map which supports concurrent insert, find and iteration, but no erase - and the concurrent_hash_map which supports concurrent insert, find and erase, but no iteration.
There are several other libraries that provide concurrent hash maps like libcds, or my own one called xenium.
ATM xenium contains two concurrent hash map implementations:
harris_michael_hash_map - fully lock-free; supports concurrent insert, erase, find and iteration. However, the number of buckets has to be defined at construction time and cannot be adapted afterwards. Each bucket contains a linked list of items, which is not very cache friendly.
vyukov_hash_map - is a very fast hash map that uses fine grained locking for insert, erase and iteration; find operations are lock-free. However, if are using iterators you have to be careful to avoid deadlocks (i.e., a thread should not try to insert or erase a key while holding an iterator). However, there is an erase overload that takes an iteration, so you can safely remove the item the iterator points to.
I am currently working to make xenium fully windows compatible.

One occasional writer, multiple frequent readers for std::map

I am facing a concurrency problem here. I have a std::map, there is one
occasional writer and multiple frequent readers from different threads, this writer will occasionally add keys (key is a std::string)to the map, and I can not guarantee when exactly the readers perform reading and stop reading. I don't want to put locks for the readers, since reading is very frequent and checking locks frequently will hurt performance.
If the readers will always access the map by keys (not map iterators), will it be always thread-safe? If not, any idea how to design the code so that the readers will always access valid keys (or map iterators )?
Other approaches using different containers solving this problem are also welcome.
I have to disagree with the previous answer. When they talk about "concurrently accessing existing elements" (when talking about insert()), that presumes that you already have a pointer/reference/iterator to the existing element. This is basically acknowledging that the map isn't going to move the elements around in memory after the insertion. It also acknowledges that iterating the map is not safe during an insert.
Thus, as soon as you have an insert, attempting to do an at() on the same container (at the same time) is a data race. During the insert the map must change some sort of internal state (pointers to tree nodes, perhaps). If the at() catches the container during that manipulation, the pointers may not be in a consistent state.
You need some sort of external synchronization (such as a reader-writer lock) as soon as you have the possibility of both an insert() and at() (or operator[]) occurring at the same time.
Attention: fundamentally edited answer
As a reflex I would put a lock.
At first sight, it seems not required to put a lock in your case:
For insert(), it's said that "Concurrently accessing existing elements is safe, although iterating ranges in the container is not."
For at() , it's said that: "Concurrently accessing or modifying other elements is safe."
The standard library addresses thread-safe aspects:
23.2.2. Container data races
1) For purposes of avoiding data races (17.6.5.9), implementations
shall consider the following functions to be const: begin, end,
rbegin, rend, front, back, data, find, lower_bound, upper_bound,
equal_range, at and, except in associative or unordered associative
containers, operator[].
2) Notwithstanding (17.6.5.9),
implementations are required to avoid data races when the contents of
the contained object in different elements in the same sequence,
excepting vector,are modified concurrently.
There are several other SO answers which interpret this as thread-safe guarantee, as I originally did.
Nevertheless, we know that iterating ranges in the container is not safe when an insert is done. And access to an element requires before somehow iterating to find the element. So, while the standard clarifies safety for concurent access to different elements when you already have their address, the wording leaves potential container concurrency issues open.
I have tried a simulation scenario with multiple read and single write on MSVC, and it never failed. But this is not engough to make the point : implementations are allowed to avoid more data races than what is foressen in the standard (see 17.5.6.9) (or maybe I was simply many times lucky).
Finally, I have found two serious (post C++11) references stating unambiguously that a user lock is required to be safe :
GNU document on concurrency in the standard library: "The standard places requirements on the library to ensure that no data races are caused by the library itself (...) The user code must guard against concurrent function calls which access any particular library object's state when one or more of those accesses modifies the state."
GotW #95 Solution: Thread Safety and Synchronization, by Herb Sutter : "Is the code correctly synchronized (...) ? No. The code has one thread reading (via const operations) from some_obj, and a second thread writing to the same variable. If those threads can execute at the same time, that’s a race and a direct non-stop ticket to undefined behavior land."
Based on these two almost authoritative interpretations, I revise my first answer and come back to my initial reflex : you'll have to lock your concurrent accesses.
Alternatively you could use non standard-libraries with concurrent implementation of maps such as for example Microsoft's concurrent_unordered_map from the Parallel Pattern Library or Intel's concurrent_unordered_map from the Threading Building Blocks (TBB) or lock-free library as described in this SO answer

std::map write/read from multiple threads

I want to be able to read and write in a std::map from multiple threads. Is there a way to do that without mutex (maybe with std::atomic)?
If not, what's the simplest way to do that in C++11?
If the values are std::atomic<>, you can modify and read them from arbitrary threads, but you'll need to have saved their addresses somehow - pointers, references or iterators - and you'll need to know no other thread will call erase on them....
Still, even atomic values won't make it safe to modify the container (e.g. inserting or erasing elements), while other threads are also doing modifications or lookups or iteration. Even the "const" functions - like size(), empty(), begin(), end(), count(), and iterator movement - are unsafe, because the mutating operations may be in the middle of rewiring the inter-node links or updating the same data.
For anything more than the above, you will need a mutex.
For a concrete example, say you've inserted a node with std::string key "client_counter" - you could start a thread that gets an iterator to that element and does atomic updates to the counter, while other threads can find the element and read from it but must not erase it. You could still have other nodes inserted in the map, with other updaters and readers, without any extra synchronisation with the client_counter-updating thread.
If you don't want to use mutex then you need to wait for concurrent containers (C++17?). If you want to use std::atomic operations inside std::map then you probably want to make or found on the Internet fully implementation of concurrent atomic std::map.
If you want to use std::map of std::atomic then you probably need to know that this will protect only elements inside std::map, but not std::map in self.

C++ do we need mutex when using map?

I have two threads, one is inserting and another is deleting an entry in the map. I am wondering whether I have a mutex around these function calls? And also one thread incrementing a counter inside this map and the other decrementing that counter. Do I need mutex for that as well?
Thanks,
Changes to the map itself (insertions, deletions) need to be synchronized. The same is true for traversal and lookup (i.e. begin(), find(), [], etc.).
Multiple threads can access different elements safely, though.
If you are incrementing and decrementing the SAME element in the map (or what may be the same element and you can't say for sure), then you need to have some sort of synchronisation. You could use an std::atomic<int> to avoid having to use a mutex tho'.
Any insert or remove in the tree will need to be protected with a mutex or similar - and of course, that also means that any access to the content of the tree needs to be protected in the same way, so if you use std::map<T>::iterator (at least for erase in the tree) will be invalidated. So you really need to ensure that no erase happens when you use any other access to the tree. This includes "ready-made" functions such as find.

Thread-safety of c++ maps

This is about thread safety of std::map. Now, simultaneous reads are thread-safe but writes are not. My question is that if I add unique element to the map everytime, will that be thread-safe?
So, for an example, If I have a map like this std:map<int, std::string> myMap
and I always add new keys and never modify the existing key-value, will that be thread-safe?
More importantly, will that give me any random run-time behavior?
Is adding new keys also considered modification? If the keys are always different while adding, shouldn't it be thread-safe as it modifies an independent part of the memory?
Thanks
Shiv
1) Of course not
2) Yes, I hope you'll encounter it during testing, not later
3) Yes, it is. The new element is added in a different location, but many pointers are modified during that.
The map is implemented by some sort of tree in most if not all implementations. Inserting a new element in a tree modifies it by rearranging nodes by means of resetting pointers to point to different nodes. So it is not thread safe
no, yes, yes. You need to obtain exclusive lock when modifying container (including insertion of new keys), though while there's no modification going on you can, of course, safely read concurrently.
edit: http://www.sgi.com/tech/stl/thread_safety.html might be of interest for you.