Iterating over the contents of concurrent_hash_map - concurrency

I am using tbb::concurrent_hash_map. I understand insertion and deletion are safe operations. Is iterating over hash map considered safe with respect to insertion and deletion. If not are there any data structures which I can use for safe iteration.

With the absence of memory reclamation support (GC) in C++ and TBB, it is not possible to make both deletion and iteration safe at the same time without significant performance impact. Thus TBB has 2 concurrent containers for hash tables:
concurrent_hash_map with safe erase() and element-level access synchronization but without safe iteration.
concurrent_unordered_* (e.g.concurrent_unordered_map) without safe erase() and element access synchronization but with thread-safe iteration support.

Related

Concurrent hash table in C++

In my application I basically have multiple threads that perform inserts and mostly one thread that is iterating through a map and removing items if it meets certain criteria. The reason I wanted to use a concurrent structure is that it would have provided finer grain locking in the code that removes items from the queue which looks similar to this which is not ideal for various reasons including that the thread could get pre-empted while holding the lock.
Function_reap()
{
while(timetaken != timeoutTime)
{
my_map_mutex.lock();
auto iter = my_unordered_map.begin();
while(iter != my_unordered_map.end())
{
if(status_completed == iter->second.status)
{
iter = my_unordered_map.erase(iter);
}
}
my_map_mutex.unlock();
}
}
Was going through the documentation for Intel TBB(Threading Building Blocks) and more specifically the concurrent_unordered_map documentation (https://software.intel.com/en-us/node/506171) to see if this is a good fit for my application and came across this excerpt.
Description concurrent_unordered_map and concurrent_unordered_multimap support concurrent insertion and
traversal, but not concurrent erasure. The interfaces have no visible
locking. They may hold locks internally, but never while calling
user-defined code. They have semantics similar to std::unordered_map
and std::unordered_multimap respectively, except as follows:
The erase and extract methods are prefixed with unsafe_, to indicate that they are not concurrency safe.
Why does TBB not provide safe synchronized deletion from the map? what is the technical reason for this?
What if any other options do i have here? Ideally something that definitely works on Linux and if possible portable to windows.
Well, it is difficult to design a solution that (efficiently) supports all operations. TBB has the concurrent_unordered_map which supports concurrent insert, find and iteration, but no erase - and the concurrent_hash_map which supports concurrent insert, find and erase, but no iteration.
There are several other libraries that provide concurrent hash maps like libcds, or my own one called xenium.
ATM xenium contains two concurrent hash map implementations:
harris_michael_hash_map - fully lock-free; supports concurrent insert, erase, find and iteration. However, the number of buckets has to be defined at construction time and cannot be adapted afterwards. Each bucket contains a linked list of items, which is not very cache friendly.
vyukov_hash_map - is a very fast hash map that uses fine grained locking for insert, erase and iteration; find operations are lock-free. However, if are using iterators you have to be careful to avoid deadlocks (i.e., a thread should not try to insert or erase a key while holding an iterator). However, there is an erase overload that takes an iteration, so you can safely remove the item the iterator points to.
I am currently working to make xenium fully windows compatible.

One occasional writer, multiple frequent readers for std::map

I am facing a concurrency problem here. I have a std::map, there is one
occasional writer and multiple frequent readers from different threads, this writer will occasionally add keys (key is a std::string)to the map, and I can not guarantee when exactly the readers perform reading and stop reading. I don't want to put locks for the readers, since reading is very frequent and checking locks frequently will hurt performance.
If the readers will always access the map by keys (not map iterators), will it be always thread-safe? If not, any idea how to design the code so that the readers will always access valid keys (or map iterators )?
Other approaches using different containers solving this problem are also welcome.
I have to disagree with the previous answer. When they talk about "concurrently accessing existing elements" (when talking about insert()), that presumes that you already have a pointer/reference/iterator to the existing element. This is basically acknowledging that the map isn't going to move the elements around in memory after the insertion. It also acknowledges that iterating the map is not safe during an insert.
Thus, as soon as you have an insert, attempting to do an at() on the same container (at the same time) is a data race. During the insert the map must change some sort of internal state (pointers to tree nodes, perhaps). If the at() catches the container during that manipulation, the pointers may not be in a consistent state.
You need some sort of external synchronization (such as a reader-writer lock) as soon as you have the possibility of both an insert() and at() (or operator[]) occurring at the same time.
Attention: fundamentally edited answer
As a reflex I would put a lock.
At first sight, it seems not required to put a lock in your case:
For insert(), it's said that "Concurrently accessing existing elements is safe, although iterating ranges in the container is not."
For at() , it's said that: "Concurrently accessing or modifying other elements is safe."
The standard library addresses thread-safe aspects:
23.2.2. Container data races
1) For purposes of avoiding data races (17.6.5.9), implementations
shall consider the following functions to be const: begin, end,
rbegin, rend, front, back, data, find, lower_bound, upper_bound,
equal_range, at and, except in associative or unordered associative
containers, operator[].
2) Notwithstanding (17.6.5.9),
implementations are required to avoid data races when the contents of
the contained object in different elements in the same sequence,
excepting vector,are modified concurrently.
There are several other SO answers which interpret this as thread-safe guarantee, as I originally did.
Nevertheless, we know that iterating ranges in the container is not safe when an insert is done. And access to an element requires before somehow iterating to find the element. So, while the standard clarifies safety for concurent access to different elements when you already have their address, the wording leaves potential container concurrency issues open.
I have tried a simulation scenario with multiple read and single write on MSVC, and it never failed. But this is not engough to make the point : implementations are allowed to avoid more data races than what is foressen in the standard (see 17.5.6.9) (or maybe I was simply many times lucky).
Finally, I have found two serious (post C++11) references stating unambiguously that a user lock is required to be safe :
GNU document on concurrency in the standard library: "The standard places requirements on the library to ensure that no data races are caused by the library itself (...) The user code must guard against concurrent function calls which access any particular library object's state when one or more of those accesses modifies the state."
GotW #95 Solution: Thread Safety and Synchronization, by Herb Sutter : "Is the code correctly synchronized (...) ? No. The code has one thread reading (via const operations) from some_obj, and a second thread writing to the same variable. If those threads can execute at the same time, that’s a race and a direct non-stop ticket to undefined behavior land."
Based on these two almost authoritative interpretations, I revise my first answer and come back to my initial reflex : you'll have to lock your concurrent accesses.
Alternatively you could use non standard-libraries with concurrent implementation of maps such as for example Microsoft's concurrent_unordered_map from the Parallel Pattern Library or Intel's concurrent_unordered_map from the Threading Building Blocks (TBB) or lock-free library as described in this SO answer

How lockfree containers react to concurrent partitioning and sorting?

How two standart algorithms like partition and sorting performed form 2 different threads at the same time are handeled via concurrent containers (for example in boost or tbb implementations)?
Boost has lockfree queues and a stack. One doesn't sort or partition these.
On superficial inspection of the documentation, TBB has concurrent_hash_map, and queue classes for which the same goes.
Only concurrent_vector from TBB would raise this question. The docs describe it as follows:
A concurrent_vector<T> is a dynamically growable array of T
However, just the storage (re)allocation is lockfree threadsafe, not the elements itself;
A concurrent_vector never moves an element until the array is cleared, which can be an advantage over the STL std::vector even for single-threaded code
And
Operations on concurrent_vector are concurrency safe with respect to growing, not for clearing or destroying a vector. Never invoke method clear() if there are other operations in flight on the concurrent_vector.
Hence if you want to sort a concurrent_vector you might
want to mutually exclude access; if latency is crucial you might use an atomic spinlock instead of fullblown mutex, but anyways you need a synchronization
want to consider copying to a sorted range leaving the source entries unmodified; this could be done without further locking (assuming the read operations on the vector elements are thread safe), see e.g. std::partial_sort_copy

Mutithreading accessing one std::map , will cause unsafe behavior?

If more than one thread access one map object, but, I can make sure any of these threads accessing will not have the same key, and the accessing is just like:
//find value by key
//if find
// erase the object or change the value
//else
// add new object of the key
Will the operation cause synchronization problem?
Yes, doing concurrent updates without proper synchronization may cause crashes, even if your threads access different keys: the std::map is based on trees, trees get rebalanced, so you can cause a write to a parent of a node with a seemingly unrelated key.
Moreover, it is not safe to perform read-only access concurrently with writing, or searching unlocked + locking on write: if you have threads that may update or delete nodes, you must lock out all readers before you write.
You will have concurrency problems if any of the threads inserts into the tree. STL map is implemented using a red-black tree (or at least that's what I'm familiar with — I don't know whether the Standard mandates red-black tree). Red-black trees may be rebalanced upon insert, which would lead to all sorts of races between threads.
Read-only access (absolutely no writers) would be fine, but keep in mind operator[] is not read-only; it potentially adds a new element. You'd need to use the find() method, get the iterator, and derefence it yourself.
Unless the docs (ie, the ISO C++11 standard) say it's thread-safe (and they don't), then that's it. Period. It's not thread-safe.
There may be implementations of a std::map that would allow this but it's by no means portable.
Maps are often built on red-black trees or other auto-balancing data structures so that a modification to the structure (such as inserting or deleting a key) will cause rebalancing.
You should wrap read and write operations on the map with something like a mutex semaphore, to ensure that synchronisation is done correctly.

Quickest Queue Container (C++)

I'm using queue's and priority queues, through which I plan on pumping a lot of data quite quickly.
Therefore, I want my q's and pq's to be responsive to additions and subtractions.
What are the relative merits of using a vector, list, or deque as the underlying container?
Update:
At the time of writing, both Mike Seymour and Steve Townsend's answers below are worth reading. Thanks both!
The only way to be sure how the choice effects performance is to measure it, in a situation similar to your expected use cases. That said, here are some observations:
For std::queue:
std::deque is usually the best choice; it supports all the necessary operations in constant time, and allocates memory in chunks as it grows.
std::list also supports the necessary operations, but may be slower due to more memory allocations; in special circumstances, you might be able to get good results by allocating from a dedicated object pool, but that's not entirely straightforward.
std::vector can't be used as it doesn't have a pop_front() operation; such an operation would be slow, as it would have to move all the remaining elements.
A potentially faster, but less flexible, alternative is to implement a circular buffer over a fixed-size array (e.g. std::array, or a std::vector that you don't resize). You'll need to deal with the case of it filling up, either by reporting an error, or allocating a larger buffer and copying all the data.
For std::priority_queue:
std::vector is usually the best choice; it grows exponentially (reducing the number of memory allocations), and is a simple data structure that's very fast to access - an iterator can be implemented simply as a wrapper around a pointer.
std::deque may be slower as it typically grows linearly (requiring more memory allocation), and access is more complicated than with a vector.
std::list can't be used as it doesn't provide random access.
To summarise - the defaults are usually the best choice, but if speed really is important, then measure the alternatives.
I would use std::queue for your basic queue which is (by default at least) a wrapper on deque. Do something more special-purpose if that does not work for you.
std::priority_queue also exists (over vector by default) but the added semantics make it more likely that you could have to roll your own here, depending on perf observed for your particular access pattern.
vector has storage characteristics which make it very ill-suited to removal from front of the dataset. A lot of shuffling down to be done every time you pop_front. For a simple queue, avoid this.
list is likely to be too expensive for any high-hit queue, because by contract it has to offer function you don't need. It could be a candidate for use as a priority queue but my instinct is always to trust the STL.
vector would implement a stack as your fast insertion is at the end and fast removal is also at the end. If you want a FIFO queue, vector would be the wrong implementation to use.
deque or list both provide constant time insertion at either end. list is good for LRU caches where you want to move elements out of the middle fast and where you want your iterators to remain valid no matter how much you move them about. deque is generally used when insertions and deletions are at the end.
The main thing I need to ask about your collection is whether they are accessed by multiple threads. I sort-of assume they are, in which case one of your primary aims is to reduce locking. This is best done if you at least have a multi_push and multi_get feature so that you can put more than one element on at a time without any locking.
There are also lock-free containers or semi-lock-free containers.
You will probably find that your locking strategy is more important than any performance in the collection itself as long as your operations are all constant-time.