C++ concurrent associative containers?

C++ concurrent associative containers? - c++

I'm looking for a associative container of some sort that provides safe concurrent read & write access provided you're never simultaneously reading and writing the same element.
Basically I have this setup:
Thread 1: Create A, Write A to container, Send A over the network.
Thread 2: Receive response to A, Read A from container, do some processing.
I can guarantee that we only ever write A once, though we may receive multiple responses for A which will be processed serially. This also guarantees that we never read and write A at the same time, since we can only receive a response to A after sending it.
So basically I'm looking for a container where writing to an element doesn't mess with any other elements. For example, std::map (or any other tree-based implementation) does not satisfy this condition because its underlying implementation is a red-black tree, so any given write may rebalance the tree and blow up any concurrent read operations.
I think that std::hash_map or boost::unordered_set may work for this, just based on my assumption that a normal hash table implementation would satisfy my criteria, but I'm not positive and I can't find any documentation that would tell me. Has anybody else tried using these similarly?

The STL won't provide any solid guarantees about threads, since the C++ standard doesn't mention threads at all. I don't know about boost, but I'd be surprised if its containers made any concurrency guarantees.
What about concurrent_hash_map from TBB? I found this in this related SO question.

Common hash table implementation have rehashing when the number of stored element increase, so that's probably not an option excepted if you know that this doesn't happen.
I'd look at structures used for functional languages (for instance look at http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf) but note that those I'm currently thinking about depend on garbage collection.

Related

C++ HashMap with multi-threading support [duplicate]

This question already has answers here:
ConcurrentHashMap for c++
(4 answers)
Closed 9 years ago.
I have a need to use a HashMap/ HashTable implementation in C++ and i have the following requirements
1- When new data is being inserted in the hashmap the complete hashmap is not locked, and other threads are allowed to read and also update other keys/ values in the hashmap
2- Multiple keys/ values should be updateable at the same time. i.e. one thread updating key x while the other thread updating key y.
Does such an implementation exist in C++ stl or any other libraries out there? or do i need to write something of my own?

I believe that both the Microsoft PPL implementation of concurrent_unordered_map and the Intel TBB implementation currently have a lock per hash bin. TBB also has a concurrent_hash_map with slightly different semantics. None of them guarantees any amount of concurrency in their spec. The only thing the specs guarantee is lack of data races.
If your algorithm is going to be this performance sensitive to the performance of concurrent hash table writes, you are probably in trouble, though. The cost of acquiring and releasing the locks on every access is similar to the cost of doing the hash-table insert. So you are going to lose half your performance to locking overhead, and need that much more parallelism to recover it. Are you sure you can't have a hash-table per thread, and then merge all the hash tables when you are done? (Some algorithms will let you get away with this, others not.)
Edit: I just noticed you are asking to be able to update keys concurrently. This is simply not possible under any concurrent hash table implementation that I'm aware of. The reason is that this is a read-modify-update operation, which, as #JoopEggen pointed out, just don't work with concurrent container types. In fact, this is a read-modify-update-modify-update operation. In order to modify keys you would need to make the whole sequence of operations atomic. Concurrent container types are monitors: each individual method call can be atomic, but sequences of them are not.

Is it possible to implement lock free map in C++

We are developing a network application based C/S, we find there are too many locks adding to std::map that the performance of server became poor.
I wonder if it is possible to implement a lock-free map, if yes, how? Is there any open source code there?
EDIT:
Actually we use the std::map to store sockets information, we did encapsulation based on the socket file description to include some other necessary information such as ip address, port, socket type, tcp or udp, etc.
To summary, we have a global map say it's
map<int fileDescriptor, socketInfor*> SocketsMap,
then every thread which is used to send data needs to access SocketsMap, and they have to add mutex before reading from SocketsMap or writing to SocketsMap, thus the concurrency level of the whole application would be greatly decreased because of so many locks addding to SocketsMap.
To avoid the concurrency level problem, we have two solutions: 1. store each socketInfor* separately 2. use some kind of lock-free map.
I would like to find some kind of lock-free map, because codes changes required by this solution are much less than that of solution 1.

Actually there's a way, although I haven't implemented it myself there's a paper on a lock free map using hazard pointers from eminent C++ expert Andrei Alexandrescu.

Yes, I have implemented a Lock-Free Unordered Map (docs) in C++ using the "Split-Ordered Lists" concept. It's an auto-expanding container and supports millions of elements on a 64-bit CAS without ABA issues. Performance-wise, it's a beast (see page 5). It's been extensively tested with millions of random ops.

HashMap would suit? Have a look at Intel Threading Building Blocks, they have an interesting concurrent map. I'm not sure it's lock-free, but hopefully you're interested in good multithreading performance, not particularly in lock-freeness. Also you can check CityHash lib
EDIT:
Actually TBB's hash map is not lock-free

I'm surprised nobody has mentioned it, but Click Cliff has implemented a wait-free hashmap in Java, which I believe could be ported to C++,

If you use C++11, you can have a look at AtomicHashMap of facebook/folly

You can implement the map using optimistic design or transactional memory.
This approach is especially effective if the chance of two operations concurrently addressing the map and one is changing the structure of it is relatively small - and you do not want the overhead of locking every time.
However, from time to time - collision will occur, and you will have to result it somehow (usually by rolling back to the last stable state and retrying the operations).
If your hardware support good enough atomic operations - this can be easily done with Compare And Swap (CAS) - where you change the reference alone (and whenever you change the map, you work on a copy of the map, and not the original, and set it as the primary only when you commit).

Design Problem: Thread safety of std::map

I am using std::map to implement my local hash table, which will be accessed by multiple threads at the same time.
I did some research and found that std::map is not thread safe.
So I will use a mutex for insert and delete operations on the map.
I plan to have separate mutex(es), one for each map entry so that they can be modified independently.
Do I need to put find operation also under critical section?
Will find operation be affected by insert/delete operations?
Is there any better implementation than using std::map that can take care of everything?

Binary trees are not particularly suited to Multi-Threading because the rebalancing can degenerate in a tree-wide modification. Furthermore, a global mutex will very negatively access the performance.
I would strongly suggest using an already written thread-safe containers. For example, Intel TBB contains a concurrent_hash_map.
If you wish to learn however, here are some hints on building a concurrent sorted associative container (I believe a full introduction to be not only out of my reach but also out of place, here).
Reader/Writer
Rather than a regular Mutex, you may want to use a Reader/Writer Mutex. This means parallelizing Reads, while Writes remain strictly sequential.
Own Tree
You can also build your own red-black or AVL tree. By augmenting the tree structure with a Reader/Writer Mutex per node. This allows you to only block part of the tree, rather than the whole structure, even when rebalancing. eg inserts with keys far enough apart can be parallel.
Skip Lists
Linked lists are much more amenable to concurrent manipulations, because you can easily isolate the modified zone.
A Skip List builds on this strength, but augments the structure to provide O(log N) access by key.
The typical way to walk a list is using the hand over hand idiom, that is, you grab the mutex of the next node before releasing the one of the current node. Skip Lists add a 2nd dimension as you can dive between two nodes, thus releasing both of them (and letting other walkers go ahead of you).
Implementations are much simpler than for binary search trees.
Persistent
Another interesting piece is the idea of persistent (or semi-persistent) data-structures, often found in functional programming. Binary Search Tree are particularly amenable for it.
The basic idea is to never change a node (or its content) once it exists. You do so by sharing a mutable head, that will point to the later version.
To Read: you copy the current head, then use it without worry (the information is immutable)
To Write: each node that you would modify in a regular tree is instead copied and the copy modified, therefore you rebuild part of the tree (up to the root) each time, and update the head to point to the new root. There are efficient ways to rebalance on descending the tree. Writes are sequential
The main advantage is that a version of the map is always available. That is, you can always read even when another thread is performing an insert or delete. Furthermore, because read access only require a single concurrent read (when copying the root pointer), they are near lock-free, and thus have excellent performance.
Reference counting (intrinsic) is your friend for those nodes.
Note: copies of the tree are very cheap :)
I do not know any implementation in C++ of either a concurrent Skip List or a concurrent Semi-Persistent Binary Search Tree.

You will in deed need to put find in a critical section, but you might want to have two different locks, one for writing and one for reading. The write lock is exclusive but if no thread holds the write lock several threads may read concurrently with no problems.
Such an implementation would work with most STL implementations but it would not be standards compliant, however. std::map is usually implemented using a red-black tree which doesn't change when elements are read. If the map was implemented using a splay tree instead, the tree would change during lookup and only one thread could read at a time.
For most purposes I would recommend using two locks.

Yes, if the insert or delete results in a rebalance I believe that find could be affected too.

Yes - You would need to put insert, delete and find in a critical section. There are techniques to enable multiple finds at the same time.

From what I can see, a similar question has been answered here, and the answer includes the explanation for this question also, as well as a link explaining the thread safety in more details.
Thread safety of std::map for read-only operations

Sequential container adaptors in C++

What is the purpose of the sequential container adaptors (i.e; stack, queue) in C++?
Thanks.

They provide a narrower interface that enforces additional invariants, and are therefore safer to use when you want those invariants to be kept.

they keep you from doing things that you decided to be unlegal (eg. if the order of processing elements is important you can use a stack for the proper order)
they point out the proper usage of a container to the user of your code (eg. prevent the user from accessing data he shouldn't be)
they allow to implement the same structure using different underlying types (eg. depending on your exact problem you may be better off implementing a stack on top of a deque or on top of a vector)

The best answer to your question would be to read the following book:
Effective STL
However if you want a quick and dirty answer: Not only do stacks and queues model real world objects such as program stacks and process queues, they also are optimal for random insert and delete operations.

Synchronized unordered_map in C++

I am using unordered_map from Boost. Are there any synchronized version of unordered_map? This is because I have quite a large number of unordered_map and manually synchronizing it using lock would be very messy.
Thanks.

It's impossible to usefully encapsulate containers offering STL-like interfaces (which unordered_map also does) with automatic locking because there are race conditions associated with retrieving iterators and positions inside the string then trying to use them in later operations. If you can find some less flexible interface that suits your needs, perhaps putting any complex operations into single locked function calls, then you can easily wrap a thread-safe class around the container to simplify your usage.

Are you sure that is what you need ?
while (!stack.empty())
{
Element const e = stack.top();
stack.pop();
}
In a single thread, this code looks right. If you wish to go multi-thread however, simply having a synchronized stack just doesn't cut it.
What happens if anyone else pops the last element AFTER you tested for emptiness ?
There is more than container synchronization to go multi-thread. That said, you could try TBB out.

Use Folly's AtomicHashmap.
From Folly's documentation on Github
folly/AtomicHashmap.h introduces a synchronized UnorderedAssociativeContainer implementation designed for extreme performance in heavily multithreaded environments (about 2-5x faster than tbb::concurrent_hash_map) and good memory usage properties. Find and iteration are wait-free, insert has key-level lock granularity, there is minimal memory overhead, and permanent 32-bit ids can be used to reference each element.
It comes with some limitations though.

Intel's Thread Building Blocks library has a class tbb::concurrent_hash_map that is an unordered map, allowing concurrent access. Internally it is implemented using a fine-grained locking scheme, but the basic outcome is that you can access it without race conditions.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js