I am using std::map to implement my local hash table, which will be accessed by multiple threads at the same time.
I did some research and found that std::map is not thread safe.
So I will use a mutex for insert and delete operations on the map.
I plan to have separate mutex(es), one for each map entry so that they can be modified independently.
Do I need to put find operation also under critical section?
Will find operation be affected by insert/delete operations?
Is there any better implementation than using std::map that can take care of everything?
Binary trees are not particularly suited to Multi-Threading because the rebalancing can degenerate in a tree-wide modification. Furthermore, a global mutex will very negatively access the performance.
I would strongly suggest using an already written thread-safe containers. For example, Intel TBB contains a concurrent_hash_map.
If you wish to learn however, here are some hints on building a concurrent sorted associative container (I believe a full introduction to be not only out of my reach but also out of place, here).
Reader/Writer
Rather than a regular Mutex, you may want to use a Reader/Writer Mutex. This means parallelizing Reads, while Writes remain strictly sequential.
Own Tree
You can also build your own red-black or AVL tree. By augmenting the tree structure with a Reader/Writer Mutex per node. This allows you to only block part of the tree, rather than the whole structure, even when rebalancing. eg inserts with keys far enough apart can be parallel.
Skip Lists
Linked lists are much more amenable to concurrent manipulations, because you can easily isolate the modified zone.
A Skip List builds on this strength, but augments the structure to provide O(log N) access by key.
The typical way to walk a list is using the hand over hand idiom, that is, you grab the mutex of the next node before releasing the one of the current node. Skip Lists add a 2nd dimension as you can dive between two nodes, thus releasing both of them (and letting other walkers go ahead of you).
Implementations are much simpler than for binary search trees.
Persistent
Another interesting piece is the idea of persistent (or semi-persistent) data-structures, often found in functional programming. Binary Search Tree are particularly amenable for it.
The basic idea is to never change a node (or its content) once it exists. You do so by sharing a mutable head, that will point to the later version.
To Read: you copy the current head, then use it without worry (the information is immutable)
To Write: each node that you would modify in a regular tree is instead copied and the copy modified, therefore you rebuild part of the tree (up to the root) each time, and update the head to point to the new root. There are efficient ways to rebalance on descending the tree. Writes are sequential
The main advantage is that a version of the map is always available. That is, you can always read even when another thread is performing an insert or delete. Furthermore, because read access only require a single concurrent read (when copying the root pointer), they are near lock-free, and thus have excellent performance.
Reference counting (intrinsic) is your friend for those nodes.
Note: copies of the tree are very cheap :)
I do not know any implementation in C++ of either a concurrent Skip List or a concurrent Semi-Persistent Binary Search Tree.
You will in deed need to put find in a critical section, but you might want to have two different locks, one for writing and one for reading. The write lock is exclusive but if no thread holds the write lock several threads may read concurrently with no problems.
Such an implementation would work with most STL implementations but it would not be standards compliant, however. std::map is usually implemented using a red-black tree which doesn't change when elements are read. If the map was implemented using a splay tree instead, the tree would change during lookup and only one thread could read at a time.
For most purposes I would recommend using two locks.
Yes, if the insert or delete results in a rebalance I believe that find could be affected too.
Yes - You would need to put insert, delete and find in a critical section. There are techniques to enable multiple finds at the same time.
From what I can see, a similar question has been answered here, and the answer includes the explanation for this question also, as well as a link explaining the thread safety in more details.
Thread safety of std::map for read-only operations
Related
I have an assignment that requires setting up a data structure with concurrent reads/writes (an order book for a matching engine in a trading exchange), and I have settled on concurrent linked/skip lists. I've looked at several of the following articles/reports, where some are lock-free, and many are fine-grained locked (listed below in no particular order):
Practical concurrent unrolled lists using lazy synchronisation
Practical lock-freedom page 53
A Contention-Friendly, Non-Blocking Skip List
A Provably Correct Concurrent Skip List
A Simple Optimistic Skiplist Algorithm
All of these have fairly detailed algorithm pseudocode listings, but there are two issues I note with all of them:
They are all maps—they associate some key to some value, whereas I need the Node class in all these algorithms to simply contain some struct T (more particularly, I need the number of units in that order, the unit selling/buying price, the order ID, and an insertion timestamp). MSDN has a very nice C# implementation of a skip list containing some T (albeit not concurrent, which is a strict requirement), and is straightforward to adapt to C++ (ergo the tag).
They don't have update and get operations—what they do have are find (which returns a boolean value, and not the node value itself), insert, and delete operations. I am wondering if I can somehow compose and modify the latter three to create the former two, but I am a little lost.
How might I implement get/update so that concurrency and thread ordering is maintained?
I am leaning towards fine-grained locking (which is easier to reason about, even if slower) than lock-free algorithms (which I don't fully understand).
I have big C++/STL data structures (myStructType) with imbricated lists and maps. I have many objects of this type I want to LRU-cache with a key. I can reload objects from disk when needed. Moreover, it has to be shared in a multiprocessing high performance application running on a BSD plateform.
I can see several solutions:
I can consider a life-time sorted list of pair<size_t lifeTime, myStructType v> plus a map to o(1) access the index of the desired object in the list from its key, I can use shm and mmap to store everything, and a lock to manage access (cf here).
I can use a redis server configured for LRU, and redesign my data structures to redis key/value and key/lists pairs.
I can use a redis server configured for LRU, and serialise my data structures (myStructType) to have a simple key/value to manage with redis.
There may be other solutions of course. How would you do that, or better, how have you successfully done that, keeping in mind high performance ?
In addition, I would like to avoid heavy dependencies like Boost.
I actually built caches (not only LRU) recently.
Options 2 and 3 are quite likely not faster than re-reading from disk. That's effectively no cache at all. Also, this would be a far heavier dependency than Boost.
Option 1 can be challenging. For instance, you suggest "a lock". That would be quite a contended lock, as it must protect each and every lifetime update, plus all LRU operations. Since your objects are already heavy, it may be worthwhile to have a unique lock per object. There are intermediate variants of this solution, where there is more than one lock, but also more than one object per lock. (You still need a key to protect the whole map, but that's for replacement only)
You can also consider if you really need strict LRU. That strategy assumes that the chances of an object being reused decreases over time. If that's not actually true, random replacement is just as good. You can also consider evicting more than one element at a time. One of the challenges is that when an element needs removing, it would be so from all threads, but it's sufficient if one thread removes it. That's why a batch removal helps: if a thread tries to take a lock for batch removal and it fails, it can continue under the assumption that the cache will have free space soon.
One quick win is to not update the LRU time of the last used element. It was already the newest, making it any newer won't help. This of course only has an effect if you often use that element quickly again, but (as noted above) otherwise you'd just use random eviction.
I am using unordered_map from Boost. Are there any synchronized version of unordered_map? This is because I have quite a large number of unordered_map and manually synchronizing it using lock would be very messy.
Thanks.
It's impossible to usefully encapsulate containers offering STL-like interfaces (which unordered_map also does) with automatic locking because there are race conditions associated with retrieving iterators and positions inside the string then trying to use them in later operations. If you can find some less flexible interface that suits your needs, perhaps putting any complex operations into single locked function calls, then you can easily wrap a thread-safe class around the container to simplify your usage.
Are you sure that is what you need ?
while (!stack.empty())
{
Element const e = stack.top();
stack.pop();
}
In a single thread, this code looks right. If you wish to go multi-thread however, simply having a synchronized stack just doesn't cut it.
What happens if anyone else pops the last element AFTER you tested for emptiness ?
There is more than container synchronization to go multi-thread. That said, you could try TBB out.
Use Folly's AtomicHashmap.
From Folly's documentation on Github
folly/AtomicHashmap.h introduces a synchronized UnorderedAssociativeContainer implementation designed for extreme performance in heavily multithreaded environments (about 2-5x faster than tbb::concurrent_hash_map) and good memory usage properties. Find and iteration are wait-free, insert has key-level lock granularity, there is minimal memory overhead, and permanent 32-bit ids can be used to reference each element.
It comes with some limitations though.
Intel's Thread Building Blocks library has a class tbb::concurrent_hash_map that is an unordered map, allowing concurrent access. Internally it is implemented using a fine-grained locking scheme, but the basic outcome is that you can access it without race conditions.
This is just for a kind of concurrency refresher...
Imagine I have a B+ tree data structure in memory - multiple items per node, only leaf nodes contain items, leaf nodes also form a linked list for easy sequential access. Inserts and deletes mostly only affect a leaf node, but can cause nodes to split or merge in a process that may propagate to the root.
I have a single-thread implementation, and the updates follow a kind of pre-planning approach. A recursion steps up the tree from leaf level as far as nodes need to change, building a linked list (linking local variable in different recursions) that describes the changes needed. When it knows what's needed, it can check whether it can allocate all needed nodes, and apply all needed changes (or not) by referencing this plan before falling out of the recursion.
This implementation also "maintains" iterators on updates, so iterators aren't invalidated by inserts/deletes unless the specific item they point to is deleted. Inserts/deletes within the same node cause the iterators pointing into that node to be updated.
Trouble is, I need to make it multithreaded - supporting potentially many readers and writers at once.
I want multiple readers to be able to read and write at the same time, so long as there is no risk of corruption as a result. So for reading, I don't want mutually exclusive access at all, even to a single node. For writing, I want to lock the minimum number of nodes needed for the change. And I want to avoid deadlock, of course.
Thankfully, it isn't something I actually need to do - but since I've neglected my concurrency skills, this just seems like a good thought experiment.
This is obviously similar to the kinds of problems that databases and filesystems have to handle, so I'm guessing I might get some references to that kind of thing, which would be great.
So - how would I handle the thread synchronisation for this? I can vaguely see a role for mutexes and/or semaphores on nodes, but what strategies would I use to work with them?
Definitely challenging task! I see that you c++ programmer, however I believe that in c++ there are similar concepts as in java and I'll try to help from java standpoint.
So for reading, I don't want mutually exclusive access at all, even to a single node
You could use ReadWriteLock. It be held simultaneously by multiple reader threads, so long as there are no writers. The write lock is exclusive. You just have to use exclusive access when doing writing. Do you have analogue in c++?
And I want to avoid deadlock, of course.
Just lock multiple nodes in order of levels (eg from top to bottom). That will guarantee you protection from deadlocks(that would be smth similar to Lamport's Bakery Algorithm).
As for databases - they resolve deadlocks by killing one process :-).
One more strategy is to implement unblocking tree structure in the similar manner how Cliff Click implemented unblocking hash map(state machine with all cases covered):
video
Cheers
I'm looking for a associative container of some sort that provides safe concurrent read & write access provided you're never simultaneously reading and writing the same element.
Basically I have this setup:
Thread 1: Create A, Write A to container, Send A over the network.
Thread 2: Receive response to A, Read A from container, do some processing.
I can guarantee that we only ever write A once, though we may receive multiple responses for A which will be processed serially. This also guarantees that we never read and write A at the same time, since we can only receive a response to A after sending it.
So basically I'm looking for a container where writing to an element doesn't mess with any other elements. For example, std::map (or any other tree-based implementation) does not satisfy this condition because its underlying implementation is a red-black tree, so any given write may rebalance the tree and blow up any concurrent read operations.
I think that std::hash_map or boost::unordered_set may work for this, just based on my assumption that a normal hash table implementation would satisfy my criteria, but I'm not positive and I can't find any documentation that would tell me. Has anybody else tried using these similarly?
The STL won't provide any solid guarantees about threads, since the C++ standard doesn't mention threads at all. I don't know about boost, but I'd be surprised if its containers made any concurrency guarantees.
What about concurrent_hash_map from TBB? I found this in this related SO question.
Common hash table implementation have rehashing when the number of stored element increase, so that's probably not an option excepted if you know that this doesn't happen.
I'd look at structures used for functional languages (for instance look at http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf) but note that those I'm currently thinking about depend on garbage collection.