thread-safe alternative for std::map? - c++

I have a parallized loop and write access to a std::map. I would like to different parts of the map at the same time, i.e. I want to access map[a] and map[b] for a,b different. At I found out that this is not possible, I wonder, however, if there is a good alternative or how to achieve this in a different way!

I could be wrong, but I believe that modifying existing elements to a map is safe as long as you're not touching the same elements (as this does not modify the underlying structure of the map). So if you insert map[a] and map[b] ahead of time, your separate threads should be able to modify those existing elements.
That said, it's probably cleaner and safer just to use normal synchronization techniques such as mutexes to protect access to the map.

It is quite possible to mutate map[a] and map[b] separately, as long as you do not mutate the underling map.
If you wish to mutate an associative container concurrently, check out concurrent_unordered_map from PPL or TBB.

If possible, you could try giving each worker its own copy of the map and then merging the results. This way no locking would be needed at all.

Related

std::map write/read from multiple threads

I want to be able to read and write in a std::map from multiple threads. Is there a way to do that without mutex (maybe with std::atomic)?
If not, what's the simplest way to do that in C++11?
If the values are std::atomic<>, you can modify and read them from arbitrary threads, but you'll need to have saved their addresses somehow - pointers, references or iterators - and you'll need to know no other thread will call erase on them....
Still, even atomic values won't make it safe to modify the container (e.g. inserting or erasing elements), while other threads are also doing modifications or lookups or iteration. Even the "const" functions - like size(), empty(), begin(), end(), count(), and iterator movement - are unsafe, because the mutating operations may be in the middle of rewiring the inter-node links or updating the same data.
For anything more than the above, you will need a mutex.
For a concrete example, say you've inserted a node with std::string key "client_counter" - you could start a thread that gets an iterator to that element and does atomic updates to the counter, while other threads can find the element and read from it but must not erase it. You could still have other nodes inserted in the map, with other updaters and readers, without any extra synchronisation with the client_counter-updating thread.
If you don't want to use mutex then you need to wait for concurrent containers (C++17?). If you want to use std::atomic operations inside std::map then you probably want to make or found on the Internet fully implementation of concurrent atomic std::map.
If you want to use std::map of std::atomic then you probably need to know that this will protect only elements inside std::map, but not std::map in self.

Associative container with queue-like properties on the values

I have data in the form of instances of a POD struct. Under "normal" conditions, I need to access them via a unique ID, currently via a std::map. If something goes wrong, however, I need to traverse the data in an order provided by a specific member of the POD struct.
I don't want to copy all data sets of the map to a priority queue in case of error -- this seems much to expensive.
I tried running std::make_heap on the std::map, but that doesn't even compile, because the map's iterators cannot be subtracted.
The sort keys will change on a regular basis, so keeping the data in a priority queue and just storing pointers in the map seems not to be feasible, especially as access via the map (the typical use-case) becomes more expensive by the indirection.
The other way around, i.e. storing pointers in a separate data structure that can be heapified on demand seems feasible, but the synchronization might be error-prone.
Is there anything in the std libraries, boost or tbb that would accomplish what I want?
It looks like this is a work for the Boost Multi-Index Containters Library.

Thread-safety of c++ maps

This is about thread safety of std::map. Now, simultaneous reads are thread-safe but writes are not. My question is that if I add unique element to the map everytime, will that be thread-safe?
So, for an example, If I have a map like this std:map<int, std::string> myMap
and I always add new keys and never modify the existing key-value, will that be thread-safe?
More importantly, will that give me any random run-time behavior?
Is adding new keys also considered modification? If the keys are always different while adding, shouldn't it be thread-safe as it modifies an independent part of the memory?
Thanks
Shiv
1) Of course not
2) Yes, I hope you'll encounter it during testing, not later
3) Yes, it is. The new element is added in a different location, but many pointers are modified during that.
The map is implemented by some sort of tree in most if not all implementations. Inserting a new element in a tree modifies it by rearranging nodes by means of resetting pointers to point to different nodes. So it is not thread safe
no, yes, yes. You need to obtain exclusive lock when modifying container (including insertion of new keys), though while there's no modification going on you can, of course, safely read concurrently.
edit: http://www.sgi.com/tech/stl/thread_safety.html might be of interest for you.

Least Recently Used cache using C++

I am trying to implement LRU Cache using C++ . I would like to know what is the best design for implementing them. I know LRU should provide find(), add an element and remove an element. The remove should remove the LRU element. what is the best ADTs to implement this
For ex: If I use a map with element as value and time counter as key I can search in O(logn) time, Inserting is O(n), deleting is O(logn).
One major issue with LRU caches is that there is little "const" operations, most will change the underlying representation (if only because they bump the element accessed).
This is of course very inconvenient, because it means it's not a traditional STL container, and therefore any idea of exhibiting iterators is quite complicated: when the iterator is dereferenced this is an access, which should modify the list we are iterating on... oh my.
And there are the performances consideration, both in term of speed and memory consumption.
It is unfortunate, but you'll need some way to organize your data in a queue (LRU) (with the possibility to remove elements from the middle) and this means your elements will have to be independant from one another. A std::list fits, of course, but it's more than you need. A singly-linked list is sufficient here, since you don't need to iterate the list backward (you just want a queue, after all).
However one major drawback of those is their poor locality of reference, if you need more speed you'll need to provide your own custom (pool ?) allocator for the nodes, so that they are kept as close together as possible. This will also alleviate heap fragmentation somewhat.
Next, you obviously need an index structure (for the cache bit). The most natural is to turn toward a hash map. std::tr1::unordered_map, std::unordered_map or boost::unordered_map are normally good quality implementation, some should be available to you. They also allocate extra nodes for hash collision handling, you might prefer other kinds of hash maps, check out Wikipedia's article on the subject and read about the characteristics of the various implementation technics.
Continuing, there is the (obvious) threading support. If you don't need thread support, then it's fine, if you do however, it's a bit more complicated:
As I said, there is little const operation on such a structure, thus you don't really need to differentiate Read/Write accesses
Internal locking is fine, but you might find that it doesn't play nice with your uses. The issue with internal locking is that it doesn't support the concept of "transaction" since it relinquish the lock between each call. If this is your case, transform your object into a mutex and provide a std::unique_ptr<Lock> lock() method (in debug, you can assert than the lock is taken at the entry point of each method)
There is (in locking strategies) the issue of reentrance, ie the ability to "relock" the mutex from within the same thread, check Boost.Thread for more information about the various locks and mutexes available
Finally, there is the issue of error reporting. Since it is expected that a cache may not be able to retrieve the data you put in, I would consider using an exception "poor taste". Consider either pointers (Value*) or Boost.Optional (boost::optional<Value&>). I would prefer Boost.Optional because its semantic is clear.
The best way to implement an LRU is to use the combination of a std::list and stdext::hash_map (want to use only std then std::map).
Store the data in the list so that
the least recently used in at the
last and use the map to point to the
list items.
For "get" use the map to get the
list addr and retrieve the data
and move the current node to the
first(since this was used now) and update the map.
For "insert" remove the last element
from the list and add the new data
to the front and update the map.
This is the fastest you can get, If you are using a hash_map you should almost have all the operations done in O(1). If using std::map it should take O(logn) in all cases.
A very good implementation is available here
This article describes a couple of C++ LRU cache implementations (one using STL, one using boost::bimap).
When you say priority, I think "heap" which naturally leads to increase-key and delete-min.
I would not make the cache visible to the outside world at all if I could avoid it. I'd just have a collection (of whatever) and handle the caching invisibly, adding and removing items as needed, but the external interface would be exactly that of the underlying collection.
As far as the implementation goes, a heap is probably the most obvious. It has complexities roughly similar to a map, but instead of building a tree from linked nodes, it arranges items in an array and the "links" are implicit based on array indices. This increases the storage density of your cache and improves locality in the "real" (physical) processor cache.
I suggest a heap and maybe a Fibonacci Heap
I'd go with a normal heap in C++.
With the std::make_heap (guaranteed by the standard to be O(n)), std::pop_heap, and std::push_heap in #include, implementing it would be absolutely cake. You only have to worry about increase-key.

Am I going to be OK for threading with STL given these conditions?

I have a collection of the form:
map<key, list<object> >
I only ever insert at the back of the list and sometimes I read from the entire map (but I never write to the map, except at initialization).
As I understand it, none of the STL containers are thread safe, but I can only really have a maximum of one thread per key. Am I missing anything in assuming I'll be pretty safe with this arrangement?
If the map is never modified at all during the multi-threaded scenario, then you're fine. If each thread looks at its own list, then that's thread-private data, so you're also fine.
Take care not to try and lookup keys with [] because that will insert (modify) if the key doesn't exist in the map yet.
However, I'm curious as to why you'd need this structure - why not keep a pointer/reference or the actual list object itself on the stack of each thread, given that it's private to each thread?
(If it's not, then you need proper synchronisation on the list.)
In fact you say you "read from the entire map" - presumably meaning that any random thread may try to iterate through any of the lists. So you definitely need to synchronise operations on the lists.
TBH as long as you put a critical section around any write and read it will work fine.