I am facing a concurrency problem here. I have a std::map, there is one
occasional writer and multiple frequent readers from different threads, this writer will occasionally add keys (key is a std::string)to the map, and I can not guarantee when exactly the readers perform reading and stop reading. I don't want to put locks for the readers, since reading is very frequent and checking locks frequently will hurt performance.
If the readers will always access the map by keys (not map iterators), will it be always thread-safe? If not, any idea how to design the code so that the readers will always access valid keys (or map iterators )?
Other approaches using different containers solving this problem are also welcome.
I have to disagree with the previous answer. When they talk about "concurrently accessing existing elements" (when talking about insert()), that presumes that you already have a pointer/reference/iterator to the existing element. This is basically acknowledging that the map isn't going to move the elements around in memory after the insertion. It also acknowledges that iterating the map is not safe during an insert.
Thus, as soon as you have an insert, attempting to do an at() on the same container (at the same time) is a data race. During the insert the map must change some sort of internal state (pointers to tree nodes, perhaps). If the at() catches the container during that manipulation, the pointers may not be in a consistent state.
You need some sort of external synchronization (such as a reader-writer lock) as soon as you have the possibility of both an insert() and at() (or operator[]) occurring at the same time.
Attention: fundamentally edited answer
As a reflex I would put a lock.
At first sight, it seems not required to put a lock in your case:
For insert(), it's said that "Concurrently accessing existing elements is safe, although iterating ranges in the container is not."
For at() , it's said that: "Concurrently accessing or modifying other elements is safe."
The standard library addresses thread-safe aspects:
23.2.2. Container data races
1) For purposes of avoiding data races (17.6.5.9), implementations
shall consider the following functions to be const: begin, end,
rbegin, rend, front, back, data, find, lower_bound, upper_bound,
equal_range, at and, except in associative or unordered associative
containers, operator[].
2) Notwithstanding (17.6.5.9),
implementations are required to avoid data races when the contents of
the contained object in different elements in the same sequence,
excepting vector,are modified concurrently.
There are several other SO answers which interpret this as thread-safe guarantee, as I originally did.
Nevertheless, we know that iterating ranges in the container is not safe when an insert is done. And access to an element requires before somehow iterating to find the element. So, while the standard clarifies safety for concurent access to different elements when you already have their address, the wording leaves potential container concurrency issues open.
I have tried a simulation scenario with multiple read and single write on MSVC, and it never failed. But this is not engough to make the point : implementations are allowed to avoid more data races than what is foressen in the standard (see 17.5.6.9) (or maybe I was simply many times lucky).
Finally, I have found two serious (post C++11) references stating unambiguously that a user lock is required to be safe :
GNU document on concurrency in the standard library: "The standard places requirements on the library to ensure that no data races are caused by the library itself (...) The user code must guard against concurrent function calls which access any particular library object's state when one or more of those accesses modifies the state."
GotW #95 Solution: Thread Safety and Synchronization, by Herb Sutter : "Is the code correctly synchronized (...) ? No. The code has one thread reading (via const operations) from some_obj, and a second thread writing to the same variable. If those threads can execute at the same time, that’s a race and a direct non-stop ticket to undefined behavior land."
Based on these two almost authoritative interpretations, I revise my first answer and come back to my initial reflex : you'll have to lock your concurrent accesses.
Alternatively you could use non standard-libraries with concurrent implementation of maps such as for example Microsoft's concurrent_unordered_map from the Parallel Pattern Library or Intel's concurrent_unordered_map from the Threading Building Blocks (TBB) or lock-free library as described in this SO answer
Related
In my application I basically have multiple threads that perform inserts and mostly one thread that is iterating through a map and removing items if it meets certain criteria. The reason I wanted to use a concurrent structure is that it would have provided finer grain locking in the code that removes items from the queue which looks similar to this which is not ideal for various reasons including that the thread could get pre-empted while holding the lock.
Function_reap()
{
while(timetaken != timeoutTime)
{
my_map_mutex.lock();
auto iter = my_unordered_map.begin();
while(iter != my_unordered_map.end())
{
if(status_completed == iter->second.status)
{
iter = my_unordered_map.erase(iter);
}
}
my_map_mutex.unlock();
}
}
Was going through the documentation for Intel TBB(Threading Building Blocks) and more specifically the concurrent_unordered_map documentation (https://software.intel.com/en-us/node/506171) to see if this is a good fit for my application and came across this excerpt.
Description concurrent_unordered_map and concurrent_unordered_multimap support concurrent insertion and
traversal, but not concurrent erasure. The interfaces have no visible
locking. They may hold locks internally, but never while calling
user-defined code. They have semantics similar to std::unordered_map
and std::unordered_multimap respectively, except as follows:
The erase and extract methods are prefixed with unsafe_, to indicate that they are not concurrency safe.
Why does TBB not provide safe synchronized deletion from the map? what is the technical reason for this?
What if any other options do i have here? Ideally something that definitely works on Linux and if possible portable to windows.
Well, it is difficult to design a solution that (efficiently) supports all operations. TBB has the concurrent_unordered_map which supports concurrent insert, find and iteration, but no erase - and the concurrent_hash_map which supports concurrent insert, find and erase, but no iteration.
There are several other libraries that provide concurrent hash maps like libcds, or my own one called xenium.
ATM xenium contains two concurrent hash map implementations:
harris_michael_hash_map - fully lock-free; supports concurrent insert, erase, find and iteration. However, the number of buckets has to be defined at construction time and cannot be adapted afterwards. Each bucket contains a linked list of items, which is not very cache friendly.
vyukov_hash_map - is a very fast hash map that uses fine grained locking for insert, erase and iteration; find operations are lock-free. However, if are using iterators you have to be careful to avoid deadlocks (i.e., a thread should not try to insert or erase a key while holding an iterator). However, there is an erase overload that takes an iteration, so you can safely remove the item the iterator points to.
I am currently working to make xenium fully windows compatible.
How two standart algorithms like partition and sorting performed form 2 different threads at the same time are handeled via concurrent containers (for example in boost or tbb implementations)?
Boost has lockfree queues and a stack. One doesn't sort or partition these.
On superficial inspection of the documentation, TBB has concurrent_hash_map, and queue classes for which the same goes.
Only concurrent_vector from TBB would raise this question. The docs describe it as follows:
A concurrent_vector<T> is a dynamically growable array of T
However, just the storage (re)allocation is lockfree threadsafe, not the elements itself;
A concurrent_vector never moves an element until the array is cleared, which can be an advantage over the STL std::vector even for single-threaded code
And
Operations on concurrent_vector are concurrency safe with respect to growing, not for clearing or destroying a vector. Never invoke method clear() if there are other operations in flight on the concurrent_vector.
Hence if you want to sort a concurrent_vector you might
want to mutually exclude access; if latency is crucial you might use an atomic spinlock instead of fullblown mutex, but anyways you need a synchronization
want to consider copying to a sorted range leaving the source entries unmodified; this could be done without further locking (assuming the read operations on the vector elements are thread safe), see e.g. std::partial_sort_copy
When inserting elements into an std::unorder_set is it worth calling std::unordered_set::find prior to std::unordered_set::insert? From my understanding, I should always just call insert as it returns an std::pair which contains a bool that tells whether the insertion succeeded.
Calling find before insert is essentially an anti-pattern, which is typically observed in poorly designed custom set implementations. Namely, it might be necessary in implementations that do not tell the caller whether the insertion actually occurred. std::set does provide you with this information, meaning that it is normally not necessary to perform this find-before-insert dance.
A typical implementation of insert will typically contain the full implementation of find, meaning that the find-before-insert approach performs the search twice for no meaningful reason.
However, some other shortcomings of std::set design do sometimes call for a find-before-insert sequence. For example, if your set elements contain some fields that need to be modified if (an only if) the actual insertion occurred. For example, you might have to allocate "permanent" memory for some pointer fields instead of "temporary" (local) memory these fields were pointing to before the insertion. Unfortunately, this is impossible to do after the insertion, since std::set only provides you with non-modifying access to its elements. One workaround is to do a find first, thus "predicting" whether an actual insertion will occur, and then setting up the new element accordingly (like allocating "permanent" memory for all fields) before doing the insert. This is ugly from the performance point of view, but it is acceptable in non-performance-critical code. That's just how things are with standard containers.
It's best to just attempt the insert, otherwise the effort of hashing and iterating over any elements that have collided in the hash bucket is unnecessarily repeated.
If your set it threadsafe and accessed concurrently then calling find first does very little, as insert would be atomic but a check-then-act would be susceptible to race condition.
So in general and especially in a multithreaded context, just insert.
If more than one thread access one map object, but, I can make sure any of these threads accessing will not have the same key, and the accessing is just like:
//find value by key
//if find
// erase the object or change the value
//else
// add new object of the key
Will the operation cause synchronization problem?
Yes, doing concurrent updates without proper synchronization may cause crashes, even if your threads access different keys: the std::map is based on trees, trees get rebalanced, so you can cause a write to a parent of a node with a seemingly unrelated key.
Moreover, it is not safe to perform read-only access concurrently with writing, or searching unlocked + locking on write: if you have threads that may update or delete nodes, you must lock out all readers before you write.
You will have concurrency problems if any of the threads inserts into the tree. STL map is implemented using a red-black tree (or at least that's what I'm familiar with — I don't know whether the Standard mandates red-black tree). Red-black trees may be rebalanced upon insert, which would lead to all sorts of races between threads.
Read-only access (absolutely no writers) would be fine, but keep in mind operator[] is not read-only; it potentially adds a new element. You'd need to use the find() method, get the iterator, and derefence it yourself.
Unless the docs (ie, the ISO C++11 standard) say it's thread-safe (and they don't), then that's it. Period. It's not thread-safe.
There may be implementations of a std::map that would allow this but it's by no means portable.
Maps are often built on red-black trees or other auto-balancing data structures so that a modification to the structure (such as inserting or deleting a key) will cause rebalancing.
You should wrap read and write operations on the map with something like a mutex semaphore, to ensure that synchronisation is done correctly.
I am trying to implement LRU Cache using C++ . I would like to know what is the best design for implementing them. I know LRU should provide find(), add an element and remove an element. The remove should remove the LRU element. what is the best ADTs to implement this
For ex: If I use a map with element as value and time counter as key I can search in O(logn) time, Inserting is O(n), deleting is O(logn).
One major issue with LRU caches is that there is little "const" operations, most will change the underlying representation (if only because they bump the element accessed).
This is of course very inconvenient, because it means it's not a traditional STL container, and therefore any idea of exhibiting iterators is quite complicated: when the iterator is dereferenced this is an access, which should modify the list we are iterating on... oh my.
And there are the performances consideration, both in term of speed and memory consumption.
It is unfortunate, but you'll need some way to organize your data in a queue (LRU) (with the possibility to remove elements from the middle) and this means your elements will have to be independant from one another. A std::list fits, of course, but it's more than you need. A singly-linked list is sufficient here, since you don't need to iterate the list backward (you just want a queue, after all).
However one major drawback of those is their poor locality of reference, if you need more speed you'll need to provide your own custom (pool ?) allocator for the nodes, so that they are kept as close together as possible. This will also alleviate heap fragmentation somewhat.
Next, you obviously need an index structure (for the cache bit). The most natural is to turn toward a hash map. std::tr1::unordered_map, std::unordered_map or boost::unordered_map are normally good quality implementation, some should be available to you. They also allocate extra nodes for hash collision handling, you might prefer other kinds of hash maps, check out Wikipedia's article on the subject and read about the characteristics of the various implementation technics.
Continuing, there is the (obvious) threading support. If you don't need thread support, then it's fine, if you do however, it's a bit more complicated:
As I said, there is little const operation on such a structure, thus you don't really need to differentiate Read/Write accesses
Internal locking is fine, but you might find that it doesn't play nice with your uses. The issue with internal locking is that it doesn't support the concept of "transaction" since it relinquish the lock between each call. If this is your case, transform your object into a mutex and provide a std::unique_ptr<Lock> lock() method (in debug, you can assert than the lock is taken at the entry point of each method)
There is (in locking strategies) the issue of reentrance, ie the ability to "relock" the mutex from within the same thread, check Boost.Thread for more information about the various locks and mutexes available
Finally, there is the issue of error reporting. Since it is expected that a cache may not be able to retrieve the data you put in, I would consider using an exception "poor taste". Consider either pointers (Value*) or Boost.Optional (boost::optional<Value&>). I would prefer Boost.Optional because its semantic is clear.
The best way to implement an LRU is to use the combination of a std::list and stdext::hash_map (want to use only std then std::map).
Store the data in the list so that
the least recently used in at the
last and use the map to point to the
list items.
For "get" use the map to get the
list addr and retrieve the data
and move the current node to the
first(since this was used now) and update the map.
For "insert" remove the last element
from the list and add the new data
to the front and update the map.
This is the fastest you can get, If you are using a hash_map you should almost have all the operations done in O(1). If using std::map it should take O(logn) in all cases.
A very good implementation is available here
This article describes a couple of C++ LRU cache implementations (one using STL, one using boost::bimap).
When you say priority, I think "heap" which naturally leads to increase-key and delete-min.
I would not make the cache visible to the outside world at all if I could avoid it. I'd just have a collection (of whatever) and handle the caching invisibly, adding and removing items as needed, but the external interface would be exactly that of the underlying collection.
As far as the implementation goes, a heap is probably the most obvious. It has complexities roughly similar to a map, but instead of building a tree from linked nodes, it arranges items in an array and the "links" are implicit based on array indices. This increases the storage density of your cache and improves locality in the "real" (physical) processor cache.
I suggest a heap and maybe a Fibonacci Heap
I'd go with a normal heap in C++.
With the std::make_heap (guaranteed by the standard to be O(n)), std::pop_heap, and std::push_heap in #include, implementing it would be absolutely cake. You only have to worry about increase-key.