C++ boost - Is there a container working like a queue with direct key access? - c++

I was wonndering about a queue-like container but which has key-access, like a map.
My goal is simple : I want a FIFO queue, but, if I insert an element and an element with a given key is already in the queue, I want it the new element to replaced the one already in the queue. For example, a map ordered by insertion time would work .
If there is no container like that, do you think it can be implemented by using both a queue and a map ?

Boost multi-index provides this kind of container.
To implement it myself, I'd probably go for a map whose values consist of a linked list node plus a payload. The list node could be hand-rolled, or could be Boost intrusive.
Note that the main point of the queue adaptor is to hide most of the interface of Sequence, but you want to mess with the details it hides. So I think you should aim to reproduce the interface of queue (slightly modified with your altered semantics for push) rather than actually use it.

Obviously what you want can be done simply with the queue-like container, but you would have to spend O(n) time on every insertion to determine if the element is already present. If you implement your queue based on something like std::vector you could use the binary search and basically speed up your insertion to O(log n) (that would still require O(n) operations when the memory reallocation is done).
If this is fine, just stick to it. The variant with additional container might give you a performance boost, but it's also likely to be error-prone to write and if the first solution is sufficient, just use it.
In the second scenario you might want to store your elements twice in different containers - the original queue and something like a map (or sometimes a hashmap may perform better). The map is used only to determine if the element is already present in the container or not - and if YES, you will have to update it in your queue.
Basically that gives us O(1) complexity for hashmap lookups (in real world this might get uglier because of the collisions - hashmaps aren't really good for determining element existence) and O(1) insertion time for the case when no update is required and O(n) insertion time for the case update is needed.
Based on the percentage of the actual update operations, the actual insertion performance may vary from O(1) to O(n), but this scheme will definitely outperform the first one if the number of updates is small enough.
Still, you have to insert your elements in two containers simultaneosly and the same should be done if the element is deleted and I would think twice "do I really need that performance boost?".

I see easy way of doing this with a queue and optionally a map.
Define some sort of == operator for your elements.
Then simply have a queue and search for your element every time you want to insert it.
You could optimize this by having a map of element locations to elements instead of searching the queue every time.

Related

iterate ordered versus unordered containers

I want to know which data-structures are more efficient for iterating through their elements between std::set, std::map and std::unordered_set, std::unordered_map.
I searched through SO and I found this question. The answers either propose to copy the elements in a std::vector or to use Boost.Container, which IMHO don't answer my question.
My purpose is to keep in a container a big number of unique elements, that most of the time I want to iterate through them. Insertions and extractions are more rare. I want to avoid std::vector in combination with std::unique.
Lets consider set vs unordered_set.
The main difference here is the 'nature' of the iteration, that is the traversal of the set will give you the elements in order while traversing a range in an unordered set will give you a bunch of values in no particular order.
Suppose you want to traverse a range [it1, it2]. If we exclude the lookup time that's needed to find elements it1 and it2 there can be no direct mapping from one case to another since the elements in between are not guarrandeed to be the same even if you've used the same elements to construct the container.
There are cases however where something like this has meaning when e.g. you want to traverse a fixed number of elements (regardless of what they are) or when you need to traverse the whole container. In such cases you need to consider implementation mechanics :
Sets are usually implemented like Red–black trees (a form of binary search trees). Like all binary search trees allow efficient in-order traversal (LRR: left root right) of their elements. That is to traverse you pay the cost of pointer chasing (just like traversing a list).
Unordered sets on the other hand are hash tables and to my knowledge the STL implementation uses hashing with chaining. That means (in a very very high level) that what's used for the structure is a (contiguous) buffer where each element is the head of a chain (list) that contains the elements. The way the elements are layed out across those chains (buckets) and across the buffer will affect the traversal time, however you'll be chasing pointers once again jumping through differents lists as well this time. I don't think it'll vary significantly from the tree case but won't be any better for sure.
In any case micro tuning and benchmarking will give you the answer for your particular application.
The difference does not lie between the ordering or lack of one but in the backing container. If it's a contiguous memory it should be fast to iterate over, due to simple implementation of iterator and cache friendliness.
Unordered containers are usually stored as a vector of vectors (or a similar thing), while ordered containers are implemented using trees, but it is left for implementation after all. This would suggest that iterating over unordered version should be waster. However this is left for implementation after all, and I saw implementations (which bent rules a little to be fair) with different behaviour.
Generally speaking, container performance is quite a complex topic and usually has to be tested in actual application to get reliable answer. There is plenty on implemention-defined stuff that might affect the performance. I'd go with hash_set if I had to go in blind. Copying into a vector might also turn out a good option.
EDIT: As #TonyD said in it's comment, there is a rule, that disallows invalidating iterators during addition of element when the max_load_factor() is not exceeded, this practically rules out backing containers which are contiguous in memory.
Thus, copying everything into a vector seems like even more reasonable option. If you need to remove duplicates, a feasible option might be to use http://en.cppreference.com/w/cpp/algorithm/sort and have dupes easily ignored. I have heard that using vector and sort to have a sorted array (or vector) is quite often a used option in case of need for a container that needs to be sorter and is being iterated over more often than modified.
iterate from fastest to slowest should be : set > map > unordered_set > unordered_map;
set is a little lighter than map, and they are ordered with binary tree rule, so should be faster than unordered_ containers.

Best data structure/ container in C++ for insertion and deletion

I am looking for the best data structure for C++ in which insertion and deletion can take place very efficiently and fast.
Traversal should also be very easy for this data structure. Which one should i go with?
What about SET in C++??
A linked list provides efficient insertion and deletion of arbitrary elements. Deletion here is deletion by iterator, not by value. Traversal is quite fast.
A dequeue provides efficient insertion and deletion only at the ends, but those are faster than for a linked list, and traversal is faster as well.
A set only makes sense if you want to find elements by their value, e.g. to remove them. Otherwise the overhead of checking for duplicate as well as that of keeping things sorted will be wasted.
It depends on what you want to put into this data structure. If the items are unordered or you care about their order, list<> could be used. If you want them in a sorted order, set<> or multiset<> (the later allows multiple identical elements) could be an alternative.
list<> is typically a double-linked list, so insertion and deletion can be done in constant time, provided you know the position. traversal over all elements is also fast, but accessing a specified element (either by value or by position) could become slow.
set<> and its family are typically binary trees, so insertion, deletion and searching for elements are mostly in logarithmic time (when you know where to insert/delete, it's constant time). Traversal over all elements is also fast.
(Note: boost and C++11 both have data structures based on hash-tables, which could also be an option)
I would say a linked list depending on whether or not you're deletions are specific and often. Iterator about it.
It occurs to me, that you need a tree.
I'm not sure about the exact structure (since you didnt provide in-detail info), but if you can put your data into a binary tree, you can achieve decent speed at searching, deleting and inserting elements ( O(logn) average and O(n) worst case).
Note that I'm talking about the data structure here, you can implement it in different ways.

Queue-like data structure with random access element removal

Is there a data structure like a queue which also supports removal of elements at arbitrary points? Enqueueing and dequeueing occur most frequently, but mid-queue element removal must be similar in speed terms since there may be periods where that is the most common operation. Consistency of performance is more important than absolute speed. Time is more important than memory. Queue length is small, under 1,000 elements at absolute peak load.In case it's not obvious I'll state it explicitly: random insertion is not required.
Have tagged C++ since that is my implementation language, but I'm not using (and don't want to use) any STL or Boost. Pure C or C++ only (I will convert C solutions to a C++ class.)
Edit: I think what I want is a kind of dictionary that also has a queue interface (or a queue that also has a dictionary interface) so that I can do things like this:
Container.enqueue(myObjPtr1);
MyObj *myObjPtr2 = Container.dequeue();
Container.remove(myObjPtr3);
I think that double-link list is exactly what you want (assuming you do not want a priority queue):
Easy and fast adding elements to both ends
Easy and fast removal of elements from anywhere
You can use std::list container, but (in your case) it is difficult to remove an element
from the middle of the list if you only have a pointer (or reference) to the element (wrapped in STL's list element), but
you do not have an iterator. If using iterators (e.g. storing them) is not an option - then implementing a double linked list (even with element counter) should be pretty easy. If you implement your own list - you can directly operate on pointers to elements (each of them contains pointers to both of its neighbours). If you do not want to use Boost or STL this is probably the best option (and the simplest), and you have control of everything (you can even write your own block allocator for list elements to speed up things).
One option is to use an order statistic tree, an augmented tree structure that supports O(log n) random access to each element, along with O(log n) insertion and deletion at arbitrary points. Internally, the order statistic tree is implemented as a balanced binary search treewith extra information associated with it. As a result, lookups are a slower than in a standard dynamic array, but the insertions are much faster.
Hope this helps!
You can use a combination of a linked list and a hash table. In java it is called a LinkedHashSet.
The idea is simple, have a linked list of elements, and also maintain a hash map of (key,nodes), where node is a pointer to the relevant node in the linked list, and key is the key representing this node.
Note that the basic implementation is a set, and some extra work will be needed to make this data structure allow dupes.
This data structure allows you both O(1) head/tail access, and both O(1) access to any element in the list. [all on average armotorized]

std::list or std::multimap

Hey, I right now have a list of a struct that I made, I sort this list everytime I add a new object, using the std::list sort method.
I want to know what would be faster, using a std::multimap for this or std::list,
since I'm iterating the whole list every frame (I am making a game).
I would like to hear your opinion, for what should I use for this incident.
std::multimap will probably be faster, as it is O(log n) per insertion, whereas an insert and sort of the list is O(n log n).
Depending on your usage pattern, you might be better off with sorted vectors. If you insert a whole bunch of items at once and then do a bunch of reads -- i.e. reads and writes aren't interleaved -- then you'll have better performance with vector, std::sort, and std::binary_search.
You might consider using the lower_bound algorithm to find where to insert into your list. http://stdcxx.apache.org/doc/stdlibref/lower-bound.html
Edit: In light of Neil's comment, note that this will work with any sequence container (vector, deque, etc.)
If you do not need Key/Value pairs std::set or std::multiset is probably better than using std::multimap.
Reference for std::set:
http://www.cplusplus.com/reference/stl/set/
Reference for std::multiset:
http://www.cplusplus.com/reference/stl/multiset/
Edit: (seems like it was unclear before)
It is in general better to use a container like std::(multi)set or std:(multi)map than using std::list and sorting it afterwards everytime an element is inserted because std::list does not perform very good in inserting elements in the middle of the container.
Generally speaking, iterating over a container is likely to take about as much time as iterating over another, so if you keep adding to a container and then iterating over it, it's mainly a question of picking a container that avoids constantly having to reallocate memory and inserts the way you want quickly.
Both list and multimap will avoid having to reallocate themselves simply from adding an element (like you could get with a vector), so it's primarily a question of how long it takes to insert. Adding to the end of a list will be O(1) while adding to a multimap will be O(log n). However, the multimap will insert the elements in sorted order, while if you want to have the list be sorted, you're going to have to either sort the list in O(n log n) or insert the element in a sorted manner with something like lower_bound which would be O(n). In either case, it will be far worse (in the worst case at least) to use the list.
Generally, if you're maintaining a container in sorted order and continually adding to it rather than creating it and sorting it once, sets and maps are more efficient since they're designed to be sorted. Of course, as always, if you really care about performance, profiling your specific application and seeing which works better is what you need to do. However, in this case, I'd say that it's almost a guarantee that multimap will be faster (especially if you have very many elements at all).

What is better, a STL list or a STL Map for 20 entries, considering order of insertion is as important as the search speed

I have the following scenario.The implementation is required for a real time application.
1)I need to store at max 20 entries in a container(STL Map, STL List etc).
2)If a new entry comes and 20 entries are already present i have to overwrite the oldest entry with the new entry.
Considering point 2, i feel if the container is full (Max 20 entries) 'list' is the best bet as i can always remove the first entry in the list and add the new one at last (push_back). However, search won't be as efficient.
For only 20 entries, does it really make a big difference in terms of searching efficiency if i use a list in place of a map?
Also considering the cost of insertion in map i feel i should go for a list?
Could you please tell what is a better bet for me ?
1)I need to store at max 20 entries in a container(STL Map, STL List etc). 2)If a new entry comes and 20 entries are already present i have to overwrite the oldest entry with the new entry.
This seems to me the job for boost::circular_buffer.
In general the term circular buffer refers to an area in memory which is used to store incoming data. When the buffer is filled, new data is written starting at the beginning of the buffer and overwriting the old.
The circular_buffer is a STL compliant container. It is a kind of sequence similar to std::list or std::deque. It supports random access iterators, constant time insert and erase operations at the beginning or the end of the buffer and interoperability with std algorithms. The circular_buffer is especially designed to provide fixed capacity storage. When its capacity is exhausted, newly inserted elements will cause elements either at the beginning or end of the buffer (depending on what insert operation is used) to be overwritten.
The circular_buffer only allocates memory when created, when the capacity is adjusted explicitly, or as necessary to accommodate resizing or assign operations. On the other hand, there is also a circular_buffer_space_optimized available. It is an adaptor of the circular_buffer which does not allocate memory at once when created, rather it allocates memory as needed.
For the fast search, I think that with just 20 elements (if their comparison isn't too complicated) you're ok with a "low-cost" container like this and normal linear search, in my opinion it would be difficult to achieve better performance with other STL containers.
Maintain order of insertion, or allow fast searching: choose one.
std::map is not an option here because it doesn't maintain the order of insertion. Besides, it's an associative container. You should choose between a list, a deque and a vector. In terms of performance your best bet is a list, since you can pop off an element from the back and insert a new one at the front (or vice-versa) without any shifting or performance penalty.
The cost of insertion in a map, just as a sidenote, isn't expensive it all: it's in the order of O(log n). Practically irrelevant in the case of 20 elements. The same holds for a std::set.
With only 20 elements, I would not worry much about which container you use. If you determine that the container chosen is in fact a detriment to the performance of your application, it should be relatively easy to swap out the container chosen and replace it with a more-efficient container later.
With that being said, for a large number of elements, the std::deque would probably give you the best all-around efficiency for what you are trying to accomplish. Unlike std::vector, std::deque allows for removal from the front without needing to move all of the other elements. Unlike std::list, std::deque allows for random access of its elements.
You just need to implement a priority queue. STL Map doesn't work.
It depends on the size of the elements.
I know from my own experience that for five integers an unordered array of integers searched with linear search is faster than a set, a list or insertion sort and binary search on an ordered array.
The O() notation of an unordered array may be much worse than any of the other options but the normally unseen C in O(N+C) + C is so much smaller.
A list, set or map (anything that uses dynamic memory and is linked by pointers) will be dominated by cache misses, memory allocations and indirect reference penalties.
You need a Priority Queue implemented on an array.
See the Binary Heap for an implementation.
Do you already know that this is a bottleneck?
My advice would be to first use what is more natural to read while programming and only optimize it when you see that the performance is not what you need.
My suggestion would be to make a circular buffer. But that only works if "old" is determined by when it was inserted, and not some field.
If you need to have a proper LRU, then you should probably go and look at something like http://www.codeproject.com/KB/recipes/LRUCache.aspx?fid=1000025&df=90&mpp=25&noise=3&sort=Position&view=Quick&fr=15
But with 20 entries as your max, it will be very hard to you to find a complex algorithm that is actually faster than the trivial lineary check of every element.