Unique membership FIFO container - c++

I need a first-in first-out queue which holds IDs with the catch that an ID should only be added if it is not already in the queue.
The best way I can think of is this:
typedef unsigned ID;
struct UniqueFifo
{
private:
std::set<ID> ids; // keep track of what is already in
std::queue<ID> fifo; // keep in fifo order
public:
void push(ID x) {
// only add to queue if not already in it
if(ids.find(x) == ids.end()) {
fifo.push(x);
ids.insert(x);
}
}
ID pop() {
// pop from queue
ID x = fifo.front();
fifo.pop();
// and also remove from map
ids.erase(x);
return x;
}
};
Is there are more elegant way of doing this with C++ STL containers?

Using a second data structure like that, optimised for insertion and searching, is the most scalable solution; but if the queue never gets particularly large, then it might be more efficient (and certainly simpler) to do a linear search in the queue itself. You'll need deque itself, rather then the queue adapter, in order to access the contents.
If it is large enough to justify a searchable "index", then consider using unordered_set if available, or some other hash-based set if not; that's likely to be faster, if you only require uniqueness and not any particular ordering.
Your code needs to insert the ID into the set as well as the queue. Conveniently, you can combine this with the check for uniqueness, so that only a single search is needed:
if (ids.insert(x).second) {
fifo.push(x);
}
You might also consider storing set iterators, rather than values, in the queue, to make erasing more efficient.

Your solution is not bad, but you need to use a std::set<ID>, not a std::map(a map is used to map keys to values while you only care about values here). Also consider using std::unordered_set if c++11 is available.

It's not bad at all. I can see other way to do this, mostly using one container (because the only "problem" that I see is that you're using 2 containers and so have to assert all the time that they are consistent with each other), but they are not more elegant and maybe cost even more.
IMHO leep this design (at least this interface) using a std::unordered_set instead of a std::set until you can benchmark your performances. After that you may have no problem or maybe the extra lookup in std::set will be too high for you. In this case you may try to keep a std::queue<std::pair<ID, std::unordered_set<ID>::const_iterator>> to ease the erasure (benefits from the fact that std::set and std::unordered_set iterator are not invalidated by addition or removal if the removal is not on there element).
But do not do this unless you need greater performances. Your code is simple and readable.

Related

Data structure for FIFO behaviour and fast lookup by value

So I am looking for a data structure which needs a FIFO behaviour but should also have a quick look up time by value.
In my current code I have some data duplication. I use a std::unordered_set and std::queue for achieving the behaviour I want but there's probably a better way of achieving this that I'm not thinking of at the moment. I have a function that adds my new entry to both the set and the queue when a new entry comes up. To search if an entry exists in the queue I use find() in the set. Laslty, I have a timer that is set off after an insertion to the queue. After a minute I get the entry in the front of the queue with queue.front(), then I use this value to erase from the set, and finally I do a pop on the queue.
This all works as expected and gives me both the FIFO behaviour and the constant time complexity for the look up but I have data duplication and I was wondering if there is a data structure (maybe something form boost?) which does what I want without the duplication.
Data structure for FIFO behaviour and fast lookup by value
A solution is to use two containers: Store the elements in an unordered set for fast lookup, and upon insertion, store iterator to the element in a queue. When you pop the queue, erase the corresponding element from the set.
A more structured approach is to use a multi-index container. The standard library doesn't provide such, but boost does. More specifically, you could use a combination of hashed and sequence indices.
This answer is mostly concerning corner cases of the problem as presented
If you problem is a practical one, and you are able store the elements with a std::vector - and if you have less than in the ballpark of some ~10-100 elements in the queue, then you could just use:
std::queue<T, std::vector<T> > q;
That is a queue using vector as the underlying container. When you have that small number of elements (only 10-100) then using advanced lookup methods is not worth it.
You then only needs to check for duplicates when you pop the queue not on every insertion. Again, that might or might not be usefull depending on your specific case. I can imagine cases where this method is superior. Eg. a webserver serving pages that gets a lot of hits to just one or a few pages. Then it might be faster to just add say 100,000 elements to the vector and then go and remove the duplicates all in one go when popping.
How about defining your own data structure which can act as a BST (for lookups) and as a min heap which you can use to impose fifo?
class node {
public:
static int autoIncrement = 0;
int order; // this will be auto-incremented to impose FIFO
int data;
node* left_Bst;
node* right_Bst;
node* left_Heap;
node* right_Heap;
node() {
order = autoIncrement;
autoIncrement++;
}
}
By doing this you are basically creating two data structures sharing the same nodes. BST's partial order is imposed via data, and heap's can be maintained via order variable.
During an insertion you can traverse via BST pointers and insert your element if it doesn't exist already and also modify the heap pointers accordingly after insertion.

Best STL container for fast lookups

I need to store a list of integers, and very quickly determine if an integer is already in the list. No duplicates will be stored.
I won't be inserting or deleting any values, but I will be appending values (which might be implemented as inserting).
What is the best STL container class for this? I found std::multimap on Google, but I that seems to require a key/value pair, which I don't have.
I'm fairly new to STL. Can I get some recommendations?
Instead of a map, you can use a set when the value and the key aren't separate.
Instead of a multiset/-map, you can use the non-multi version which doesn't store duplicates.
Instead of a set, you have the std::unordered_set as an alternative. It may be faster for your use case.
There are other, less generic, data structures that can be used to represent sets of integers, and some of those may be more efficient depending on the use case. But those other data structures aren't necessarily provided for you by the standard library.
But I'm not clear which have the fastest lookup.
Unordered set has better asymptotic complexity for lookups than the ordered one. Whether it is faster for your use case, you can find out by measuring.
not likely to exceed a thousand or so
In that case, asymptotic complexity is not necessarily relevant.
Especially for small-ish sets like this, a sorted vector can be quite efficient. Given that you "won't be inserting or deleting any values", the vector shouldn't have significant downsides either. The standard library doesn't provide a set container implemented internally using a sorted vector, but it does provide a vector container as well as all necessary algorithms.
I don't know how the containers compute hashes.
You can provide a hash function for the unordered container. By default it uses std::hash. std::hash uses an implementation defined hash function.
std::unordered_set<int> is a good choice for keeping track of duplicates of ints, since both insertion and lookup can be achieved in constant time on average.
Insertion
Inserting a new int into the collection can be achieved with the insert() member function:
std::unordered_set<int> s;
s.insert(7);
Lookup
Checking whether a given int is present in the collection can be done with the find() member function:
bool is_present(const std::unordered_set<int>& s, int value) {
return s.find(value) != s.end();
}

Datastructure for quick access with more than one key or with key and priority

Thanks to std::map and similar data structures, it's easy to do quick insertion, access and deletion of data elements based on a key.
Thanks to std::make_heap and it's colleages, it's easy to maintain a priority queue based on a value.
But very often, the algorithm needs a combination of both. For example, one has the following struct:
struct entry{
int id;
char name[20];
double value;
}
The algorithm needs to quickly find and remove the entry with the highest value. That calls for a priority queue with std's heap functions. It also needs to quickly remove some elements based on name and/or id. That calls for a std::map.
When programming that kind of algorithms, I often end up just using a good datastructure for the operation that is most needed (for example, priority access), and then use a linear search through that structure for the lesser needed operation, for example removal of a key.
But is it possible to implement that kind of algorithm maintaining quick access for priority and access over two keys?
One way is boost multi index.
Another is to create two data structures whose value is a shared_ptr<const entry> and who use a different ordering, then a wrapping class that ensures adding/removing occurs in both. When you want to edit you naturally have to remove then reinsert.
Boost's multi-index is more complex to set up, but claims faster performance as the two data structures are intertwined, causing better cache performance and less memory usage.

which std container to use in A* algorithm's openSet?

I'm implementing the A* algorithm using std::priority_queue on the openSet. At some point on the algorithm, as in wikipedia pseudo-code:
else if tentative_g_score < g_score[neighbor]
tentative_is_better := true
followed by
if tentative_is_better = true
came_from[neighbor] := current
g_score[neighbor] := tentative_g_score
f_score[neighbor] := g_score[neighbor] + h_score[neighbor]
means that one has to perform a search on the priority_queue and change a value of one of their elements, which is not possible (as far as I understood).
Also, on this line:
if neighbor not in openset
one cannot search on a priority-queue and so this if cannot be implemented on a priority_queue, which I solved by creating a std::set which only tell us which elements are on the openSet (so that when I add/remove one element to the openSet, I add/remove to both std::set and std::priority_queue).
So, I wonder how can I avoid the first problem, or which std::container should one really use for this particular (yet general A*) implementation.
More generically, I wonder which is an efficient approach to A* using std containers?
I implemented A* algorithm with the STL before and got roughly through the same situation.
I ended up just working with std::vector only, using standard algorithms like push_heap and pop_heap (which are what priority_queue uses) to keep them in order.
To be clear: you should implement it with vectors and use algorithms to manipulate the vectors and keep them in a good state. It's far easier and potentially more efficient than using some alternatives to do it that way.
Update:
Today I would certainly try some of the Boost containers, like these ones: http://www.boost.org/doc/libs/1_55_0/doc/html/heap.html But only if I'm allowed to use Boost (like for my own code for example).
You can solve this by relying on the algorithm's behavior. Use a standard priority_queue, but instead of the increase/decrease_key operations, you insert a new node into the priority queue. Both successors now live in the priority queue. The one with the better priority will be taken first and then expanded and added to the closed list. When the additional node with higher priority is taken out it is already closed and thus discarded.
Unfortunately, the std:: containers don't currently support the operations you require - what's really needed is an "indexed" priority queue that supports decrease/increase_key style operations.
One option is to roll your own container (based on an augmented binary heap) that does this, if this sounds like too much work, you can almost fake it by making use of an augmented std::set data structure - both options are discussed in more detail here.
As others have said, another option is to just remove the priority queue entirely and try to maintain a sorted std::vector. This approach will work for sure, and might require the least coding on your part, but it does have significant implications for the asymptotic scaling of the overall algorithm - it will no longer be possible to achieve the fast O(log(n)) updates of the queue while maintaining sorted order.
Hope this helps.
Without decrease_key, you can instead just re-add the node to the open set. Whenever you pop a node off the open set, check to see whether its key was greater than that node's current score; if so, continue without processing the node. That compromises the efficiency proof of A*, but in practice it isn't a serious issue.

C++ boost - Is there a container working like a queue with direct key access?

I was wonndering about a queue-like container but which has key-access, like a map.
My goal is simple : I want a FIFO queue, but, if I insert an element and an element with a given key is already in the queue, I want it the new element to replaced the one already in the queue. For example, a map ordered by insertion time would work .
If there is no container like that, do you think it can be implemented by using both a queue and a map ?
Boost multi-index provides this kind of container.
To implement it myself, I'd probably go for a map whose values consist of a linked list node plus a payload. The list node could be hand-rolled, or could be Boost intrusive.
Note that the main point of the queue adaptor is to hide most of the interface of Sequence, but you want to mess with the details it hides. So I think you should aim to reproduce the interface of queue (slightly modified with your altered semantics for push) rather than actually use it.
Obviously what you want can be done simply with the queue-like container, but you would have to spend O(n) time on every insertion to determine if the element is already present. If you implement your queue based on something like std::vector you could use the binary search and basically speed up your insertion to O(log n) (that would still require O(n) operations when the memory reallocation is done).
If this is fine, just stick to it. The variant with additional container might give you a performance boost, but it's also likely to be error-prone to write and if the first solution is sufficient, just use it.
In the second scenario you might want to store your elements twice in different containers - the original queue and something like a map (or sometimes a hashmap may perform better). The map is used only to determine if the element is already present in the container or not - and if YES, you will have to update it in your queue.
Basically that gives us O(1) complexity for hashmap lookups (in real world this might get uglier because of the collisions - hashmaps aren't really good for determining element existence) and O(1) insertion time for the case when no update is required and O(n) insertion time for the case update is needed.
Based on the percentage of the actual update operations, the actual insertion performance may vary from O(1) to O(n), but this scheme will definitely outperform the first one if the number of updates is small enough.
Still, you have to insert your elements in two containers simultaneosly and the same should be done if the element is deleted and I would think twice "do I really need that performance boost?".
I see easy way of doing this with a queue and optionally a map.
Define some sort of == operator for your elements.
Then simply have a queue and search for your element every time you want to insert it.
You could optimize this by having a map of element locations to elements instead of searching the queue every time.