So I am looking for a data structure which needs a FIFO behaviour but should also have a quick look up time by value.
In my current code I have some data duplication. I use a std::unordered_set and std::queue for achieving the behaviour I want but there's probably a better way of achieving this that I'm not thinking of at the moment. I have a function that adds my new entry to both the set and the queue when a new entry comes up. To search if an entry exists in the queue I use find() in the set. Laslty, I have a timer that is set off after an insertion to the queue. After a minute I get the entry in the front of the queue with queue.front(), then I use this value to erase from the set, and finally I do a pop on the queue.
This all works as expected and gives me both the FIFO behaviour and the constant time complexity for the look up but I have data duplication and I was wondering if there is a data structure (maybe something form boost?) which does what I want without the duplication.
Data structure for FIFO behaviour and fast lookup by value
A solution is to use two containers: Store the elements in an unordered set for fast lookup, and upon insertion, store iterator to the element in a queue. When you pop the queue, erase the corresponding element from the set.
A more structured approach is to use a multi-index container. The standard library doesn't provide such, but boost does. More specifically, you could use a combination of hashed and sequence indices.
This answer is mostly concerning corner cases of the problem as presented
If you problem is a practical one, and you are able store the elements with a std::vector - and if you have less than in the ballpark of some ~10-100 elements in the queue, then you could just use:
std::queue<T, std::vector<T> > q;
That is a queue using vector as the underlying container. When you have that small number of elements (only 10-100) then using advanced lookup methods is not worth it.
You then only needs to check for duplicates when you pop the queue not on every insertion. Again, that might or might not be usefull depending on your specific case. I can imagine cases where this method is superior. Eg. a webserver serving pages that gets a lot of hits to just one or a few pages. Then it might be faster to just add say 100,000 elements to the vector and then go and remove the duplicates all in one go when popping.
How about defining your own data structure which can act as a BST (for lookups) and as a min heap which you can use to impose fifo?
class node {
public:
static int autoIncrement = 0;
int order; // this will be auto-incremented to impose FIFO
int data;
node* left_Bst;
node* right_Bst;
node* left_Heap;
node* right_Heap;
node() {
order = autoIncrement;
autoIncrement++;
}
}
By doing this you are basically creating two data structures sharing the same nodes. BST's partial order is imposed via data, and heap's can be maintained via order variable.
During an insertion you can traverse via BST pointers and insert your element if it doesn't exist already and also modify the heap pointers accordingly after insertion.
Related
Thanks to std::map and similar data structures, it's easy to do quick insertion, access and deletion of data elements based on a key.
Thanks to std::make_heap and it's colleages, it's easy to maintain a priority queue based on a value.
But very often, the algorithm needs a combination of both. For example, one has the following struct:
struct entry{
int id;
char name[20];
double value;
}
The algorithm needs to quickly find and remove the entry with the highest value. That calls for a priority queue with std's heap functions. It also needs to quickly remove some elements based on name and/or id. That calls for a std::map.
When programming that kind of algorithms, I often end up just using a good datastructure for the operation that is most needed (for example, priority access), and then use a linear search through that structure for the lesser needed operation, for example removal of a key.
But is it possible to implement that kind of algorithm maintaining quick access for priority and access over two keys?
One way is boost multi index.
Another is to create two data structures whose value is a shared_ptr<const entry> and who use a different ordering, then a wrapping class that ensures adding/removing occurs in both. When you want to edit you naturally have to remove then reinsert.
Boost's multi-index is more complex to set up, but claims faster performance as the two data structures are intertwined, causing better cache performance and less memory usage.
Interview question: There is a stream of Integers that arrives at specified intervals (say every 20 sec). Which Container of STL would you use to store them so that the Integers look sorted? My reply was map/set when there is no duplicate or multimap/multiset when there is duplicate. Any better answer if exists?
Use a multiset if you want to preserve duplicates. If you don't want to preserve duplicates, use a set.
If it's only being updated every 20 seconds, it probably doesn't matter a whole lot (unless it goes for so long that the set of integers becomes tremendously huge).
If you had data coming in a lot faster, there are alternatives that might be worth considering. One would be to use a couple of vectors. As data arrives, just push it onto one of the vectors. When you need to do an in-order traversal, sort that newly arrived data, and merge with the other vector of existing (already-sorted data). That'll give you results in order, which you can then write out to another vector, and start the same cycle again.
The big advantage here is that you're dealing with contiguous data instead of individually allocated nodes. Even with a possibility of three vectors in use at a time, your total memory usage is likely to be about equal (or possibly even less than) that of using a set or multiset.
Another possibility to consider (that's a bit of a hybrid between the two) would be something like a B+ tree. This is still a tree, so you can do in-order insertions with logarithmic complexity, but you have all the data in the leaf nodes (which are fairly large) so you get at least a reasonable amount of contiguous access as well.
To maintain a sorted list of integers streaming I would use std::priority_queue with any underlying container (vector or deque depending on the particular use).
You can keep push() ing to the priority_queue and use top() and pop() to retrieve in the sorted order.
Answer should be std::set . std::map<key, value> has to consider when there is a pairs of data as <key, value> and it need to be sorted according to the value of key
In same way if you have to consider duplicates, use std::multiset and std::multimap according to type of data.
Is there a data structure like a queue which also supports removal of elements at arbitrary points? Enqueueing and dequeueing occur most frequently, but mid-queue element removal must be similar in speed terms since there may be periods where that is the most common operation. Consistency of performance is more important than absolute speed. Time is more important than memory. Queue length is small, under 1,000 elements at absolute peak load.In case it's not obvious I'll state it explicitly: random insertion is not required.
Have tagged C++ since that is my implementation language, but I'm not using (and don't want to use) any STL or Boost. Pure C or C++ only (I will convert C solutions to a C++ class.)
Edit: I think what I want is a kind of dictionary that also has a queue interface (or a queue that also has a dictionary interface) so that I can do things like this:
Container.enqueue(myObjPtr1);
MyObj *myObjPtr2 = Container.dequeue();
Container.remove(myObjPtr3);
I think that double-link list is exactly what you want (assuming you do not want a priority queue):
Easy and fast adding elements to both ends
Easy and fast removal of elements from anywhere
You can use std::list container, but (in your case) it is difficult to remove an element
from the middle of the list if you only have a pointer (or reference) to the element (wrapped in STL's list element), but
you do not have an iterator. If using iterators (e.g. storing them) is not an option - then implementing a double linked list (even with element counter) should be pretty easy. If you implement your own list - you can directly operate on pointers to elements (each of them contains pointers to both of its neighbours). If you do not want to use Boost or STL this is probably the best option (and the simplest), and you have control of everything (you can even write your own block allocator for list elements to speed up things).
One option is to use an order statistic tree, an augmented tree structure that supports O(log n) random access to each element, along with O(log n) insertion and deletion at arbitrary points. Internally, the order statistic tree is implemented as a balanced binary search treewith extra information associated with it. As a result, lookups are a slower than in a standard dynamic array, but the insertions are much faster.
Hope this helps!
You can use a combination of a linked list and a hash table. In java it is called a LinkedHashSet.
The idea is simple, have a linked list of elements, and also maintain a hash map of (key,nodes), where node is a pointer to the relevant node in the linked list, and key is the key representing this node.
Note that the basic implementation is a set, and some extra work will be needed to make this data structure allow dupes.
This data structure allows you both O(1) head/tail access, and both O(1) access to any element in the list. [all on average armotorized]
I am trying to implement my own heap with the method removing any number (not only the min or max) but I can't solve one problem. To write that removing function I need the pointers to the elements in my heap (to have O(logn) time of removing given element). But when I have tried do it this way:
vector<int*> ptr(n);
it of course did not work.
Maybe I should insert into my heap another class or structure containing int, but for now I would like to find any solution with int (because I have already implemented it using int)?
When you need to remove (or change the priority of) other objects than the root, a d-heap isn't necessarily the ideal data structure: the nodes keep changing their position and you need to keep track of various moves. It is doable, however. To use a heap like this you would return a handle to the newly inserted object which identifies some sort of node which stays put. Since the d-heap algorithm relies on the tree being a perfectly balanced tree, you effectively need to implement it using an array. Since these two requirements (using an array and having nodes stay put) are mutually exclusive you need to do both and have an index from the nodes into the array (so you can find the position of the object in the array) and a pointer from the array to the node (so you can update the node when the position changes). Almost certainly you don't want to move your nodes a lot, i.e. you rather accept finding the proper direction to move a nodes by searching multiple nodes, i.e. you want to use a d > 2.
There are alternative approach to implement a heap which are inherently nodes based. In particular Fibonacci heaps which yield for certain usage patterns a better amortized complexity than the usual O(ln(n)) complexity. However, they are somewhat harder to implement and the actual efficiency only pays off if you either need to change the priority of a node frequently or you have fairly large data sets.
A heap is a particular sort of data structure; the elements are stored in a binary tree, and there are well-established procedures for adding or removing elements. Many implementations use an array to hold the tree nodes, and removing an element involved moving log(n) elements around. Normally the way the array is used, the children of the node at array location n are stored at locations 2n and 2n+1; element 0 is left empty.
This Wikipedia page does a fine job of explaining the algorithms.
I was wonndering about a queue-like container but which has key-access, like a map.
My goal is simple : I want a FIFO queue, but, if I insert an element and an element with a given key is already in the queue, I want it the new element to replaced the one already in the queue. For example, a map ordered by insertion time would work .
If there is no container like that, do you think it can be implemented by using both a queue and a map ?
Boost multi-index provides this kind of container.
To implement it myself, I'd probably go for a map whose values consist of a linked list node plus a payload. The list node could be hand-rolled, or could be Boost intrusive.
Note that the main point of the queue adaptor is to hide most of the interface of Sequence, but you want to mess with the details it hides. So I think you should aim to reproduce the interface of queue (slightly modified with your altered semantics for push) rather than actually use it.
Obviously what you want can be done simply with the queue-like container, but you would have to spend O(n) time on every insertion to determine if the element is already present. If you implement your queue based on something like std::vector you could use the binary search and basically speed up your insertion to O(log n) (that would still require O(n) operations when the memory reallocation is done).
If this is fine, just stick to it. The variant with additional container might give you a performance boost, but it's also likely to be error-prone to write and if the first solution is sufficient, just use it.
In the second scenario you might want to store your elements twice in different containers - the original queue and something like a map (or sometimes a hashmap may perform better). The map is used only to determine if the element is already present in the container or not - and if YES, you will have to update it in your queue.
Basically that gives us O(1) complexity for hashmap lookups (in real world this might get uglier because of the collisions - hashmaps aren't really good for determining element existence) and O(1) insertion time for the case when no update is required and O(n) insertion time for the case update is needed.
Based on the percentage of the actual update operations, the actual insertion performance may vary from O(1) to O(n), but this scheme will definitely outperform the first one if the number of updates is small enough.
Still, you have to insert your elements in two containers simultaneosly and the same should be done if the element is deleted and I would think twice "do I really need that performance boost?".
I see easy way of doing this with a queue and optionally a map.
Define some sort of == operator for your elements.
Then simply have a queue and search for your element every time you want to insert it.
You could optimize this by having a map of element locations to elements instead of searching the queue every time.