Container for a stack of unique elements

Container for a stack of unique elements - c++

What I'm trying to do is to construct a stack which contains unique elements.
And if an element is pushed to which is already in stack the element is not pushed but the existing elemetn should be moved to the top of the stack, i.e.
ABCD + B > ACDB
I would like to here from you which container will be the best choice to have this functionality.
I decided to user stack adapter over list, because
list does provide constant time for element move
list is one of the natively supported containers for the stack.
The drawback of my choice is that I have to manually check for the duplicate elements.
P.S. My compiler is not so recent so please don't suggest unordered_set.

Basically you have to chose between constant time moving + long search, or constant time search + long moving.
It's hard to say which would be more time-consuming, but consider this:
You will have to search for if the element exists every time you try to add an element
You will not have to move elements every time, since obviously at some times you will be adding elements that are "new" for the container.
I'd suggest you to store elements and their stack positions in different containers. Store elements in a way that provides fast search, store stack positions in a way that provides fast movement. Connect both with pointers (so you can know which element is on which position, and which position holds which element <-- messy phrase, I know it!), and you will perform stuff rather fast.

From your requirements, it seems to me that the structure you want could be derived from a Max Heap.
If instead of storing just the item, you store a pair (generation, item), with generation coming from a monotonically increasing counter, then the "root" of the heap is always the last seen element (and the other elements do not really matter, do they ?).
Pop: typical pop operation on the heap (delete-max operation)
Push: modified operation, to account for uniqueness of "item" within the structure
look for "item", if found update its generation (increase-key operation)
if not, insert it (insert operation)
Given the number of elements (20), building the heap on a vector seems a natural choice.

Related

How do you update values in priority_queue, or is there another way to update keys in heaps in c++

I was going through Djikstra's algorithm when I noticed, I could update keys in heap(with n keys) in O(logn) time (last line in the pseudocode). How do I update keys in heaps in C++, is there any method in priority_queues to do this? Or do I have to write my own heap class to do achieve updates in O(logn) like this?
Edit 1:
Clarifying my need - for a binary heap with n elements -
1) Should insert new values and find & pop minimum values in O(logn)
2) Should update already present keys in O(logn)
I tried to come up with a way to implement this using make_heap, push_heap, pop_heap, and a custom function for update as John Ding suggested.
However I am facing a problem in making the function, I first need to find the location of the key in the heap. Doing this under O(logn) in a heap requires a lookup array for position of keys in heap, see here (I don't know of any other way). However these lookup tables won't be updated when I call push_heap or pop_heap.

You can optimize dijktra algorithm with priority_queue. It is implemented by a binary heap, where you can pop the top or push in a element in O(logN) time. However, due to the encapsulation of priority_queue， you cannot modify the key(more pricisely, decrease the key) of any element.
So our method is to push multiple elements into the heap regardless of whether we have multiple elements refering to the same node.
for example, when
Node N : distance = 30, GraphNode = A(where A refers to one node in the graph, while N is one node in the heap)
is already in the heap, then using the priority_queue cannot help you do such a operation when we try to relax Node N:
decrease_key_to(N, 20)
by decreasing key can make the heap always include less than N elements, but it's cannot be implemented by priority_queue
What we can do with it is to add another node in the heap:
Node N2 : distance = 20, GraphNode = A
push N2 into the heap
That's corresponding to priority_queue::push
So you may need to implement a binary heap supporting decrease_key yourself or find an implementation online, and store a table of pointers pointing to every element in a heap to know access elements through nodes in the graph.
As an extension, using Fibonacci heap can even make decrease_key faster, that's the ultimate level of Dijkstra, Haha :)
Problem of last version of my answer:
We cannot locate the element pushed in to the heap using push_heap.

In order to do this, you need more than the priority_queue provides: you need to know where in the underlying container the element to be updated is stored. In a binary heap, for example, you need to know the position of the element for which you want to change the priority. With this you can change the priority of the element and then restore the heap property in O(log n) by bubbling the modified element up or down.
For Dijkstra, if I remember correctly, something called Fibonacci heap is more efficient than a binary heap.

Unfortunately, std::priority_queue doesn't support updates of entries in the heap, but you may be able to invalidate entries in the heap and wait for them to percolate up to the root from where they can eventually be deleted. So instead of changing an existing entry, you invalidate it and insert another one with the new priority. Whether you can live with the consequences of having invalid entries filling up the heap, is for you to judge.
For an idea how this might work in practice, see here.

Which one to use? Vector or List

I have to store 10 elements of type card (user defined class). I cannot decide whether to go with vector or list. Following are the operations I would be performing on the structure:
Appending or Inserting at the end of the structure
(better to go with vector).
Ramdom access(element to be accessed can be at end, begining or any position in the structure) (again vector is a better choice).
To delete the random accessed element i.e. Erasing an element from begining or end or any position
(Vector only good for end positions, elsewhere list preferred).
Move element from one position to other such that the element is not swapped with the element at desired position but it gets
inserted within (List are much better here).
To move more than one elements in the same manner as point 4.
(Again I would prefer list)
So can you please guide me which one to pick.
Thanks a ton!

Sounds like you have done your research on vector and list, as you can see there are some conflicting requirements. One other thing to consider is perhaps the frequent of those operations. I.e how often are you expecting to be inserting or deleting from the middle of the collection. Another consideration is the size of the collection, 10 elements is a very small collection so copying 10 elements around isn't a big deal unless you are doing it very frequently. My default choice would be vector, but you can profile both to see which one performs better.

c++ heap with removing any element method

I am trying to implement my own heap with the method removing any number (not only the min or max) but I can't solve one problem. To write that removing function I need the pointers to the elements in my heap (to have O(logn) time of removing given element). But when I have tried do it this way:
vector<int*> ptr(n);
it of course did not work.
Maybe I should insert into my heap another class or structure containing int, but for now I would like to find any solution with int (because I have already implemented it using int)?

When you need to remove (or change the priority of) other objects than the root, a d-heap isn't necessarily the ideal data structure: the nodes keep changing their position and you need to keep track of various moves. It is doable, however. To use a heap like this you would return a handle to the newly inserted object which identifies some sort of node which stays put. Since the d-heap algorithm relies on the tree being a perfectly balanced tree, you effectively need to implement it using an array. Since these two requirements (using an array and having nodes stay put) are mutually exclusive you need to do both and have an index from the nodes into the array (so you can find the position of the object in the array) and a pointer from the array to the node (so you can update the node when the position changes). Almost certainly you don't want to move your nodes a lot, i.e. you rather accept finding the proper direction to move a nodes by searching multiple nodes, i.e. you want to use a d > 2.
There are alternative approach to implement a heap which are inherently nodes based. In particular Fibonacci heaps which yield for certain usage patterns a better amortized complexity than the usual O(ln(n)) complexity. However, they are somewhat harder to implement and the actual efficiency only pays off if you either need to change the priority of a node frequently or you have fairly large data sets.

A heap is a particular sort of data structure; the elements are stored in a binary tree, and there are well-established procedures for adding or removing elements. Many implementations use an array to hold the tree nodes, and removing an element involved moving log(n) elements around. Normally the way the array is used, the children of the node at array location n are stored at locations 2n and 2n+1; element 0 is left empty.
This Wikipedia page does a fine job of explaining the algorithms.

Difference between lists and arrays

It seems a list in lisp can use push to add another element to it, while an array can use vector-push-extend to do the same thing (if you use :adjustable t, except add an element at the end. Similarly, pop removes the first item in a list, while vector-pop removes the last item from a vector.
So what is the difference between a list and a vector in lisp?

a list is made of cons cells and nil. One has to move sequential through the list to access an element. Adding and removing an element at the front is very cheap. Operations at other list positions are more costly. Lists may have a space overhead, since each element is stored in a cons cell.
a vector is a one-dimensional array of a certain length. If an array is adjustable, it might change its size, but possibly the elements have to be copied. Adding an element is costly, when the array has to be adjusted. Adjustable arrays have space overhead, since arrays are usually not adjusted by increments of one. One can access all elements directly via an index.

A vector is tangible thing, but the list you're thinking of is a name for a way to view several loosely-connected but separate things.
The vector is like an egg carton—it's a box with some fixed number of slots, each of which may or may not have a thing inside of it. By contrast, what you're thinking of a list is more like taking a few individual shoes and tying their laces together. The "list" is really a view of one cons cell that holds a value—like a slot in the egg carton—and points to another cons cell, which also holds a value and points to another cons cell, which holds a value and possibly points to nothing else, making it the end of the list. What you think of as a list is not really one thing there, but rather it's a relationship between several distinct cons cells.
You are correct that there are some operations you can perform on both structures, but their time complexity is different. For a list, pushing an item onto the front does not actually modify anything about the existing list; instead, it means making a new cons cell to represent the new head of the list, and pointing that cons cell at the existing list as the new tail. That's an O(1) operation; it doesn't matter how long the list was, as putting a new cell in front of it always takes the same amount of time. Similarly, popping an item off the front of the list doesn't change the existing list; it just means shifting your view one cell over from the current head to see what was the second item as the new head.
By contrast, with a vector, pushing a new item onto the front requires first moving all the existing items over by one space to leave an empty space at the beginning—assuming there's enough space in the vector to hold all the existing items and one more. This shifting over has time complexity O(n), meaning that the more items there are in the vector, the longer it takes to shift them over and make room for the new item. Similarly, popping from the front of a vector requires shifting all the existing items but the first one down toward the front, so that what was the second becomes the first, and what was the third becomes the second, and so on. Again, that's of time complexity O(n).
It's sometimes possible to fiddle with the bookkeeping in a vector so that popping an item off the front doesn't require shifting all the existing items over; instead, just change the record of which item counts as the first. You can imagine that doing so has many challenges: indices used to access items need to be adjusted, the vector's capacity is no longer simple to interpret, and reallocating space for the vector to accommodate new items will now need to consider where to leave the newly-available empty space after copying the existing data over to new storage. The "deque" structure addresses these concerns.
Choosing between a list and a vector requires thinking about what you intend to do with it, and how much you know in advance about the nature and size of the content. They each sing in different scenarios, despite the apparent overlap in their purpose.
At the time I wrote the answer above, the question was asking about pushing and popping elements from the front of both a vector and a list. Operating on the end of a vector as vector-push and vector-pop do is much more similar to manipulating the head of a list than the distinction made in my comparison above. Depending on whether the vector's capacity can accommodate another element without reallocating, pushing and popping elements at the end of a vector take constant (O(1)) time.

array vs vector vs list

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:
array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.
stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .
stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.
Right now, my choice is to use an array. Is it justifiable? Or did I miss something?
Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?
What is the best way to measure this performance? Can I just display the timestamp before and after the operations?

Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.
You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.
I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.

Premature optimization is the root of all evil.
Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.

It is really worth investing some time in understanding the fundamental differences between lists and vectors.
The most significant difference between the two is the way they store elements and keep track of them.
- Lists -
List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.
Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed
- Vectors -
Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.
There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.
As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.

Prefer an std::vector over and array. Some advantages of vector are:
They allocate memory from the free space when increasing in size.
They are NOT a pointer in disguise.
They can increase/decrease in size run-time.
They can do range checking using at().
A vector knows its size, so you don't have to count elements.
The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.

You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.
If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.
Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.
So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.
The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.

First a couple of notes:
A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.
STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.
Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.

I am maintaining a fixed-length table of 10 entries. Each item is a
structure of like 4 fields. There will be insert, update and delete
operations, specified by numeric position. I am wondering which is the
best data structure to use to maintain this table of information:
Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.
The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.
If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.
template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
vect[index]= vect.back();//move the item in the back to the index
vect.pop_back(); //delete the item in the back
}
template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
vect.push_back(vect[index]);//insert the item at index to the back of the vector
vect[index] = value; //replace the item at index with value
}

I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js