keeping track of changing pointers - c++

I have a red black tree algorithm which is working fine. When a node is inserted into the tree, the insert() method returns to the caller a pointer to the node that was inserted. I store all such pointers in a STL vector.
The problem is, within the operation of the RB tree, sometimes these pointers are invalidated. For instance, there is a method that is called during a rotateleft/right that copies the values of node A into the current node and then deletes node A. Well I had a pointer to node A in that vector which is now invalid.
I thought about making a way to update the pointers in the vector as follows,
1) keep a multimap which maps node pointers to the vector indices that holds those pointers.
2) Before deleting a node, check this multimap to find all the spots in the vector that will be affected
3) iterate over the vector and change the old pointer to the new pointer
4) Update the key value in the multimap to reflect the new pointer as well.
Problem is, you can't update a key value of a map collection for obvious reasons. Also this seems like a horrible solution both for complexity and implementation reasons. Any ideas on how I can accomplish this dynamic updating of pointers?

It seems more reasonable to keep the data in some opaque data structure pointed by node, and to keep external pointers to this structures instead of nodes.
Basicly it means adding a level of indirection between the tree and actual data.

I'm not sure if this is exactly what you're trying to do, but to keep track of items added to tree/heap data structures, the following has worked for me in the past:
Store two "index" vectors in addition to the underlying tree data:
std::vector<int> item_to_tree;
std::vector<int> tree_to_item;
So, to find the index in the tree of the ith item, use item_to_tree[i]. To find the item at a particular jth tree index, use tree_to_item[j]. This is similar to storing explicit pointers, as you've done, but by making use of indices you can essentially get a bi-directional map with O(1) lookup.
Obviously, within the tree operations, you have to make sure that the mappings stay consistent. I've not thought about this for an RB tree, but definitely for other tree-like structures this just adds some O(1) complexity to each operation.
In the case of the ith item "removed" from the tree, tree_to_item no longer contains the ith item index, and I usually set item_to_tree[i] = -1, or some "empty" flag.
Hope this helps.

Related

Should I use std::list or is there a better method?

I am currently coding in a 2D geometry editor in c++. I am having the user place nodes. Lines and arcs can be drawn by selecting 2 nodes.
Right now, I am storing the nodes in a std::deque container (same thing for the lines and arcs) because I would like to store the address of the node into each line/arc. This makes things very convenient coding wise when I implement a feature to move the node. If I were to store the actual node inside of each line/arc, then when I want to move a node, then I would have to iterate through the entire line and arc stucture to find the node that I just moved and update the parameters. This option isn't an option on the table. Hence, the need to be able to store the address of the node inside each line/arc.
However, I am running into some issue where I need to delete the node. Looking on the reference manual, it seems that for all pointer, these are invalidated when you erase an element from the deque (unless that element is at the beginning or the end. For the sake of discussion, I will not be considering this case). This causes issue with the erasing because now, all of my lines/arc reconnect themselves to different nodes or are not drawn at all when a node is erased and the program eventually crashes.
Continuing to look online, I come across std::list which (from my understanding of reading the documentation) does not invalidate any pointers or references when one of the elements is erased. This seems to be a very nice solution to my problem.
However, I have been looking a little bit on stack overflow to see what are the benefits/disadvantages of using a list vs a deque. And it seems like there is more of a preference to use a deque then a list. It seems as though the list is slower to access then the deque. This is not good because I am not sure how many nodes a user would like to draw. For all I know, there could be 10,000+ nodes in the geometry and if the user wants to move a node, I don't want the user have to wait 30 sec for the program to iterate through all of the elements to find the node(s) to erase.
So on one hand, deque are alot faster but as soon as an element is removed, all of the pointers and references are invalidated. On the other hand, std::list allows me to erase whatever element I want without invalidating any of the pointers and references but is slower compared to a deque.
I am considering to switch to a list because even if the list is slower, if I can't erase an element without invalidating the pointers and references, then there isn't much of a benefit speed wise if the program doesn't work.
However, is using a list the best choice in my situation? Is there any way to use a deque? Or is there a third option that I haven't considered?
Edit:
I forgot to mention. One thing that I am not to fond of with lists is the inability to get an element's data directly (in std::deque and vector, I can use the at function to access elements). This isn't a huge deal breaker with my code. But it does makes things convenient. For example, when a user selects a node when they want to create a line/arc, the code iterates over the entire node list to find out which one was selected and then, for the first selection, stores the index into a variable (called firstNodeIndex). For the second node, it does the same thing but when both variables (firstNodeIndex and secondNodeIndex) are viable numbers, then the function for creating the line/arc is called and the function uses the two stored indexes to re-access the node list to grab an address to the node. If I were to use the list, I would have to store the address of the two nodes in variables and then create some additional logic to make sure that the two variables containing the addresses to the two nodes are viable options.
Another alternate solution would be to reiterate through the entire node list again to grab the nodes that are selected (I would have a variable inside each node to indicate that it is selected). But I am afraid that this might not be a good idea given std::list limitations.
I am kind of in favor of my first way but I am open to change if need be or if there is a better method
So your problem is that you don't want your iterators invalidated when you insert or erase element, but you want your data structure to be fast.
Linked list is only slow when you have to iterate all elements frequently. In does not take advantage of continuous data access like vector or deque. Also linear search in list is slow.
I had similar situations. Here are some options:
Use list and try to avoid linear searches. See if memory access speed of linked list affect your performance significantly and if it doesn't - use it.
Use map or set. Same cons as list except search, which is O(logn). Or you can use unordered versions if you don't care about sorting elements.
Use non-standard data structure like plf::colony. If you don't care about order of insertion, this is probably your best option.
Create your own deque-like data structure that does not invalidate iterators (using skipfields or storing free elements somewhere). I wouldn't recommend it since you will probably end up writing something like plf::colony anyway.
A rule of thumb:
will I want to add and delete items at random?
set, list, map, multimap and unordered versions of same
will I want to be able to name individual items and find them quickly?
map, set, multimap and unordered versions of same
does the thing I am storing have mutable data, or is it more detailed than just its name (key)?
map, multimap, unordered versions thereof
do I need the items to say in order?
yes: map, no: unordered_map

C++/STL Structure for Indexed Linked List (Indices in Hash Table)

I'm looking for a way to remember locations in a doubly-linked list (in hash tables or other data structures).
In C, I would add prev and next pointers to my struct. Then, I could store references to elements of my struct wherever I wanted, and refer to them later. I need only maintain these prev/next pointers to manipulate my linked list, and stored references to locations in the list will stay updated.
What is the C++ approach to this problem?
The end goal is an data structure (which is sequenced, but not ordered, i.e. no comparison function exists, but they are relatively sequenced based on where they are inserted). I need to cheaply insert, delete, move objects as the structure grows. But I also need to cheaply look up each element by some key unrelated to the ordering, and I look up meaningful locations (like head, tail, and various checkpoints in the structure called slices). I need to be able to traverse the sequenced list after looking up a starting place by key or by slice.
Head and tail will be free. I was planning a hash table that maps the keys to list elements, and another hash table that maps slices to list elements.
I asked a more specific question related to this here:
Using Both Map and List for Same Objects
The conclusion I made was that I would need to maintain both a List and various Maps pointing to the same data to get the performance I need. But doing this by storing iterators in C++ seemed subpar. Instead it seemed easier to reimplement linked list (building it into my class) and using STL maps to point to data.
I was hoping for some input about which is a more fruitful route, or if there is some third plan that better meets my needs. My assumption is that the STL implementation of unordered_map is faster than anything I would implement, but I could match or beat the performance of list since I'm only using a subset of its functionality.
Thanks!
More precise description of my data/performance requirements:
Data will come in with a unique key. I will add it into a queue.
I'll need to update/move/remove/delete this data in O(1) based on its unique key.
I'll need to insert new data/read data based on metadata stored in other data structures.
I was speaking imprecisely when I said very large list above. The list will definitely fit into memory. Space is cheap enough that it is worth using other data structures to index this list.
I understand your requirements as being:
the data has a unique key
update/move/remove/delete this data in constant time, using its unique key
According to this the best fit would be the unodered_map: It works with a key, and uses a hash table to access the elements. In average insert, find, update is constant time (thanks to the hash table), unless the hash function is not appropriate (i.e. worst case if all elements would yield the same hash value, you would have linear time, as in a list, due to the colisions).
This seems also to match your initial intention:
Head and tail will be free. I was planning a hash table that maps the
keys to list elements, and another hash table that maps slices to list
elements.
Edit: If you need also to master sequencing of elements, independently of their key, you'd need to build a combined container, based on a list and an unordered_map which associates the key to an iterator to the element in the list. You'd then have to manage synchronisation, for example:
insert element: get iterator by inserting element into list, then add the iterator to the unordered_map using the element's key.
remove element: find iterator to element by searching for the key in the unordered_map, erase element in the list using this iterator, and finally erase the key in the unordered_map.
find element: find iterator to element by searching for the key in the unordered_map
sequential iteration: use the iterator to the begin of the list.
I'd route you to STL containers to browse... but when you write word 'very large' (and I'm currently Big Data professional) everything changes.
Nobody usually gives you good advice for scalability but ... here are points.
What is 'very large' in your case? Does std::list fit your needs? Before 3rd paragraph everything looks suitable if you are not too large. Do your structure fits in memory?
How about your structure aligned to memory manager? Simply C-like list with 'prev' and 'next' has serious disadvantage - every element usually is allocated from memory manager. If you are large, this matters and gives your memory over-usage.
What do you expect to be element external reference? If you use pointers - you loose ability to perform optimization on your structure. But probably you don't need it.
Actually you definitely need to consider some 'pools' management if you are really large and indices in such pools can be pretty good references if you modify your structure intensively.
Please consider about large twice. If you mean really large - you need special solution. Especially if your data is larger than your memory. If you are not so large - why not start with just std:list? When you answer to this question, probably your life could be much more easy ;-).

Priority queue and Prim's Algorithm

I have gone through the C++ reference manual and am still unclear on how to use the priorityqueue data structure in STL.
So, basically I have been trying to implement my own using heaps.
I am doing this for implementing Prim's algorithm.
Vector <int, int> pq;
This is my priority queue. The first field is the node and the second field the weight to the existing tree.
I plan to modify the values of weight in pq every time a new node is added to the tree by updating the weights of its neighbour nodes.
How do I access the individual elements of this vector? I also need to be able to delete elements at will.
Is this a good way to implement a priority queue? what if I want to add another field to the container, namely
Vector<int, int, int> MST
How would I access the third element? I want to store the resulting MST this way such that the first two fields represent the vertices forming the edge, and the third the weight.
It would also help if someone could tell me how to assign elements to this vector using push_back.
Also, would the conventional C++ STL priority queue help in this as I need to update the priority values each time a new element is added to the MST? Would it self-correct itself according to the priority when values are modified?
One other question, these Vectors, when I pass them to a function, and try to make changes, is it a pass by value or pass by reference - Or, are the changes reflected outside the function?
In Prim's algorithm the random access to elements not needed. You just need to skip elements from the queue which connect already connected and pass forward.
So the algorithm looks as follows:
choose a node N
add all edges from N to the PQ
pop a minimal cost edge from PQ
if it connects nodes which are already in the tree, skip it
otherwise add this edge to the tree, call the new node N and go back to point 2.
After adding the node just check if size of the tree is already size of graph - 1. If so then finish.
Note that the only operations on PQ are add_element and pop_minimum - thus std::priority_queue will work.
Firstly, std::vector<int,int> isn't valid - the second type argument is an (optional) allocator, and int is not an allocator. If you're using a different underlying container, please say what it is. I'll assume you want to work with std::vector for now.
Secondly, std::priority_queue doesn't support the operations you want (access and delete arbitrary elements), so you can't use that.
You can use the underlying vector directly, and the heap algorithms (std::make_heap etc.) to sort it:
random access will work (although it's not clear what you expect the index to be once your vector is in heap order)
deleting an arbitrary element will require erasing it from the vector and re-running make_heap, or you can implement your own siftDown
Oh, and you can make some value type to store in your vector, such as
std::vector<std::pair<int,int>>
for your first example, or perhaps more clearly:
struct {
int node;
int weight;
} Node;
// ...
std::vector<Node>

Is it possible to directly access a position (not a key) in a multimap

Question: How do you access a value at a specified position in a key range without loops?
The only possible way I know to acquire this data is by incrementing the iterator however many times the position is from the beginning or end of a key range.
edit The reason I am reluctant in using loops is to reduce processing time by getting the wanted value when the values position in an index is known.
As properly stated in the comments, you basically cannot do this on multimap. Or on map. Or on any container that does not support random access. The simple answer to the question "Why I cannot do this?" is "Because it is not in the interface".
The longer answer requires a minimal understanding of the implementation of different containers. Elements of, say, vector are stored in the memory consequently. Knowing address of the i-th element, you can add k and acquire the address of i+k-th element.
Maps (and multimaps) are different. To allow an efficient way of associative access they use some kind of trees as an underlying data structure. The simplest is the binary search tree. And all nodes of the tree are allocated in the heap. You don't know where before you actually access them. But you can access a node only through other nodes.
What you can do is to go through all elements and store their addresses in a vector, so they can now be accessed "randomly". However, this vector is invalidated once a new element is added or an element is removed from the map. There is no magic data structure which allows you both effective associative and random access.

c++ heap with removing any element method

I am trying to implement my own heap with the method removing any number (not only the min or max) but I can't solve one problem. To write that removing function I need the pointers to the elements in my heap (to have O(logn) time of removing given element). But when I have tried do it this way:
vector<int*> ptr(n);
it of course did not work.
Maybe I should insert into my heap another class or structure containing int, but for now I would like to find any solution with int (because I have already implemented it using int)?
When you need to remove (or change the priority of) other objects than the root, a d-heap isn't necessarily the ideal data structure: the nodes keep changing their position and you need to keep track of various moves. It is doable, however. To use a heap like this you would return a handle to the newly inserted object which identifies some sort of node which stays put. Since the d-heap algorithm relies on the tree being a perfectly balanced tree, you effectively need to implement it using an array. Since these two requirements (using an array and having nodes stay put) are mutually exclusive you need to do both and have an index from the nodes into the array (so you can find the position of the object in the array) and a pointer from the array to the node (so you can update the node when the position changes). Almost certainly you don't want to move your nodes a lot, i.e. you rather accept finding the proper direction to move a nodes by searching multiple nodes, i.e. you want to use a d > 2.
There are alternative approach to implement a heap which are inherently nodes based. In particular Fibonacci heaps which yield for certain usage patterns a better amortized complexity than the usual O(ln(n)) complexity. However, they are somewhat harder to implement and the actual efficiency only pays off if you either need to change the priority of a node frequently or you have fairly large data sets.
A heap is a particular sort of data structure; the elements are stored in a binary tree, and there are well-established procedures for adding or removing elements. Many implementations use an array to hold the tree nodes, and removing an element involved moving log(n) elements around. Normally the way the array is used, the children of the node at array location n are stored at locations 2n and 2n+1; element 0 is left empty.
This Wikipedia page does a fine job of explaining the algorithms.