Loop in std::set by index - c++

I'm making a program that handles dynamic graphs. Nodes and arcs are two classes, they are memorized in an array in the graph object, and are all indexed by custom ids (which are that item's position in the array).
Each arc has the ids of the 2 nodes it connects, each node has a list of the ids of all arcs its connected to. (all stored in sets)
An arc's destructor removes its id from the arcs set of the nodes it connected.
Now i'm writing node's destructor. It should call each arc's destructor until its set is empty. I cannot iterate through the set with an iterator, since every step the arc destructor is removing its id from the set itself.
Hence i'd need to always access last element until the set is empty; but std::set does not allow indexing like arrays and vectors, and it doesn't have a "back" like lists and stacks. How can i do that?
relevant code:
graph::arc::~arc()
{
owner->list_node[n1]->remove_arc(id);
owner->list_node[n2]->remove_arc(id);
owner->list_arc[id] = nullptr;
}
graph::node::~node()
{
while (!list_arc.empty())
{
owner->remove_arc(list_arc[list_arc.size()-1]); //invalid, that's roughly what i want to achieve
}
owner->list_node[id] = nullptr;
}
Notes
owner is the graph object. owner->list_(node/arc) holds the actual pointers. Each item's id is equal to its position in graph's list.

This feels like an error-prone cleanup strategy, but an improvement would probably require rewriting significantly more than what is provided in the question. (I suspect that shared and weak pointers could simplify things.) Since I lack enough information to suggest a better data structure:
For a set, it's easier to access the first element, *list_arc.begin(), than the last. (Not a huge difference, but still easier.)

Related

C++ How to update member value via map value pointers

I have an unusual graph structure that's made up of a few classes, and I'm trying to set a boolean member value in one of them for the sake of traversal. Let's say the classes are Graph, Node, and Edge. Graph holds an unordered map with string labels as keys and Nodes as values. The graph has bounded degree, so fixed sized arrays of pointers to Edges are kept at each node, and each Edge also has pointers to the Nodes at each end.
My aim is to visit every Edge once, so I maintain a boolean 'marked' flag inside every Edge initially set to false. Since the map in the Graph lets me iterate over Nodes, I wish to iterate over all Edges from every Node and mark each to avoid repeated visits from opposite ends. However, I find that the marks are failing to be recorded and can't seem to get it to work.
My iteration code looks like this:
for(auto it = nodeMap.begin(); it != nodeMap.end(); ++it){
Node* node = &it->second;
for (i=0; i< node->EdgeArray.size(); i++){
if (node->EdgeArray[i]){
Edge & edge = *(node->EdgeArray[i]);
if(edge.getMark()) continue;
[...do needed processing...]
edge.setMark(true);
}
}
}
I am more comfortable with pointers than I am with references, so my original version had 'edge' as a pointer into EdgeArray without the dereferencing. However, some digging led me to understand that passing by reference is used to effect changes on function caller's values. My suspicion was that some kind of similar tweak is needed here, but in this case all of the iteration occurs in a method in the Graph class where the nodeMap is stored. I've tried basically all variations of pointers (dereferenced or not) and references I could think of, but can't seem to get the marked values to persist outside the loop. I.e., if I add a print that depends on the second if conditional, I never see a result from it.
If your previous version worked, have you tried replacing only edge.setMark(true) with node->EdgeArray[i]->setMark(true)?

Copy elements from vector based on condition C++

I'm using C++ to create Hopcroft's algorithm for DFA Minimization.
Part of Hopcroft's algorithm is to - initially - divide two sets (P with accept and non-accept states and Q with non-accept states only). I already have group P, and from P I'm trying to extract Q. I'm using the following code to do it:
for(int i=0; i<groupP.size(); i++)
if(groupP[i]->final)
groupQ.push_back(groupP[i]);
in which groupP and groupQ are:
vector<node*> groupQ;
vector<node*> groupP;
and node is a structure that I've created to represent a node of my automata. It's guaranteed that the boolean attribute "final" is already correctly set (false for non-final states, true for final states).
Finally, my question is: is it correct to copy one element from a vector to another by doing what I've done? If I modify the content of a copied element from groupP, will this same element be modified in groupQ as well?
Right now, you have vectors of pointers. When you copy from one vector to another, you're copying the pointers, not the elements themselves.
Since you have two pointers referring to the same node, any modification made to a node will be visible in the other group--i.e., if yo make a change to groupP[i]->foo, then the same change will be visible in groupQ[j]->foo (provided that groupP[i] is one of the elements you copied from groupP to groupQ.
If you don't want that, you have a couple of choices. One would be to leave groupP and groupQ in the same vector, but partition the vector based on the state of an element's final member:
auto P_end = std::partition(groupP.begin(), groupQ.end(),
[](node *n) { return n->final;});
Then [groupP.begin(), P_begin) is groupP (i.e., final==true) and [P_begin, groupP.end()) is groupQ (i.e., final==false).
This moves the pointers around (and gives you an iterator so you know the dividing line between the two) so you have exactly one pointer to each element, but they're separated into the two relevant groups.
As a final possibility, you might want to actually copy elements from groupP to groupQ, and in the process create a new element, so after you copy items from groupP to groupQ, each item you copied now exists in two place--i.e., there's one element in groupP and one element in groupQ. Either one can be modified, but they're separate from each other, so either can be modified, but a modification to one has no effect on the other.
The most obvious way to achieve that would be be to just use vectors of nodes:
vector<node> groupQ;
vector<node> groupP;
This way, when you copy from one group to the other, you're copying the nodes themselves rather than pointers to nodes, so each copy creates a new, independent node with the same value as an existing node.
You could use std::copy_if which does the same thing:
std::copy_if(groupP.cbegin(), groupP.cend(),
std::back_inserter(groupQ),
[](node* n){ return n->final; });
Since you are manipulating pointers, the elements themselves are shared, so modifying a node in one of the container can be seen from the other.
Note that manipulating raw pointers like you are doing is very error prone, and you may want to use shared pointers for instance.
Edit: Adding missing std::back_inserter.

Graph data structure memory management

I would like to implement a custom graph data structure for my project and I had a question about proper memory management.
Essentially, the data structure will contain nodes that have two vectors: one for edges coming into the node and one for edges coming out of the node (no looped edges). The graph is connected. The graph will also contain one 'entry' node that will have no edges coming into it. An edge is simply a pointer to a node.
My question here is: What would be the best method of clearing up memory for this type of data structure? I understand how to do it if there was only one entry edge (at which point this structure degenerates to a n-ary tree), but I'm not sure what to do in the case where there are multiple nodes that have edges going into a single node. I can't just call delete from an arbitrary entry node because this will likely result in 'double free' bugs later on.
For example, suppose I had this subgraph:
C <-- B
^
|
A
If I were to call delete from node B, I would remove the memory allocated for C, but A would still have a pointer to it. So if I wanted to clear all the nodes A had connections to, I would get a double free error.
You will need to perform a search to figure out which node is still connected to the input edge, when you remove a component. If you end up with more than one connected group, you will need to figure out which one of these contains the entry node and remove all others.
No greedy (local) algorithm for this can exist, which can be shown by a simple thought experiment:
Let A, B be subgraphs connected only through the node n, which shall be removed. We are left with two unconnected subgraphs. There is no way of knowing (without a whole bunch of state per node) if we have just removed the only route to the entry node for A or B. And, it is necessary to figure that out, so that the appropriate choice of removing either A or B can be made.
Even if every node stored every single route to the entry node, it would mean you have to clean up all routes in all nodes whenever you remove a single node.
Solution Sketch
Let us talk about a graphical representation of what we need to do:
First, Color the node that is being deleted black. Then perform the following for every node we encounter:
For uncolored nodes:
If the node we came from is black, give this node a new color
If the node we came from is colored, give this node the same color
Travel through every outgoing edge
For colored nodes:
If the node we came from is black, just return
If the node we came from is the same color, just return
If the node we came from has a different color, merge the two colors (e.g. by remembering that green and blue are the same, or by painting every green node blue)
Travel through every outgoing edge
At the end we will know which connected components will exist after we delete the current node. All connected components (plus our original to be deleted node) which do not contain the entry node must be deleted (Note: This may delete every single node, if our to-be-deleted node was the entry node...)
Implementation
You will need a data structure like the following:
struct cleanup {
vector<set<node*>> colors;
node* to_be_deleted;
size_t entry_component;
};
The index into the vector of lists will be your "color". The "color black" will be represented by usage of to_be_deleted. Finally, the entry_component will contain the index of the color that has the entry node.
Now, the previous algorithm can be implemented. There are quite a few things to consider, and the implementation may end up being different, depending on what kind of support structures you already keep for other operations.
The answer depends on the complexity of the graph:
If the graph is a tree, each parent can own its children and delete them in its destructor.
If the graph is a directed acyclic graph, an easy and performant way to handle it is to do reference counting on the nodes.
If the graph can be cyclic, you are out of luck. You will need to keep track of each and every node in your graph, and then do garbage collection. Depending on your use case, you can either do the collection by
cleaning up everything when you are done with the complete graph, or by
repeatedly marking all connected nodes and cleaning up all the unreachable ones.
If there is any possibility to get away with option 1 or 2 (possibly tweaking the problem to ensure that the graph fulfills the constraint), you should do so; option 3 implies significant overheads in terms of code complexity and runtime.
There are a couple of ways. One way is to make your nodes know what other nodes have edges to it. So, if you delete C from B, C will need to remove the edge to it from A. So later when you remove/delete A, it won't try to delete C.
std::shared_ptr or some other type of reference counting may also work for you.
Here's a simple way to avoiding memory problems when implementing a graph: Don't use pointers to represent edges.
Instead, give each node a unique ID number (an incrementing integer counter will suffice). Keep a global unordered_map<int, shared_ptr<Node> > so that you can quickly look up any Node by its ID number. Then each Node can represent its edges as a set of integer Node IDs.
After you delete a Node (i.e. remove it from the global map of Nodes), it's possible that some other Nodes will now have "dangling edges", but that will be easy to detect and handle because when you go to look up the now-removed Node's ID in your global map, the lookup will fail. You can then gracefully respond by ignoring that edge, or by removing that edge its the source Node, or etc.
The advantages of doing it this way: The code remains very simple, and there is no need to worry about reference-cycles, memory leaks, or double-frees.
The disadvantages: It's a little bit less efficient to traverse the graph (since doing a map lookup takes more cycles than a simple pointer dereference) and (depending on what you are doing) the 'dangling edges' might require occasional cleanup sweeps (but those are easy enough to do... just iterate over the global map, and for each node, check each edge in its edge-set and remove the ones with IDs that aren't present in the global map)
Update: If you don't like doing a lot of unordered_map lookups, you could alternatively get very similar functionality by representing your edges using weak_ptr instead. A weak_ptr will automagically become NULL/invalid when the object it is pointing at goes away.

C++ Deleting objects from memory

Lets say I have allocated some memory and have filled it with a set of objects of the same type, we'll call these components.
Say one of these components needs to be removed, what is a good way of doing this such that the "hole" created by the component can be tested for and skipped by a loop iterating over the set of objects?
The inverse should also be true, I would like to be able to test for a hole in order to store new components in the space.
I'm thinking menclear & checking for 0...
boost::optional<component> seems to fit your needs exactly. Put those in your storage, whatever that happens to be. For example, with std::vector
// initialize the vector with 100 non-components
std::vector<boost::optional<component>> components(100);
// adding a component at position 15
components[15].reset(component(x,y,z));
// deleting a component at position 82
componetnts[82].reset()
// looping through and checking for existence
for (auto& opt : components)
{
if (opt) // component exists
{
operate_on_component(*opt);
}
else // component does not exist
{
// whatever
}
}
// move components to the front, non-components to the back
std::parition(components.begin(), components.end(),
[](boost::optional<component> const& opt) -> bool { return opt; });
The short answer is it depends on how you store it in memmory.
For example, the ansi standard suggests that vectors be allocated contiguously.
If you can predict the size of the object, you may be able to use a function such as size_of and addressing to be able to predict the location in memory.
Good luck.
There are at least two solutions:
1) mark hole with some flag and then skip it when processing. Benefit: 'deletion' is very fast (only set a flag). If object is not that small even adding a "bool alive" flag can be not so hard to do.
2) move a hole at the end of the pool and replace it with some 'alive' object.
this problem is related to storing and processing particle systems, you could find some suggestions there.
If it is not possible to move the "live" components up, or reorder them such that there is no hole in the middle of the sequence, then the best option if to give the component objects a "deleted" flag/state that can be tested through a member function.
Such a "deleted" state does not cause the object to be removed from memory (that is just not possible in the middle of a larger block), but it does make it possible to mark the spot as not being in use for a component.
When you say you have "allocated some memory" you are likely talking about an array. Arrays are great because they have virtually no overhead and extremely fast access by index. But the bad thing about arrays is that they aren't very friendly for resizing. When you remove an element in the middle, all following elements have to be shifted back by one position.
But fortunately there are other data structures you can use, like a linked list or a binary tree, which allow quick removal of elements. C++ even implements these in the container classes std::list and std::set.
A list is great when you don't know beforehand how many elements you need, because it can shrink and grow dynamically without wasting any memory when you remove or add any elements. Also, adding and removing elements is very fast, no matter if you insert them at the beginning, in the end, or even somewhere in the middle.
A set is great for quick lookup. When you have an object and you want to know if it's already in the set, checking it is very quick. A set also automatically discards duplicates which is really useful in many situations (when you need duplicates, there is the std::multiset). Just like a list it adapts dynamically, but adding new objects isn't as fast as in a list (not as expensive as in an array, though).
Two suggestions:
1) You can use a Linked List to store your components, and then not worry about holes.
Or if you need these holes:
2) You can wrap your component into an object with a pointer to the component like so:
class ComponentWrap : public
{
Component component;
}
and use ComponentWrap.component == null to find if the component is deleted.
Exception way:
3) Put your code in a try catch block in case you hit a null pointer error.

Fast bucket implementation

In a graph class I need to handle nodes with integer values (1-1000 mostly). In every step I want to remove a node and all its neighbors from the graph. Also I want to always begin with the node of the minimal value. I thought long about how to do this in the fastest possible manner and decided to do the following:
The graph is stored using adjancency lists
There is a huge array std::vector<Node*> bucket[1000] to store the nodes by its value
The index of the lowest nonempty bucket is always stored and kept track off
I can find the node of minimal value very fast by picking a random element of that index or if the bucket is already empty increase the index
Removing the selected node from the bucket can clearly done in O(1), the problem is that for removing the neighbors I need to search the bucket bucket[value of neighbor] first for all neighbor nodes, which is not really fast.
Is there a more efficient approach to this?
I thought of using something like std::list<Node*> bucket[1000], and assign every node a pointer to its "list element", such that I can remove the node from the list in O(1). Is this possible with stl lists, clearly it can be done with a normal double linked list that I could implement by hand?
I recently did something similar to this for a priority queue implementation using buckets.
What I did was use a hash tables (unordered_map), that way, you don't need to store 1000 empty vectors and you still get O(1) random access (general case, not guaranteed). Now, if you only need to store/create this graph class one time, it probably doesn't matter. In my case I needed to create the priority queue tens/hundreds of time per second and using the hash map made a huge difference (due to the fact that I only created unordered_sets when I actually had an element of that priority, so no need to initialize 1000 empty hash sets). Hash sets and maps are new in C++11, but have been available in std::tr1 for a while now, or you could use the Boost libraries.
The only difference that I can see between your & my usecase, is that you also need to be able to remove neighboring nodes. I'm assuming every node contains a list of pointers to it's neighbors. If so, deletion of the neighbors should take k * O(1) with k the number of neighbors (again, O(1) in general, not guaranteed, worst case is O(n) in an unordered_map/set). You just go over every neighboring node, get its priority, that gives you the correct index into the hash map. Then you find the pointer in the hash set which the priority maps to, this search in general will be O(1) and removing the element is again O(1) in general.
All in all, I think you got a pretty good idea of what to do, but I believe that using hash maps/sets will speed up your code by quite a lot (depends on the exact usage of course). For me, the speed improvement of an implementation with unordered_map<int, unordered_set> versus vector<set> was around 50x.
Here's what I would do. Node structure:
struct Node {
std::vector<Node*>::const_iterator first_neighbor;
std::vector<Node*>::const_iterator last_neighbor;
int value;
bool deleted;
};
Concatenate the adjacency lists and put them in a single std::vector<Node*> to lower the overhead of memory management. I'm using soft deletes so update speed is not important.
Sort pointers to the nodes by value into another std::vector<Node*> with a counting sort. Mark all nodes as not deleted.
Iterate through the nodes in sorted order. If the node under consideration has been deleted, go to the next one. Otherwise, mark it deleted and iterate through its neighbors and mark them deleted.
If your nodes are stored contiguously in memory, then you can omit last_neighbor at the cost of an extra sentinel node at the end of the structure, because last_neighbor of a node is first_neighbor of the succeeding node.