C++ How to update member value via map value pointers - c++

I have an unusual graph structure that's made up of a few classes, and I'm trying to set a boolean member value in one of them for the sake of traversal. Let's say the classes are Graph, Node, and Edge. Graph holds an unordered map with string labels as keys and Nodes as values. The graph has bounded degree, so fixed sized arrays of pointers to Edges are kept at each node, and each Edge also has pointers to the Nodes at each end.
My aim is to visit every Edge once, so I maintain a boolean 'marked' flag inside every Edge initially set to false. Since the map in the Graph lets me iterate over Nodes, I wish to iterate over all Edges from every Node and mark each to avoid repeated visits from opposite ends. However, I find that the marks are failing to be recorded and can't seem to get it to work.
My iteration code looks like this:
for(auto it = nodeMap.begin(); it != nodeMap.end(); ++it){
Node* node = &it->second;
for (i=0; i< node->EdgeArray.size(); i++){
if (node->EdgeArray[i]){
Edge & edge = *(node->EdgeArray[i]);
if(edge.getMark()) continue;
[...do needed processing...]
edge.setMark(true);
}
}
}
I am more comfortable with pointers than I am with references, so my original version had 'edge' as a pointer into EdgeArray without the dereferencing. However, some digging led me to understand that passing by reference is used to effect changes on function caller's values. My suspicion was that some kind of similar tweak is needed here, but in this case all of the iteration occurs in a method in the Graph class where the nodeMap is stored. I've tried basically all variations of pointers (dereferenced or not) and references I could think of, but can't seem to get the marked values to persist outside the loop. I.e., if I add a print that depends on the second if conditional, I never see a result from it.

If your previous version worked, have you tried replacing only edge.setMark(true) with node->EdgeArray[i]->setMark(true)?

Related

Loop in std::set by index

I'm making a program that handles dynamic graphs. Nodes and arcs are two classes, they are memorized in an array in the graph object, and are all indexed by custom ids (which are that item's position in the array).
Each arc has the ids of the 2 nodes it connects, each node has a list of the ids of all arcs its connected to. (all stored in sets)
An arc's destructor removes its id from the arcs set of the nodes it connected.
Now i'm writing node's destructor. It should call each arc's destructor until its set is empty. I cannot iterate through the set with an iterator, since every step the arc destructor is removing its id from the set itself.
Hence i'd need to always access last element until the set is empty; but std::set does not allow indexing like arrays and vectors, and it doesn't have a "back" like lists and stacks. How can i do that?
relevant code:
graph::arc::~arc()
{
owner->list_node[n1]->remove_arc(id);
owner->list_node[n2]->remove_arc(id);
owner->list_arc[id] = nullptr;
}
graph::node::~node()
{
while (!list_arc.empty())
{
owner->remove_arc(list_arc[list_arc.size()-1]); //invalid, that's roughly what i want to achieve
}
owner->list_node[id] = nullptr;
}
Notes
owner is the graph object. owner->list_(node/arc) holds the actual pointers. Each item's id is equal to its position in graph's list.
This feels like an error-prone cleanup strategy, but an improvement would probably require rewriting significantly more than what is provided in the question. (I suspect that shared and weak pointers could simplify things.) Since I lack enough information to suggest a better data structure:
For a set, it's easier to access the first element, *list_arc.begin(), than the last. (Not a huge difference, but still easier.)

Copy elements from vector based on condition C++

I'm using C++ to create Hopcroft's algorithm for DFA Minimization.
Part of Hopcroft's algorithm is to - initially - divide two sets (P with accept and non-accept states and Q with non-accept states only). I already have group P, and from P I'm trying to extract Q. I'm using the following code to do it:
for(int i=0; i<groupP.size(); i++)
if(groupP[i]->final)
groupQ.push_back(groupP[i]);
in which groupP and groupQ are:
vector<node*> groupQ;
vector<node*> groupP;
and node is a structure that I've created to represent a node of my automata. It's guaranteed that the boolean attribute "final" is already correctly set (false for non-final states, true for final states).
Finally, my question is: is it correct to copy one element from a vector to another by doing what I've done? If I modify the content of a copied element from groupP, will this same element be modified in groupQ as well?
Right now, you have vectors of pointers. When you copy from one vector to another, you're copying the pointers, not the elements themselves.
Since you have two pointers referring to the same node, any modification made to a node will be visible in the other group--i.e., if yo make a change to groupP[i]->foo, then the same change will be visible in groupQ[j]->foo (provided that groupP[i] is one of the elements you copied from groupP to groupQ.
If you don't want that, you have a couple of choices. One would be to leave groupP and groupQ in the same vector, but partition the vector based on the state of an element's final member:
auto P_end = std::partition(groupP.begin(), groupQ.end(),
[](node *n) { return n->final;});
Then [groupP.begin(), P_begin) is groupP (i.e., final==true) and [P_begin, groupP.end()) is groupQ (i.e., final==false).
This moves the pointers around (and gives you an iterator so you know the dividing line between the two) so you have exactly one pointer to each element, but they're separated into the two relevant groups.
As a final possibility, you might want to actually copy elements from groupP to groupQ, and in the process create a new element, so after you copy items from groupP to groupQ, each item you copied now exists in two place--i.e., there's one element in groupP and one element in groupQ. Either one can be modified, but they're separate from each other, so either can be modified, but a modification to one has no effect on the other.
The most obvious way to achieve that would be be to just use vectors of nodes:
vector<node> groupQ;
vector<node> groupP;
This way, when you copy from one group to the other, you're copying the nodes themselves rather than pointers to nodes, so each copy creates a new, independent node with the same value as an existing node.
You could use std::copy_if which does the same thing:
std::copy_if(groupP.cbegin(), groupP.cend(),
std::back_inserter(groupQ),
[](node* n){ return n->final; });
Since you are manipulating pointers, the elements themselves are shared, so modifying a node in one of the container can be seen from the other.
Note that manipulating raw pointers like you are doing is very error prone, and you may want to use shared pointers for instance.
Edit: Adding missing std::back_inserter.

Transforming recursive DFS-based topological sort into a non-recursive algorithm (without losing cycle detection)

Here is a pseudocode for topological sort from Wikipedia:
L ← Empty list that will contain the sorted nodes
while there are unmarked nodes do
select an unmarked node n
visit(n)
function visit(node n)
if n has a temporary mark then stop (not a DAG)
if n is not marked (i.e. has not been visited yet) then
mark n temporarily
for each node m with an edge from n to m do
visit(m)
mark n permanently
unmark n temporarily
add n to head of L
I want to write it non-recursively without losing cicle detection.
The problem is I don't know how to do that and I thought of many approaches already. Basically the problem is to do DFS but with remembering the "current path" (it corresponds to "temporary marking" certain nodes in pseudocode above). So traditional approach with a stack gives me nothing because when using a stack (and putting neighbors of every node in it) I'm putting nodes there even though I will see them "in the undetermined future" and I only want to keep track of nodes "on my current path" (I see it as walking through a maze with a thread I'm leaving behind me - when I see a dead end, I turn back and "wrap the tread" when doing that and at any point in time I want to remember nodes "with thread lying on them" and nodes on which the thread has been at least once). Any tips that would point me in the right direction? I mean - should I think of using 2 stacks instead of 1, maybe some other data structure?
Or maybe this algorithm is OK and I should leave it in its recursive form. I'm only worrying about exceeding the "recursion depth" for sufficiently large graphs.
Obviously, you'd use a stack but you wouldn't put all adjacent nodes anyway: that would yield a DFS with the wrong size complexity anyway (it would be quadratic in the number of nodes assuming non-parallel edges, otherwise potentially worse). Instead, you'd store the current node together with a state indicating the next node to be visited. You'd always work off the stack's top, i.e., something like this:
std::stack<std::pair<node, iterator> stack;
stack.push(std::make_pair(root, root.begin()));
while (!stack.empty()) {
std::pair<node, iterator>& top = stack.top();
if (top.second == top.first.begin()) {
mark(top.first);
// do whatever needs to be done upon first visit
}
while (top.second != top.first.end() && is_marked(*top.second)) {
++top.second;
}
if (top.second != top.first.end()) {
node next = *top.second;
++top.second;
stack.push(std::make_pair(next, next.first());
}
else {
stack.pop();
}
}
This code assumes that nodes have a begin() and end() yielding suitable iterators to iterate over adjacent nodes. Something along those lines, possibly with an indirection via edges will certainly exist. It also assumes that there are functions available to access a node's mark. In a more realistic imlementation that would probably use something a BGL property map. Whether a std::stack<T> can be used to respresent the stack depends on whether the nodes currently on the stack need to be accessed: std::stack doesn't provide such access. However, it is trivial to create a suitable stack implementation based on any of the STL sequence containers.

Graph data structure memory management

I would like to implement a custom graph data structure for my project and I had a question about proper memory management.
Essentially, the data structure will contain nodes that have two vectors: one for edges coming into the node and one for edges coming out of the node (no looped edges). The graph is connected. The graph will also contain one 'entry' node that will have no edges coming into it. An edge is simply a pointer to a node.
My question here is: What would be the best method of clearing up memory for this type of data structure? I understand how to do it if there was only one entry edge (at which point this structure degenerates to a n-ary tree), but I'm not sure what to do in the case where there are multiple nodes that have edges going into a single node. I can't just call delete from an arbitrary entry node because this will likely result in 'double free' bugs later on.
For example, suppose I had this subgraph:
C <-- B
^
|
A
If I were to call delete from node B, I would remove the memory allocated for C, but A would still have a pointer to it. So if I wanted to clear all the nodes A had connections to, I would get a double free error.
You will need to perform a search to figure out which node is still connected to the input edge, when you remove a component. If you end up with more than one connected group, you will need to figure out which one of these contains the entry node and remove all others.
No greedy (local) algorithm for this can exist, which can be shown by a simple thought experiment:
Let A, B be subgraphs connected only through the node n, which shall be removed. We are left with two unconnected subgraphs. There is no way of knowing (without a whole bunch of state per node) if we have just removed the only route to the entry node for A or B. And, it is necessary to figure that out, so that the appropriate choice of removing either A or B can be made.
Even if every node stored every single route to the entry node, it would mean you have to clean up all routes in all nodes whenever you remove a single node.
Solution Sketch
Let us talk about a graphical representation of what we need to do:
First, Color the node that is being deleted black. Then perform the following for every node we encounter:
For uncolored nodes:
If the node we came from is black, give this node a new color
If the node we came from is colored, give this node the same color
Travel through every outgoing edge
For colored nodes:
If the node we came from is black, just return
If the node we came from is the same color, just return
If the node we came from has a different color, merge the two colors (e.g. by remembering that green and blue are the same, or by painting every green node blue)
Travel through every outgoing edge
At the end we will know which connected components will exist after we delete the current node. All connected components (plus our original to be deleted node) which do not contain the entry node must be deleted (Note: This may delete every single node, if our to-be-deleted node was the entry node...)
Implementation
You will need a data structure like the following:
struct cleanup {
vector<set<node*>> colors;
node* to_be_deleted;
size_t entry_component;
};
The index into the vector of lists will be your "color". The "color black" will be represented by usage of to_be_deleted. Finally, the entry_component will contain the index of the color that has the entry node.
Now, the previous algorithm can be implemented. There are quite a few things to consider, and the implementation may end up being different, depending on what kind of support structures you already keep for other operations.
The answer depends on the complexity of the graph:
If the graph is a tree, each parent can own its children and delete them in its destructor.
If the graph is a directed acyclic graph, an easy and performant way to handle it is to do reference counting on the nodes.
If the graph can be cyclic, you are out of luck. You will need to keep track of each and every node in your graph, and then do garbage collection. Depending on your use case, you can either do the collection by
cleaning up everything when you are done with the complete graph, or by
repeatedly marking all connected nodes and cleaning up all the unreachable ones.
If there is any possibility to get away with option 1 or 2 (possibly tweaking the problem to ensure that the graph fulfills the constraint), you should do so; option 3 implies significant overheads in terms of code complexity and runtime.
There are a couple of ways. One way is to make your nodes know what other nodes have edges to it. So, if you delete C from B, C will need to remove the edge to it from A. So later when you remove/delete A, it won't try to delete C.
std::shared_ptr or some other type of reference counting may also work for you.
Here's a simple way to avoiding memory problems when implementing a graph: Don't use pointers to represent edges.
Instead, give each node a unique ID number (an incrementing integer counter will suffice). Keep a global unordered_map<int, shared_ptr<Node> > so that you can quickly look up any Node by its ID number. Then each Node can represent its edges as a set of integer Node IDs.
After you delete a Node (i.e. remove it from the global map of Nodes), it's possible that some other Nodes will now have "dangling edges", but that will be easy to detect and handle because when you go to look up the now-removed Node's ID in your global map, the lookup will fail. You can then gracefully respond by ignoring that edge, or by removing that edge its the source Node, or etc.
The advantages of doing it this way: The code remains very simple, and there is no need to worry about reference-cycles, memory leaks, or double-frees.
The disadvantages: It's a little bit less efficient to traverse the graph (since doing a map lookup takes more cycles than a simple pointer dereference) and (depending on what you are doing) the 'dangling edges' might require occasional cleanup sweeps (but those are easy enough to do... just iterate over the global map, and for each node, check each edge in its edge-set and remove the ones with IDs that aren't present in the global map)
Update: If you don't like doing a lot of unordered_map lookups, you could alternatively get very similar functionality by representing your edges using weak_ptr instead. A weak_ptr will automagically become NULL/invalid when the object it is pointing at goes away.

How to make QList<Type*> work with indexOf() and custom operator==()?

Given the following code:
QList<Vertex*> _vertices; //this gets filled
//at some other point i want to check if there's already
//a vertex with the same payload inside the list
Vertex* v = new Vertex(payload);
int result = _vertices.indexOf(v);
if (result == -1){
//add the vertex to the list
} else {
//discard v and return pointer to match
}
//overloaded Vertex::operator==
bool Vertex::operator==(Vertex* other){
//i return true if my payload and the others
//payload are the same
}
As far as I can see indexOf() never ends up calling my operator==. I assume this is because of QList encapsulating a pointer type and indexOf() comparing the pointers. Is there a way of keeping ponters in the QList and still using my own operator==()?
Like Vertex*::operator==(Vertex* other)
Related Questions: removing in pointer type Qlists | not working because of pointer type
Edit: Intention.
Two vertices are considered equal iff. identifiers carried by their payload are equal.
Vertex is part of a Graph class. I want clients of that class to be able to call Graph::addEdge(Payload,Payload) to populate the graph. Graph objects then take care of wrapping up Payloads in Vertex objects and building Edges. Hence Graph needs to check if a Vertex encapsulating a given payload doesn't already exist. Using QList seemed like the "simplest thing that might work" at the time of writing the code.
Is there a way of keeping ponters in the QList and still using my own
operator==()?
No, you would need QList to dereference the pointer first, which it doesn't. You would have to subclass QList in order to do that. But as indexOf() just iterates through and compares using operator==() there's nothing stopping you doing the same thing manually.
However, all of this looks like a code smell. Trying to find something in an unordered/non-hashed container is in linear time - very slow compared to QMap/QHash. Please edit your question describing why you want to do this, and what data Vertex contains, and we'll see if the community can up with better performing mechanism.