Kruskal’s Minimum Spanning Tree Algorithm (C++) - c++

I'm working though an assignment on the Stanford CS106B C++ course, but I'm massively stuck implementing Kruskal's algorithm to find a minimum spanning tree.
To be more specific, I can't figure out the logic to determine whether to add an arc/ vertex to the tree. These are the instructions I've been given:
"The strategy you will use is based on tracking connected sets. For each node, maintain the set of the nodes that are connected to it. At
the start, each node is connected only to itself. When a new arc is
added, you merge the sets of the two endpoints into one larger combined set that both nodes are now connected to. When considering an
arc, if its two endpoints already belong to the same connected set,
there is no point in adding that arc and thus you skip it."
void getMinSpanTree(graphT *&graph)
{
Map<Set <nodeT *> > connections;
// Create set of arcs in decreasing order
Set<arcT *> arcs(costCmp);
Set<arcT *>::Iterator gItr = graph->arcs.iterator();
while (gItr.hasNext()) {
arcT *arc = gItr.next();
arcs.add(arc);
}
// Initialise map with initial node connections
Set<nodeT *>::Iterator nItr = graph->nodes.iterator();
while (nItr.hasNext()) {
nodeT *node = nItr.next();
Set<nodeT *> nodes;
nodes.add(node);
connections.add(node->name, nodes);
}
// Iterate through arcs
Set<arcT *>::Iterator aItr = arcs.iterator();
while (aItr.hasNext()) {
arcT *arc = aItr.next();
if (connections[arc->start->name].equals(connections[arc->finish->name])) {
Set<nodeT *> nodes = connections[arc->start->name];
nodes.unionWith(connections[arc->finish->name]);
connections[arc->start->name] = nodes;
connections[arc->finish->name] = nodes;
// Update display with arc
coordT start = {arc->start->x, arc->start->y};
coordT finish = {arc->finish->x, arc->finish->y};
DrawLineBetween(start, finish, HIGHLIGHT_COLOR);
}
}
}
I know the line:
if (connections[arc->start->name].equals(connections[arc->finish->name])) {
needs to be changed. Does anyone know what it should be? :)

One simple solution would be to iterate over nodes in
connections[arc->start->name]
and see if they match
arc->finish->name
If so, arc->start->name and arc->finish->name are connected and there's no point in merging the two sets.

Related

Segmentation fault in recursive function when using smart pointers

I get a segmentation fault in the call to
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
after a few recursive calls. Strange thing is that it's always at the same point in time. Can anyone spot the problem?
This is an implementation for a dynamic programming problem and here I'm accumulating the costs of a path. I have simplified the cost function but in this example the problem still occurs.
void HorizonLineDetector::dp(std::shared_ptr<Node> n)
{
n->cost= 1 + n->prev->cost;
//Check if we reached the last column(done!)
if (n->x==current_edges.cols-1)
{
//Save the info in the last node if it's the cheapest path
if (last_node->cost > n->cost)
{
last_node->cost=n->cost;
last_node->prev=n;
}
}
else
{
//Check for neighboring pixels to see if they are edges, launch dp with all the ones that are
for (int i=0;i<2;i++)
{
for (int j=-1;j<2;j++)
{
if (i==0 && j==0) continue;
if (n->x+i >= current_edges.cols || n->x+i < 0 ||
n->y+j >= current_edges.rows || n->y+j < 0) continue;
if (current_edges.at<char>(n->y+j,n->x+i)!=0)
{
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
//n->next.push_back(n1);
nlist.push_back(n1);
dp(n1);
}
}
}
}
}
class Node
{
public:
Node(){}
Node(std::shared_ptr<Node> p,int x_,int y_){prev=p;x=x_;y=y_;lost=0;}
Node(Node &n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;}//next=n1.next;}
std::shared_ptr<Node> prev; //Previous and next nodes
int cost; //Total cost until now
int lost; //Number of steps taken without a clear path
int x,y;
Node& operator=(const Node &n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;}//next=n1.next;}
Node& operator=(Node &&n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;n1.prev=nullptr;}//next=n1.next;n1.next.clear();}
};
Your code looks like a pathological path search, in that it checks almost every path and doesn't keep track of paths it has already checked you can get to more than one way.
This will build recursive depth equal to the length of the longest path, and then the next longest path, and ... down to the shortest one. Ie, something like O(# of pixels) depth.
This is bad. And, as call stack depth is limited, will crash you.
The easy solution is to modify dp into dp_internal, and have dp_internal return a vector of nodes to process next. Then write dp, which calls dp_internal and repeats on its return value.
std::vector<std::shared_ptr<Node>>
HorizonLineDetector::dp_internal(std::shared_ptr<Node> n)
{
std::vector<std::shared_ptr<Node>> retval;
...
if (current_edges.at<char>(n->y+j,n->x+i)!=0)
{
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
//n->next.push_back(n1);
nlist.push_back(n1);
retval.push_back(n1);
}
...
return retval;
}
then dp becomes:
void HorizonLineDetector::dp(std::shared_ptr<Node> n)
{
std::vector<std::shared_ptr<Node>> nodes={n};
while (!nodes.empty()) {
auto node = nodes.back();
nodes.pop_back();
auto new_nodes = dp_internal(node);
nodes.insert(nodes.end(), new_nodes.begin(), new_nodes.end());
}
}
but (A) this will probably just crash when the number of queued-up nodes gets ridiculously large, and (B) this just patches over the recursion-causes-crash, doesn't make your algorithm suck less.
Use A*.
This involves keeping track of which nodes you have visited and what nodes to process next with their current path cost.
You then use heuristics to figure out which of the ones to process next you should check first. If you are on a grid of some sort, the heuristic is to use the shortest possible distance if nothing was in the way.
Add the cost to get to the node-to-process, plus the heuristic distance from that node to the destination. Find the node-to-process that has the least total. Process that one: you mark it as visited, and add all of its adjacent nodes to the list of nodes to process.
Never add a node to the list of nodes to process that you have already visited (as that is redundant work).
Once you have a solution, prune the list of nodes to process against any node whose current path value is greater than or equal to your solution. If you know your heuristic is a strong one (that it is impossible to get to the destination faster), you can even prune based off of the total of heuristic and current cost. Similarly, don't add to the list of nodes to process if it would be pruned by this paragraph.
The result is that your algorithm searches in a relatively strait line towards the target, and then expands outwards trying to find a way around any barriers. If there is a relatively direct route, it is used and the rest of the universe isn't even touched.
There are many optimizations on A* you can do, and even alternative solutions that don't rely on heuristics. But start with A*.

Finding the number of edges and performing a topo sort in my graph implementation

I have been working on a graph implementation for the last few days. All of this is really new to me, and I am stuck on two parts of my implementation. I am implementing a digraph of courses from an input file. From the file, I can determine which courses are prereqs for other courses. I then create a digraph with courses as nodes, and edges connecting courses that are prereqs. I also want to find the total number of nodes and edges, and perform a topological sort on the graph (I will later be adding weights to the edges). Here is my implementation.
Digraph.h
class vertex{
public:
typedef std::pair<int, vertex*> ve;
std::vector<ve> adjacency;
std::string course;
vertex(std::string c){
course = c;
}
};
class Digraph{
public:
void addVertex(std::string&);
void addEdge(std::string& from, std::string& to, int cost);
typedef std::map<std::string, vertex *> vmap;
vmap work;
int getNumVertices();
int getNumEdges();
void getTopoSort();
};
Digraph.cpp
void Digraph::addVertex(std::string& course){
vmap::iterator iter = work.begin();
iter = work.find(course);
if(iter == work.end()){
vertex *v;
v = new vertex(course);
work[course] = v;
return;
}
}
void Digraph::addEdge(std::string& from, std::string& to, int cost){
vertex *f = (work.find(from)->second);
vertex *t = (work.find(to)->second);
std::pair<int, vertex *> edge = std::make_pair(cost, t);
f->adjacency.push_back(edge);
}
Finding the number of nodes was easy just return work.size. I have confirmed this is working properly. I am lost on how I would return the number of edges in my graph. It seems it would be simple, but everything I tried doesn't work. Secondly, I am completely lost on how to perform a topological sort on this graph. Any assistance is appreciated.
A simple way would be to iterate through all vertices in your graph, add up their neighbor counts and then divide by two:
int Digraph::getNumEdges(){
int count = 0;
for (const auto & v : work) {
count += v.second->adjacency.size();
}
return count / 2;
}
To use the range based for loop, you need to use c++11. With g++ that would be --std=c++11 on the command line.
EDIT:
I just realized you have a directed graph, and you probably want to count one for each direction. In such case: don't divide by two!
int Digraph::getNumEdges(){
int count = 0;
for (const auto & v : work) {
count += v.second->adjacency.size();
}
return count;
}
First, for the number of edges, it would be simpler to count them directly when you build the graph (just add a counter in your Digraph class and increment it each time you add an edge … )
For the topological sort, first I have a question: your edges are from prereqs to dependant courses ? That is you have a link A -> B if A is a prereq of B ? If this not the case, you need to invert your graph.
You to main algorithm in order to build a topological sort: one based on a simple DFS (http://en.wikipedia.org/wiki/Depth-first_search) and the other relying on in-degrees (http://en.wikipedia.org/wiki/Directed_graph#Indegree_and_outdegree) of your vertices (courses in your case.)
Normally, you need to verify that your graph doesn't contain any cycle, which will normally be the case if your data are coherent.
Let's consider the DFS based algorithm: a DFS traverses each vertices from a given root following edges as they appear. We can easily prove that order of last encounter of a vertex forms a reverse topological order. So, all we need is to push in a stack the current vertex after the calls on its successors.
I made a quick and dirty implementation for you, using C++11 again.
First, add the following to the Digraph class:
typedef std::unordered_set<vertex*> marks_set;
marks_set marks;
typedef std::deque<vertex*> stack;
stack topo;
void dfs(vertex* vcur);
Then here comes the code:
void Digraph::dfs(vertex* vcur) {
marks.insert(vcur);
for (const auto & adj : vcur->adjacency) {
vertex* suc = adj.second;
if (marks.find(suc) == marks.end()) {
this->dfs(suc);
} // you can detect cycle in the else statement
}
topo.push_back(vcur);
}
void Digraph::getTopoSort() {
// It should be a good idea to separate this algorithm from the graph itself
// You probably don't want the inner state of it in your graph,
// but that's your part.
// Be sure marks and topo are empty
marks.clear();
topo.clear();
// Run the DFS on all connected components
for (const auto & v : work) {
if (marks.find(v.second) == marks.end()) {
this->dfs(v.second);
}
}
// Display it
for (const auto v : topo) {
std::cout << v->course << "\n";
}
}
The code compiles but I haven't tested. If for any reasons you have an issue with the recursive algorithm (the function Digraph::dfs), it can be derecursified using a stack containing the parent of the target vertex and the iterator to the current successor, the iterator reach the end of the adjacency list, you can push the parent in the topological sort.
The other algorithm is almost as simple: for each vertices you need to count the number of predecessor (in-degree) which can be done while building the graph. In order to compute the topological sort, you look for the first vertex with a in-degree of 0 (no predecessor), you then decrease the in-degree of all its successors and continue with the next vertex with 0. If the graph has no cycle, there will always be a vertex with a in-degree of 0 (at beginning of course, but also during the algorithm run as you decrease it) until all vertices have been seen. The order of vertices encounter form a topological sort (this is related to the Bellman shortest-path algorithm.)
Note that these 2 algorithms are listed here: http://en.wikipedia.org/wiki/Topological_sorting. The one using in-degree is described in terms of removing edges which we simply simulate by decreasing the in-degree (a far less destructive approach … )

BFS using adjacency lists in STL

I am trying to write a program for implementing BFS in C++ using STL. I am representing the adjacency list using nested vector where each cell in vector contains a list of nodes connected to a particular vertex.
while(myQ.size()!=0)
{
int j=myQ.front();
myQ.pop();
int len=((sizeof(adjList[j]))/(sizeof(*adjList[j])));
for (int i=0;i<len;i++)
{
if (arr[adjList[j][i]]==0)
{
myQ.push(adjList[j][i]);
arr[adjList[j][i]]=1;
dist(v)=dist(w)+1;
}
}
}
myQ is the queue i am using to keep the nodes along whose edges i will be exploring the graph. In the notation adjList[j] represents the vector pointing to the list and adjList[j][i] represents a particular node in that list. I am storing whether i have explored a particular node by inputting 1 in the array arr. Also dist(v)=dist(w)+1 is not a part of the code but i want to know how i can write it in the correct syntax where my v is the new vertex and w is the old one which discovers v i.e w=myQ.front().
If I have understood your problem, then you want a data structure to store the distances of the graph nodes.
This can be easily done using map.
Use this:
typedef std::map <GraphNode*, int> NodeDist;
NodeDist node_dist;
Replace dist(v)=dist(w)+1; with:
NodeDist::iterator fi = node_dist.find (w);
if (fi == node_dist.end())
{
// Assuming 0 distance of node w.
node_dist[v] = 1;
}
else
{
int w_dist = (*fi).second;
node_dist[v] = w_dist + 1;
}
Please let me if I have misunderstood your problem or the given solution does not work for you. We can work on that.

B-Tree Node Splitting Techniques

I've stumbled upon a problem whilst doing my DSA (Data Structures and Algorithms) homework. I'm said to implement a B-Tree with Insertion and Search algorithms. As far as it goes, the search is working correctly, but I'm having trouble implementing the insertion function. Specifically the logic behind the B-Tree node-splitting algorithm. A pseudocode/C-style I could come up with is the following:
#define D 2
#define DD 2*D
typedef btreenode* btree;
typedef struct node
{
int keys[DD]; //D == 2 and DD == 2*D;
btree pointers[DD+1];
int index; //used to iterate throught the "keys" array
}btreenode;
void splitNode(btree* parent, btree* child1, btree* child2)
{
//Copies the content from the splitted node to the children
(*child1)->key[0] = (*parent)->key[0];
(*child1)->key[1] = (*parent)->key[1];
(*child2)->key[0] = (*parent)->key[2];
(*child2)->key[1] = (*parent)->key[3];
(*child1)->index = 1;
(*child2)->index = 1;
//"Clears" the parent node from any data
for(int i = 0; i<DD; i++) (*parent)->key[i] = -1;
for(int i = 0; i<DD+1; i++) (*parent)->pointers[i] = NULL
//Fixed the pointers to the children
(*parent)->index = 0;
//the line bellow was taken out for creating a new node that didn't have to be there.
//(*parent)->key[(*parent)->index] = newNode(); // The newNode() function allocs and inserts a the new key that I need to insert.
(*parent)->pointers[index] = (*child1);
(*parent)->pointers[index+1] = (*child2);
}
I'm almost sure that I'm messing up something with the pointers, but I'm not sure what. Any help is appreciated. Maybe I need a little bit more study on the B-Tree subject? I must add that while I can use basic input/output from C++, I need to use C-style structs.
You don't need to create a new node here. You've apparently already created the two new child nodes. All you have to do here after populating the children is make the parent now point to the two children, via a copy of the first key in each of them, and adjust its key count to two. You don't need to set the parent keys to -1 either.

priority_queue becomes extremely slow in debug mode

I am currently writing an A* pathfinding algorithm for a game and came across a very strange performance problem regarding priority_queue's.
I am using a typical 'open nodes list', where I store found, but yet unprocessed nodes. This is implemented as an STL priority_queue (openList) of pointers to PathNodeRecord objects, which store information about a visited node. They are sorted by the estimated cost to get there (estimatedTotalCost).
Now I noticed that whenever the pathfinding method is called, the respective AI thread gets completely stuck and takes several (~5) seconds to process the algorithm and calculate the path. Subsequently I used the VS2013 profiler to see, why and where it was taking so long.
As it turns out, the pushing to and popping from the open list (the priority_queue) takes up a very large amount of time. I am no expert in STL containers, but I never had problems with their efficiency before and this is just weird to me.
The strange thing is that this only occurs while using VS's 'Debug' build configuration. The 'Release' conf. works fine for me and the times are back to normal.
Am I doing something fundamentally wrong here or why is the priority_queue performing so badly for me? The current situation is unacceptable to me, so if I cannot resolve it soon, I will need to fall back to using a simpler container and inserting it to the right place manually.
Any pointers to why this might be occuring would be very helpful!
.
Here is a snippet of what the profiler shows me:
http://i.stack.imgur.com/gEyD3.jpg
.
Code parts:
Here is the relevant part of the pathfinding algorithm, where it loops the open list until there are no open nodes:
// set up arrays and other variables
PathNodeRecord** records = new PathNodeRecord*[graph->getNodeAmount()]; // holds records for all nodes
std::priority_queue<PathNodeRecord*> openList; // holds records of open nodes, sorted by estimated rest cost (most promising node first)
// null all record pointers
memset(records, NULL, sizeof(PathNodeRecord*) * graph->getNodeAmount());
// set up record for start node and put into open list
PathNodeRecord* startNodeRecord = new PathNodeRecord();
startNodeRecord->node = startNode;
startNodeRecord->connection = NULL;
startNodeRecord->closed = false;
startNodeRecord->costToHere = 0.f;
startNodeRecord->estimatedTotalCost = heuristic->estimate(startNode, goalNode);
records[startNode] = startNodeRecord;
openList.push(startNodeRecord);
// ### pathfind algorithm ###
// declare current node variable
PathNodeRecord* currentNode = NULL;
// loop-process open nodes
while (openList.size() > 0) // while there are open nodes to process
{
// retrieve most promising node and immediately remove from open list
currentNode = openList.top();
openList.pop(); // ### THIS IS, WHERE IT GETS STUCK
// if current node is the goal node, end the search here
if (currentNode->node == goalNode)
break;
// look at connections outgoing from this node
for (auto connection : graph->getConnections(currentNode->node))
{
// get end node
PathNodeRecord* toNodeRecord = records[connection->toNode];
if (toNodeRecord == NULL) // UNVISITED -> path record needs to be created and put into open list
{
// set up path node record
toNodeRecord = new PathNodeRecord();
toNodeRecord->node = connection->toNode;
toNodeRecord->connection = connection;
toNodeRecord->closed = false;
toNodeRecord->costToHere = currentNode->costToHere + connection->cost;
toNodeRecord->estimatedTotalCost = toNodeRecord->costToHere + heuristic->estimate(connection->toNode, goalNode);
// store in record array
records[connection->toNode] = toNodeRecord;
// put into open list for future processing
openList.push(toNodeRecord);
}
else if (!toNodeRecord->closed) // OPEN -> evaluate new cost to here and, if better, update open list entry; otherwise skip
{
float newCostToHere = currentNode->costToHere + connection->cost;
if (newCostToHere < toNodeRecord->costToHere)
{
// update record
toNodeRecord->connection = connection;
toNodeRecord->estimatedTotalCost = newCostToHere + (toNodeRecord->estimatedTotalCost - toNodeRecord->costToHere);
toNodeRecord->costToHere = newCostToHere;
}
}
else // CLOSED -> evaluate new cost to here and, if better, put back on open list and reset closed status; otherwise skip
{
float newCostToHere = currentNode->costToHere + connection->cost;
if (newCostToHere < toNodeRecord->costToHere)
{
// update record
toNodeRecord->connection = connection;
toNodeRecord->estimatedTotalCost = newCostToHere + (toNodeRecord->estimatedTotalCost - toNodeRecord->costToHere);
toNodeRecord->costToHere = newCostToHere;
// reset node to open and push into open list
toNodeRecord->closed = false;
openList.push(toNodeRecord); // ### THIS IS, WHERE IT GETS STUCK
}
}
}
// set node to closed
currentNode->closed = true;
}
Here is my PathNodeRecord with the 'less' operator overloading to enable sorting in priority_queue:
namespace AI
{
struct PathNodeRecord
{
Node node;
NodeConnection* connection;
float costToHere;
float estimatedTotalCost;
bool closed;
// overload less operator comparing estimated total cost; used by priority queue
// nodes with a higher estimated total cost are considered "less"
bool operator < (const PathNodeRecord &otherRecord)
{
return this->estimatedTotalCost > otherRecord.estimatedTotalCost;
}
};
}
std::priority_queue<PathNodeRecord*> openList
I think the reason is that you have a priority_queue of pointers to PathNodeRecord.
and there is no ordering defined for the pointers.
try changing it to std::priority_queue<PathNodeRecord> first, if it makes a difference then all you need is passing on your own comparator that knows how to compare pointers to PathNodeRecord, it will just dereference the pointers first and then do the comparison.
EDIT:
taking a wild guess about why did you get an extremely slow execution time, I think the pointers were compared based on their address. and the addresses were allocated starting from one point in memory and going up.
and so that resulted in the extreme case of your heap (the heap as in data structure not the memory part), so your heap was actually a list, (a tree where each node had one children node and so on).
and so you operation took a linear time, again just a guess.
You cannot expect a debug build to be as fast as a release optimized one, but you seems to do a lot of dynamic allocation that may interact badly with the debug runtime.
I suggest you to add _NO_DEBUG_HEAP=1 in the environment setting of the debug property page of your project.