Recursive Depth First Search (DFS) algorithm in C++ - c++

I've implemented the graph in the class Graph as adjacency matrix with all required functions to access and modify it, the ones i needed in the DFS algorithm
// for a Graph x, node v
string x.get_node_value(v) //returns the the label of the node
queue x.neighbors(v) //returns a queue with the adjacent nodes to the node v (nodes index on the graph starts from 1)
now i tried to implement a recursive DFS but it always stuck at some point, it never recurse back after it calls itself again, so it works and finds the goal if it exists on its path before it reaches a leaf node, but then it stops after reaching a leaf node
It keeps track of the nodes by indicating colors, unvisited node is WHITE, node in progress is GREY, node that is done (visited and all children are visited) is BLACK.
Here's the kickoff function:
int Search::DFSr(const std::string search_key, Graph& x, int starting_node){
Color * visited_nodes = new Color[x.size()];
for(int i=0; i<x.size(); i++){visited_nodes[i] = WHITE;}
bool goal_f = 0;
int goal = DFSUtil(search_key, x, starting_node, visited_nodes, goal_f);
if(goal_f) return goal;
else return -1;
}
and here's the visit function:
int Search::DFSUtil(std::string search_key, Graph& x, int current_node, Color(visited_nodes)[], bool& goal_f){
visited_nodes[current_node-1] = GREY; //-1 because array index start from 0 but nodes index on the graph starts from 1
if(x.get_node_value(current_node) == search_key ){
goal_f = 1;
return current_node;
}
else{
std::queue <int> childs = x.neighbors(current_node);
while(!childs.empty() && !goal_f){
if(visited_nodes[childs.front()-1] == WHITE){
return DFSUtil(search_key, x, childs.front(), visited_nodes, goal_f);
}
childs.pop();
}
visited_nodes[current_node-1] = BLACK;
}
}
Tested it on this graph:
It only finds the goal if it was within A, B, or D, otherwise it exits normally without errors

The following change to your code should help:
int Search::DFSUtil(std::string search_key, Graph& x, int current_node, Color(visited_nodes)[], bool& goal_f){
visited_nodes[current_node-1] = GREY; //-1 because array index start from 0 but nodes index on the graph starts from 1
if(x.get_node_value(current_node) == search_key ){
goal_f = 1;
return current_node;
}
else{
std::queue <int> childs = x.neighbors(current_node);
while(!childs.empty() && !goal_f){
if(visited_nodes[childs.front()-1] == WHITE){
int result = DFSUtil(search_key, x, childs.front(), visited_nodes, goal_f);
if( result >= 0 ) {
return result;
}
}
childs.pop();
}
visited_nodes[current_node-1] = BLACK;
}
return -1;
}
You can further remove goal_f variable from parameters and statements involving it. A return value is sufficient.
EDIT: the problem was in this line of code
return DFSUtil(search_key, x, childs.front(), visited_nodes, goal_f);
Here the function was returning even if the goal had not been found. So the remaining (in the queue) neighbors were not getting visited. The fix makes the function to return only if the goal has been reached. In the fix, there is also "return -1" statement in the end of the function, which indicates that the function finished without reaching the goal.
For assesment of code logic, memory, and readability, and suggestions of best practices you can post your code here: https://codereview.stackexchange.com/

Related

How to find if there's a cycle within selected nodes in a directed graph?(C++)

I'm currently working on a problem of finding cycles consisted of selected nodes in a directed graph.
For the instance described here:
there's a cycle within node 1, 2, 3, and no cycle is found within 1, 2, 4.
I've tried to implement the algorithm myself with the following operation:
Start with the first node within the selected nodes.
Mark current node as "visited".
Check if adjacent nodes are within selected nodes.
Recursive call if the node hasn't been visited, return true if it's visited.
At the end of the function: return false.
My implementation is as following(the function is called for each selected nodes, and the array storing visited nodes is initialized every time)
bool hasLoop(const int startNode, const bool directions[][MAX_DOT_NUM], const int nodesLen, bool nodesVisited[], const int selectedNodes[], const int selectedNum){
nodesVisited[startNode] = true;
for(int i = 0; i < nodesLen; i++){ //loop through all nodes
if(withinSelected(i, selectedNodes, selectedNum) == false) continue; //check loop only for selected nodes
if(directions[startNode][i] == 1){ //connected and is within selected nodes
if(nodesVisited[i] == true){
return true;
}else{
if(hasLoop(i, directions, nodesLen, nodesVisited, selectedNodes, selectedNum)){
return true;
}
}
}
}
return false;
}
However, this implementation doesn't work for all testing data from the online judge I'm using.
I found that my algorithm is different from Depth First Search, which uses White, Grey, Black arrays to store nodes that are not visited, being visited, or not needed to be visited, I wonder if that's the reason causing problems.
Hopefully, I can find the bug causing this implementation not to work for all circumstances with your help!
Thank you so much for reading this!
Edited: it's a directed graph! sorry for that.
Edited: Thanks for your help so much! I revised my implementation to have the function return true only when finding a node pointing to the node where the function started.
Here's the final implementation accepted by the online judge I use:
bool hasLoop(const int currentNode, const bool directions[][MAX_DOT_NUM], const int nodesLen, bool nodesVisited[], const int selectedNodes[], const int selectedNum, const int startNode){
// cout << currentNode << " -> ";
nodesVisited[currentNode] = true;
for(int i = 0; i < nodesLen; i++){
if(withinSelected(i, selectedNodes, selectedNum) == false) continue;
if(directions[currentNode][i] == 1){ //connected and is within selected nodes
if(nodesVisited[i] == true){
if(i == startNode) return true;
}else{
if(hasLoop(i, directions, nodesLen, nodesVisited, selectedNodes, selectedNum, startNode)){
return true;
}
}
}
}
return false;
}
Your implementation is a DFS, but will fail for "side nodes" that do not create a cycle:
Consider the graph with 3 nodes (A,B,C):
A
/ \
/ \
V V
B <---- C
Your algorithm will tell that the graph has a cycle, while in fact - it does not!
You can solve it by finding Strongly Connected Components, and seeing if there are non trivial (size>1) components.
Another solution would be to use Topological Sort - which returns an error if and only if the graph has a cycle.
In both solutions, you apply the algorithm only on the subgraph containing the "selected nodes". Both solutions are O(|V|+|E|) time, and O(|V|) space.

Checking for a cycle in an undirected graph using DFS?

So, I made the following code for DFS:
void dfs (graph * mygraph, int foo, bool arr[]) // here, foo is the source vertex
{
if (arr[foo] == true)
return;
else
{
cout<<foo<<"\t";
arr[foo] = true;
auto it = mygraph->edges[foo].begin();
while (it != mygraph->edges[foo].end())
{
int k = *it;
if (arr[k] == false)
{
//cout<<k<<"\n";
dfs(mygraph,k,arr);
//cout<<k<<"\t";
}
it++;
}
}
//cout<<"\n";
}
Now, I read up that in an undirected graph, if while DFS, it returns to the same vertex again, there is a cycle. Therefore, what I did was this,
bool checkcycle( graph * mygraph, int foo, bool arr[] )
{
bool result = false;
if (arr[foo] == true)
{
result = true;
}
else
{
arr[foo] = true;
auto it = mygraph->edges[foo].begin();
while (it != mygraph->edges[foo].end())
{
int k = *it;
result = checkcycle(mygraph,k,arr);
it++;
}
}
return result;
}
But, my checkcycle function returns true even if their is no cycle. Why is that? Is there something wrong with my function? There is no execution problem, otherwise I would have debugged, but their seems to be something wrong in my logic.
Notice that your function doesn't quite do what you think it does. Let me try to step through what's happening here. Assume the following relationships: (1,2), (1,3), (2,3). I'm not assuming reflexibility (that is, (1,2) does not imply (2,1)). Relationships are directed.
Start with node 1. Flag it as visited
Iterate its children (2 and 3)
When in node 2, recursively call check cycle. At this point 2 is also flagged as visited.
The recursive call now visits 3 (DEPTH search). 3 is also flagged as visited
Call for step 4 dies returning false
Call for step 3 dies returning false
We're back at step 2. Now we'll iterate node 3, which has already been flagged in step 4. It just returns true.
You need a stack of visited nodes or you ONLY search for the original node. The stack will detect sub-cycles as well (cycles that do not include the original node), but it also takes more memory.
Edit: the stack of nodes is not just a bunch of true/false values, but instead a stack of node numbers. A node has been visited in the current stack trace if it's present in the stack.
However, there's a more memory-friendly way: set arr[foo] = false; as the calls die. Something like this:
bool checkcycle( graph * mygraph, int foo, bool arr[], int previousFoo=-1 )
{
bool result = false;
if (arr[foo] == true)
{
result = true;
}
else
{
arr[foo] = true;
auto it = mygraph->edges[foo].begin();
while (it != mygraph->edges[foo].end())
{
int k = *it;
// This should prevent going back to the previous node
if (k != previousFoo) {
result = checkcycle(mygraph,k,arr, foo);
}
it++;
}
// Add this
arr[foo] = false;
}
return result;
}
I think it should be enough.
Edit: should now support undirected graphs.
Node: this code is not tested
Edit: for more elaborate solutions see Strongly Connected Components
Edit: this answer is market as accepted although the concrete solution was given in the comments. Read the comments for details.
are all of the bools in arr[] set to false before checkcycle begins?
are you sure your iterator for the nodes isn't doubling back on edges it has already traversed (and thus seeing the starting node multiple times regardless of cycles)?

BFS implementation

i was recently solving a bfs problem where each node is a different arrangement of elements of an array. but i was unable to come up with a suitable data structure to keep track of the visited nodes in the expanded tree. generally the nodes are different strings so we can just use a map to mark a node as visited but what DS should i use in the above case?
Consider the following pseudocode:
type Node; // information pertaining to a node
type Path; // an ordered list of nodes
type Area; // an area containing linked neighboring nodes
type Queue; // a FIFO queue structure
function Traverse(Area a, Node start, Node end) returns Path:
Queue q;
Node n;
// traverse backwards, from finish to start
q.push(end); // add initial node to queue
end.parent = end; // set first node's parent to itself
while (not q.empty()):
n = q.pop(); // remove first element
if (n == start) // if element is the final element, we're done
break;
for (Node neighbor in a.neighbors(n)): // for each neighboring node
if (neighbor.parent != Null): // if already visited, skip
continue;
neighbor.parent = n; // otherwise, visit
q.push(neighbor); // then add to queue
Path p; // prepare to build path from visited list
for (Node previous = Null, current = n;
previous != current;
previous = current, current = current.parent):
p.add(current); // for each node from start to end, add node to p
// Note that the first node's parent is itself
// thus dissatisfying the loop condition
return p;
The "visited list" is stored as the node's parent. Coding this to C++, you would probably handle most of the nodes as references or pointers since this pseudocode relies on referential behavior.
You start with an Area, which is a field of Nodes. The area knows where each node is in relation to the others. You start at one specific Node, the "start" node, and push it into a queue.
Traversing the area is as simple as getting the list of neighboring nodes from the Area, skipping them if they're already visited, and setting their parent and adding them to the queue otherwise. Traversal ends when a node removed from the queue equals the destination node. You could speed up the algorithm a little by doing this check during the neighbor loop, when the node is initially encountered.
NOTE: You do not need to generate every possible node within the area before beginning the traversal, the Area requires only that once it has created a node, it keeps track of it. This might help your situation where it appears you use permutations of strings or arrays: you could push the starting and ending nodes into the Area, and it could generate and cache neighbor nodes on the fly. You might store them as vectors, which can be compared for equality based on their order and contents with the == operator. See this example.
The traversal goes backwards rather than forwards because it makes rebuilding the path easier (rather than ending up at the end node, with each parent the node before it, you end up at the start node, with each parent the node after it)
Data Structure Summary
Node would need to keep track of enough information for Area to identify it uniquely (via an array index or a name or something), as well as a parent node. The parent nodes should be set to NULL before the traversal to avoid weird behavior, since traversal will ignore any node with its parent set. This keeps track of the visited state too: visited is equivalent to (parent != NULL). Doing it this way also keeps you from having to keep track of the entire path in the queue, which would be very computationally intensive.
Area needs to maintain a list of Node, and needs a neighbor map, or a mapping of which nodes neighbor which other nodes. It's possible that this mapping could be generated on the fly with a function rather than being looked up from a table or some more typical approach. It should be able to provide the neighbors of a node to a caller. It might help to have a helper method that clears the parents of every node as well.
Path is basically a list type, containing an ordered list of nodes.
Queue is whatever FIFO queue is available. You could do it with a linked list.
I like how the syntax highlighting worked on my Wuggythovasp++.
At least as a start, you could try using/implementing something like Java's Arrays.toString() and using a map. Each arrangement would result in a different string, and thus it'll at least get somewhere.
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
/**
*
* #author VAISAKH N
*/
public class BFSME {
public static String path = "";
public static String add = "";
public static void findrec(String temp, String end, String[][] m, int j) {
if (temp.equals(m[j][1])) {
add = m[j][0] + temp + end + "/";
end = temp + end;
System.out.println(end);
path = path + add;
temp = "" + add.charAt(0);
System.out.println("Temp" + temp);
for (int k = 0; k < m.length; k++) {
findrec(temp, end, m, k);
}
}
}
public static void main(String[] args) {
String[][] data = new String[][]{{"a", "b"}, {"b", "c"}, {"b", "d"}, {"a", "d"}};
String[][] m = new String[data.length][2];
for (int i = 0; i < data.length; i++) {
String temp = data[i][0];
String end = data[i][1];
m[i][0] = temp;
m[i][1] = end;
path = path + temp + end + "/";
for (int j = 0; j < m.length; j++) {
findrec(temp, end, m, j);
}
}
System.out.println(path);
}
}
Just for the purpose of understanding, i have provided my sample code here (its in C#)
private void Breadth_First_Travers(Node node)
{
// First Initialize a queue -
// it's retrieval mechanism works as FIFO - (First in First Out)
Queue<Node> myQueue = new Queue<Node>();
// Add the root node of your graph into the Queue
myQueue.Enqueue(node);
// Now iterate through the queue till it is empty
while (myQueue.Count != 0)
{
// now, retrieve the first element from the queue
Node item = myQueue.Dequeue();
Console.WriteLine("item is " + item.data);
// Check if it has any left child
if (item.left != null)
{
// If left child found - Insert/Enqueue into the Queue
myQueue.Enqueue(item.left);
}
// Check if it has right child
if (item.right != null)
{
// If right child found Insert/Enqueue into the Queue
myQueue.Enqueue(item.right);
}
// repeat the process till the Queue is empty
}
}
Here sample code is give with reference of http://en.wikipedia.org/wiki/Binary_tree
as tree is a type of graph it self.
Here is BFS implementation using C++ STL(adjacency lists) for Graph. Here three Array and a Queue is used for complete implementation.
#include<iostream>
#include<bits/stdc++.h>
using namespace std;
//Adding node pair of a Edge in Undirected Graph
void addEdge( vector<int> adj[], int u, int v){
adj[u].push_back(v); // 1st push_back
adj[v].push_back(u); //2nd push_back
//for Directed Graph use only one push_back i.e., 1st push_back() rest is same
}
//Traversing through Graph from Node 0 in Adjacency lists way
void showGraph( vector<int>adj[], int size){
cout<<"Graph:\n";
for(int i=0; i<size ; i++){
cout<<i;
for( vector<int>::iterator itr= adj[i].begin() ; itr!=adj[i].end(); itr++){
cout<<" -> "<<*itr;
}
cout<<endl;
}
}
//Prints Array elements
void showArray(int A[]){
for(int i=0; i< 6; i++){
cout<<A[i]<<" ";
}
}
void BFS( vector<int>adj[], int sNode, int N){
// Initialization
list<int>queue; //Queue declaration
int color[N]; //1:White, 2:Grey, 3:Black
int parentNode[N]; //Stores the Parent node of that node while traversing, so that you can reach to parent from child using this
int distLevel[N]; //stores the no. of edges required to reach the node,gives the length of path
//Initialization
for(int i=0; i<N; i++){
color[i] = 1; //Setting all nodes as white(1) unvisited
parentNode[i] = -1; //setting parent node as null(-1)
distLevel[i] = 0; //initializing dist as 0
}
color[sNode] = 2; //since start node is visited 1st so its color is grey(2)
parentNode[sNode] = -1; //parent node of start node is null(-1)
distLevel[sNode] = 0; //distance is 0 since its a start node
queue.push_back(sNode); //pushing start node(sNode) is queue
// Loops runs till Queue is not empty if queue is empty all nodes are visited
while( !queue.empty()){
int v = queue.front(); //storing queue's front(Node) to v
// queue.pop_front();//Dequeue poping element from queue
//Visiting all nodes connected with v-node in adjacency list
for(int i=0; i<adj[v].size() ;i++){
if( color[ adj[v][i] ] == 1){// if node is not visited, color[node]==1 which is white
queue.push_back(adj[v][i]); //pushing that node to queue
color[adj[v][i]]=2; //setting as grey(2)
parentNode[ adj[v][i] ] = v; //parent node is stored distLevel[ adj[v][i] ] = distLevel[v]+1; //level(dist) is incremented y from dist(parentNode)
}
}//end of for
color[v]=3;
queue.pop_front();//Dequeue
}
printf("\nColor: \n");showArray(color);
printf("\nDistLevel:\n");showArray(distLevel);
printf("\nParentNode:\n");showArray(parentNode);
}
int main(){
int N,E,u,v;//no of nodes, No of Edges, Node pair for edge
cout<<"Enter no of nodes"<<endl;
cin>>N;
vector<int> adj[N]; //vector adjacency lists
cout<<"No. of edges"<<endl;
cin>>E;
cout<<"Enter the node pair for edges\n";
for( int i=0; i<E;i++){
cin>>u>>v;
addEdge(adj, u, v); //invoking addEdge function
}
showGraph(adj,N); //Printing Graph in Adjacency list format
BFS(adj,0,N); /invoking BFS Traversal
}

Find nth smallest element in Binary Search Tree

I have written an algorithm for finding nth smallest element in BST but it returns root node instead of the nth smallest one. So if you input nodes in order 7 4 3 13 21 15, this algorithm after call find(root, 0) returns Node with value 7 instead of 3, and for call find(root, 1) it returns 13 instead of 4. Any thoughts ?
Binode* Tree::find(Binode* bn, int n) const
{
if(bn != NULL)
{
find(bn->l, n);
if(n-- == 0)
return bn;
find(bn->r, n);
}
else
return NULL;
}
and definition of Binode
class Binode
{
public:
int n;
Binode* l, *r;
Binode(int x) : n(x), l(NULL), r(NULL) {}
};
It is not possible to efficiently retrieve the n-th smallest element in a binary search tree by itself. However, this does become possible if you keep in each node an integer indicating the number of nodes in its entire subtree. From my generic AVL tree implementation:
static BAVLNode * BAVL_GetAt (const BAVL *o, uint64_t index)
{
if (index >= BAVL_Count(o)) {
return NULL;
}
BAVLNode *c = o->root;
while (1) {
ASSERT(c)
ASSERT(index < c->count)
uint64_t left_count = (c->link[0] ? c->link[0]->count : 0);
if (index == left_count) {
return c;
}
if (index < left_count) {
c = c->link[0];
} else {
c = c->link[1];
index -= left_count + 1;
}
}
}
In the above code, node->link[0] and node->link[1] are the left and right child of node, and node->count is the number of nodes in the entire subtree of node.
The above algorithm has O(logn) time complexity, assuming the tree is balanced. Also, if you keep these counts, another operation becomes possible - given a pointer to a node, it is possible to efficiently determine its index (the inverse of the what you asked for). In the code I linked, this operation is called BAVL_IndexOf().
Be aware that the node counts need to be updated as the tree is changed; this can be done with no (asymptotic) change in time complexity.
There are a few problems with your code:
1) find() returns a value (the correct node, assuming the function is working as intended), but you don't propagate that value up the call chain, so top-level calls don't know about the (possible) found element
.
Binode* elem = NULL;
elem = find(bn->l, n);
if (elem) return elem;
if(n-- == 0)
return bn;
elem = find(bn->r, n);
return elem; // here we don't need to test: we need to return regardless of the result
2) even though you do the decrement of n at the right place, the change does not propagate upward in the call chain. You need to pass the parameter by reference (note the & after int in the function signature), so the change is made on the original value, not on a copy of it
.
Binode* Tree::find(Binode* bn, int& n) const
I have not tested the suggested changes, but they should put you in the right direction for progress

c++ directed graph depth first search

I am attempting to write a method DFS method for a directed graph. Right now I am running into a segmentation fault, and I am really unsure as to where it is. From what I understand of directed graphs I believe that my logic is right... but a fresh set of eyes would be a very nice help.
Here is my function:
void wdigraph::depth_first (int v) const {
static int fVertex = -1;
static bool* visited = NULL;
if( fVertex == -1 ) {
fVertex = v;
visited = new bool[size];
for( int x = 0; x < size; x++ ) {
visited[x] = false;
}
}
cout << label[v];
visited[v] = true;
for (int v = 0; v < adj_matrix.size(); v++) {
for( int x = 0; x < adj_matrix.size(); x++) {
if( adj_matrix[v][x] != 0 && visited[x] != false ) {
cout << " -> ";
depth_first(x);
}
if ( v == fVertex ) {
fVertex = -1;
delete [] visited;
visited = NULL;
}
}
}
}
class definition:
class wdigraph {
public:
wdigraph(int =NO_NODES); // default constructor
~wdigraph() {}; // destructor
int get_size() { return size; } // returns size of digraph
void depth_first(int) const;// traverses graph using depth-first search
void print_graph() const; // prints adjacency matrix of digraph
private:
int size; // size of digraph
vector<char> label; // node labels
vector< vector<int> > adj_matrix; // adjacency matrix
};
thanks!
You are deleting visited before the end of the program.
Coming back to the starting vertex doesn't mean you finished.
For example, for the graph of V = {1,2,3}, E={(1,2),(2,1),(1,3)}.
Also, notice you are using v as the input parameter and also as the for-loop variable.
There are a few things you might want to consider. The first is that function level static variables are not usually a good idea, you can probably redesign and make those either regular variables (at the cost of extra allocations) or instance members and keep them alive.
The function assumes that the adjacency matrix is square, but the initialization code is not shown, so it should be checked. The assumption can be removed by making the inner loop condition adj_matrix[v].size() (given a node v) or else if that is an invariant, add an assert before that inner loop: assert( adj_matrix[v].size() == adj_matrix.size() && "adj_matrix is not square!" ); --the same goes for the member size and the size of the adj_matrix it self.
The whole algorithm seems more complex than it should, a DFS starting at node v has the general shape of:
dfs( v )
set visited[ v ]
operate on node (print node label...)
for each node reachable from v:
if not visited[ node ]:
dfs( node )
Your algorithm seems to be (incorrectly by the way) transversing the graph in the opposite direction. You set the given node as visited, and then try to locate any node that is the start point of an edge to that node. That is, instead of following nodes reachable from v, you are trying to get nodes for which v is reachable. If that is the case (i.e. if the objective is printing all paths that converge in v) then you must be careful not to hit the same edge twice or you will end up in an infinite loop -> stackoverflow.
To see that you will end with stackoverlow, consider this example. The start node is 1. You create the visited vector and mark position 1 as visited. You find that there is an edge (0,1) in the tree, and that triggers the if: adj_matrix[0][1] != 0 && visited[1], so you enter recursively with start node being 1 again. This time you don't construct the auxiliary data, but remark visited[1], enter the loop, find the same edge and call recursively...
I see a couple of problems:
The following line
if( adj_matrix[v][x] != 0 && visited[x] != false ) {
should be changed to
if( adj_matrix[v][x] != 0 && visited[x] == false ) {
(You want to recurse only on vertices that have not been visited already.)
Also, you're creating a new variable v in the for loop that hides the parameter v: that's legal C++, but it's almost always a terrible idea.