Minimizing memory usage of a breadth first search - c++

In my the following code, I am traversing a graph through breadth first search. The code constructs the graph while it is traversing. This is a very large graph, with a fan out of 12. Due to this, any time the depth of the breadth first search increases, I want to destruct the layer above it in an attempt to minimize memory usage. How could I design an algorithm to do so?
string Node::bfs(Node * rootNode) {
QQueue<Cube *> q;
q.enqueue(rootNode);
while (!(q.empty())) {
Node * currNode = q.dequeue();
currNode->constructChildren();
foreach (Node * child, currNode->getListOfChildren()) {
q.enqueue(child);
}
if (currNode->isGoalNode()) {
return currNode->path;
}
}

With constant fanout and assuming a tree-like graph, the number of nodes that have been visited by a BFS is almost the same as the number of nodes on the fringe. (e.g. in a tree with fanout K, each level n has K^n nodes, and the number of nodes with lower depth than n is also Theta(K^n)).
Hence, storing the fringe will already take up alot of memory. So if memory is a very big problem, an "equivalent" technique such as iterative deepening DFS may be much better.
But if you want to destroy the "visited" nodes, then some way of tracking what has been visited (in the case of a general graph; if it is a tree then there's no problem) needs to be devised. In which case more information on the graph is needed.
EDIT on why iterative deepening DFS is better.
The fringe (unvisited nodes that are to be adjacent to the visited nodes) in a BFS is O(K^n) in size, n being the current depth. The fringe for DFS is O(n) in size.
Iterative deepening DFS has the same fringe size as DFS, and gives the same result as BFS, because it "simulates" BFS.

Breadth-first search inherently has exponential space complexity. Any tricks will make only marginal impacts in the memory requirements for large graphs. You're better off using depth-first search if you want tractable space complexity.

Related

Are time complexity of pre-order and DFS on a balanced binary tree same?

I read from an answer that pre-order is a type of DFS:link
Yet,for a balanced binary tree,the time complexity for a tree-traversal is O(logn)link and the DFS has a time complexity of O(N)link.
So, is pre-order traversal not a type of DFS or I misunderstood the concept?
Thanks.
The time complexity for a preorder, inorder, or postorder traversal of a binary search tree is always Θ(n), where is the number of nodes in the tree. One way to see this is that in each case, each node is visited once and exactly once, and each edge is visited exactly twice (once descending downward, and once ascending upward).
You had mentioned in your question that the time complexity of a tree traversal on a balanced tree is O(log n). The O(log n) here actually refers to the space complexity (how much auxiliary memory is needed) rather than the time complexity (how many operations will be performed). The reason for this is that all of these tree traversals, in their typical implementation, need to store a stack of the nodes that have been visited so far so that the traversal can back up higher in the tree when necessary. This means that the auxiliary space needed is proportional to the height of the tree, which in a balanced tree is O(log n) and in an arbitrary BST will be O(n).
So in that sense, the best answer to your question is probably "DFS, inorder traversals, preorder traversals, and postorder traversals of a BST always take the same amount of time (Θ(n)), and the space complexity depends on the height of the tree, which can range between Θ(log n) and Θ(n)."

Understanding Time Complexity for tree traversal using BFS

I am trying to understand the time complexity when I traverse a tree with n nodes (not necessarily a binary tree) using BFS.
As per my understanding it should be O(n^2) since my outer loop runs for n times i.e till the queue is not empty and since the tree contains n nodes.
And my inner for loop has to keep adding the children associated with a particular node to the queue. (Every node has a dict which contains the address of all its children)
So for example if root node has n-1 nodes (and thus all those nodes have no children further) then wouldn't the time complexity be n*(n-1) = O(n^2).
Is my understanding correct?
Is there any way that this can be done in O(n) ? Please explain.
It's often more useful to describe the complexity of graph algorithms in terms of both the number of nodes and edges. Typically |V| is used to represent the number of nodes, and |E| to represent the number of edges.
In BFS, we visit each of the |V| nodes once and add all of their neighbors to a queue. And, by the end of the algorithm, each edge in the graph has been processed exactly once. Therefore we can say BFS is O(|V| + |E|).
In a fully connected graph, |E| = |V|(|V| - 1)/2. So you are correct that the complexity is O(|V|^2) for fully connected graphs; however, O(|V| + |E|) is considered a tighter analysis for graphs that are known to be sparse.
Big-O notation means the upper bound of the time complexity. You can of course say that the time complexity of BFS is O(n2), but it's not a strict upper bound.
To get the strict upper bound, you can consider BFS like this: Each node will be added into the queue only once, and each node will be removed from the queue only once. Each adding and removing operation costs only O(1) time, so the time complexity is O(n).
EDIT
To implement the O(n) BFS on a tree, you can try to implement the following pseudo code.
procedure bfs(root: root of the tree)
q := an empty queue
push root into q
while q is not empty
v := the element at the head of q
for u := children of v
push u into q
pop v out of q

Find the median of binary search tree, C++

Once I was interviewed by "One well known company" and the interviewer asked me to find the median of BST.
int median(treeNode* root)
{
}
I started to implement the first brute-force solution that I came up with. I fill all the data into a std::vector<int> with inorder traversal (to get everything sorted in the vector) and got the middle element.
So my algo is O(N) for inserting every element in the vector and query of middle element with O(1), + O(N) of memory.
So is there more effective way (in terms of memory or in terms of complexity) to do the same thing.
Thanks in advance.
It can be done in O(n) time and O(logN) space by doing an in-order traversal and stopping when you reach the n/2th node, just carry a counter that tells you how many nodes have been already traversed - no need to actually populate any vector.
If you can modify your tree to ranks-tree (each node also has information about the number of nodes in the subtree it's a root of) - you can easily solve it in O(logN) time, by simply moving torward the direction of n/2 elements.
Since you know that the median is the middle element of a sorted list of elements, you can just take the middle element of your inorder traversal and stop there, without storing the values in a vector. You might need two traversals if you don't know the number of nodes, but it will make the solution use less memory (O(h) where h is the height of your tree; h = O(log n) for balanced search trees).
If you can augment the tree, you can use the solution I gave here to get an O(log n) algorithm.
The binary tree offers a sorted view for your data but in order to take advantage of it, you need to know how many elements are in each subtree. So without this knowledge your algorithm is fast enough.
If you know the size of each subtree, you select each time to visit the left or the right subtree, and this gives an O(log n) algorithm if the binary tree is balanced.

Does time complexity of dijkstra's algorithm for shortest path depends on data structure used?

One way to store the graph is to implement nodes as structures, like
struct node {
int vertex; node* next;
};
where vertex stores the vertex number and next contains link to the other node.
Another way I can think of is to implement it as vectors, like
vector<vector< pair<int,int> > G;
Now, while applying Dijkstra's algorithm for shortest path, we need to build priority queue and other required data structures and so as in case 2 (vector implementation).
Will there be any difference in complexity in above two different methods of applying graph? Which one is preferable?
EDIT:
In first case, every node is associated with a linked list of nodes which are directly accessible from the given node. In second case,
G.size() is the number of vertices in our graph
G[i].size() is the number of vertices directly reachable from vertex with index i
G[i][j].first is the index of j-th vertex reachable from vertex i
G[i][j].second is the length of the edge heading from vertex i to vertex G[i][j].first
Both are adjacency list representations. If implemented correctly, that would be expected to result in the same time complexity. You'd get a different time complexity if you use an adjacency matrix representation.
In more detail - this comes down to the difference between an array (vector) and a linked-list. When all you're doing is iterating through the entire collection (i.e. the neighbours of a vertex), as you do in Dijkstra's algorithm, this takes linear time (O(n)) regardless of whether you're using an array or linked-list.
The resulting complexity for running Dijkstra's algorithm, as noted on Wikipedia, would be
O(|E| log |V|) with a binary heap in either case.

augmenting/index priority_queue in STL

I am using STL priority_queue as an data structure in my graph application. You can safely assume it like a advance version of Prim's spanning tree algorithm.
With in the Algorithm I want to find a node in the priority queue (not just a minimum node) efficiently.[ this is needed because cost of node might get changed and need to be fixed in priority_queue]
All i have to do is augment the priority_queue and index it based on my node key's also. I don't find any way this can be done in STL. Can anyone have better idea how to do it in STL?
The std::priority_queue<T> doesn't support efficient look-up of nodes: it uses a d-ary heap, typically with d == 2. This representation doesn't keep nodes put. If you really want to use a std::priority_queue<T> with Prim's algorithm, the only way is to just add nodes with their current shortest distance and possibly add each node multiple times. This turns the size of the into O(E) instead of O(N), though, i.e., for graphs with many edges it will result in a much higher complexity.
You can use something like std::map<...> but that really suffers from pretty much the same problem: you can either locate the next node to extract efficiently or you can locate the nodes to update efficiently.
The "proper" approach is to use a node-based priority queue, e.g., a Fibanocci-heap: Since the nodes stay put, you can get a handle from the heap when inserting a node and efficiently update the distance of a node through the handle. Access to the closest node is efficient using the few top nodes in the heap's set of trees. The overall performance of basic heap operations (push(), top(), and pop()) are slower for Fibonacci heaps than for d-ary heaps but the efficient update of individual nodes makes their use worthwhile. I seem to recall that Prim's algorithm actually required Fibonacci-heaps anyway to achieve the tight complexity bound.
I know that there is an implementation of Fibonacci-heaps at Boost. An efficient implementation of Fibonacci heaps isn't entirely trivial but they are more efficient than just being of theoretical interest.