Understanding Time Complexity for tree traversal using BFS - c++

I am trying to understand the time complexity when I traverse a tree with n nodes (not necessarily a binary tree) using BFS.
As per my understanding it should be O(n^2) since my outer loop runs for n times i.e till the queue is not empty and since the tree contains n nodes.
And my inner for loop has to keep adding the children associated with a particular node to the queue. (Every node has a dict which contains the address of all its children)
So for example if root node has n-1 nodes (and thus all those nodes have no children further) then wouldn't the time complexity be n*(n-1) = O(n^2).
Is my understanding correct?
Is there any way that this can be done in O(n) ? Please explain.

It's often more useful to describe the complexity of graph algorithms in terms of both the number of nodes and edges. Typically |V| is used to represent the number of nodes, and |E| to represent the number of edges.
In BFS, we visit each of the |V| nodes once and add all of their neighbors to a queue. And, by the end of the algorithm, each edge in the graph has been processed exactly once. Therefore we can say BFS is O(|V| + |E|).
In a fully connected graph, |E| = |V|(|V| - 1)/2. So you are correct that the complexity is O(|V|^2) for fully connected graphs; however, O(|V| + |E|) is considered a tighter analysis for graphs that are known to be sparse.

Big-O notation means the upper bound of the time complexity. You can of course say that the time complexity of BFS is O(n2), but it's not a strict upper bound.
To get the strict upper bound, you can consider BFS like this: Each node will be added into the queue only once, and each node will be removed from the queue only once. Each adding and removing operation costs only O(1) time, so the time complexity is O(n).
EDIT
To implement the O(n) BFS on a tree, you can try to implement the following pseudo code.
procedure bfs(root: root of the tree)
q := an empty queue
push root into q
while q is not empty
v := the element at the head of q
for u := children of v
push u into q
pop v out of q

Related

Directed Acyclic Graph and its Topological Sorting (assign priorities)

I have a directed acyclic graph like the one shown in the image below.
The thing I want to achieve is to obtain a **topological ordering where every node has a priority p later used for scheduling (where a higher value means a higher priority)
A topological sorting like the linear list in the visualization below can be computed in linear time by visiting each node and edge once (O(|V| + |E|) (https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search)
However, when the task of assigning priorities comes into play, I cannot find a better algorithm so far than doing the thing below.
Assign to every node n in graph g the priority p=0.
for every node n in graph g:
start a depth-first recursion from node n
while traversing down the subgraph from n
if a parent node already has a higher priority as the current node, you don't need to go deeper, and you can continue with other nodes in the depth-first recursion.
if the priority is equal or lower,
set the priority of the parent nodes to the priority of
the current node + 1 and visit the parent.
(there are some small optimizations: ignoring subgraphs from which we already started a DFS and only increasing the subgraphs priority offset, each nodes priority is then calculated by its local priority + its subgraphs priority offset)
Each depth-first recursion starting from n cannot be made simpler by ignoring already visited nodes inside the for loop above, like it can be done for a simple topological sort (linear list in image, (https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search))
Is it better to first compute the topological ordering (O(|V| + |E|) and then iterate over the linear list and assign priorities (O(|V| + |E|)?
I would like to have a less distinct priorities as possible to allow for paralle scheduling (node D and E can be scheduled at the same time in the image)

How to compress a part of graph into a single node and be able to find shortest path from one node to all?

I have a graph with the following features -
Has total N nodes (1,2...N).
All edges are bidirectional.
Nodes from 1 to K (K <=N ) are connected to each other with same weight W.
And there are M edges. These M edges may have different weights. Also these M edges can be between any node to any other node ( can be between the N-K nodes or from nodes from first K ones to other N-K nodes.
There exists a path to reach from one node to another for sure.
An example graph having these features.
I need to find the shortest path distances from a given node to all other nodes.
Now I was thinking instead of directly using Dijkstra's Algorithm to find the shortest path from a given node to all other nodes, it would be more optimal to compress the subgraph of nodes 1 to K into a single node since it would take just O(1) time to find distances from any of the K nodes to any other K node because it is fixed with a value W and each of the K nodes is connected to each other.
But I am not able to think of how to code it or modify my Dijkstra's algorithm. I want to know how to go about solving this problem and also if possible is there any better solution available?
Let's use a standard Dijkstra's algorithm with one twist: we'll keep a segment tree that supports three operations:
Get minimum in a range
Set a value in the given position to +INF
Make a range update (setting a[i] = min(a[i], new_val) for all l <= i <= r)
A standard segment tree can handle all these operations in O(log N) time.
We can take care of all "other" M edges in a standard fashion (we get the value for the child node, update it and put it back to the tree if necessary).
The edges between the first K nodes can be handled like this: if the current node v is among the first K, we make a range update to the [1, K] segment with a value dist[v] + W.
That's it. There're at most K <= N updates of the second type and M updates of the first time (like in a standard Dijkstra's algorithm). So the total time complexity is O((M + N) log N), no matter how large K is.

Binary Tree Questions

Currently studying for an exam, and whilst reading through some notes, I had a few questions.
I know that the height of a Binary Search Tree is Log(n). Does this mean the depth is also Log(n)?
What is maximum depth of a node in a full binary tree with n nodes? This related to the first question; if the height of a Binary Tree is Log(n), would the maximum depth also be Log(n)?
I know that the time complexity of searching for a node in a binary search tree is O(Log(n)), which I understand. However, I read that the worst case time complexity is O(N). In what scenario would it take O(N) time to find an element?
THIS IS A PRIORITY QUEUE/ HEAP QUESTION. In my lecture notes, it says the following statement:
If we use an array for Priority Queues, en-queuing takes O(1) and de-queuing takes O(n). In a sorted Array, en-queue takes O(N) and de-queue takes O(1).
I'm having a hard time understanding this. Can anyone explain?
Sorry for all the questions, really need some clarity on a few of these topics.
Caveat: I'm a little rusty, but here goes ...
Height and depth of a binary tree are synonymous--more or less. height is the maximum depth along any path from root to leaf. But, when you traverse a tree, you have a concept of current depth. root node has depth 0, its children have depth 1, its grandchildren depth 2. If we stop here, the height of the tree is 3, but the maximum depth [we visited] is 2. Otherwise, they are often interchanged when talking about the tree overall.
Before we get to some more of your questions, it's important to note that binary trees come in various flavors. Balanced or unbalanced. With a perfectly balanced tree, all nodes except those at maximum height will have their left/right links non-null. For example, with n nodes in the tree, let n = 1024. Perfectly balanced the height is log2(n) which is 10 (e.g. 1024 == 2^10).
When you search a perfectly balanced tree, the search is O(log2(n)) because starting from the root node, you choose to follow either left or right, and each time you do, you eliminate 1/2 of the nodes. In such a tree with 1024 elements, the depth is 10 and you make 10 such left/right decisions.
Most tree algorithms, when you add a new node, will rebalance the tree on the fly (e.g. AVL or RB (red black)) trees. So, you get a perfectly balanced tree, all the time, more or less.
But ...
Let's consider a really bad algorithm. When you add a new node, it just appends it to the left link on the child with the greatest depth [or the new node becomes the new root]. The idea is fast append, and "we'll rebalance later".
If we search this "bad" tree, if we've added n nodes, the tree looks like a doubly linked list using the parent link and the left link [remember all right links are NULL]. This is linear time search or O(n).
We did this deliberately, but it can still happen with some tree algorithm and/or combinations of data. That is, the data is such that it gets naturally placed on the left link because that's where it's correct to place it based on the algorithm's placement function.
Priority queues are like regular queues except each piece of data has a priority number associated with it.
In an ordinary queue, you just push/append onto the end. When you dequeue, you shift/pop from the front. You never need to insert anything in the middle. Thus, enqueue and dequeue are both O(1) operations.
The O(n) comes from the fact that if you have to do an insertion into the middle of an array, you have to "part the waters" to make space for the element you want to insert. For example, if you need to insert after the first element [which is array[0]], you will be placing the new element at array[1], but first you have to move array[1] to array[2], array[2] to array[3], ... For an array of n, this is O(n) effort.
When removing an element from an array, it is similar, but in reverse. If you want to remove array[1], you grab it, then you must "close the gap" left by your removal by array[1] = array[2], array[2] = array[3], ... Once again, an O(n) operation.
In a sorted array, you just pop off the end. It's the one you want already. Hence O(1). To add an element, its an insertion into the correct place. If your array is 1,2,3,7,9,12,17 and you want to add 6, that's new value for array[4], and you have to move 7,9,12,17 out of the way as above.
Priority queue just appends to the array, hence O(1). But to find the correct element to dequeue, you scan the array array[0], array[1], ... remembering the first element position for a given priority, if you find a better priority, you remember that. When you hit the end, you know which element you need, say it's j. Now you have to remove j from array, and that an O(n) operation as above.
It's slightly more complex than all that, but not by two much.

Find the median of binary search tree, C++

Once I was interviewed by "One well known company" and the interviewer asked me to find the median of BST.
int median(treeNode* root)
{
}
I started to implement the first brute-force solution that I came up with. I fill all the data into a std::vector<int> with inorder traversal (to get everything sorted in the vector) and got the middle element.
So my algo is O(N) for inserting every element in the vector and query of middle element with O(1), + O(N) of memory.
So is there more effective way (in terms of memory or in terms of complexity) to do the same thing.
Thanks in advance.
It can be done in O(n) time and O(logN) space by doing an in-order traversal and stopping when you reach the n/2th node, just carry a counter that tells you how many nodes have been already traversed - no need to actually populate any vector.
If you can modify your tree to ranks-tree (each node also has information about the number of nodes in the subtree it's a root of) - you can easily solve it in O(logN) time, by simply moving torward the direction of n/2 elements.
Since you know that the median is the middle element of a sorted list of elements, you can just take the middle element of your inorder traversal and stop there, without storing the values in a vector. You might need two traversals if you don't know the number of nodes, but it will make the solution use less memory (O(h) where h is the height of your tree; h = O(log n) for balanced search trees).
If you can augment the tree, you can use the solution I gave here to get an O(log n) algorithm.
The binary tree offers a sorted view for your data but in order to take advantage of it, you need to know how many elements are in each subtree. So without this knowledge your algorithm is fast enough.
If you know the size of each subtree, you select each time to visit the left or the right subtree, and this gives an O(log n) algorithm if the binary tree is balanced.

Minimizing memory usage of a breadth first search

In my the following code, I am traversing a graph through breadth first search. The code constructs the graph while it is traversing. This is a very large graph, with a fan out of 12. Due to this, any time the depth of the breadth first search increases, I want to destruct the layer above it in an attempt to minimize memory usage. How could I design an algorithm to do so?
string Node::bfs(Node * rootNode) {
QQueue<Cube *> q;
q.enqueue(rootNode);
while (!(q.empty())) {
Node * currNode = q.dequeue();
currNode->constructChildren();
foreach (Node * child, currNode->getListOfChildren()) {
q.enqueue(child);
}
if (currNode->isGoalNode()) {
return currNode->path;
}
}
With constant fanout and assuming a tree-like graph, the number of nodes that have been visited by a BFS is almost the same as the number of nodes on the fringe. (e.g. in a tree with fanout K, each level n has K^n nodes, and the number of nodes with lower depth than n is also Theta(K^n)).
Hence, storing the fringe will already take up alot of memory. So if memory is a very big problem, an "equivalent" technique such as iterative deepening DFS may be much better.
But if you want to destroy the "visited" nodes, then some way of tracking what has been visited (in the case of a general graph; if it is a tree then there's no problem) needs to be devised. In which case more information on the graph is needed.
EDIT on why iterative deepening DFS is better.
The fringe (unvisited nodes that are to be adjacent to the visited nodes) in a BFS is O(K^n) in size, n being the current depth. The fringe for DFS is O(n) in size.
Iterative deepening DFS has the same fringe size as DFS, and gives the same result as BFS, because it "simulates" BFS.
Breadth-first search inherently has exponential space complexity. Any tricks will make only marginal impacts in the memory requirements for large graphs. You're better off using depth-first search if you want tractable space complexity.