How to calculate the total distance between various vertices in a graph? - c++

Let's say I have a weighted, undirected, acyclic graph with no negative value weights, comprised of n vertices and n-1 edges. If I want to calculate the total distance between every single one of them (using edge weight) and then add it up, which algorithm should I use? If for example a graph has 4 vertices, connected like a-b, a-c, c-d then the program should output the total distance needed to go from a-d, a-c, a-b, b-c, b-d and so on. You could call it every possible path between the given vertices. The language I am using is C++.
I have tried using Dijikstra's and Prim's algorithm, but none have worked for me. I have thought about using normal or multisource DFS, but I have been struggling with it for some time now. Is there really a fast way to calculate it, or have I misunderstood the problem entirely?

Since you have an acyclic graph, there is only one possible path between any two points. This makes things a lot simpler to compute and you don't need to use any real pathfinding algorithms.
Let's say we have an edge E that connects nodes A and B. Calculate how many nodes can be reached from node A, not using edge E (including A). Multiply that by the number of nodes that can be reached from node B, not using edge E (including B). Now you have the number of paths that travel through edge E. Multiply this by the weight of edge E, and you have the total contribution of edge E to the sum.
Do the same thing for every edge and add up the results.
To make the algorithm more efficient, each edge can store cached values that say the number of nodes that are reachable on each side of the edge.
You don't have to use a depth first search. Here is some pseudocode showing how you calculate the number of nodes reachable on a side of edge E very fast taking advantage of caching:
int count_nodes_reachable_on_edge_side(Edge e, Node a) {
// assume edge e directly connects to node a
if (answer is already cached in e) { return the answer; }
answer = 1; // a is reachable
for each edge f connected to node a {
if (f is not e) {
let b be other node f touches (not a)
answer += count_nodes_reachable_on_edge_side(f, b)
}
}
cache the answer in edge e;
return answer;
}

I already presented an O(N^2) algorithm in my other answer, but I think you can actually do this in O(N) time with this pseudo code:
let root be an arbitrary node on the graph;
let total_count be the total number of nodes;
let total_cost be 0;
process(root, null);
// Returns the number of nodes reachable from node n without going
// through edge p. Also adds to total_cost the contribution from
// all edges touching node n, except for edge p.
int process(Node n, Edge p)
{
count = 1
for each edge q that touches node n {
if (q != p) {
let m be the other node connected to q (not n)
sub_count = process(m, q)
total_cost += weight(q) * sub_count * (total_count - sub_count)
count += sub_count
}
}
return count
}
The run time of this is O(N), where N is the number of nodes, because process will be called exactly once for each node.
(For the detail-oriented readers: the loop inside process does not matter: there are O(N) iterations that call process, because process is called on each node exactly once. There are O(N) iterations that don't do anything (because q == p), because those iterations can only happen once for process call.)
Every edge will also be visited. After we recursively count the number of nodes on one side of the edge, we can do a simple subtraction (total_count - sub_count) to get the number of nodes on the other side of the edge. When we have these two node counts, we can just multiply them together to get the total number of paths going through the edge, then mulitply that by the weight, and add it to the total cost.

Related

How does this Dijkstra code return minimum value (and not maximum)?

I am solving this question on LeetCode.com called Path With Minimum Effort:
You are given heights, a 2D array of size rows x columns, where heights[row][col] represents the height of cell (row, col). Aim is to go from top left to bottom right. You can move up, down, left, or right, and you wish to find a route that requires the minimum effort. A route's effort is the maximum absolute difference in heights between two consecutive cells of the route. Return the minimum effort required to travel from the top-left cell to the bottom-right cell. For e.g., if heights = [[1,2,2],[3,8,2],[5,3,5]], the answer is 2 (in green).
The code I have is:
class Solution {
public:
vector<pair<int,int>> getNeighbors(vector<vector<int>>& h, int r, int c) {
vector<pair<int,int>> n;
if(r+1<h.size()) n.push_back({r+1,c});
if(c+1<h[0].size()) n.push_back({r,c+1});
if(r-1>=0) n.push_back({r-1,c});
if(c-1>=0) n.push_back({r,c-1});
return n;
}
int minimumEffortPath(vector<vector<int>>& heights) {
int rows=heights.size(), cols=heights[0].size();
using arr=array<int, 3>;
priority_queue<arr, vector<arr>, greater<arr>> pq;
vector<vector<int>> dist(rows, vector<int>(cols, INT_MAX));
pq.push({0,0,0}); //r,c,weight
dist[0][0]=0;
//Dijkstra
while(pq.size()) {
auto [r,c,wt]=pq.top();
pq.pop();
if(wt>dist[r][c]) continue;
vector<pair<int,int>> neighbors=getNeighbors(heights, r, c);
for(auto n: neighbors) {
int u=n.first, v=n.second;
int curr_cost=abs(heights[u][v]-heights[r][c]);
if(dist[u][v]>max(curr_cost,wt)) {
dist[u][v]=max(curr_cost,wt);
pq.push({u,v,dist[u][v]});
}
}
}
return dist[rows-1][cols-1];
}
};
This gets accepted, but I have two questions:
a. Since we update dist[u][v] if it is greater than max(curr_cost,wt), how does it guarantee that in the end we return the minimum effort required? That is, why don't we end up returning the effort of the one in red above?
b. Some solutions such as this one, short-circuit and return immediately when we reach the bottom right the first time (ie, if(r==rows-1 and c==cols-1) return wt;) - how does this work? Can't we possibly get a shorter dist when we revisit the bottom right node in future?
The problem statement requires that we find the path with the minimum "effort".
And "effort" is defined as the maximum difference in heights between adjacent cells on a path.
The expression max(curr_cost, wt) takes care of the maximum part of the problem statement. When moving from one cell to another, the distance to the new cell is either the same as the distance to the old cell, or it's the difference in heights, whichever is greater. Hence max(difference_in_heights, distance_to_old_cell).
And Dijkstra's algorithm takes care of the minimum part of the problem statement, where instead of using a distance from the start node, we're using the "effort" needed to get from the start node to any given node. Dijkstra's attempts to minimize the distance, and hence it minimizes the effort.
Dijkstra's has two closely related concepts: visited and explored. A node is visited when any incoming edge is used to arrive at the node. A node is explored when its outgoing edges are used to visit its neighbors. The key design feature of Dijkstra's is that after a node has been explored, additional visits to that node will never improve the distance to that node. That's the reason for the priority queue. The priority queue guarantees that the node being explored has the smallest distance of any unexplored nodes.
In the sample grid, the red path will be explored before the green path because the red path has effort 1 until the last move, whereas the green path has effort 2. So the red path will set the distance to the bottom right cell to 3, i.e. dist[2][2] = 3.
But when the green path is explored, and we arrive at the 3 at row=2, col=1, we have
dist[2][2] = 3
curr_cost=2
wt=2
So dist[2][2] > max(curr_cost, wt), and dist[2][2] gets reduced to 2.
The answers to the questions:
a. The red path does set the bottom right cell to a distance of 3, temporarily. But the result of the red path is discarded in favor of the result from the green path. This is the natural result of Dijkstra's algorithm searching for the minimum.
b. When the bottom right node is ready to be explored, i.e. it's at the head of the priority queue, then it has the best distance it will ever have, so the algorithm can stop at that point. This is also a natural result of Dijkstra's algorithm. The priority queue guarantees that after a node has been explored, no later visit to that node will reduce its distance.

If edges are not inserted in the deque in sorted order of weights, does 0-1 BFS produce the right answer?

The general trend of 0-1 BFS algorithms is: if the edge is encountered having weight = 0, then the node is pushed to the front of the deque and if the edge's weight = 1, then it will be pushed to the back of the deque.
If we randomly push the edges, then can 0-1 BFS calculate the right answer? What if edges are entered in the deque are not in sorted order of their weights?
This is the general 0-1 BFS algorithm. If I skip out the last if and else parts and randomly push the edges, then what will happen?
To me, it should work, but then why is this algorithm made in this way?
void bfs (int start)
{
std::deque<int> Q; // double ended queue
Q.push_back(start);
distance[start] = 0;
while(!Q.empty())
{
int v = Q.front();
Q.pop_front();
for(int i = 0 ; i < edges[v].size(); i++)
{
// if distance of neighbour of v from start node is greater than sum of
// distance of v from start node and edge weight between v and its
// neighbour (distance between v and its neighbour of v) ,then change it
if(distance[edges[v][i].first] > distance[v] + edges[v][i].second)
{
distance[edges[v][i].first] = distance[v] + edges[v][i].second;
// if edge weight between v and its neighbour is 0
// then push it to front of
// double ended queue else push it to back
if(edges[v][i].second == 0)
{
Q.push_front(edges[v][i].first);
}
else
{
Q.push_back(edges[v][i].first);
}
}
}
}
}
It is all a matter of performance. While random insertion still finds the shortest path, you have to consider a lot more paths (exponential in the size of the graph). So basically, the structured insertion guarantees a linear time complexity. Let's start with why the 0-1 BFS guarantees this complexity.
The basic idea is the same as the one of Dijkstra's algorithm. You visit nodes ordered by their distance from the start node. This ensures that you won't discover an edge that would decrease the distance to a node observed so far (which would require you to compute the entire subgraph again).
In 0-1 BFS, you start with the start node and the distances in the queue are just:
d = [ 0 ]
Then you consider all neighbors. If the edge weight is zero, you push it to the front, if it is one, then to the back. So you get a queue like this:
d = [ 0 0 0 1 1]
Now you take the first node. It may have neighbors for zero-weight edges and neighbors for one-weight edges. So you do the same and end up with a queue like this (new node are marked with *):
d = [ 0* 0* 0 0 1 1 1*]
So as you see, the nodes are still ordered by their distance, which is essential. Eventually, you will arrive at this state:
d = [ 1 1 1 1 1 ]
Going from the first node over a zero-weight edge produces a total path length of 1. Going over a one-weight edge results in two. So doing 0-1 BFS, you will get:
d = [ 1* 1* 1 1 1 1 2* 2*]
And so on... So concluding, the procedure is required to make sure that you visit nodes in order of their distance to the start node. If you do this, you will consider every edge only twice (once in the forward direction, once in the backward direction). This is because when visiting a node, you know that you cannot get to the node again with a smaller distance. And you only consider the edges emanating from a node when you visit it. So even if the node is added to the queue again by one of its neighbors, you will not visit it because the resulting distance will not be smaller than the current distance. This guarantees the time complexity of O(E), where E is the number of edges.
So what would happen if you did not visit nodes ordered by their distance from the start node? Actually, the algorithm would still find the shortest path. But it will consider a lot more paths. So assume that you have visited a node and that node is put in the queue again by one of its neighbors. This time, we cannot guarantee that the resulting distance will not be smaller. Thus, we might need to visit it again and put all its neighbors in the queue again. And the same applies to the neighbors, so in the worst case this might propagate through the entire graph and you end up visiting nodes over and over again. You will find a solution eventually because you always decrease the distance. But the time needed is far more than for the smart BFS.

C++ : Storing weight for larger Graph

I was solving some question on graph. It requires to store weight for N Nodes(N<=50000). I cant use matrix to store weight of graph(as 50000x50000 can't be allocated). Do you know any other way? Thanks.
My preferred way of storing not too dense graphs is using adjacency lists.
The downside using adjacency lists is however that you can't directly check if node i is connected to node j. Instead you traverse all neighbors of node i (in which j would eventually show up if it is connected with node i). Also it's not practical to remove edges. I use it when doing breadth-first or depth-first searches on a graph, since one is only interested in the set of neighbors and not whether two specific nodes are connected.
In summary:
Takes only as much memory as you have edges (which is what you wanted) but at least as much memory as you have nodes.
Easy to traverse egdes for any node, i.e. always constant time per neighbor
To check whether two nodes i and j are connected you need to traverse the whole neighborhoodlist of node i or j. Which is bad if one node is connected to almost all other nodes and cheap if its connected to a few
Removing edges is also expensive for large neighborhoods (at worst linear time in the number of neighbors of a node) and cheap for small neighborhoods.
Inserting edges is very cheap (constant time)
To give you an example (first with all weights 1)
using Graph = std::vector<std::vector<int>>;
now you can create a graph with n nodes with:
Graph mygraph(n);
And if you want to connect node i and j just do
mygraph[i].push_back(j);
mygraph[j].push_back(i);
And to traverse all edges of some node, you can simply do
for (int neighbor : mygraph[i]) {
std::cout << i << " is connected with " << neighbor << std::endl;
}
And now for the harder problem with general weights:
using Graph = std::vector<std::vector<std::pair<int, double>>>;
Graph myWeightedgraph(n);
Now you can insert edges very easily
double weight = 123.32424;
myWeightedgraph[i].push_back({j, w});
myWeightedgraph[j].push_back({i, w});
And for traversal:
for (auto& neighbor : myWeightedgraph[i]) {
std::cout << i << " is connected with " << neighbor.first << " with weight " << neighbor.second << std::endl;
}
If two nodes can't have multiple edges between them:
First think of some system how to give each existing edge an unique number.
Eg. for N nodes and node numbers netween 0 and N-1, a edge between node A and node B could simply have A*N+B (eg. in an uint64_t variable)
Then make a std::map of edges, with the calculated number as key and the weight as value. Most operations there have logarithmic time, which is not as good than the 2D array but still good, and you need much less memory.
There are generally two ways to represent graphs. As you stated, the first one is to use an adjacency matrix. The pros are that you can easily see if two nodes i and j are connected. The downside is the space complexity (O(V²) where V is the number of vertices).
The other one is the adjacency list: for each vertex, you store an adjacency list that contains every edge coming out of that vertex. Obviously, the spatial complexity is O(V + E) where V is the number of vertices and E the number of edges.
Note that you can store the edges in adjacency maps instead of lists. Let's say you give each edge a unique integer key. If your graph is sparse, an std::unordered_map would fit well since collisions odds will be low. This grants you on average O(1) lookup and insertion complexity for a given edge.
If your graph can have a huge number of edges, then just use a regular std::map which relies on red black trees. You'll then have a logarithmic complexity for both inserting or looking up a node.
Here is some sample code:
struct Edge {
int weight;
int start, end;
}
struct Vertex {
int key;
std::unordered_map<int, Edge> adjacency_map;
}
struct Graph {
std::vector<Edge> edges;
}
You can't allocate an array with size of orders 10^9 as a static memory. You should be using malloc instead. Better still, you can use adjacency list to store the graph.

Code Output on Graph and some claims on Local Contest?

I ran into a question as follows:
We have a Code on Weighted, Acyclic Graph G(V, E) with positive and negative edges. we change the weight of this graph with following code, to give a G without negative edge (G'). if V={1,2...,n} and G_ij be a weight of edge i to edge j.
Change_weight(G)
for i=i to n
for j=1 to n
c_i=min c_ij for all j
if c_i < 0
c_ij = c_ij-c_i for all j
c_ki = c_ki+c_i for all k
We have two axioms:
1) the shortest path between every two vertex in G is the same as G'.
2) the length of shortest path between every two vertex in G is the same as G'.
We want to verify these two sentence. which one is True and Which one is false. Who can add some hint why these are true or false?
My Solution:
I think two is false as following counter example, the original graph is given in left, and after the algorithm is run, the result is in right the shortest path between 1 to 3 changed, it passed from vertex 2 but after the algorithm is run it never passed from vertex 2.
Assumptions:
There are a few problems with your presentation of the question; I made some assumptions, which I clarify here. The answer to your question, given that these assumptions are correct, is in the section below.
First, as #amit said, your use of j is not clear. It seems that you meant this:
Change_weight(G)
for i = 1 to n
c_i = min(c_ij) for all j
if c_i < 0
c_ij = c_ij-c_i for all j
c_ki = c_ki+c_i for all k
That is, for every vertex i, if the smallest outgoing edge c_i is negative, then increase the weights of all outgoing edges by -c_i and decrease the weights of all incoming edges by -c_i. Then the smallest outgoing edge will have weight of 0.
Second, by itself, this algorithm will not guarantee that G' has no negative edges! Consider the following graph:
Here, the value of edge (1,2) is pushed up to 0 by the operation on 1, but it is pushed back to -1 by the operation on 2. You must specify that the graph is in reverse topological order, so that edge (i,j) will always be operated on by j before being operated on by i. (Alternatively, you could sort it in topological order and iterate from n to 1.)
Answer to your question:
1) The shortest path between every two vertices in G is the same as in G'.
This is true. Consider a path not as a tuple of edges but as a tuple of nodes. For vertices s and t, a path is a tuple of nodes (s, v_1, v_2, ..., t) where there is an edge between every two subsequent elements. For every vertex u, u decreased the cost of incoming edges at the same rate that it increased the cost of outgoing edges; therefore, the relative cost of including u in the path is unchanged.
2) The weight of the shortest path between every two vertices in G is the same as in G'.
This is false. The source s increases its outgoing weight by -c_s, while the destination t decreases its incoming weight by -c_t. If c_s != c_t, then the weight of the path will not be the same.
To reiterate, the weight of every path from s to t will be increased by (c_t-c_s). Therefore, the shortest path for a given s and t pair will still be the shortest (since all paths between this pair change by the same amount). However, the weight will obviously not necessarily be the same.

How can I find a negative weighted cycle of 3 edges in a graph?

I have a directed graph with about 10,000 nodes. All edges are weighted. I want to find a negative cycle containing only 3 edges. Is there any algorithm quicker than O(n^3)?
a sample code: (g is my graph)
if (DETAILS) std::printf ("Calculating cycle of length 3.\n");
for (int i=0;i<NObjects;i++)
{
for (int j=i+1;j<NObjects;j++)
{
for (int k=j+1;k<NObjects;k++)
{
if ((d= g[i][j]+g[j][k]+g[k][i])<0)
{
results[count][0] = i;
results[count][1] = j;
results[count][2] = k;
results[count][3] = d;
count++;
if (count>=MAX_OUTPUT_SIZE3)
goto finish3;
}
if ((d= g[i][k]+g[k][j]+g[j][i])<0)
{
results[count][0] = j;
results[count][1] = i;
results[count][2] = k;
results[count][3] = d;
count++;
if (count>=MAX_OUTPUT_SIZE3)
goto finish3;
}
}
}
}
finish3:
I cannot think of any algorithm with definite comlexity lower than O(n3), but also the constant factor is important in practice. The following algorithm allows pruning to speed up finding a cycle of length 3 with negative sum of weights - or checking that there is no such cycle.
sort the (directed) edges according to their weight
take the edge with the lowest weight as starting edge.
try all edges connected to the end vertex of the starting edge with a weight not lower than the starting edge (1st pruning) and check the sum of weights when you close the cycle. If you find a cycle with negative sum you are done.
continue with the edge with the next lowest weight as the starting edge. If its weight is negative goto 3 - otherwise you are done (2nd pruning)
The idea is that at least one of the edges of a cylce with negative sum must have a negative weight. And that we can start a cycle at the edge with lowest weight in the cycle.
If you know that the number edges with negative weights is O(n) then this algorithm will be O(n2 ld n) since the algorithm will then be dominated by step 1 (= sorting the edges according to their weight).