Code Output on Graph and some claims on Local Contest? - c++

I ran into a question as follows:
We have a Code on Weighted, Acyclic Graph G(V, E) with positive and negative edges. we change the weight of this graph with following code, to give a G without negative edge (G'). if V={1,2...,n} and G_ij be a weight of edge i to edge j.
Change_weight(G)
for i=i to n
for j=1 to n
c_i=min c_ij for all j
if c_i < 0
c_ij = c_ij-c_i for all j
c_ki = c_ki+c_i for all k
We have two axioms:
1) the shortest path between every two vertex in G is the same as G'.
2) the length of shortest path between every two vertex in G is the same as G'.
We want to verify these two sentence. which one is True and Which one is false. Who can add some hint why these are true or false?
My Solution:
I think two is false as following counter example, the original graph is given in left, and after the algorithm is run, the result is in right the shortest path between 1 to 3 changed, it passed from vertex 2 but after the algorithm is run it never passed from vertex 2.

Assumptions:
There are a few problems with your presentation of the question; I made some assumptions, which I clarify here. The answer to your question, given that these assumptions are correct, is in the section below.
First, as #amit said, your use of j is not clear. It seems that you meant this:
Change_weight(G)
for i = 1 to n
c_i = min(c_ij) for all j
if c_i < 0
c_ij = c_ij-c_i for all j
c_ki = c_ki+c_i for all k
That is, for every vertex i, if the smallest outgoing edge c_i is negative, then increase the weights of all outgoing edges by -c_i and decrease the weights of all incoming edges by -c_i. Then the smallest outgoing edge will have weight of 0.
Second, by itself, this algorithm will not guarantee that G' has no negative edges! Consider the following graph:
Here, the value of edge (1,2) is pushed up to 0 by the operation on 1, but it is pushed back to -1 by the operation on 2. You must specify that the graph is in reverse topological order, so that edge (i,j) will always be operated on by j before being operated on by i. (Alternatively, you could sort it in topological order and iterate from n to 1.)
Answer to your question:
1) The shortest path between every two vertices in G is the same as in G'.
This is true. Consider a path not as a tuple of edges but as a tuple of nodes. For vertices s and t, a path is a tuple of nodes (s, v_1, v_2, ..., t) where there is an edge between every two subsequent elements. For every vertex u, u decreased the cost of incoming edges at the same rate that it increased the cost of outgoing edges; therefore, the relative cost of including u in the path is unchanged.
2) The weight of the shortest path between every two vertices in G is the same as in G'.
This is false. The source s increases its outgoing weight by -c_s, while the destination t decreases its incoming weight by -c_t. If c_s != c_t, then the weight of the path will not be the same.
To reiterate, the weight of every path from s to t will be increased by (c_t-c_s). Therefore, the shortest path for a given s and t pair will still be the shortest (since all paths between this pair change by the same amount). However, the weight will obviously not necessarily be the same.

Related

How to calculate the total distance between various vertices in a graph?

Let's say I have a weighted, undirected, acyclic graph with no negative value weights, comprised of n vertices and n-1 edges. If I want to calculate the total distance between every single one of them (using edge weight) and then add it up, which algorithm should I use? If for example a graph has 4 vertices, connected like a-b, a-c, c-d then the program should output the total distance needed to go from a-d, a-c, a-b, b-c, b-d and so on. You could call it every possible path between the given vertices. The language I am using is C++.
I have tried using Dijikstra's and Prim's algorithm, but none have worked for me. I have thought about using normal or multisource DFS, but I have been struggling with it for some time now. Is there really a fast way to calculate it, or have I misunderstood the problem entirely?
Since you have an acyclic graph, there is only one possible path between any two points. This makes things a lot simpler to compute and you don't need to use any real pathfinding algorithms.
Let's say we have an edge E that connects nodes A and B. Calculate how many nodes can be reached from node A, not using edge E (including A). Multiply that by the number of nodes that can be reached from node B, not using edge E (including B). Now you have the number of paths that travel through edge E. Multiply this by the weight of edge E, and you have the total contribution of edge E to the sum.
Do the same thing for every edge and add up the results.
To make the algorithm more efficient, each edge can store cached values that say the number of nodes that are reachable on each side of the edge.
You don't have to use a depth first search. Here is some pseudocode showing how you calculate the number of nodes reachable on a side of edge E very fast taking advantage of caching:
int count_nodes_reachable_on_edge_side(Edge e, Node a) {
// assume edge e directly connects to node a
if (answer is already cached in e) { return the answer; }
answer = 1; // a is reachable
for each edge f connected to node a {
if (f is not e) {
let b be other node f touches (not a)
answer += count_nodes_reachable_on_edge_side(f, b)
}
}
cache the answer in edge e;
return answer;
}
I already presented an O(N^2) algorithm in my other answer, but I think you can actually do this in O(N) time with this pseudo code:
let root be an arbitrary node on the graph;
let total_count be the total number of nodes;
let total_cost be 0;
process(root, null);
// Returns the number of nodes reachable from node n without going
// through edge p. Also adds to total_cost the contribution from
// all edges touching node n, except for edge p.
int process(Node n, Edge p)
{
count = 1
for each edge q that touches node n {
if (q != p) {
let m be the other node connected to q (not n)
sub_count = process(m, q)
total_cost += weight(q) * sub_count * (total_count - sub_count)
count += sub_count
}
}
return count
}
The run time of this is O(N), where N is the number of nodes, because process will be called exactly once for each node.
(For the detail-oriented readers: the loop inside process does not matter: there are O(N) iterations that call process, because process is called on each node exactly once. There are O(N) iterations that don't do anything (because q == p), because those iterations can only happen once for process call.)
Every edge will also be visited. After we recursively count the number of nodes on one side of the edge, we can do a simple subtraction (total_count - sub_count) to get the number of nodes on the other side of the edge. When we have these two node counts, we can just multiply them together to get the total number of paths going through the edge, then mulitply that by the weight, and add it to the total cost.

Is it possible to achieve desired permutation of values for vertices with adjacent swaps?

Consider an arbitrary connected (acyclic/cyclic) undirected graph with N Vertices, with vertex numbered from 1 to N. Each vertex has some value assigned to it. Let the values be denoted by A1, A2, A3, ... AN, where A[i] denotes value of ith vertex. Let P be a permutation of A. Each operation, we can swap values of two adjacent vertices. Is it possible to achieve A = P, i.e. after all swapping operation A[i] = P[i] for all 1 <= i <= N. In other words, each vertex i should have value P[i] after the operations.
P.S - I was confused about where to ask this - stack overflow or math.stack exchange. Apologies in advance.
Edit 1: I think the answer should be Yes. But I am only saying this on basis of case analysis of different types of graphs of 5 vertices. I tried to modify the permutation to Q where Q1 < Q2 < .. This changes a problem a bit that now final state should be A1 < A2 < A3... AN. So it can be said can the graph be sorted? Please correct me if my assumption is wrong.
Indeed this is possible. Since we've got a connected graph, we can remove edges until you've got a tree. Removing an edge simply means we won't use it to do adjacent swaps in this case. "Removing a node" simply means we'll never swap the value of the node.
Now we can use the following algorithm to produce the permutation:
Choose a leaf and determine the position of the value intended to be located there after the permutation. Repeatedly swap the value with the next one on the path to the leaf until the value reaches the leaf.
Remove the leaf from the tree; the resulting graph still is a tree
Continue with 1., if there are any nodes left.
In each iteration we reduce the size of the graph by 1 by doing a number of swaps that can be bounded from above by the number of nodes, so with a finite number of swaps we're able to produce the premutation. The algorithm may not yield a solution using the optimum number of swaps, but it shows that it can be done.

What would be the fastest algorithm to randomly select N items from a list based on weights distribution?

I have a large list of items, each item has a weight.
I'd like to select N items randomly without replacement, while the items with more weight are more probable to be selected.
I'm looking for the most performing idea. Performance is paramount. Any ideas?
If you want to sample items without replacement, you have lots of options.
Use a weighted-choice-with-replacement algorithm to choose random indices. There are many algorithms like this. One of them is WeightedChoice, described later in this answer, and another is rejection sampling, described as follows. Assume that the highest weight is max, there are n weights, and each weight is 0 or greater. To choose an index in [0, n) using rejection sampling:
Choose a uniform random integer i in [0, n).
With probability weights[i]/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is weights[i] or less, return i, or go to step 1 otherwise.)
Each time the weighted choice algorithm chooses an index, set the weight for the chosen index to 0 to keep it from being chosen again. Or...
Assign each index an exponentially distributed random number (with a rate equal to that index's weight), make a list of pairs assigning each number to an index, then sort that list by those numbers. Then take each item from first to last, in ascending order. This sorting can be done on-line using a priority queue data structure (a technique that leads to weighted reservoir sampling). Notice that the naïve way to generate the random number, -ln(1-RNDU01())/weight, where RNDU01() is a uniform random number in [0, 1], is not robust, however ("Index of Non-Uniform Distributions", under "Exponential distribution").
Tim Vieira gives additional options in his blog.
A paper by Bram van de Klundert compares various algorithms.
EDIT (Aug. 19): Note that for these solutions, the weight expresses how likely a given item will appear first in the sample. This weight is not necessarily the chance that a given sample of n items will include that item (that is, an inclusion probability). The methods given above will not necessarily ensure that a given item will appear in a random sample with probability proportional to its weight; for that, see "Algorithms of sampling with equal or unequal probabilities".
Assuming you want to choose items at random with replacement, here is pseudocode implementing this kind of choice. Given a list of weights, it returns a random index (starting at 0), chosen with a probability proportional to its weight. This algorithm is a straightforward way to implement weighted choice. But if it's too slow for you, see my section "Weighted Choice With Replacement" for a survey of other algorithms.
METHOD WChoose(weights, value)
// Choose the index according to the given value
lastItem = size(weights) - 1
runningValue = 0
for i in 0...size(weights) - 1
if weights[i] > 0
newValue = runningValue + weights[i]
lastItem = i
// NOTE: Includes start, excludes end
if value < newValue: break
runningValue = newValue
end
end
// If we didn't break above, this is a last
// resort (might happen because rounding
// error happened somehow)
return lastItem
END METHOD
METHOD WeightedChoice(weights)
return WChoose(weights, RNDINTEXC(Sum(weights)))
END METHOD
Let A be the item array with x itens. The complexity of each method is defined as
< preprocessing_time, querying_time >
If sorting is possible: < O(x lg x), O(n) >
sort A by the weight of the itens.
create an array B, for example:
B = [ 0, 0, 0, x/2, x/2, x/2, x/2, x/2 ].
it's clear to see that B has a bigger probability from choosing x/2.
if you haven't picked n elements yet, choose a random element e from B.
pick a random element from A within the interval e : x-1.
If iterating through the itens is possible: < O(x), O(tn) >
iterate through A and find the average weight w of the elements.
define the maximum number of tries t.
try (at most t times) to pick a random number in A whose weight is bigger than w.
test for some t that gives you good/satisfactory results.
If nothing above is possible: < O(1), O(tn) >
define the maximum number of tries t.
if you haven't picked n elements yet, take t random elements in A.
pick the element with biggest value.
test for some t that gives you good/satisfactory results.

If edges are not inserted in the deque in sorted order of weights, does 0-1 BFS produce the right answer?

The general trend of 0-1 BFS algorithms is: if the edge is encountered having weight = 0, then the node is pushed to the front of the deque and if the edge's weight = 1, then it will be pushed to the back of the deque.
If we randomly push the edges, then can 0-1 BFS calculate the right answer? What if edges are entered in the deque are not in sorted order of their weights?
This is the general 0-1 BFS algorithm. If I skip out the last if and else parts and randomly push the edges, then what will happen?
To me, it should work, but then why is this algorithm made in this way?
void bfs (int start)
{
std::deque<int> Q; // double ended queue
Q.push_back(start);
distance[start] = 0;
while(!Q.empty())
{
int v = Q.front();
Q.pop_front();
for(int i = 0 ; i < edges[v].size(); i++)
{
// if distance of neighbour of v from start node is greater than sum of
// distance of v from start node and edge weight between v and its
// neighbour (distance between v and its neighbour of v) ,then change it
if(distance[edges[v][i].first] > distance[v] + edges[v][i].second)
{
distance[edges[v][i].first] = distance[v] + edges[v][i].second;
// if edge weight between v and its neighbour is 0
// then push it to front of
// double ended queue else push it to back
if(edges[v][i].second == 0)
{
Q.push_front(edges[v][i].first);
}
else
{
Q.push_back(edges[v][i].first);
}
}
}
}
}
It is all a matter of performance. While random insertion still finds the shortest path, you have to consider a lot more paths (exponential in the size of the graph). So basically, the structured insertion guarantees a linear time complexity. Let's start with why the 0-1 BFS guarantees this complexity.
The basic idea is the same as the one of Dijkstra's algorithm. You visit nodes ordered by their distance from the start node. This ensures that you won't discover an edge that would decrease the distance to a node observed so far (which would require you to compute the entire subgraph again).
In 0-1 BFS, you start with the start node and the distances in the queue are just:
d = [ 0 ]
Then you consider all neighbors. If the edge weight is zero, you push it to the front, if it is one, then to the back. So you get a queue like this:
d = [ 0 0 0 1 1]
Now you take the first node. It may have neighbors for zero-weight edges and neighbors for one-weight edges. So you do the same and end up with a queue like this (new node are marked with *):
d = [ 0* 0* 0 0 1 1 1*]
So as you see, the nodes are still ordered by their distance, which is essential. Eventually, you will arrive at this state:
d = [ 1 1 1 1 1 ]
Going from the first node over a zero-weight edge produces a total path length of 1. Going over a one-weight edge results in two. So doing 0-1 BFS, you will get:
d = [ 1* 1* 1 1 1 1 2* 2*]
And so on... So concluding, the procedure is required to make sure that you visit nodes in order of their distance to the start node. If you do this, you will consider every edge only twice (once in the forward direction, once in the backward direction). This is because when visiting a node, you know that you cannot get to the node again with a smaller distance. And you only consider the edges emanating from a node when you visit it. So even if the node is added to the queue again by one of its neighbors, you will not visit it because the resulting distance will not be smaller than the current distance. This guarantees the time complexity of O(E), where E is the number of edges.
So what would happen if you did not visit nodes ordered by their distance from the start node? Actually, the algorithm would still find the shortest path. But it will consider a lot more paths. So assume that you have visited a node and that node is put in the queue again by one of its neighbors. This time, we cannot guarantee that the resulting distance will not be smaller. Thus, we might need to visit it again and put all its neighbors in the queue again. And the same applies to the neighbors, so in the worst case this might propagate through the entire graph and you end up visiting nodes over and over again. You will find a solution eventually because you always decrease the distance. But the time needed is far more than for the smart BFS.

What Probabilistic Function to use

I want to probabilistically chose "n" edges from "e" edges_in_sorted_order in a vector.
But i want to use probability in choosing. And i also want not to choose big edges in the starting.
so its like giving more weightage to smaller edges in the starting, and as i take edges,
i would give more and more weightage to bigger remaining edges too.
what probabilistic function of n and e should i choose ?
while( edgesTaken < n ) {
for each edge i and edgesTaken < n
probability = pdf( edgesTaken, i)
if ( prob > THRESHOLD )
take the edge
}
You need the quantile function for the distribution you want. Draw a random number using a standard generator to give you q uniformally distributed in [0, 1). Then call the quantile function with q as the parameter. The resulting random set will have the required distribution.
The probability that the first edge is 1 is choose(n-1,e-1)/choose(n,e).
More generally, the probability that the first edge is k is
[choose(n-k,e-1)/choose(n,e)] * 1/k
You also might want the probability that there is exactly one edge in 1-k:
[choose(n-k,e-1)/choose(n,e)]
From here I think you can wrap things up!
P.S. Just to explain, the three functions give the ratio of the number of ways to pick edges that satisfy their condition, to choose(n,e) which is the number of ways to pick e edges from n.