Breadth-first limiting on each node within a result in Gremlin? - amazon-web-services

Background
Looking at the below image, we're facing an issue with how we want to do our limiting at a breadth-level.
The goal is to ensure that off each neighbor, we never read more than X edges off the current node to avoid timeouts on nodes with a large amount of edges.
Example
We have a max-breadth limit of X where X is the number of neighbors we aggregate off a single node. We begin a BFS traversal from 0 and aggregate 3, 1 and 2.
Assuming our max-breath limit is 3, the problem that can occur is that we first pull 1 and immediately begin reading all of 1's neighbors. As a result, we completely disregard the neighbors that could exist off of nodes 3 and 2 because 1 would fulfill the max-breadth.
Question
How can we, in Gremlin (in a single query), say that we want an edge limit for each node in my list of neighbors?
In other words, I want X neighbors from node 3, X neighbors from node 1 and X neighbors from node 2. This idea should hold true recursively up until some depth D.
Attempt
g.V(idsList).outE().limit(50).inV().dedup.by(T.id).fold()
The issue with the above is that we blindly limit all edges from all neighbors to X which can favor a single subgraph.

Related

Find minimum steps to reach the end of graph

Given a directed graph which has nodes from 1 to N where each node has a color(blue/red) which is given to us. Now nodes are connected and each edge has some weight. We have to reach N from 1 such that in the path abs(x-y)<=k where x is the number of red nodes and y is the number of blue nodes and k is some integer value.
I tried solving using the Dijkstra shortest path algorithm but that will work only when there is no condition on no. of blue and red balls. So how should things proceed with Dijkstra or this requires something else?

how to find S or less nodes in tree with minimum distance? [duplicate]

Given an unoriented tree with weightless edges with N vertices and N-1 edges and a number K find K nodes so that every node from a tree is within S distance of at least one of the K nodes. Also, S has to be the smallest possible S, so that if there were S' < S at least one node would be unreachable in S' steps.
I tried solving this problem, however, I feel that my supposed solution is not very fast.
My solution:
set x=1
find nodes which are x distance from every node
let the node which has the most nodes in its distance be one of the K nodes.
recompute for every node whilst not counting already covered nodes.
do this till I find K number of K nodes. Then if every node is covered we are done else increase x.
This problem is called p-center, and you can find several papers online about it such as this. It is indeed NP for general graphs, but polynomial on trees, both weighted and unweighted.
For me it looks like a clustering problem. Try it with the k-Means (wikipedia) algorithm where k equals to your K. Since you have a tree and all vertices are connected, you can use as distance measurement the distance/number of edges between your vertices.
When the algorithm converts you get the K nodes which should be found. Then you can determine S by iterating through all k clusters. There you calculate the maximum distance for every node in the cluster to the center node. And the overall max should be S.
Update: But actually I see that the k-means algorithm does not produce a global optimum, so this algorithm wouldn't also produce the best result ...
You say N nodes and N-1 vertices so your graph is a tree. You are actually looking for a connected K-subset of nodes minimizing the longest edge.
A polynomial algorithm may be:
Sort all your edges increasing distance.
Then loop on edges:
if none of the 2 nodes are in a group, create a new group.
else if one node is in 1 existing goup, add the other to the group
else both nodes are in 2 different groups, then fuse the groups
When a group reach K, break the loop and you have your connected K-subset.
Nevertheless, you have to note that your group can contain more than K nodes. You can imagine the problem of having 4 nodes, closed two by two. There would be no exact 3-subset solution of your problem.

Efficient algorithm to find weights of all cycles in an undirected weighted graph

My aim is to find all the cycles and their respective weights in an weighted undirected graph. The weight of a cycle is defined as sum of the weights of the paths that constitute the cycle. My preset algorithm does the following:
dfs(int start, int now,int val)
{
if(visited[now])
return;
if(now==start)
{
v.push_back(val);// v is the vector of all weights
return;
}
dfs through all nodes neighbouring to now;
}
I call dfs() from each start point:
for(int i=0;i<V;++i)
{
initialise visited[];
for(int j=0;j<adj[i].size();++j)// adj is the adjacency matrix
dfs(i,adj[i][j].first,adj[i][j].second);
// adj is a vector of vector of pairs
// The first element of the pair is the neighbour index and the second element is the weight
}
So the overall complexity of this algorithm is O(V*E)(I think so). Can anyone suggest a better approach?
Since not everyone defines it the same way, I assume...
the weigths are on the edges, not vertices
the only vertex in a cycle that is visited more than one time is the start/end vertex
a single vertex, or two connected vertices, are no cycle, ie. it needs at least three vertices
between two vertices in your graph, there can't be more than one edge (no "multigraph")
Following steps can determine if (at least) one odd-weighted cycle exists:
Remove all vertices that have only 0 or 1 connected edges (not really necessary, but it might be faster with it).
Split every even-weighted edge (only them, not the odd-weighted ones!) by inserting a new vertex. Eg. if the egde between vertex A and B has weight 4, it should become A-Z 2 and Z-B 2, or A-Z 3 and Z-B 1, or something like that.
The actual weight distribution is not important, you don't even need to save it. Because, starting after this step, all weights are not necessary anymore.
What did this actually do? Think like every odd weight is 1, and every even one is 2. (This doesn't change if there is a odd-weighted cycle: If 3+4+8 is odd then 1+2+2 is too). Now you're splitting all 2 into two 1. Since now the only existing weight is 1, determining if the sum is odd is the same as determining if the edge "count" is odd.
Now, for checking bipartiteness / 2coloring:
You can use a modified DFS here
A vertex can be unknown, 0, or 1. When starting, assign 0 to a single vertex, all others are unknown. The unknown neighbors of a 0-vertex always get 1, and the ones of a 1-vertex always get 0.
While checking neighbors of a vertex if they were already visited, check too if the number is different from the vertex you're processing now. If not, you just found out that your graph has odd-weigthed cycles and you can stop everything.
If you reach then end of DFS without finding that, there are no odd-weighted cycles.
For the implementation, note that you could reach the "end" of DFS while there are still unvisited vertices, namely if you have a disconnected graph. If so, you'll need to set one of the remaining vertices to a known number (0) and continue DFS from there on.
Complexity O(V + E) (this time really, instead of a exponential thing or not-working solutions).

Disconnected node during Graph traversal

I have been going through Breadth First Traversal at this link
Breadth First Traversal
Now what if the graph structure is changed to this
The node 3 is now disconnected from the graph.
When traversal program is now used, it does'nt display vertex 3.
Is there a way where we can dispaly this vertex as well?
To my understanding, BFS would keep looking for unvisited nodes as long as they exist; however, if this is not done, BFS only visits nodes in the connected component of the initial vertex. This seems to be more a matter of definition than an actual programming problem; simply restart the BFS implementation on unvisited nodes as long as they exist - if visiting of all connected components is desired.
Many implementations of BFS/DFS assume implicitly that the graph is connected.
Is there a way where we can dispaly this vertex as well?
Yes there is. If after finishing BFS still there are some unvisited vertices, enqueue them into the queue.
If you have a list of all nodes, your graph search algorithm of choice (DFS/BFS) will discover the connected components one at a time.
You could do this in the following way.
For example, consider your example graph in which there are 4 nodes and edges between 0, 2, 2, 0 and 1, 2 and node 3 has no incoming or outgoing edges.
You would have a list of nodes {0, 1, 2, 3}
And to discover all connected components, you would do the following:
Initialize visited array. Set all nodes to false
for node in list:
if not visited: dfs(node)
where dfs is implemented in the usual way. Here when you run the code on our list {0,1,2,3}, the nodes {0,1,2} will be visited by the first dfs call and 0,1,2 will be marked visited. Then when we come across 3, since it is not visited, there will be another dfs call.
Hope you get the idea.

Subset of vertices

I have a homework problem and i don't know how to solve it. If you could give me an idea i would be very grateful.
This is the problem:
"You are given a connected undirected graph which has N vertices and N edges. Each vertex has a cost. You have to find a subset of vertices so that the total cost of the vertices in the subset is minimum, and each edge is incident with at least one vertex from the subset."
Thank you in advance!
P.S: I have tought about a solution for a long time, and the only ideas i came up with are backtracking or an minimum cost matching in bipartite graph but both ideas are too slow for N=100000.
This may be solved in linear time using dynamic programming.
A connected graph with N vertices and N edges contains exactly one cycle. Start with detecting this cycle (with the help of depth-first search).
Then remove any edge on this cycle. Two vertices incident to this edge are u and v. After this edge removal, we have a tree. Interpret it as a rooted tree with the root u.
Dynamic programming recurrence for this tree may be defined this way:
w0[k] = 0 (for leaf nodes)
w1[k] = vertex_cost (for leaf nodes)
w0[k] = w1[k+1] (for nodes with one descendant)
w1[k] = vertex_cost + min(w0[k+1], w1[k+1]) (for nodes with one descendant)
w0[k] = sum(w1[k+1], x1[k+1], ...) (for branch nodes)
w1[k] = vertex_cost + sum(min(w0[k+1], w1[k+1]), min(x0[k+1], x1[k+1]), ...)
Here k is the node depth (distance from root), w0 is cost of the sub-tree starting from node w when w is not in the "subset", w1 is cost of the sub-tree starting from node w when w is in the "subset".
For each node only two values should be calculated: w0 and w1. But for nodes that were on the cycle we need 4 values: wi,j, where i=0 if node v is not in the "subset", i=1 if node v is in the "subset", j=0 if current node is not in the "subset", j=1 if current node is in the "subset".
Optimal cost of the "subset" is determined as min(u0,1, u1,0, u1,1). To get the "subset" itself, store back-pointers along with each sub-tree cost, and use them to reconstruct the subset.
Due to the number of edges are strict to the same number of vertices, so it's not the common Vertex cover problem which is NP-Complete. I think there's a polynomial solution here:
An N vertices and (N-1) edges graph is a tree. Your graph has N vertices and N edges. Firstly find the awful edge causing a loop and make the graph to a tree. You could use DFS to find the loop (O(N)). Removing any one of the edges in the loop would make a possible tree. In extreme condition you would get N possible trees (the raw graph is a circle).
Apply a simple dynamic planning algorithm (O(N)) to each possible tree (O(N^2)), then find the one with the least cost.