How to find the nonidentical elements from multiple vectors?

How to find the nonidentical elements from multiple vectors? - c++

Given several vectors/sets, each of which contains multiple integer numbers which are different within one vector. Now I want to check, whether there exists a set which is composed by extracting only one element from each given vectors/sets, in the same time the extracted numbers are nonidentical from each other.
For example, given sets a, b, c, d as:
a <- (1,3,5);
b <- (3,6,8);
c <- (2,3,4);
d <- (2,4,6)
I can find out sets like (1, 8, 4, 6) or (3, 6, 2, 4) ..... actually, I only need to find out one such set to prove the existence.
applying brutal force search, there can be maximal m^k combinations to check, where m is the size of given sets, k is the number of given sets.
Are there any cleverer ways?
Thank you!

You can reformulate your problem as a matching in a bipartite graph:
the node of the left side are your sets,
the node of the right side are the integer appearing in the sets.
There is an edge between a "set" node and an "integer" node if the set contains the given integer. Then, you are trying to find a matching in this bipartite graph: each set will be associated to one integer and no integer will be used twice. The running time of a simple algorithm to find such a matching is O(|V||E|), here |V| is smaller than (m+1)k and |E| is equal to mk. So you have a solution in O(m^2 k^2). See: Matching in bipartite graphs.
Algorithm for bipartite matching:
The algorithm works on oriented graphs. At the beginning, all edges are oriented from left to right. Two nodes will be matched if the edge between them is oriented from right to left, so at the beginning, the matching is empty. The goal of the algorithm is to find "augmenting paths" (or alternating paths), i.e. paths that increase the size the matching.
An augmenting path is a path in the directed graph starting from an unmatched left node and ending at an unmatched right node. Once you have an augmenting path, you just have to flip all the edges along the path to one increment the size of the matching. (The size of the matching will be increased because you have one more edge not belonging to the matching. This is called an alternating path because the path alternate between edges not belonging to the matching, left to right, and edges belonging to the matching, right to left.)
Here is how you find an augmenting path:
all the nodes are marked as unvisited,
you pick an unvisited and unmatched left node,
you do a depth first search until you find an unmatched right node (then you have an augmenting path). If you cannot find an unmatched right node, you go to 2.
If you cannot find an augmenting path, then the matching is optimal.
Finding an augmenting path is of complexity O(|E|), and you do this at most min(k, m) times, since the size of best matching is bounded by k and m. So for your problem, the complexity will be O(mk min(m, k)).
You can also see this reference, section 1., for a more complete explanation with proofs.

Related

how to find S or less nodes in tree with minimum distance? [duplicate]

Given an unoriented tree with weightless edges with N vertices and N-1 edges and a number K find K nodes so that every node from a tree is within S distance of at least one of the K nodes. Also, S has to be the smallest possible S, so that if there were S' < S at least one node would be unreachable in S' steps.
I tried solving this problem, however, I feel that my supposed solution is not very fast.
My solution:
set x=1
find nodes which are x distance from every node
let the node which has the most nodes in its distance be one of the K nodes.
recompute for every node whilst not counting already covered nodes.
do this till I find K number of K nodes. Then if every node is covered we are done else increase x.

This problem is called p-center, and you can find several papers online about it such as this. It is indeed NP for general graphs, but polynomial on trees, both weighted and unweighted.

For me it looks like a clustering problem. Try it with the k-Means (wikipedia) algorithm where k equals to your K. Since you have a tree and all vertices are connected, you can use as distance measurement the distance/number of edges between your vertices.
When the algorithm converts you get the K nodes which should be found. Then you can determine S by iterating through all k clusters. There you calculate the maximum distance for every node in the cluster to the center node. And the overall max should be S.
Update: But actually I see that the k-means algorithm does not produce a global optimum, so this algorithm wouldn't also produce the best result ...

You say N nodes and N-1 vertices so your graph is a tree. You are actually looking for a connected K-subset of nodes minimizing the longest edge.
A polynomial algorithm may be:
Sort all your edges increasing distance.
Then loop on edges:
if none of the 2 nodes are in a group, create a new group.
else if one node is in 1 existing goup, add the other to the group
else both nodes are in 2 different groups, then fuse the groups
When a group reach K, break the loop and you have your connected K-subset.
Nevertheless, you have to note that your group can contain more than K nodes. You can imagine the problem of having 4 nodes, closed two by two. There would be no exact 3-subset solution of your problem.

Extracting operations from Damerau-Levenshtein

The Damerau-Levenshtein distance tells you the number of additions, deletions, substitutions and transpositions between two words (the latter is what differentiates DL from Levenshtein distance).
The algo is on wikipedia and relatively straightforward. However I want more than just the distance; I want the actual operations.
Eg a function that takes AABBCC, compares it to ABZ, and returns:
Remove A at index 0 -> ABBCC
Remove B at index 2 -> ABCC
Remove C at index 4 -> ABC
Substitute C at index 5 for Z -> ABZ
(ignore how the indices are affected by removals for now)
It seems you can do something with the matrix produced by the DL calculation. This site produces the output above. The text below says you should walk from the bottom right of the matrix, following each lowest cost operation in each cell (follow the bold cells):
If Delete is lowest cost, go up one cell
For Insert, go left one cell
Otherwise for Substitute, Transpose or Equality go up and left
It seems to prioritise equality or substitution over anything else if there's a tie, so in the example I provided, when the bottom-right cell is 4 for both substitution and removal, it picks substitution.
However once it reaches the top left cell, equality is the lowest scoring operation, with 0. But it has picked deletion, with score 2.
This seems to be the right answer, because if you strictly pick the lowest score, you end up with too many As at the start of the string.
But what's the real steps for picking an operation, if not lowest score? Are there other ways to pick operations out of a DL matrix, and if so do you have a reference?

I missed a vital part of fuzzy-string's explanation of how to reconstruct the operations:
But when you want to see the simplest path, it is determined by working backwards from bottom-right to top-left, following the direction of the minimum Change in each cell. (If the top or left edge is reached before the top-left cell, then the type of Change in the remaining cells is overwritten, with Inserts or Deletes respectively.)
...which explains why the equality operation in cell [1,1] is ignored and the delete is used instead!

BGL - determine all mincuts

Given a graph, I want to find all edges (if any), that if removed spilt the graph into two components.
An initial idea would have been to assign a weight of 1 to all edges, then calculate the mincut of the graph. mincut > 1 implies there is no single edge that when removed causes a split.
For mincut == 1, it would have been nice if the algorithm would provide for each mincut the edges it consists of.
Unfortunately, BGL does not seem to support that kind of thing:
The stoer_wagner_min_cut function determines exactly one of the min-cuts as well as its weight.
(http://www.boost.org/doc/libs/1_59_0/libs/graph/doc/stoer_wagner_min_cut.html)
Is there a way to make this work (i.e. to determine more than one mincut) with the BGL or will I have to come up with something different?

This may come a little bit late...
From what I see you only need to find all edges that don't belong to any cycle in the graph(assuming the graph is already connected).
This can be done by iteratively removing leaf nodes(and the edges connected to a them), much like what you do in topological sorting, until there's no leaf node left i.e. every edge in the remaining graph belongs to at least one cycle. All edges removed during the process will be the ones you want.
In pseudocode, for a connected undirected graph G=(V,E), you can do this:
S = Ø
while(there exists a node n∈V s.t. degree(n)==1)
e = edge connected to n
S = S∪{e}
E = E-{e}
V = V-{n}
return S
which can be done in O(|V|+|E|) time

Graph theory: Breadth First Search

There are n vertices connected by m edges. Some of the vertices are special and others are ordinary. There is atmost one path to move from one vertex to another.
First Query:
I need to find out how many pairs of special vertices exists which are connected directly or indirectly.
My approach:
I"ll apply BFS (via queue )to see how many nodes are connected to each other somehow. Let number of special vertices I discover in this be n, then answer to my query would be nC2. I'll repeat this till all vertices are visited.
Second Query:
How many vertices lie on path between any two special vertices.
My approach:
In my approach for query 1, I'll apply BFS to find out path between any two special vertices and then backtrack and mark the vertices lying on the path.
Problem:
Number of vertices can be as high as 50,000. So, applying BFS and then I guess, backtracking would be slower for my time constraint (2 seconds).
I have list of all vertices and their adjacency list. Now while pushing vertices in my queue while BFS, can I somehow calculate answer to query 2 also? Is there a better approach one can use to solve the problem? Input format will be such that I'll be told whether a vertex is special or not one by one and then I'll be given info about i th pathway which connects two vertices.There is atmost one path to move from one vertex to another.

The first query is solved by splitting your forest in trees.
Starting with the full set of vertices, pick the one, then visit every node you can from there, until you cannot visit any more vertex. This is one tree. Repeat for each tree.
You now have K bags of vertices, each containing 0-j special ones. That answers the first question.
For the second question, I suppose a trivial solution is indeed to BFS the path between a vertex to another for each pair in their sub-graph.
You could also take advantage of the tree nature of your sub-graph. This question: How to find the shortest simple path in a Tree in a linear time? mentions it. (I have not really dug into this yet, though)

For the first query, one round of BFS and some simple calculation as you have described is optimal.
For the second query, assuming the worst case where all vertices are special and the graph is a tree, doing a BFS per query is going to give O(Q|V|) complexity, where Q is the number of queries. You are going to run into trouble if Q is larger than 104 and |V| is also larger than 104.
In the worst case, we are basically solving the all pairs shortest path problem, but on a tree/forest. When |V| is small, we can do BFS on all nodes, which results in O(|V|2) algorithm. However, there is a faster algorithm:
Read all second type queries and store all the pairs in a set S.
For each tree in the forest:
Choose a root node for the current tree. Calculate the distance from the root node to all the other nodes in the current tree (regardless it is special or not).
Calculating lowest common ancestor (LCA) for all pairs of nodes being queried (which are stored in set S). This can be done with Tarjan's offline LCA algorithm.
Calculate the distance between a pair of node by: dist(root, a) + dist(root, b) - dist(root, lca(a,b))

Let the arr be bool array where arr[i] is 1 if it is special and 0 otherwise.
find-set(i) returns the root node of the tree. So any nodes lying in the same tree returns the same number.
for(int i=1; i<n; i++){
for(int j=i+1; j<=n; j++){
if(arr[i]==1 && arr[j]==1){ //If both are special
if(find-set(i)==find-set(j)){ //and both i and j belong to the same tree
//k++ where k is answer to the first query.
//bfs(i,j) and find the intermediate vertices and do ver[i]=1 for the corresponding intermediate vertex/node.
}
}
}
}
finally count no of 1's in ver matrix which is the answer to the second query.

Is there any way to quickly find all the edges that are part of a cycle(back edges) in an undirected/directed graph?

I've got a minimum spanning tree. I add an edge to it. Surely a cycle is formed. I need to find all the edges that are part of that cycle ie., all the back edges. How quickly can that be done? My solution-
For example if it's edge (1,4), add 4 to Adj(1) at all places and run dfs every time. Eg. if Adj(1) had 2,3,5, first add 4 before 2, run DFS. I'll get a back edge. Then add 4 between 2 and 3 and run dfs. I get the another back edge. Then between 3 and 5 and so on. Is there any faster way to do this?

In a tree you have a single (simple) route between any pair of vertices. If you are adding an edge (i,j), first find the route in the tree between i and j and then you will have your cycle - it consists of all the vertices in that route(and turns into a cycle once you add (i,j) as edge).

You are looking for the strongly connected components of the graph, which can be found using Tarjan's algorithm (among others).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js