Maximum Bipartite Matching C++ - c++

I'm solving a matching problem with two vectors of a class
class matching
{
public:
int n;
char match;
};
This is the algorithm I'm trying to implement:
int augment(vector<matching> &left, vector<matching> &right)
{
while(there's no augmenting path)
if(condition for matching)
<augment>
return "number of matching";
}
For the rough matching, if left[i] matches with right[j], then left[i].n = j, left[i].match ='M' , right[j].n = i and right[j].match = 'M' and the unmatched ones have members n = -1 and match = 'U'
While finding the augmenting paths, if one exists for another (i, j), then we change the member match of the one being unmatched from 'M' to 'U' and its n = -1 and the two matched with the augmenting path have their members match changed to 'A' while we change their members n according to their indices.
I don't know if this is the right approach to solving this, this is my first attempt on maximum matching and I've read a lot of articles and watched tutorials online and I can't get my 'code' to function appropriately.
I do not need a code, I can write my code. I just want to understand this algorithm step by step. If someone can give me an algorithm like the one I was trying above, I would appreciate it. Also, if I have been going the wrong direction since, please correct me.

I am not sure if you are finding the augmenting paths correctly. I suggest the following approach.
Find an initial matching in a greedy way. To obtain this we travel through every vertex in the left side and greedily try to match it with some free (unmatched) vertex on the right side.
Try to find an augmenting path P in the graph. For this we need to do a breadth-first search starting from all the free vertices on the left side and alternating through matched and unmatched edges in the search. (i.e. the second level contains all the right side vertices adjacent to level-1
vertices, the third level contains all the left side vertices that are
matched to level-2 vertices, the fourth level contains all the right side
vertices adjacent to level-3 vertices etc). We stop the search when we
visit a free vertex in any future level and compute the augmenting path P
using the breadth-first search tree computed so far.
If we can find an augmenting path P in the previous step: Change the matched and unmatched edges in P to unmatched and matched edges respectively and goto step 2.
Else: The resulting matching obtained is maximum.
This algorithm requires a breadth-first search for every augumentation and so it's worst-case complexity is O(nm). Although Hopcroft-Karp algorithm can perform multiple augmentations for each breadth-first search and has a better worst-case complexity, it
seems (from the Wikipedia article) that it isn't faster in practice.

Related

How to Solve this Modified Word Ladder Problem?

Here is the word ladder problem:
Given two words (beginWord and endWord), and a dictionary's word list, find the length of the shortest transformation sequence from beginWord to endWord, such that:
Only one letter can be changed at a time.
Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
Now along with the modification, we are allowed to delete or add an element.
We have to find minimum steps if possible to convert string1 to string2.
This problem has a nice BFS structure. Let's illustrate this using the example in the problem statement.
beginWord = "hit",
endWord = "cog",
wordList = "hot","dot","dog","lot","log","cog"
Since only one letter can be changed at a time, if we start from "hit", we can only change to those words which have exactly one letter different from it (in this case, "hot"). Putting in graph-theoretic terms, "hot" is a neighbor of "hit". The idea is simply to start from the beginWord, then visit its neighbors, then the non-visited neighbors of its neighbors until we arrive at the endWord. This is a typical BFS structure.
But now since we are allowed to add/delete also how should I proceed further?

how to find the indexes of all matching substring using suffix tree?

I created a suffix tree from this amazing answer. It works like a charm!
For now, if I look for "cat" in "This cat is a pretty cat", it will return 5 as "cat" first appearance as for starting index 5.
But I can't find a way to keep track of all the suffixes in the algorithm to create. So basically, I can find the index of the first match, but not all the different occurrences.
For now, I have:
class Edge
{
int textIndexFrom;
Node* nodefrom;
Node* nodeTo;
int textIndexTo;
}
class Node
{
std::map<char,Edge*> m_childEdges;
Edge* m_pParentEdge;
Node* m_pLinkedNode;
}
I just put the relevant variables in the code above. To store the different starting positions, I imagine a std::vector is needed in Edge, but I don't see when to add a new index. We might use the active point but with the suffix links, it becomes tricky.
Could someone explain?
I assume you constructed a suffix tree for the string S$ where $ is some special character not present in S. The $ char ensures that each suffix has its own leaf in the tree. The number of occurances of word w in S is the number of leaves in the subtree of w.
I think that storing all starting positions in each edge/node would require quadratic memory. For example if T happens to be perfectly balanced binary tree then on level 0 (root) you have n entries, on level 1 you have 2 * n/2 entries and so on. After summing it gives us n^2. It requires proving so please correct me if I'm wrong.
Instead I think its better to keep all the leaves in a list in order they appear in dfs traversal of the tree (left to right if you draw a picture of the tree). And in every node v keep 2 pointers to the elements of that list. First should point to the first leaf of v's subtree and second to the last leaf of v's subtree. All that can be computed by simple dfs procedure. Now if for example 'cat' happens to be in the middle of some edge then go through that edge to some node v and get leaves from that node. In addition in every leaf you should store the length of the path from root to that leaf. It will help you find the position of that particular 'cat' occurance.
Walk the entire cat subtree. Each leaf in that subtree corresponds to a suffix that begins with cat. If you know the length of the string you've matched so far and the length of the string, each time you encounter a leaf you can do a subtraction to find the index of the corresponding occurrence of cat.

Finding connected components in an implicit representation of a graph

I have a CS problem for school that I just can't seem to wrap my head around.
We're given a vector of strings, which composes an "image". This vector of strings effectively represents a 2 dimensional matrix, and, in each of the spaces, there can be one of 4 characters: 'K' - A knight, 'D' - a dragon, '#' - a castle wall, and ' ' - empty space.
The problem requires us to write several functions, for instance: safe_knights - which finds how many knights are positioned such that dragons cannot move to them, castle_count - which counts the castles in the image, and knights_in_castles - which counts how many knights reside within castles. (see definitions below)
The assignment is somewhat vague, and I'm confused as to where to begin with this. The one hint that I have is that we should be looking for connected components of spaces - if I knew which spaces are connected, I'd be able to determine whether a dragon has a path to a knight (safe_knights needs this), or if a region of spaces is enclosed by castle walls (would be used for castle_count).
I know that I can use one of the graph traversal algorithms to help find connected components, but I'm scratching my head over how to implement this to work on the graph which is represented implicitly in the vector of strings. The pieces just aren't coming together for me.
Any hints or ideas to point me in the right direction would be appreciated!
//Definitions of terms above:
A castle is a region of at least one space enclosed by walls. Walls are connected if they are orthogonal or on the same line, not diagonal.
Castles: || Not Castles:
#### ### ###### || ## ### #
# # # # # #### || # # # # #
#### # # # # || ### # ####
### ######### ||
A knight is safe if there is no dragon that can move to it. Dragons cannot move diagonally.
Safe Knights: || Tasty Knights:
### ### || ##### ### D
#K# D # K# || #K D# # # K
### ### D || ##### ####
Example question to clarify what I don't understand:
The vector of strings is called image.
Say that (i, j) is equivalent to the character at (image[i])[j] (i.e. the jth char in the string at image[i].)
Then, how can I look at (i,j) and say "this is in a castle" or "this space can be reached by the dragon at (m,n)"?
Do I need some kind of tracking, like storing known connected components as a member of the class?
I'm assuming that each point (i,j) is a vertex in the graph, so I think I need some way to look at (i,j) and decide which other vertices it's adjacent to. I was told by the instructor that I don't need to represent the graph separately (e.g. scan through the vector and construct an adjacency matrix.) So I need to operate on the graph implicitly as opposed to having an explicit representation.
Edit:
So, I've thought about this some more, and it seems like the basis of each function is going to be doing a traversal, with slightly different stipulations on what constitutes adjacency. For example:
safe_knights - starting with a knight, DFS to legal spaces until a dragon is found or no more moves can be made. legal spaces are those in cardinal directions not blocked by a wall or the edge of the board.
castle_count - starting with a wall, DFS to spaces in cardinal directions containing walls, until this can't be done anymore. I think I'm going to also need some way to tell whether I've made it all the way back to where I started - maybe I can remember that starting node within that function. I might also have to check if there's a space in the middle.
knights_in_castles - this one is a little confusing - maybe starting with each knight, check the spaces around him until a wall is found, then check if that wall is part of a castle?
I guess your problem is how to use an incidence matrix (storing edge length in each cell) to represent the chessboard, before you apply ordinary graph traversal algorithm on it. Yet actually there is no need to do this.
Normally on a chessboard, we directly use the board matrix to represent this problem, no need to store edge length in a cell. If we want to find connected components, we use depth-first search (DFS) or breadth-first search (BFS) or A* search to find them.
A recursive DFS algorithm is quite simple to implement, but a little hard to understand at first.
DFS(node v)
{
visit(v);
for each child w of v {
// On a chessboard, they are right, left, up, and down neighbor cells
DFS(w);
}
}
In most cases, we use a global flag matrix to mark which cell has
been visited.
Before 'visit(v)', we always need to check that v is a legal cell (by check coordinates not over bound).
For example, if want the 'safe_knights', we start with a knight, DFS recursively search neighbor legal (not over bound, not a wall) cells, until we encounter a dragon, and fast return.

Checking if a list of strings can be chained

Question
Implement a function bool chainable(vector<string> v), which takes a set of strings as parameters and returns true if they can be chained. Two strings can be chained if the first one ends with the same character the second starts with, e.g.:
ship->petal->lion->nick = true; (original input is list not LL)
ship->petal axe->elf = false;
My Solution:
My logic is that if its chainable there will only be one start and end that don't match. So I create a list of starts and a list of ends. like so.
starts:s,p,l,n
ends: p,l,n,k
if I remove the common elements, the lists should contain at most one items. namely s and k. If so the list is chainable. If the list is cyclic, the final lists are empty.
But i think I am missing some cases here,
EDIT:
Okay clearly I had faults in my solution. Can we conclude the best algorithm for this ?
The problem is to check if a Eulerian path exists in the directed graph whose vertices are the letters occurring as first or last letter of at least one of the supplied words and whose edges are the supplied words (each word is the edge from its first letter to its last).
Some necessary conditions for the existence of Eulerian paths in such graphs:
The graph has to be connected.
All vertices with at most two exceptions have equally many incoming and outgoing edges. If exceptional vertices exist, there are exactly two, one of them has one more outgoing edge than incoming, the other has one more incoming edge than outgoing.
The necessity is easily seen: If a graph has Eulerian paths, any such path meets all vertices except the isolated vertices (neither outgoing nor incoming edges). By construction, there are no isolated vertices in the graph under consideration here. In a Eulerian path, every time a vertex is visited, except the start and end, one incoming edge and one outgoing edge is used, so each vertex with the possible exception of the starting and ending vertex has equally many incoming and outgoing edges. The starting vertex has one more outgoing edge than incoming and the ending vertex one more incoming edge than outgoing unless the Eulerian path is a cycle, in which case all vertices have equally many incoming and outgoing edges.
Now the important thing is that these conditions are also sufficient. One can prove that by induction on the number of edges.
That allows for a very efficient check:
record all edges and vertices as obtained from the words
use a union find structure/algorithm to count the connected components of the graph
record indegree - outdegree for all vertices
If number of components > 1 or there is (at least) one vertex with |indegree - outdegree| > 1 or there are more than two vertices with indegree != outdegree, the words are not chainable, otherwise they are.
Isn't that similar to the infamous traveling salesman problem?
If you have n strings, you can construct a graph out of them, where each node corresponds to one string. You construct the edges the following way:
If string (resp. node) a and b are chainable, you introduce an edge a -> b with weight 1.
For all unchainable strings (resp. nodes) a and b, you introduce an edge a -> b with weight n.
Then, all your strings are chainable (without repetition) if and only if you can find an optimal TSP route in the graph whose weight is less than 2n.
Note: Your problem is actually simpler than TSP, since you always can transform string chaining into TSP, but not necessarily the other way around.
Here's a case where your algorithm doesn't work:
ship
pass
lion
nail
Your start and end lists are both s, p, l, n, but you can't make a single chain (you get two chains - ship->pass and lion->nail).
A recursive search is probably going to be best - pick a starting word (1), and, for each word that can follow it (2), try to solve the smaller problem of creating a chain starting with (2) that contains all of the words except (1).
As phimuemue pointed out, this is a graph problem. You have a set of strings (vertices), with (directed) edges. Clearly, the graph must be connected to be chainable -- this is easy to check. Unfortunately, the rules beyond this are a little unclear:
If strings may be used more than once, but links can't, then the problem is to find an Eulerian path, which can be done efficiently. An Eulerian path uses each edge once, but may use vertices more than once.
// this can form a valid Eulerian path
yard
dog
god
glitter
yard -> dog -> god -> dog -> glitter
If the strings may not be used more than once, then the problem is to find a Hamiltonian path. Since the Hamiltonian path problem is NP-complete, no exact efficient solution is known. Of course, for small n, efficiency isn't really important and a brute force solution will work fine.
However, things are not quite so simple, because the set of graphs that can occur as inputs to this problem are limited. For example, the following is a valid directed graph (in dot notation) (*).
digraph G {
alpha -> beta;
beta -> gamma;
gamma -> beta;
gamma -> delta;
}
However, this graph cannot be constructed from strings using the rules of this puzzle: Since alpha and gamma both connect to beta, they must end with the same character (let's assume they end with 'x'), but gamma also connects to delta, so delta must also start with 'x'. But delta cannot start with 'x', because if it did, then there would be an edge alpha -> delta, which is not in the original graph.
Therefore, this is not quite the same as the Hamiltonian path problem, because the set of inputs is more restricted. It is possible that an efficient algorithm exists to solve the string chaining problem even if no efficient algorithm exists to solve the Hamiltonian path problem.
But... I don't know what that algorithm would be. Maybe someone else will come up with a real solution, but in the mean time I hope someone finds this answer interesting.
(*) It also happens to have a Hamiltonian path: alpha -> beta -> gamma -> delta, but that's irrelevant for what follows.
if you replace petal and lion with pawn and label, you still have:
starts:s,p,l,n
ends: p,l,n,k
You're algorithm decides its chainable, but they aren't.
The problem is you are disconnecting the first and last letters of each word.
A recursive backtracking or dynamic programming algorithm should solve this problem.
seperatedly check for "Is chainable" and is "cylcic"
if it's to be cyclic it must be chainable first. you could do something like this:
if (IsChainable)
{
if (IsCyclic() { ... }
}
Note: That's the case if you check only the first and last element of the chain for "cylic".
This can be solved by a reduction to the Eulerian path problem by considering a digraph G with N(G) = Σ and E(G) = a->e for words aWe.
Here's a simple program to do this iteratively:
#include <string>
#include <vector>
#include <iostream>
using std::vector;
using std::string;
bool isChained(vector<string> const& strngs)
{
if (strngs.size() < 2) return false; //- make sure we have at least two strings
if (strngs.front().empty()) return false; //- make sure 1st string is not empty
for (vector<string>::size_type i = 1; i < strngs.size(); ++i)
{
string const& head = strngs.at(i-1);
string const& tail = strngs.at(i);
if (tail.empty()) return false;
if (head[head.size()-1] != tail[0]) return false;
}
return true;
}
int main()
{
vector<string> chained;
chained.push_back("ship");
chained.push_back("petal");
chained.push_back("lion");
chained.push_back("nick");
vector<string> notChained;
notChained.push_back("ship");
notChained.push_back("petal");
notChained.push_back("axe");
notChained.push_back("elf");
std::cout << (isChained(chained) ? "true" : "false") << "\n"; //- prints 'true'
std::cout << (isChained(notChained) ? "true" : "false") << "\n"; //- prints 'false'
return 0;
}

number of paths in graph

how could the number of paths in a directed graph calculated? Are there any algorithms for this purpose?
Best wishes
EDIT: The graph is not a tree.
Let A be the adjacency matrix of a graph G. Then A^n (i.e. A multiplied n times with itself) has the following interesting property:
The entry at position (i,j) of A^n equals the number of different paths of length n from vertex i to vertex j.
Hence:
represent the graph as an adjacency matrix A
multiply A it with itself repeatedly until you get bored
in each step: compute the sum of all matrix elements and add it to the result, starting at 0
It might be wise to first check whether G contains a cycle, because in this case it contains infinitely many paths. In order to detect cycles, set all edge weights to -1 and use Bellman-Ford.
All the search hits I see are for the number of paths from a given node to another given node. But here's an algorithm that should find the total number of paths anywhere in the graph, for any acyclic digraph. (If there are cycles, there are an infinite number of paths unless you specify that certain repetitive paths are excluded.)
Label each node with the number of paths which end at that node:
While not all nodes are labeled:
Choose an unlabeled node with no unlabeled ancestors.
(An implementation might here choose any node, and recursively
process any unlabeled ancestors of that node first.)
Label the node with one plus the sum of the labels on all ancestors.
(If a node has no ancestors, its label is simply 1.)
Now just add the labels on all nodes.
If you don't want to count "length zero" paths, subtract the number of nodes.
You can use depth-first search. However, you don't terminate the search when you find a path from start to destination, the way depth-first search normally does. Instead, you just add to the count of paths and return from that node as if it were a dead end. This is probably not the fastest method, but it should work.
You could also potentially use breadth-first search, but then you need to work out a way to pass information on path counts forward (or backwards) through the tree as you search it. If you could do that, it'd probably be much faster.
Assuming the graph is acyclic (a DAG), you can make a topological sorting of the vertices and than do dynamic programming to compute the number of distinct paths. If you want to print all the paths, there is not much use in discussing big O notation since the number of paths can be exponential on the number of vertices.
Pseudo-code:
paths := 0
dp[i] := 0, for all 0 <= i < n
compute topological sorting and store on ts
for i from n - 1 to 0
for all edges (ts[i], v) // outbound edges from ts[i]
dp[ts[i]] := 1 + dp[ts[i]] + dp[v]
paths := paths + dp[ts[i]]
print paths
Edit: Bug on the code
I don't believe there's anything faster than traversing the graph, starting at the root.
In pseudo-code -
visit(node) {
node.visited = true;
for(int i = 0; i < node.paths.length; ++i) {
++pathCount;
if (!node.paths[i].visited)
visit(node.paths[i]);
}
}
If it is realy a tree, the number of paths equals the number of nodes-1 if you count paths to internal nodes. If you only count paths to leaves, the number of paths equals the number of leaves. So the fact that we're talking about trees simplifies matters to just counting nodes or leaves. A simple BFS or DFS algorithm will suffice.
admat gives the length 1 paths between vertices;
admat^2 gives the length 2 paths between vertices;
admat^3 gives the length 3 paths between vertices;
Spot the pattern yet ?
If graph is not a tree, there will be infinite paths - walk a loop any times.