Finding path in a char array - c++

I am working on a project and I've found myself in a position where I have a n x n char array of signs a,b or c I have to check if there is a path of b's between the first and the last row.
Example YES input:
I am stuck at this point? Should I adapt some well-known algorithm for graph searching or is there any better way of solving this problem? Should I add a bool array to mark which cell I have visited?
Thanks in advance for your time!

Yes, you should adopt a graph algorithm for finding the path from a source to the target. In your case you have multiple sources (all 'b's in the first row) and multiple targets ('b's in the last row).
Shortest path on an unweighted graph can be solved pretty efficiently by the easily implemented BFS. Only difference to handle multiple sources is to initialize the queue with all the 'b's on the first line (and not a single node).
In your graph every 'b' cell is a node, there is an edge between every two adjacent 'b' cells.
Note that BFS is complete (always finds a solution if one exists) and optimal (finds shortest path).

The easiest way to do this is to allocate an equally sized, zero filled 2D array, mark the start points, and do a flood fill using the char array as a guide. When the flood fill terminates, you can easily check whether an end point has been marked.
A flood fill may be implemented in several ways, how you do it doesn't really matter as long as your problem size is small.
Generally, the easiest way is to do it in a recursive fashion. The only problem with a recursive flood fill is the huge recursion depth that can result, so it really depends on the problem size whether a recursive version is applicable.
If time is not important, you may simply do it iteratively, going through the entire array several times, marking points that have marked neighbours and are bs, until an iteration does not mark any point.
If you need to handle huge arrays efficiently, you should go for a breadth-first flood fill, keeping a queue of frontier pixels which you process in a first-in-first-out manner.

Related

Obtain lowest sum from matrix, picking one value per row and column

Essentially I have a matrix of floats ranging from 0-1, and I need to find the combination of values with the lowest sum. The kicker is that once a value is selected, no other values from that row or column may be used. All the columns must be used.
In the case the matrix's width is greater than height, it will be padded with 1's to make the matrix square. In the case the height is greater than width, simply not all the rows will be used, but all of the columns must ALWAYS be used.
I have looked into binary trees and Dijkstra's algorithm for this task, but both seem to get far too complex with larger matrices. Ideally I'm looking for an algorithm or implementation which will provide a very good guess in a relatively short amount of time. Anything optimized for c++ would be great!
I think Greedy Approach should work here for the good guess/optimized part.
Put all the elements in an array as a tuple < value, row, column >
Sort the list with <value> parameter of the tuple.
Greedily pick the elements from beginning, and keep track of the used column/row with either using bitset or boolean matrix as suggested #Thomas Mathews.
The total Complexity will be NMlog(NM) where N is the number of rows, and M no. of columns.
Amit's suggestion to change the title actually led me to finding the solution. It is an "Assignment Problem" and the solution is to use the Hungarian algorithm. I knew there had to be something out there already, I just wasn't sure how to phrase it to find the answer until now. Thanks for all the help.
You can follow the Dijkstra algorithm for the shortest path, assuming you are constructing a tree. In the root node you select a length of 0, and for each node you select the next accesible element that gives you the shortest path from the root node, and store that length (from the root) in the node. You'll add at each iteration, for all the leaves, the arc that makes the total length lesser, and will continue until you get a N nodes path (or a bitmask of 0, see below). The first branch of N nodes from the root will be the shortest path. At each node, you can store a bitmap of the already visited nodes (or you can determine it, looking at the parents) as the possible nodes from it are the unvisited ones only. Or you can have a bitmap of the non-visited ones. This will make the search easier, as you'll stop as soon as no bits are on in the mask.
You have not shown any code or intent to solve the problem, so I'll do that same thing (it seems to be some kind of homework, and you seem not have work on it at all by now) This is an academic problem, already shown in many programming courses in relation with Simplex and operations investigation, in object/resource assignment, so there must be plenty literature about it.

Shortest path in a directed weighted graph that uses at most k vertices

I am trying to solve a SSSP problem in a connected directed weighted cyclic graph with non-negative weights. The catch here is, this problem asks for the SSSP that uses at most k vertices.
I tried using modified dijkstra's algorithm to solve this problem, keeping a 3-tuple in my priority queue. i.e. (vertex weight, number of vertices in path to this vertex (inclusive), vertex index). My algorithm prevents nodes that are more than k vertices away from being pushed into the priority queue and thus being considered in the shortest path.
Somehow, my algorithm is getting the wrong answer. One reason is, if an initially smaller weighted edge leads to a non-valid path and a initially larger weighted edge leads to a valid path, then my algorithm (being greedy) will report that it cannot find a valid path to the destination.
Edit: Solution code redacted as it is not helpful.
I've found it hard to read your code - so maybe you're already doing this: give each vertex a collection of best paths (edit: actually each vertex stores only the previous step of each of the possible paths), storing the least expensive for that number of visited vertices, once a path goes over the maximum vertex count you can discard it, but you can't discard a more expensive (in terms of total edge lengths) path until you know that the cheaper paths will eventually reach the target in an acceptable number of vertices.
At the end you may have more than one complete path, and you just choose the least edge-wise expensive regardless of its number of vertices (you'd have already discarded it if there were too many)
(Your code would be easier to read if you create a class/struct for some of the things you're storing as pairs of pairs etc, then you can give the members clearer names than second.first etc. Even if you are OK with the current naming, the extra clarity may help you get some other answers if the above hasn't helped.)
Edit to answer: "How do I keep the more expensive path until I know that the cheaper path will lead to a solution? "
Your priority queue is nearly doing this already - its not that each vertex (n) has a complete path stored as I originally implied, currently you just store the best previous vertex (n-1) to use to get to vertex n - and its cost and its vertex count. I'm saying that instead of storing that one best choice for vertex n-1 you store several options, e.g. the best route to A using 3 previous vertices is from vertex B and costs 12, and the best route using 4 is from vertex C and costs 10.
(In all the above best means best found so far in the search)
You only need to store the cheapest route for a given number of vertices. You keep a route if (but only if) its better on either the cost or the vertex count.
In my above example you need to keep both because the cheaper route to this vertex uses more previous vertices so might result in too many vertices before reaching the target - its not clear at that stage which path will be best in the end.
So you need to change your collection type, and your rule for discarding options.
You could for example use a std::map where previous vertices count is the key and total edge cost and previous vertex ID are stored in the value, or an array of total costs where index is the count.
I think you want to store two incrementors with each node: one for the node count and one for the weighted distance. You use the node-count as an early terminator to discard those paths from the set of potential options. You use the weighted distance to choose the next node to iterate, and discard based on node count. In this way, if you fully terminate all the nodes on the periphery as discardable, then you know there's no eligible path to the destination that's at most the required number of hops. If you get to the destination within your list of periphery nodes, then you automatically know it's not more than the restricted number of nodes, and by induction you know it's already the shortest way of getting there, because every other path that could be found from then on must already have a longer path.

Suffix tree vs Suffix array for LCS

I'm working on a program to find the longest common substring between multiple strings. I've lowered my approach down to either using suffix array's or a suffix tree. I want to see which is the better approach (if there is one) and why. Also for suffix array's I've seen a few algorithms for two strings but not any for more then two strings. Any solid examples would be appreciated, thanks again for the advice!
Note: I didn't see any other questions that specifically addressed this issue, but if they exist please point me in that direction!
If you have a substring that occurs in all sequences, then in a suffix array, the pointers to each occurrence of that substring must sort close together. So you could attempt to find them by moving a window along the suffix array, where the window is just large enough to contain at least one occurrence of each sequence. You could do this in linear time by maintaining a table that tells you, for each sequence, how many times that sequence occurs within that window. Then when you move the rear end of the window forwards decrement the count for the sequence associated with the pointer you have just skipped over and, if necessary, move the forward end of the window just far enough to pick up a new occurrence of this sequence and update the table.
Now you need to be able to find the length of the common prefix shared by all substrings starting at the pointers in the window. This should be the minimum LCP value occurring between the pointers in the window. If you use a red-black tree, such as a Java Treeset, with a key which consists of the LCP value as most significant component and some tie-breaker such as the pointer itself as a less significant component, then you can maintain the minimum LCP value within the window at a cost of roughly log window size per window adjustment.

Extract the upper and lower boundaries from a list (vector) of 2d coordinates

My program contains polygons which have the form of a vector containing points (2 dimensional double coordinates, stored in a self-made structure). I'm looking for a quick way of finding the smallest square containing my polygon (ie. knowing the maximal and minimal coordinates of all the points).
Is there a quicker way than just parsing all the points and storing the minimum and maximum values?
The algorithm ou are describing is straightforward: Iterate over all your points and find the minimum and maximum for each coordinate. This is an O(n) algorithm, n being the number of points you have.
You can't do better, since you will need to check at least all your points once, otherwise the last one could be outside the square you found.
Now, the complexity is at best O(n) so you just have to minimize the constant factors, but in that case it's already pretty small : Only one loop over your vector, looking for two maximums and two minimums.
You can either iterate through all points and find max and min values, or do some preprocessing, for example, store your points in treap (http://en.wikipedia.org/wiki/Treap).
There is no way w/o some preprocessing to do it better than just iterating over all points.
I'm not sure if there can be any faster way to find the min & max values in an array of values than linear time. The only 'optimization' I can think of is to find these values on one of the other occasions you're iterating the array (filling it/performing a function on all points), then perform checks on any data update.

Prim's algorithm for dynamic locations

Suppose you have an input file:
<total vertices>
<x-coordinate 1st location><y-coordinate 1st location>
<x-coordinate 2nd location><y-coordinate 2nd location>
<x-coordinate 3rd location><y-coordinate 3rd location>
...
How can Prim's algorithm be used to find the MST for these locations? I understand this problem is typically solved using an adjacency matrix. Any references would be great if applicable.
If you already know prim, it is easy. Create adjacency matrix adj[i][j] = distance between location i and location j
I'm just going to describe some implementations of Prim's and hopefully that gets you somewhere.
First off, your question doesn't specify how edges are input to the program. You have a total number of vertices and the locations of those vertices. How do you know which ones are connected?
Assuming you have the edges (and the weights of those edges. Like #doomster said above, it may be the planar distance between the points since they are coordinates), we can start thinking about our implementation. Wikipedia describes three different data structures that result in three different run times: http://en.wikipedia.org/wiki/Prim's_algorithm#Time_complexity
The simplest is the adjacency matrix. As you might guess from the name, the matrix describes nodes that are "adjacent". To be precise, there are |v| rows and columns (where |v| is the number of vertices). The value at adjacencyMatrix[i][j] varies depending on the usage. In our case it's the weight of the edge (i.e. the distance) between node i and j (this means that you need to index the vertices in some way. For instance, you might add the vertices to a list and use their position in the list).
Now using this adjacency matrix our algorithm is as follows:
Create a dictionary which contains all of the vertices and is keyed by "distance". Initially the distance of all of the nodes is infinity.
Create another dictionary to keep track of "parents". We use this to generate the MST. It's more natural to keep track of edges, but it's actually easier to implement by keeping track of "parents". Note that if you root a tree (i.e. designate some node as the root), then every node (other than the root) has precisely one parent. So by producing this dictionary of parents we'll have our MST!
Create a new list with a randomly chosen node v from the original list.
Remove v from the distance dictionary and add it to the parent dictionary with a null as its parent (i.e. it's the "root").
Go through the row in the adjacency matrix for that node. For any node w that is connected (for non-connected nodes you have to set their adjacency matrix value to some special value. 0, -1, int max, etc.) update its "distance" in the dictionary to adjacencyMatrix[v][w]. The idea is that it's not "infinitely far away" anymore... we know we can get there from v.
While the dictionary is not empty (i.e. while there are nodes we still need to connect to)
Look over the dictionary and find the vertex with the smallest distance x
Add it to our new list of vertices
For each of its neighbors, update their distance to min(adjacencyMatrix[x][neighbor], distance[neighbor]) and also update their parent to x. Basically, if there is a faster way to get to neighbor then the distance dictionary should be updated to reflect that; and if we then add neighbor to the new list we know which edge we actually added (because the parent dictionary says that its parent was x).
We're done. Output the MST however you want (everything you need is contained in the parents dictionary)
I admit there is a bit of a leap from the wikipedia page to the actual implementation as outlined above. I think the best way to approach this gap is to just brute force the code. By that I mean, if the pseudocode says "find the min [blah] such that [foo] is true" then write whatever code you need to perform that, and stick it in a separate method. It'll definitely be inefficient, but it'll be a valid implementation. The issue with graph algorithms is that there are 30 ways to implement them and they are all very different in performance; the wikipedia page can only describe the algorithm conceptually. The good thing is that once you implement it some way, you can find optimizations quickly ("oh, if I keep track of this state in this separate data structure, I can make this lookup way faster!"). By the way, the runtime of this is O(|V|^2). I'm too lazy to detail that analysis, but loosely it's because:
All initialization is O(|V|) at worse
We do the loop O(|V|) times and take O(|V|) time to look over the dictionary to find the minimum node. So basically the total time to find the minimum node multiple times is O(|V|^2).
The time it takes to update the distance dictionary is O(|E|) because we only process each edge once. Since |E| is O(|V|^2) this is also O(|V|^2)
Keeping track of the parents is O(|V|)
Outputting the tree is O(|V| + |E|) = O(|E|) at worst
Adding all of these (none of them should be multiplied except within (2)) we get O(|V|^2)
The implementation with a heap is O(|E|log(|V|) and it's very very similar to the above. The only difference is that updating the distance is O(log|V|) instead of O(1) (because it's a heap), BUT finding/removing the min element is O(log|V|) instead of O(|V|) (because it's a heap). The time complexity is quite similar in analysis and you end up with something like O(|V|log|V| + |E|log|V|) = O(|E|log|V|) as desired.
Actually... I'm a bit confused why the adjacency matrix implementation cares about it being an adjacency matrix. It could just as well be implemented using an adjacency list. I think the key part is how you store the distances. I could be way off in my implementation outlined above, but I am pretty sure it implements Prim's algorithm is satisfies the time complexity constraints outlined by wikipedia.