Shouldn't this be using a backtracking algorithm? - c++

I am solving some questions on LeetCode. One of the questions is:
Given a m x n grid filled with non-negative numbers, find a path from top left to bottom right which minimizes the sum of all numbers along its path.You can only move either down or right at any point in time.
The editorial as well as the solutions posted all use dynamic programming. One of the most upvoted solution is as follows:
class Solution {
public:
int minPathSum(vector<vector<int>>& grid) {
int m = grid.size();
int n = grid[0].size();
vector<vector<int> > sum(m, vector<int>(n, grid[0][0]));
for (int i = 1; i < m; i++)
sum[i][0] = sum[i - 1][0] + grid[i][0];
for (int j = 1; j < n; j++)
sum[0][j] = sum[0][j - 1] + grid[0][j];
for (int i = 1; i < m; i++)
for (int j = 1; j < n; j++)
sum[i][j] = min(sum[i - 1][j], sum[i][j - 1]) + grid[i][j];
return sum[m - 1][n - 1];
}
};
My question is simple: shouldn't this be solved using backtracking? Suppose the input matrix is something like:
[
[1,2,500]
[100,500,500]
[1,3,4]
]
My doubt is because in DP, the solutions to subproblems are a part of the global solution (optimal substructure). However, as can be seen above, when we make a local choice of choosing 2 out of (2,100), we might be wrong, since the future paths might be too expensive (all numbers surrounding 2 are 500s). So, how is using dynamic programming justified in this case?
To summarize:
Shouldn't we use backtracking since we might have to retract our path if we have made an incorrect choice previously (looking at local maxima)?
How is this a dynamic programming question?
P.S.: The above solution definitely runs.

The example you illustrated above shows that a greedy solution to the problem will not necessarily produce an optimal solution, and you're absolutely right about that.
However, the DP solution to this problem doesn't quite use this strategy. The idea behind the DP solution is to compute, for each location, the cost of the shortest path ending at that location. In the course of solving the overall problem, the DP algorithm will end up computing the length of some shortest paths that pass through the 2 in your grid, but it won't necessarily use those intermediate shortest paths when determining the overall shortest path to return. Try tracing through the above code on your example - do you see how it computes and then doesn't end up using those other path options?

Shouldn't we use backtracking since we might have to retract our path if we have made an incorrect choice previously (looking at local maxima)?
In a real-world scenario, there will be quite a few factors that will determine which algorithm will be better suited to solve this problem.
This DP solution is alright in the sense that it will give you the best performance/memory usage when handling worst-case scenarios.
Any backtracking/dijkstra/A* algorithm will need to maintain a full matrix as well as a list of open nodes. This DP solution just assumes every node will end up being visited, so it can ditch the open node list and just maintain the costs buffer.
By assuming every node will be visited, it also gets rid of the "which node do I open next" part of the algorithm.
So if optimal worst-case scenario performance is what we are looking for, then this algorithm is actually going to be very hard to beat. But wether that's what we want or not is a different matter.
How is this a dynamic programming question?
This is only a dynamic programming question in the sense that there exists a dynamic programming solution for it. But by no means is DP the only way to tackle it.
Edit: Before I get dunked on, yes there are more memory-efficient solutions, but at very high CPU costs in the worst-case scenarios.

For your input
[
[ 1, 2, 500]
[100, 500, 500]
[ 1, 3, 4]
]
sum array results to
[
[ 1, 3, 503]
[101, 503, 1003]
[102, 105, 109]
]
And we can even retrace shortest path:
109, 105, 102, 101, 1
Algorithm doesn't check each path, but use the property that it can take previous optimum path to compute current cost:
sum[i][j] = min(sum[i - 1][j], // take better path between previous horizontal
sum[i][j - 1]) // or previous vertical
+ grid[i][j]; // current cost

Backtracking, in itself, doesn't fit this problem particularly well.
Backtracking works well for problems like eight queens, where a proposed solution either works, or it doesn't. We try a possible route to a solution, and if it fails, we backtrack and try another possible route, until we find one that works.
In this case, however, every possible route gets us from the beginning to the end. We can't just try different possibilities until we find one that works. Instead, we have to basically try every route from beginning to end, until one find the one that works the best (the lowest weight, in this case).
Now, it's certainly true that with backtracking and pruning, we could (perhaps) improve our approach to this solution, to at least some degree. In particular, let's assume you did a search that started by looking downward (if possible) and then to the side. In this case, with the input you gave its first attempt would end up being the optimal route.
The question is whether it can recognize that, and prune some branches of the tree without traversing them entirely. The answer is that yes, it can. To do that, it keeps track of the best route it's found so far, and based upon that, it can reject entire sub-trees. In this case its first route gives a total weight of 109. Then it tries to the right of the first node, which is a 2, for a total weight of 3 so far. That's smaller than 109, so it proceeds. From there, it looks downward and gets to the 500. That gives a weight of 503, so without doing any further looking, it knows no route from there can be suitable, so it stops and prunes off all the branches that start from that 500. Then it tries rightward from the 2 and finds another 500. This lets it prune that entire branch as well. So, in these cases, it never looks at the third 500, or the 3 and 4 at all--just by looking at the 500 nodes, we can determine that those can't possibly yield an optimal solution.
Whether that's really an improvement on the DP strategy largely comes down to a question of what operations cost how much. For the task at hand, it probably doesn't make much difference either way. If, however, your input matrix was a lot larger, it might. For example, we might have a large input stored in tiles. With a DP solution, we evaluate all the possibilities, so we always load all the tiles. With a tree-trimming approach, we might be able to completely avoid loading some tiles at all, because the routes including those tiles have already been eliminated.

Related

Is this connected-component labeling algorithm new?

A long time ago, I made a game in which a sort of connected-component labeling is required to implement AI part. I used the two-pass algorithm unknowingly at that time.
Recently, I got to know that I can make them faster using bit-scan based method instead. It uses 1-bit-per-pixel data as input, instead of typical bytes-per-pixel input. Then it finds every linear chunks in each scan-lines using BSF instruction. Please see the code below. Cut is a struct which saves information of a linear chunk of bit 1s in a scan-line.
Cut* get_cuts_in_row(const u32* bits, const u32* bit_final, Cut* cuts) {
u32 working_bits = *bits;
u32 basepos = 0, bitpos = 0;
for (;; cuts++) {
//find starting position
while (!_BitScanForward(&bitpos, working_bits)) {
bits++, basepos += 32;
if (bits == bit_final) {
cuts->start_pos = (short)0xFFFF;
cuts->end_pos = (short)0xFFFF;
return cuts + 1;
}
working_bits = *bits;
}
cuts->start_pos = short(basepos + bitpos);
//find ending position
working_bits = (~working_bits) & (0xFFFFFFFF << bitpos);
while (!_BitScanForward(&bitpos, working_bits)) {
bits++, basepos += 32;
working_bits = ~(*bits);
}
working_bits = (~working_bits) & (0xFFFFFFFF << bitpos);
cuts->end_pos = short(basepos + bitpos);
}
}
First, it uses the assembly BSF instruction to find the first position bit 1 appears. After it is found, it finds the the first position bit 0 appears after that position using bit inversion and bit masking, then repeats this process.
After getting the starting position and the ending position of all linear chunks of 1s (I prefer refer them as 'cuts') in every scan-line, it gives labels to them in CCL manner. For the first row, every cuts get different labels.
For each cut in rest rows, it checks if there are upper cuts which are connected to it first. If no upper cuts are connected to it, it gets new label. If only one upper cuts are connected to it, it gets the copy of the label. If many upper cuts are connected to it, those labels are merged and it gets the merged one. This can be done easily using two progressing pointers of upper chunks and lower chunks. Here is the full code doing that part.
Label* get_labels_8c(Cut* cuts, Cut* cuts_end, Label* label_next) {
Cut* cuts_up = cuts;
//generate labels for the first row
for (; cuts->start_pos != 0xFFFF; cuts++) cuts->label = [GET NEW LABEL FROM THE POOL];
cuts++;
//generate labels for the rests
for (; cuts != cuts_end; cuts++) {
Cut* cuts_save = cuts;
for (;; cuts++) {
u32 start_pos = cuts->start_pos;
if (start_pos == 0xFFFF) break;
//Skip upper slices ends before this slice starts
for (; cuts_up->end_pos < start_pos; cuts_up++);
//No upper slice meets this
u32 end_pos = cuts->end_pos;
if (cuts_up->start_pos > end_pos) {
cuts->label = [GET NEW LABEL FROM THE POOL];
continue;
};
Label* label = label_equiv_recursion(cuts_up->label);
//Next upper slice can not meet this
if (end_pos <= cuts_up->end_pos) {
cuts->label = label;
continue;
}
//Find next upper slices meet this
for (; cuts_up->start_pos <= end_pos; cuts_up++) {
Label* label_other = label_equiv_recursion(cuts_up->label);
if (label != label_other) [MERGE TWO LABELS]
if (end_pos <= cuts_up->end_pos) break;
}
cuts->label = label;
}
cuts_up = cuts_save;
}
return label_next;
}
After this, one can use these information for each scan-line to make the array of labels or any output he want directly.
I checked the execution time of this method and then I found that it's much faster the two-scan method I previously used. Surprisingly, it turned out to be much faster than the two-scan one even when the input data is random. Apparently the bit-scanning algorithm is best for data with relatively simple structures where each chunks in scan-lines are big. It wasn't designed to be used on random images.
What baffled me was that literally nobody says about this method. Frankly speaking, it doesn't seem to be an idea that hard to come up with. It's hard to believe that I'm the first one who tried it.
Perhaps while my method is better than the primitive two-scan method, it's worse than more developed ones based on the two-scan idea so that anyway doesn't worth to be mentioned.
However, if the two-scan method can be improved, the bit-scan method also can. I myself found a nice improvement for 8-connectivity. It analyses neighboring two scan-lines at ones my merging them using the bit OR instruction. You can find the full codes and detailed explanation on how they work here.
I got to know that there is a benchmark for CCL algorithms named YACCLAB. I'll test my algorithms in this with best CCL algorithms to see how really good they are. Before that, I want to ask several things here.
My question is,
Are these algorithms I found really new? It's still hard to believe that nobody has ever thought CCL algorithm using bit-scanning. If it's already a thing, why I can't found anyone says about it? Were bit-scan based algorithms proven to be bad and forgotten?
If I really found a new algorithm, what should I do next? Of course I'll test it in more reliable system like YACCLAB. I'm questioning about what I should do next. What I should to to make these algorithms mine and spread them?
So far, I'm a bit sceptical
My reasoning was getting too long for a comment, so here we are. There is a lot to unpack. I like the question quite a bit even though it might be better suited for a computer science site.
The thing is, there are two layers to this question:
Was a new algorithm discovered?
What about the bit scanning part?
You are combining these two so first I will explain why I would like to think about them separately:
The algorithm is a set of steps(more formal definition) that is language-agnostic. As such it should work even without the need for the bitscanning.
The bit scanning on the other hand I would consider to be an optimization technique - we are using a structure that the computer is comfortable with which can bring us performance gains.
Unless we separate these two, the question gets a bit fuzzy since there are several possible scenarios that can be happening:
The algorithm is new and improved and bit scanning makes it even faster. That would be awesome.
The algorithm is just a new way of saying "two pass" or something similar. That would still be good if it beats the benchmarks. In this case it might be worth adding to a library for the CCL.
The algorithm is a good fit for some cases but somehow fails in others(speed-wise, not correction wise). The bit scanning here makes the comparison difficult.
The algorithm is a good fit for some cases but completely fails in others(produces incorrect result). You just didn't find a counterexample yet.
Let us assume that 4 isn't the case and we want to decide between 1 to 3. In each case, the bit scanning is making things fuzzy since it most likely speeds up things even more - so in some cases even a slower algorithm could outperform a better one.
So first I would try and remove the bit scanning and re-evaluate the performance. After a quick look it seems that the algorithms for the CCL have a linear complexity, depending on the image size - you need to check every pixel at least once. Rest is the fight for lowering the constant as much as possible. (Number of passes, number of neighbors to check etc.) I think it is safe to assume, that you can't do better then linear - so the first question is: does your algorithm improve on the complexity by a multiplicative constant? Since the algorithm is linear, the factor directly translates to performance which is nice.
Second question would then be: Does bit scanning further improve the performance of the algorithm?
Also, since I already started thinking about it, what about a chess-board pattern and 4-connectivity? Or alternatively, a chessboard of 3x3 crosses for the 8-connectivity.

Enumerate some extreme points near optimum solution

I am looking for a simple way to obtain a lot of "good" solutions in a LP problem (not MIP) with CPLEX, and not only (one of the) optimal basic solution(s). By "good" solutions I mean that the corresponding objective values are not so far from the real optimal value. Such pool of solutions could help the decision-maker...
More precisely, given a certain polyedron Ax<=b with x>=0 and an objective function z=cx I want to maximize, after running the LP, I can obtain the optimal value z*. Then I want to enumerate all the extreme points of the polyhedron given by the set of constraints
Ax <= b
cx >= z* - epsilon
x >= 0
when epsilon is a given tolerance.
I know that CPLEX offers way to generate solution pool (see here), but it will not function because this method is for MIP : it enumerates all the solutions of an IP (or one solution for every given set of fixed integer variables if the problem is a MIP).
An interesting efficient way is to visit the adjacent solutions of the optimal basic solution, i.e. all the adjacent extreme points : if I suppose the polyhedron is not degenerative, for each pair of basic variable x_B and non-basic variable x_N, I compute the basic solution obtained when x_B leaves the basis and x_N enters in the basis. Then I throw the solutions with cx < z*-epsilon, and for the others I repeat the procedure. [I know that I could improve this algorithm, but this is the general idea].
The routine CPPXpivot of the Callable Library could help to do this pivoting operation, but I did not find an equivalent in the C++ API (concert technology). Does someone know if such an equivalent exist, or could propose me an other way to answer my original problem ?
Thanks a lot :) !
RĂ©mi L.
There is one interesting way to make this suitable for use with the Cplex solution pool. Use binary variables to encode the current basis, e.g. basis[k] = 0 meaning nonbasic and basis[k] = 1 indicating variable (or row) k is basic. Of course we have sum(k, basis[k]) = m (number of rows). Finally we have x[k] <= basis[k] * upperbound[k] (i.e. if nonbasic then zero -- assuming positive variables). When we add this to the LP model we end up with a MIP and can enumerate (all or some, optimal or near optimal) bases using the Cplex solution pool. See here and here.

all solutions to change making with dynamic programming

I was reviewing my handouts for our algorithm class and I started to think about this question:
Given different types of coins with different values, find all coin configurations to add up to a certain sum without duplication.
During class, we solved the problem to find the number of all possible ways for a sum and the least number of coins for a sum. However, we never tried to actually find the solutions.
I was thinking about solving this problem with dynamic programming.
I came with the recursion version(for simplicity I only print the solutions):
void solve(vector<string>& result, string& currSoln, int index, int target, vector<int>& coins)
{
if(target < 0)
{
return;
}
if(target == 0)
{
result.push_back(currSoln);
}
for(int i = index; i < coins.size(); ++i)
{
stringstream ss;
ss << coins[i];
string newCurrSoln = currSoln + ss.str() + " ";
solve(result, newCurrSoln, i, target - coins[i], coins);
}
}
However, I got stuck when trying to use DP to solve the problem.
I have 2 major obstacles:
I don't know what data structure I should use to store previous answers
I don't know what my bottom-up procedure(using loops to replace recursions) should look like.
Any help is welcomed and some codes would be appreciated!
Thank you for your time.
In a dp solution you generate a set of intermediate states, and how many ways there are to get there. Then your answer is the number that wound up in a success state.
So, for change counting, the states are that you got to a specific amount of change. The counts are the number of ways of making change. And the success state is that you made the correct amount of change.
To go from counting solutions to enumerating them you need to keep those intermediate states, and also keep a record in each state of all of the states that transitioned to that one - and information about how. (In the case of change counting, the how would be which coin you added.)
Now with that information you can start from the success state and recursively go backwards through the dp data structures to actually find the solutions rather than the count. The good news is that all of your recursive work is efficient - you're always only looking at paths that succeed so waste no time on things that won't work. But if there are a billion solutions, then there is no royal shortcut that makes printing out a billion solutions fast.
If you wish to be a little clever, though, you can turn this into a usable enumeration. You can, for instance, say "I know there are 4323431 solutions, what is the 432134'th one?" And finding that solution will be quick.
It is immediately obvious that you can take a dynamic programming approach. What isn't obvious that in most cases (depending on the denominations of the coins) you can use the greedy algorithm, which is likely to be more efficient. See Cormen, Leiserson, Rivest, Stein: Introduction to Algorithms 2nd ed, problems 16.1.

Finding an optimal solution to a system of linear equations in c++

Here's the problem:
I am currently trying to create a control system which is required to find a solution to a series of complex linear equations without a unique solution.
My problem arises because there will ever only be six equations, while there may be upwards of 20 unknowns (usually way more than six unknowns). Of course, this will not yield an exact solution through the standard Gaussian elimination or by changing them in a matrix to reduced row echelon form.
However, I think that I may be able to optimize things further and get a more accurate solution because I know that each of the unknowns cannot have a value smaller than zero or greater than one, but it is free to take on any value in between them.
Of course, I am trying to create code that would find a correct solution, but in the case that there are multiple combinations that yield satisfactory results, I would want to minimize Sum of (value of unknown * efficiency constant) over all unknowns, i.e. Sigma[xI*eI] from I=0 to n, but finding an accurate solution is of a greater priority.
Performance is also important, due to the fact that this algorithm may need to be run several times per second.
So, does anyone have any ideas to help me on implementing this?
Edit: You might just want to stick to linear programming with equality and inequality constraints, but here's an interesting exact solution that does not incorporate the constraint that your unknowns are between 0 and 1.
Here's a powerpoint discussing your problem: http://see.stanford.edu/materials/lsoeldsee263/08-min-norm.pdf
I'll translate your problem into math to make things a bit easier to figure out:
you have a 6x20 matrix A and a vector x with 20 elements. You want to minimize (x^T)e subject to Ax=y. According to the slides, if you were just minimizing the sum of x, then the answer is A^T(AA^T)^(-1)y. I'll take another look at this as soon as I get the chance and see what the solution is to minimizing (x^T)e (ie your specific problem).
Edit: I looked in the powerpoint some more and near the end there's a slide entitled "General norm minimization with equality constraints". I am going to switch the notation to match the slide's:
Your problem is that you want to minimize ||Ax-b||, where b = 0 and A is your e vector and x is the 20 unknowns. This is subject to Cx=d. Apparently the answer is:
x=(A^T A)^-1 (A^T b -C^T(C(A^T A)^-1 C^T)^-1 (C(A^T A)^-1 A^Tb - d))
it's not pretty, but it's not as bad as you might think. There's really aren't that many calculations. For example (A^TA)^-1 only needs to be calculated once and then you can reuse the answer. And your matrices aren't that big.
Note that I didn't incorporate the constraint that the elements of x are within [0,1].
It looks like the solution for what I am doing is with Linear Programming. It is starting to come back to me, but if I have other problems I will post them in their own dedicated questions instead of turning this into an encyclopedia.

parallel calculation of infinite series

I just have a quick question, on how to speed up calculations of infinite series.
This is just one of the examples:
arctan(x) = x - x^3/3 + x^5/5 - x^7/7 + ....
Lets say you have some library which allow you to work with big numbers, then first obvious solution would be to start adding/subtracting each element of the sequence until you reach some target N.
You also can pre-save X^n so for each next element instead of calculating x^(n+2) you can do lastX*(x^2)
But over all it seems to be very sequential task, and what can you do to utilize multiple processors (8+)??.
Thanks a lot!
EDIT:
I will need to calculate something from 100k to 1m iterations. This is c++ based application, but I am looking for abstract solution, so it shouldn't matter.
Thanks for reply.
You need to break the problem down to match the number of processors or threads you have. In your case you could have for example one processor working on the even terms and another working on the odd terms. Instead of precalculating x^2 and using lastX*(x^2), you use lastX*(x^4) to skip every other term. To use 8 processors, multiply the previous term by x^16 to skip 8 terms.
P.S. Most of the time when presented with a problem like this, it's worthwhile to look for a more efficient way of calculating the result. Better algorithms beat more horsepower most of the time.
If you're trying to calculate the value of pi to millions of places or something, you first want to pay close attention to choosing a series that converges quickly, and which is amenable to parallellization. Then, if you have enough digits, it will eventually become cost-effective to split them across multiple processors; you will have to find or write a bignum library that can do this.
Note that you can factor out the variables in various ways; e.g.:
atan(x)= x - x^3/3 + x^5/5 - x^7/7 + x^9/9 ...
= x*(1 - x^2*(1/3 - x^2*(1/5 - x^2*(1/7 - x^2*(1/9 ...
Although the second line is more efficient than a naive implementation of the first line, the latter calculation still has a linear chain of dependencies from beginning to end. You can improve your parallellism by combining terms in pairs:
= x*(1-x^2/3) + x^3*(1/5-x^2/7) + x^5*(1/9 ...
= x*( (1-x^2/3) + x^2*((1/5-x^2/7) + x^2*(1/9 ...
= [yet more recursive computation...]
However, this speedup is not as simple as you might think, since the time taken by each computation depends on the precision needed to hold it. In designing your algorithm, you need to take this into account; also, your algebra is intimately involved; i.e., for the above case, you'll get infinitely repeating fractions if you do regular divisions by your constant numbers, so you need to figure some way to deal with that, one way or another.
Well, for this example, you might sum the series (if I've got the brackets in the right places):
(-1)^i * (x^(2i + 1))/(2i + 1)
Then on processor 1 of 8 compute the sum of the terms for i = 1, 9, 17, 25, ...
Then on processor 2 of 8 compute the sum of the terms for i = 2, 11, 18, 26, ...
and so on, finally adding up the partial sums.
Or, you could do as you (nearly) suggest, give i = 1..16 (say) to processor 1, i = 17..32 to processor 2 and so on, and they can compute each successive power of x from the previous one. If you want more than 8x16 elements in the series, then assign more to each processor in the first place.
I doubt whether, for this example, it is worth parallelising at all, I suspect that you will get to double-precision accuracy on 1 processor while the parallel threads are still waking up; but that's just a guess for this example, and you can probably many series for which parallelisation is worth the effort.
And, as #Mark Ransom has already said, a better algorithm ought to beat brute-force and a lot of processors every time.