Maximum contiguous subsequence -- dynamic programming or greedy algorithm? - c++

Given an array vector<int> arr with positive and negative entries, the maximum contiguous subsequence problem requires to find a (contiguous) segment of the array arr with maximum sum. Sum of empty segment is zero. The C++ code of the algorithm I'm using is as follows:
int MaxContSum(const vector<int>& arr){
int i,sum=0,max=0;
for(i=0;i<arr.size();i++){
if(arr[i]>=0) {if(sum<0) sum=0;}
else {if(sum>max) max=sum;}
sum+=arr[i];
}
if(sum>max) max=sum; return max;
}
Is this algorithm a greedy algorithm or dynamic programming? It looks like it's just scanning the entries one by one and applying different strategies based on whether arr[i] is potive or negative, a locally checkable condition. Why does this problem appear in the dynamic programming chapter, then?

This is Kadane's algorithm for the maximum subarray problem. It scans through the sequence and keeps track of the maximum subarray sum found up to this iteration in general, and the maximum subarray sum ending exactly at this point. How does it know the starting position of the subarray leading to the best sum up to exactly this point? Whenever 1) the previous sum is negative, and 2) a positive element is encountered, it pays to start at the positive element and continue from there. The proof that it works is by simple induction.
This algorithm is not greedy, but it can be viewed as dynamic programming.
A greedy algorithm makes a locally-optimum guess, and sticks with it (just continuing it further and further). Here, conversely, the algorithm can guess to check a subsequence starting at some point (where the sum ending at a positive element is negative), and later discard it and try a subsequence starting at some other point (again, because the sum becomes negative and the element is positive).
Conversely, it can be viewed as a dynamic programming problem. As the Wikipedia entry puts it:
Because of the way this algorithm uses optimal substructures (the maximum subarray ending at each position is calculated in a simple way from a related but smaller and overlapping subproblem: the maximum subarray ending at the previous position) this algorithm can be viewed as a simple example of dynamic programming.

Two main properties that problem should have in order to be eligible for solving with DP are:
Overlapping subproblems
Optimal substructure
From what you presented, first property is definitely missing and therefore I wouldn't classify this algorithm as DP. On the other hand, you use the result of the calculation for the smaller problem to get a final result - so we have Optimal substructure and that is probably the reason why you found this algorithm in the dynamic programming chapter, even though it should not belong there.

Related

number of subarrays whose product is even?

given an array of size n, n<=10^5 what is efficient approach to count number of sub arrays whose product is even ?
i am using naive approach with (On^3) time complexity ?
please suggest some efficient approach?
Be careful: from your explanation I have the impression that you are taking all sub-arrays, calculate the product and check if it is even.
However there's one very important mathematical rule: when you have a series of natural numbers, as soon as there's one even number, the product will be even.
So, I'd advise you to program following algorithm:
Search in your array for an even number.
Count the amount of sub-arrays, containing that even number.
Search in your array for the next even number.
Count the amount of sub-arrays, containing that next even number, but not containing the previous even number.
Continue until you've processed all even numbers in your array.

Obtain lowest sum from matrix, picking one value per row and column

Essentially I have a matrix of floats ranging from 0-1, and I need to find the combination of values with the lowest sum. The kicker is that once a value is selected, no other values from that row or column may be used. All the columns must be used.
In the case the matrix's width is greater than height, it will be padded with 1's to make the matrix square. In the case the height is greater than width, simply not all the rows will be used, but all of the columns must ALWAYS be used.
I have looked into binary trees and Dijkstra's algorithm for this task, but both seem to get far too complex with larger matrices. Ideally I'm looking for an algorithm or implementation which will provide a very good guess in a relatively short amount of time. Anything optimized for c++ would be great!
I think Greedy Approach should work here for the good guess/optimized part.
Put all the elements in an array as a tuple < value, row, column >
Sort the list with <value> parameter of the tuple.
Greedily pick the elements from beginning, and keep track of the used column/row with either using bitset or boolean matrix as suggested #Thomas Mathews.
The total Complexity will be NMlog(NM) where N is the number of rows, and M no. of columns.
Amit's suggestion to change the title actually led me to finding the solution. It is an "Assignment Problem" and the solution is to use the Hungarian algorithm. I knew there had to be something out there already, I just wasn't sure how to phrase it to find the answer until now. Thanks for all the help.
You can follow the Dijkstra algorithm for the shortest path, assuming you are constructing a tree. In the root node you select a length of 0, and for each node you select the next accesible element that gives you the shortest path from the root node, and store that length (from the root) in the node. You'll add at each iteration, for all the leaves, the arc that makes the total length lesser, and will continue until you get a N nodes path (or a bitmask of 0, see below). The first branch of N nodes from the root will be the shortest path. At each node, you can store a bitmap of the already visited nodes (or you can determine it, looking at the parents) as the possible nodes from it are the unvisited ones only. Or you can have a bitmap of the non-visited ones. This will make the search easier, as you'll stop as soon as no bits are on in the mask.
You have not shown any code or intent to solve the problem, so I'll do that same thing (it seems to be some kind of homework, and you seem not have work on it at all by now) This is an academic problem, already shown in many programming courses in relation with Simplex and operations investigation, in object/resource assignment, so there must be plenty literature about it.

Which is the best algorithm to find sum of all subarray?

for example if
a[3]={1,2,3}
1,2,3,1+2,2+3,1+2+3
so my program should print
1
2
3
3
5
8
I know there exits a formula for calculating the total sum.
But what i want is efficient method to calculate individual sums.
A seg tree advisable?
Assuming that by subarray you mean what some authors conventionally call subarray being a contiguous slice of an array, you are looking for Kadane's algorithm.
It works by incrementally finding the biggest subarray. At any given point of the search, the maximum subarray on that index is either the empty array (sum == zero) or consists of one more element than the maximum subarray that ended at the previous position. You keep track of what is the best you've ever found so you can compare subarrays with the best so far and return the actual best solution.
It may also be extended to multiple dimensions.

Finding shortest path in a graph, with additional restrictions

I have a graph with 2n vertices where every edge has a defined length. It looks like **
**.
I'm trying to find the length of the shortest path from u to v (smallest sum of edge lengths), with 2 additional restrictions:
The number of blue edges that the path contains is the same as the number of red edges.
The number of black edges that the path contains is not greater than p.
I have come up with an exponential-time algorithm that I think would work. It iterates through all binary combinations of length n - 1 that represent the path starting from u in the following way:
0 is a blue edge
1 is a red edge
There's a black edge whenever
the combination starts with 1. The first edge (from u) is then the first black one on the left.
the combination ends with 0. Then last edge (to v) is then the last black one on the right.
adjacent digits are different. That means we went from a blue edge to a red edge (or vice versa), so there's a black one in between.
This algorithm would ignore the paths that don't meet the 2 requirements mentioned earlier and calculate the length for the ones that do, and then find the shortest one. However doing it this way would probably be awfully slow and I'm looking for some tips to come up with a faster algorithm. I suspect it's possible to achieve with dynamic programming, but I don't really know where to start. Any help would be very appreciated. Thanks.
Seems like Dynamic Programming problem to me.
In here, v,u are arbitrary nodes.
Source node: s
Target node: t
For a node v, such that its outgoing edges are (v,u1) [red/blue], (v,u2) [black].
D(v,i,k) = min { ((v,u1) is red ? D(u1,i+1,k) : D(u1,i-1,k)) + w(v,u1) ,
D(u2,i,k-1) + w(v,u2) }
D(t,0,k) = 0 k <= p
D(v,i,k) = infinity k > p //note, for any v
D(t,i,k) = infinity i != 0
Explanation:
v - the current node
i - #reds_traversed - #blues_traversed
k - #black_edges_left
The stop clauses are at the target node, you end when reaching it, and allow reaching it only with i=0, and with k<=p
The recursive call is checking at each point "what is better? going through black or going though red/blue", and choosing the best solution out of both options.
The idea is, D(v,i,k) is the optimal result to go from v to the target (t), #reds-#blues used is i, and you can use up to k black edges.
From this, we can conclude D(s,0,p) is the optimal result to reach the target from the source.
Since |i| <= n, k<=p<=n - the total run time of the algorithm is O(n^3), assuming implemented in Dynamic Programming.
Edit: Somehow I looked at the "Finding shortest path" phrase in the question and ignored the "length of" phrase where the original question later clarified intent. So both my answers below store lots of extra data in order to easily backtrack the correct path once you have computed its length. If you don't need to backtrack after computing the length, my crude version can change its first dimension from N to 2 and just store one odd J and one even J, overwriting anything older. My faster version can drop all the complexity of managing J,R interactions and also just store its outer level as [0..1][0..H] None of that changes the time much, but it changes the storage a lot.
To understand my answer, first understand a crude N^3 answer: (I can't figure out whether my actual answer has better worst case than crude N^3 but it has much better average case).
Note that N must be odd, represent that as N=2H+1. (P also must be odd. Just decrement P if given an even P. But reject the input if N is even.)
Store costs using 3 real coordinates and one implied coordinate:
J = column 0 to N
R = count of red edges 0 to H
B = count of black edges 0 to P
S = side odd or even (S is just B%1)
We will compute/store cost[J][R][B] as the lowest cost way to reach column J using exactly R red edges and exactly B black edges. (We also used J-R blue edges, but that fact is redundant).
For convenience write to cost directly but read it through an accessor c(j,r,b) that returns BIG when r<0 || b<0 and returns cost[j][r][b] otherwise.
Then the innermost step is just:
If (S)
cost[J+1][R][B] = red[J]+min( c(J,R-1,B), c(J,R-1,B-1)+black[J] );
else
cost[J+1][R][B] = blue[J]+min( c(J,R,B), c(J,R,B-1)+black[J] );
Initialize cost[0][0][0] to zero and for the super crude version initialize all other cost[0][R][B] to BIG.
You could super crudely just loop through in increasing J sequence and whatever R,B sequence you like computing all of those.
At the end, we can find the answer as:
min( min(cost[N][H][all odd]), black[N]+min(cost[N][H][all even]) )
But half the R values aren't really part of the problem. In the first half any R>J are impossible and in the second half any R<J+H-N are useless. You can easily avoid computing those. With a slightly smarter accessor function, you could avoid using the positions you never computed in the boundary cases of ones you do need to compute.
If any new cost[J][R][B] is not smaller than a cost of the same J, R, and S but lower B, that new cost is useless data. If the last dim of the structure were map instead of array, we could easily compute in a sequence that drops that useless data from both the storage space and the time. But that reduced time is then multiplied by log of the average size (up to P) of those maps. So probably a win on average case, but likely a loss on worst case.
Give a little thought to the data type needed for cost and the value needed for BIG. If some precise value in that data type is both as big as the longest path and as small as half the max value that can be stored in that data type, then that is a trivial choice for BIG. Otherwise you need a more careful choice to avoid any rounding or truncation.
If you followed all that, you probably will understand one of the better ways that I thought was too hard to explain: This will double the element size but cut the element count to less than half. It will get all the benefits of the std::map tweak to the basic design without the log(P) cost. It will cut the average time way down without hurting the time of pathological cases.
Define a struct CB that contains cost and black count. The main storage is a vector<vector<CB>>. The outer vector has one position for every valid J,R combination. Those are in a regular pattern so we could easily compute the position in the vector of a given J,R or the J,R of a given position. But it is faster to keep those incrementally so J and R are implied rather than directly used. The vector should be reserved to its final size, which is approx N^2/4. It may be best if you pre compute the index for H,0
Each inner vector has C,B pairs in strictly increasing B sequence and within each S, strictly decreasing C sequence . Inner vectors are generated one at a time (in a temp vector) then copied to their final location and only read (not modified) after that. Within generation of each inner vector, candidate C,B pairs will be generated in increasing B sequence. So keep the position of bestOdd and bestEven while building the temp vector. Then each candidate is pushed into the vector only if it has a lower C than best (or best doesn't exist yet). We can also treat all B<P+J-N as if B==S so lower C in that range replaces rather than pushing.
The implied (never stored) J,R pairs of the outer vector start with (0,0) (1,0) (1,1) (2,0) and end with (N-1,H-1) (N-1,H) (N,H). It is fastest to work with those indexes incrementally, so while we are computing the vector for implied position J,R, we would have V as the actual position of J,R and U as the actual position of J-1,R and minU as the first position of J-1,? and minV as the first position of J,? and minW as the first position of J+1,?
In the outer loop, we trivially copy minV to minU and minW to both minV and V, and pretty easily compute the new minW and decide whether U starts at minU or minU+1.
The loop inside that advances V up to (but not including) minW, while advancing U each time V is advanced, and in typical positions using the vector at position U-1 and the vector at position U together to compute the vector for position V. But you must cover the special case of U==minU in which you don't use the vector at U-1 and the special case of U==minV in which you use only the vector at U-1.
When combining two vectors, you walk through them in sync by B value, using one, or the other to generate a candidate (see above) based on which B values you encounter.
Concept: Assuming you understand how a value with implied J,R and explicit C,B is stored: Its meaning is that there exists a path to column J at cost C using exactly R red branches and exactly B black branches and there does not exist exists a path to column J using exactly R red branches and the same S in which one of C' or B' is better and the other not worse.
Your exponential algorithm is essentially a depth-first search tree, where you keep track of the cost as you descend.
You could make it branch-and-bound by keeping track of the best solution seen so far, and pruning any branches that would go beyond the best so far.
Or, you could make it a breadth-first search, ordered by cost, so as soon as you find any solution, it is among the best.
The way I've done this in the past is depth-first, but with a budget.
I prune any branches that would go beyond the budget.
Then I run if with budget 0.
If it doesn't find any solutions, I run it with budget 1.
I keep incrementing the budget until I get a solution.
This might seem like a lot of repetition, but since each run visits many more nodes than the previous one, the previous runs are not significant.
This is exponential in the cost of the solution, not in the size of the network.

Linear algorithm to find minimum subset sum over a threshold

I have a collection of N positive integers, each bounded by a (relatively small) constant C. I want to find a subset of these numbers with the smallest sum greater than (or equal to) a value K.
The numbers involved aren't terribly large (<100), but I need good performance even in the worst case. I thought maybe I could adapt Pisinger's dynamic programming algorithm to the task; it runs in O(NC) time, and I happen to meet the requirements of bounded, positive numbers.
[Edit]: The numbers are not sorted and there may be duplicates.
However, I don't understand the algorithm well enough to do this myself. In fact, I'm not even certain if the assumptions it is based on still hold...
-Is it possible to adapt this algorithm to my needs?
-Or is there another linear algorithm I could use that is similarly efficient?
-Could anyone provide pseudocode or a detailed explanation?
Thanks.
Link to the Subset-Sum code I was investigating:
Fast solution to Subset sum algorithm by Pisinger
(Apologies if this is poorly worded/formatted/etc. I'm still new to StackOverflow...)
Pisinger's algorithm gives you the largest sum less than or equal to the capacity of the knapsack. To solve your problem, use Pisinger to figure out what not to put in the subset. Formally, let the items be w_1, ..., w_n and the minimum be K. Give w_1, ..., w_n and w_1 + ... + w_n - K to Pisinger, then take every item that Pisinger does not.
Well one solution is to:
T = {0}
for x in V
for t in T
T.insert(x+t)
for i in K to max(T)
if (T.contains(i))
return i
fail
This gives you the size of the subset, but you can adapt to output the members.
The maximum size of T is O(N) (because of C bound), so the running time is O(N^2) and the space is O(N). You can use a bit array of length NC as the backing store of T.