Is prefix sum included in dynamic programming?

Is prefix sum included in dynamic programming? - list

I've been solving algorithm problems, and I'm a bit confused about the terms.
When we want to calculate prefix sum (or cumulative sum) like the code below, can we say that we are using dynamic programming?
def calc_prefix_sum(nums):
N = len(nums)
prefix_sum = [0] * (N + 1)
for i in range(1, N + 1):
prefix_sum[i] = prefix_sum[i - 1] + nums[i - 1]
return prefix_sum
nums = [1, 3, 0, -2, 1]
print(calc_prefix_sum(nums))
[0, 1, 4, 4, 2, 3]
According to the definition in this page,
Dynamic programming is used where we have problems, which can be
divided into similar sub-problems so that their results can be
re-used.
In my prefix_sum algorithm, the current calculation (prefix_sum[i]) is divided into similar sub-problems (prefix_sum[i - 1] + nums[i - 1]) so that the previous result (prefix_sum[i - 1]) can be re-used. So I am assuming that calculating prefix sum is one of the applications of dynamic programming.
Can I say it's dynamic programming, or should I use different terms? (Especially, I am thinking about the situation in coding interviews.)

No, the correct term is memoization, not dynamic programming. Dynamic programming requires the problem to have optimal substructure as well as overlapping subproblems. Prefix sum has optimal substructure but it does not have overlapping subproblems. Therefore, this optimization should be called memoization.

Yes, prefix sums can be considered as a form of Dynamic Programming. It is the simplest way to calculate the sum of a range given a static array by using a prefix array which stores data based on previous sums.
Prefix Sum Array Construction Runtime = O(n)
Prefix Sum Query Runtime = O(1)

People often say that Kadane's algorithm is DP, and Kadane's is only 1 if statement away from a prefix sum.
def maxSubArray(nums: List[int]) -> int:
for i in range(1, len(nums)):
if nums[i-1] > 0:
nums[i] = nums[i-1] + nums[i]
return max(nums)
If you tried to calculate a prefix sum recursively, you would end up with an O(n^2) without memoization but an O(n) algorithm with memoization. This is because of overlapping subproblems.
nums = [1, 3, 0, -2, 1]
def cumsum(i):
if i < 0:
return 0
return nums[i] + cumsum(i-1)
prefix_sum = [cumsum(i) for i in range(len(nums))]
We can see that cumsum(0) is called 5 times, since the recursion must hit the base case before returning, and we call the function 5 times. cumsum(1) is called 4 times, cumsum(2) is called 3 times, and so on.
This is why I would say that prefix sum has both optimal substructure and overlapping subproblems.

Related

Perfect sum problem with fixed subset size

I am looking for a least time-complex algorithm that would solve a variant of the perfect sum problem (initially: finding all variable size subset combinations from an array [*] of integers of size n that sum to a specific number x) where the subset combination size is of a fixed size k and return the possible combinations without direct and also indirect (when there's a combination containing the exact same elements from another in another order) duplicates.
I'm aware this problem is NP-hard, so I am not expecting a perfect general solution but something that could at least run in a reasonable time in my case, with n close to 1000 and k around 10
Things I have tried so far:
Finding a combination, then doing successive modifications on it and its modifications
Let's assume I have an array such as:
s = [1,2,3,3,4,5,6,9]
So I have n = 8, and I'd like x = 10 for k = 3
I found thanks to some obscure method (bruteforce?) a subset [3,3,4]
From this subset I'm finding other possible combinations by taking two elements out of it and replacing them with other elements that sum the same, i.e. (3, 3) can be replaced by (1, 5) since both got the same sum and the replacing numbers are not already in use. So I obtain another subset [1,5,4], then I repeat the process for all the obtained subsets... indefinitely?
The main issue as suggested here is that it's hard to determine when it's done and this method is rather chaotic. I imagined some variants of this method but they really are work in progress
Iterating through the set to list all k long combinations that sum to x
Pretty self explanatory. This is a naive method that do not work well in my case since I have a pretty large n and a k that is not small enough to avoid a catastrophically big number of combinations (the magnitude of the number of combinations is 10^27!)
I experimented several mechanism related to setting an area of research instead of stupidly iterating through all possibilities, but it's rather complicated and still work in progress
What would you suggest? (Snippets can be in any language, but I prefer C++)
[*] To clear the doubt about whether or not the base collection can contain duplicates, I used the term "array" instead of "set" to be more precise. The collection can contain duplicate integers in my case and quite much, with 70 different integers for 1000 elements (counts rounded), for example

With reasonable sum limit this problem might be solved using extension of dynamic programming approach for subset sum problem or coin change problem with predetermined number of coins. Note that we can count all variants in pseudopolynomial time O(x*n), but output size might grow exponentially, so generation of all variants might be a problem.
Make 3d array, list or vector with outer dimension x-1 for example: A[][][]. Every element A[p] of this list contains list of possible subsets with sum p.
We can walk through all elements (call current element item) of initial "set" (I noticed repeating elements in your example, so it is not true set).
Now scan A[] list from the last entry to the beginning. (This trick helps to avoid repeating usage of the same item).
If A[i - item] contains subsets with size < k, we can add all these subsets to A[i] appending item.
After full scan A[x] will contain subsets of size k and less, having sum x, and we can filter only those of size k
Example of output of my quick-made Delphi program for the next data:
Lst := [1,2,3,3,4,5,6,7];
k := 3;
sum := 10;
3 3 4
2 3 5 //distinct 3's
2 3 5
1 4 5
1 3 6
1 3 6 //distinct 3's
1 2 7
To exclude variants with distinct repeated elements (if needed), we can use non-first occurence only for subsets already containing the first occurence of item (so 3 3 4 will be valid while the second 2 3 5 won't be generated)
I literally translate my Delphi code into C++ (weird, I think :)
int main()
{
vector<vector<vector<int>>> A;
vector<int> Lst = { 1, 2, 3, 3, 4, 5, 6, 7 };
int k = 3;
int sum = 10;
A.push_back({ {0} }); //fictive array to make non-empty variant
for (int i = 0; i < sum; i++)
A.push_back({{}});
for (int item : Lst) {
for (int i = sum; i >= item; i--) {
for (int j = 0; j < A[i - item].size(); j++)
if (A[i - item][j].size() < k + 1 &&
A[i - item][j].size() > 0) {
vector<int> t = A[i - item][j];
t.push_back(item);
A[i].push_back(t); //add new variant including current item
}
}
}
//output needed variants
for (int i = 0; i < A[sum].size(); i++)
if (A[sum][i].size() == k + 1) {
for (int j = 1; j < A[sum][i].size(); j++) //excluding fictive 0
cout << A[sum][i][j] << " ";
cout << endl;
}
}

Here is a complete solution in Python. Translation to C++ is left to the reader.
Like the usual subset sum, generation of the doubly linked summary of the solutions is pseudo-polynomial. It is O(count_values * distinct_sums * depths_of_sums). However actually iterating through them can be exponential. But using generators the way I did avoids using a lot of memory to generate that list, even if it can take a long time to run.
from collections import namedtuple
# This is a doubly linked list.
# (value, tail) will be one group of solutions. (next_answer) is another.
SumPath = namedtuple('SumPath', 'value tail next_answer')
def fixed_sum_paths (array, target, count):
# First find counts of values to handle duplications.
value_repeats = {}
for value in array:
if value in value_repeats:
value_repeats[value] += 1
else:
value_repeats[value] = 1
# paths[depth][x] will be all subsets of size depth that sum to x.
paths = [{} for i in range(count+1)]
# First we add the empty set.
paths[0][0] = SumPath(value=None, tail=None, next_answer=None)
# Now we start adding values to it.
for value, repeats in value_repeats.items():
# Reversed depth avoids seeing paths we will find using this value.
for depth in reversed(range(len(paths))):
for result, path in paths[depth].items():
for i in range(1, repeats+1):
if count < i + depth:
# Do not fill in too deep.
break
result += value
if result in paths[depth+i]:
path = SumPath(
value=value,
tail=path,
next_answer=paths[depth+i][result]
)
else:
path = SumPath(
value=value,
tail=path,
next_answer=None
)
paths[depth+i][result] = path
# Subtle bug fix, a path for value, value
# should not lead to value, other_value because
# we already inserted that first.
path = SumPath(
value=value,
tail=path.tail,
next_answer=None
)
return paths[count][target]
def path_iter(paths):
if paths.value is None:
# We are the tail
yield []
else:
while paths is not None:
value = paths.value
for answer in path_iter(paths.tail):
answer.append(value)
yield answer
paths = paths.next_answer
def fixed_sums (array, target, count):
paths = fixed_sum_paths(array, target, count)
return path_iter(paths)
for path in fixed_sums([1,2,3,3,4,5,6,9], 10, 3):
print(path)
Incidentally for your example, here are the solutions:
[1, 3, 6]
[1, 4, 5]
[2, 3, 5]
[3, 3, 4]

You should first sort the so called array. Secondly, you should determine if the problem is actually solvable, to save time... So what you do is you take the last k elements and see if the sum of those is larger or equal to the x value, if it is smaller, you are done it is not possible to do something like that.... If it is actually equal yes you are also done there is no other permutations.... O(n) feels nice doesn't it?? If it is larger, than you got a lot of work to do..... You need to store all the permutations in an seperate array.... Then you go ahead and replace the smallest of the k numbers with the smallest element in the array.... If this is still larger than x then you do it for the second and third and so on until you get something smaller than x. Once you reach a point where you have the sum smaller than x, you can go ahead and start to increase the value of the last position you stopped at until you hit x.... Once you hit x that is your combination.... Then you can go ahead and get the previous element so if you had 1,1,5, 6 in your thingy, you can go ahead and grab the 1 as well, add it to your smallest element, 5 to get 6, next you check, can you write this number 6 as a combination of two values, you stop once you hit the value.... Then you can repeat for the others as well.... You problem can be solved in O(n!) time in the worst case.... I would not suggest that you 10^27 combinations, meaning you have more than 10^27 elements, mhmmm bad idea do you even have that much space??? That's like 3bits for the header and 8 bits for each integer you would need 9.8765*10^25 terabytes just to store that clossal array, more memory than a supercomputer, you should worry about whether your computer can even store this monster rather than if you can solve the problem, that many combinations even if you find a quadratic solution it would crash your computer, and you know what quadratic is a long way off from O(n!)...

A brute force method using recursion might look like this...
For example, given variables set, x, k, the following pseudo code might work:
setSumStructure find(int[] set, int x, int k, int setIdx)
{
int sz = set.length - setIdx;
if (sz < x) return null;
if (sz == x) check sum of set[setIdx] -> set[set.size] == k. if it does, return the set together with the sum, else return null;
for (int i = setIdx; i < set.size - (k - 1); i++)
filter(find (set, x - set[i], k - 1, i + 1));
return filteredSets;
}

Does this recursive algorithm for finding the largest sum in a continuous sub array have any advantages?

Objective: Evaluating the algorithm for finding the largest sum in a continuous subarray below.
Note: written in C++
As I was looking into the problem that Kadane successfully solved using dynamic programming, I thought I would find my own way of solving it. I did so by using a series of recursive calls depending on whether the sum can be larger by shorting the ends of the array. See below.
int corbins_largest_sum_continuous_subarray(int n, int* array){
int sum = 0; // calculate the sum of the current array given
for(int i=0; i<n; i++){sum += array[i];}
if(sum-array[0]>sum && sum-array[n-1]>sum){
return corbins_largest_sum_continuous_subarray(n-2, array+1);
}else if(sum-array[0]<sum && sum-array[n-1]>sum){
return corbins_largest_sum_continuous_subarray(n-1, array);
}else if(sum-array[0]>sum && sum-array[n-1]<sum){
return corbins_largest_sum_continuous_subarray(n-1, array+1);
}else{
return sum; // this is the largest subarray sum, can not increase any further
}
}
I understand that Kadane's algorithm takes O(n) time. I am having trouble calculating the Big O of my algorithm. Would it also be O(n)? Since it calculates the sum using O(n) and all calls after that use the same time. Does my algorithm provide any advantage over Kadane's? In what ways is Kadane's algorithm better?

First of all, the expression sum-array[0]>sum is equivalent to array[0]<0. A similar observation applies to those other conditions you have in your code.
Your algorithm is incorrect. The comment you have here is not true:
}else{
return sum // this is the largest subarray sum, can not increase any further
}
When you get at that point you know that the outer two values are both positive, but there might be a negative-sum subarray somewhere else in the array, which -- when removed -- would give two remaining subarrays, of which one (or both) could have a sum that is greater than the total sum.
For instance, the following input would be such a case:
[1, -4, 1]
Your algorithm will conclude that the maximum sum is achieved by taking the complete array (sum is -2), yet the subarray [1] represents a greater sum.
Other counter examples:
[1, 2, -2, 1]
[1, -3, -3, 1, 1]

Seems very hard to find out the time complexity of this simple program

I have the code below to mimic a recursive behavior of an algorithm, because I failed to figure out the time complexity of that algorithm:
int M(int n)
{
int result = 1;
for (int i = n-1; i >= 0; --i)
{
result += M(i);
}
return result;
}
According to my understanding, I have drawn the tree below to illustrate the algorithm :
(The input n is 3 in the picture).
I think the number of nodes in the tree is the complexity of the algorithm. If the input is n, what's the time complexity would be? Thanks!

My background is not CS but I can provide you an easy way to look at this problem,
So I took a paper and pen and started out with different values of n.
n = 2, cycles = 4
n = 3, cycles = 8
n = 4, cycles = 16
n = 5, cycles = 32.
You can clearly see the cycles = 2^N and therefor we can conclude that time complexity of this problem is O(2^N).
Now to look at this in another way could be
We know that
f(0) = 1
f(1) = f(0) + 1 = 2
f(2) = f(1) + f(0) + 1 = 4
...
f(N) = f(N-1) + f(N-2) .. + f(0) + 1 = 2^N.
So now that you have a recurrence relation similar to how you calculate factorial, you can do maths or create a program to measure time complexity of the problem.
Hope that my answer helps you in understanding the theory of calculating time complexity.

Your tree diagram is illuminating. Can you see the line of symmetry? The tree for M(n) looks like two copies of the tree for M(n-1). Thus the number of nodes in the tree is 2**n, and the complexity of the algorithm O(2**n).

I have attached a image you can check it.

Python3 how to create a list of partial products

I have a very long list (of big numbers), let's say for example:
a=[4,6,7,2,8,2]
I need to get this output:
b=[4,24,168,336,2688,5376]
where each b[i]=a[0]*a[1]...*a[i]
I'm trying to do this recursively in this way:
b=[4] + [ a[i-1]*a[i] for i in range(1,6)]
but the (wrong) result is: [4, 24, 42, 14, 16, 16]
I don't want to compute all the products each time, I need a efficient way (if possible), because the list is very long
At the moment this works for me:
b=[0]*6
b[0]=4
for i in range(1,6): b[i]=a[i]*b[i-1]
but it's too slow. Any ideas? Is it possible to avoid "for" or to speedup it in other way?

You can calculate the product step-by-step since every next calculation heavily depends on the previous one.
What I mean is:
1) Compute the product for the first i - 1 numbers
2) The i-th product will be equal to a[i] * product of the last i - 1 numbers
This method is called dynamic programming
Dynamic programming (also known as dynamic optimization) is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions
This is the implementation:
a = [4, 6, 7, 2, 8, 2]
b = []
product_so_far = 1
for i in range(len(a)):
product_so_far *= a[i]
b.append(product_so_far)
print(b)
This algorithm works in linear time (O(n)), which is the most efficient complexity you'll get for such a task
If you want a little optimization, you could generate the b list to the predefined length (b = [0] * len(a)) and, instead of appending, you would do this in a loop:
b[i] = product_so_far

Recursive Algorithm Time Complexity: Coin Change

I'm going through some algorithms, and came across the coin change problem.
When thinking about the problem I came up with this naive recursive solution:
int coinChange(const vector<int>& coins, int start, int n) {
if (n == 0) return 1;
if (n < 0) return 0;
int total = 0;
for (int i = start; i < coins.size(); ++i) {
if (coins[i] <= n) total += coinChange(coins, i, n-coins[i]);
}
return total;
}
I then realized the "accepted" solution was as follows:
int count( int S[], int m, int n )
{
// If n is 0 then there is 1 solution (do not include any coin)
if (n == 0)
return 1;
// If n is less than 0 then no solution exists
if (n < 0)
return 0;
// If there are no coins and n is greater than 0, then no solution exist
if (m <=0 && n >= 1)
return 0;
// count is sum of solutions (i) including S[m-1] (ii) excluding S[m-1]
return count( S, m - 1, n ) + count( S, m, n-S[m-1] );
}
At first I thought the two were essentially the same. It was clear to me that my recursion tree was much wider, but it seemed that this was only because my algorithm was doing more work at each level so it evened out. It looks like both algorithms are considering the number of ways to make changes with the current coin (given it is <= the current sum), and considering the number of ways to make change without the current coin (thus with all the elements in the coin array minus the current coin). Therefore the parameter start in my algorithm was doing essentially the same thing as m is doing in the second algorithm.
The more I look at it though, it seems that regardless of the previous text, my algorithm is O(n^n) and the second one is O(2^n). I've been looking at this for too long but if someone could explain what extra work my algorithm is doing compared to the second one that would be great.
EDIT
I understand the dynamic programming solution to this problem, this question is purely a complexity based question.

The two pieces of code are the same except that the second uses recursion instead of a for loop to iterate over the coins. That makes their runtime complexity the same (although the second piece of code probably has worse memory complexity because of the extra recursive calls, but that may get lost in the wash).
For example, here's partial evaluation of the second count in the case where S = [1, 5, 10] and m=3. On each line, I expand the left-most definition of count.
count(S, 3, 100)
= count(S, 2, 100) + count(S, 3, 90)
= count(S, 1, 100) + count(S, 2, 95) + count(S, 3, 90)
= count(S, 0, 100) + count(S, 1, 99) + count(S, 2, 95) + count(S, 3, 90)
= 0 + count(S, 1, 99) + count(S, 2, 95) + count(S, 3, 90)
You can see that this is the same calculation as your for-loop that sums up total.
Both algorithms are terrible because they run in exponential time. Here is an answer (of mine) that uses a neat dynamic programming method that runs in O(nm) time and uses O(n) memory, and is extremely concise -- comparable in size to your naive recursive solution. https://stackoverflow.com/a/20743780/1400793 . It's in Python, but it's trivially convertable to C++.

You didn't read the whole article (?).
The idea behind dynamic programming is that you store some values you already got and that way you don't need to calculate them again. In the end of the article you can see the actual correct solution.
As for why your solution is n^n and their original one is 2^n. Both solutions actually are 2^(n+#coins). They just call the function with m-1, instead of having a cycle that goes trough every coin. While your solution tries every coin at the start and then less and less, theirs tries to take one coin of type m, then another, then another, until at some point it switches to type m-1 and does the same with it and so on. Basically both solutions are the same.
Another way to prove that they have the same complexity is like this:
Both solutions are correct, so they will reach all possible solutions, and both stop growing a particular branch of the recursion the moment it reaches a negative n. Therefore, they have the same complexity.
And if you are not convinced just try each solution except add some counter and increment it every time you enter the function. Do this for each solution and you will see that you get the same number.

Benchmark
On my computer benchmarks follows:
coinChange(v, 0, 500);// v=[1, 5, 10, 15, 20, 25]
took 1.84649s to complete.
But
count(s, 6, 500); //s = [1, 5, 10, 15, 20, 25]
took 0.853075s to execute.
EDIT
I interpret the result as the time complexity of two algorithm are the same.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js