Does this recursive algorithm for finding the largest sum in a continuous sub array have any advantages? - c++

Objective: Evaluating the algorithm for finding the largest sum in a continuous subarray below.
Note: written in C++
As I was looking into the problem that Kadane successfully solved using dynamic programming, I thought I would find my own way of solving it. I did so by using a series of recursive calls depending on whether the sum can be larger by shorting the ends of the array. See below.
int corbins_largest_sum_continuous_subarray(int n, int* array){
int sum = 0; // calculate the sum of the current array given
for(int i=0; i<n; i++){sum += array[i];}
if(sum-array[0]>sum && sum-array[n-1]>sum){
return corbins_largest_sum_continuous_subarray(n-2, array+1);
}else if(sum-array[0]<sum && sum-array[n-1]>sum){
return corbins_largest_sum_continuous_subarray(n-1, array);
}else if(sum-array[0]>sum && sum-array[n-1]<sum){
return corbins_largest_sum_continuous_subarray(n-1, array+1);
}else{
return sum; // this is the largest subarray sum, can not increase any further
}
}
I understand that Kadane's algorithm takes O(n) time. I am having trouble calculating the Big O of my algorithm. Would it also be O(n)? Since it calculates the sum using O(n) and all calls after that use the same time. Does my algorithm provide any advantage over Kadane's? In what ways is Kadane's algorithm better?

First of all, the expression sum-array[0]>sum is equivalent to array[0]<0. A similar observation applies to those other conditions you have in your code.
Your algorithm is incorrect. The comment you have here is not true:
}else{
return sum // this is the largest subarray sum, can not increase any further
}
When you get at that point you know that the outer two values are both positive, but there might be a negative-sum subarray somewhere else in the array, which -- when removed -- would give two remaining subarrays, of which one (or both) could have a sum that is greater than the total sum.
For instance, the following input would be such a case:
[1, -4, 1]
Your algorithm will conclude that the maximum sum is achieved by taking the complete array (sum is -2), yet the subarray [1] represents a greater sum.
Other counter examples:
[1, 2, -2, 1]
[1, -3, -3, 1, 1]

Related

3-sum alternative approach

I tried an alternative approach to the 3sum problem: given an array find all triplets that sum up to a given number.
Basically the approach is this: Sort the array. Once a pair of elements (say A[i] and A[j]) is selected, a binary search is done for the third element [using the equal_range function]. The index one past the last of the matching elements is saved in a variable 'c'. Since A[j+1] > A[j], we to search only upto and excluding index c (since numbers at index c and beyond would definitely sum greater than the target sum). For the case j=i+1, we save the end index as 'd' instead and make c=d. For the next value of i, when j=i+1, we need to search only upto and excluding index d.
C++ implementation:
int sum3(vector<int>& A,int sum)
{
int count=0, n=A.size();
sort(A.begin(),A.end());
int c=n, d=n; //initialize c and d to array length
pair < vector<int>::iterator, vector<int>::iterator > p;
for (int i=0; i<n-2; i++)
{
for (int j=i+1; j<n-1; j++)
{
if(j == i+1)
{
p=equal_range (A.begin()+j+1, A.begin()+d, sum-A[i]-A[j]);
d = p.second - A.begin();
if(d==n+1) d--;
c=d;
}
else
{
p=equal_range (A.begin()+j+1, A.begin()+c, sum-A[i]-A[j]);
c = p.second - A.begin();
if(c==n+1) c--;
}
count += p.second-p.first;
for (auto it=p.first; it != p.second; ++it)
cout<<A[i]<<' '<<A[j]<<' '<<*it<<'\n';
}
}
return count;
}
int main() //driver function for testing
{
vector <int> A = {4,3,2,6,4,3,2,6,4,5,7,3,4,6,2,3,4,5};
int sum = 17;
cout << sum3(A,sum) << endl;
return 0;
}
I am unable to work out the upper bound time needed for this algorithm. I understand that the worst case scenario will be when the target sum is unachievably large.
My calculations yield something like:
For i=0, no. of binary searches is lg(n-2) + lg(n-3) + ... +lg(1)
For i=1, lg(n-3) + lg(n-4) + ... + lg(1)
...
...
...
For i=n-3, lg(1)
So totally, lg((n-2)!) + lg((n-3)!) + ... + lg(1!)
= lg(1^n*2^(n-1)3^(n-2)...*(n-1)^2*n^1)
But how to deduce the O(n) bound from this expression?
In addition to James' good answer I would like to point out that this can actually go upto O (n^3) in the worst case because you are running 3 nested for loops. Consider the case
{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}
and the demanded sum is 3.
When computing complexity, I'll start by referring to the Big-O Cheat sheet. I use this sheet to classify smaller sections of the code to get their runtime performance.
E.g. if I had a simple loop it would be O(n). BinSearch (according to the cheat sheet) is O(log(n)), etc..
Next, I use the Properties of Big-O notation to composite the smaller pieces together.
So for instance if I had two loops independent of each other it would be O(n) + O(n) or O(2n) => O(n). If one of my loops were inside the other, I would multiply them. So g( f(x) ) turns into O(n^2).
Now, I know you're saying: "hey, wait, I'm changing the upper and lower bounds of the inner loop" but I don't think that really matters...here's a university level example.
So my back-of-the-napkin calculation of your runtime is O(n^2) * O(Log(n)) or O(n^2 Log(n)).
But this need not be the case. I could've done something horribly wrong. So my next step would be to start graphing the runtimes of your worst possible case. Set sum to the impossibly large value and generate larger and larger arrays. You can avoid integer overflow by using lots and lots of repeated smaller numbers.
Also, compare it to the Quadratic 3Sum Solution. That's a known O(n^2) solution. Be sure to compare worst cases, or at least the same array on both. Do both timed tests at the same time so you can start getting a feel for which is faster while you are empirically testing the runtime.
Release builds, optimized for speed.
1. For your analysis, note that
log(1) + log(2) + ... + log(k) = Theta(k log(k)).
Indeed, the upper half of this sum is log(k/2) + log(k/2+1) + ... + log(k),
so it is at least log(k/2)*k/2, which is asymptotically the same as log(k)*k already.
Similarly, we can conclude that
log(n-1) + log(n-2) + log(n-3) + ... + log(1) + // Theta((n-1) log(n-1))
log(n-2) + log(n-3) + ... + log(1) + // Theta((n-2) log(n-2))
log(n-3) + ... + log(1) + // Theta((n-3) log(n-3))
... +
log(1) = Theta(n^2 log(n))
Indeed, if we consider the logarithms which are at least log(n/2), it's the half-triangle (thus ~1/2) of the upper left quadrant (thus ~n^2/4) of the above sum, so there are Theta(n^2/8) such terms.
2. As noted by satvik in another answer, your output loop can take up to Theta(n^3) steps when the number of outputs itself is Theta(n^3), which is when they are all equal.
3. There are O(n^2) solutions to the 3-sum problem, which are therefore asymptotically faster than this one.

Random generation algorithm in C++

Suppose you need to generate a random permutation of the first N integers. For example, {4, 3, 1, 5, 2} and {3, 1, 4, 2, 5} are legal permutations, but {5, 4, 1, 2, 1} is not, because one number (1) is duplicated and another (3) is missing. This routine is often used in simulation of algorithms. We assume the existence of a random number generator, RandInt(i,j), that generates between i and j with equal probability. Here is the algorithm:
Fill the array A from A[0] to A[N-1] as follows: To fill A[i], generate random numbers until you get one that is not already in A[0], A[1],…, A[i-1].
Implement this algorithm in C++ and find the complexity. This is my code:
int a;
bool b = false;
A[0] = RandInt(1,n);
for (int i=1;i<n;i++) {
do {
b = false;
a = RandInt(1,n);
for (int j=0;j<i;j++)
if(A[j] == a)
b = true;
} while(b);
A[i] = a;
}
Is this code correct? And how can I find the complexity of the algorithm? Since, RandInt(i,j) generates random numbers, I don't know how many times the do while loop will be repeated.
This algorithm will produce correct results, selecting a permutation uniformly at random from all possible permutations.
The running time is not bounded above by any deterministic function since, as you point out, it could run literally forever. In the best case, this algorithm runs in O(n^2) and selects a random permutation without having to repeat any selection. On average, you'd expect to have to try n/n=1 time to get the first unique random, n/(n-1) times to get the second, and so on down to an expected value of n/1=n times to get the last one. Adding those together gives you n*H(n), where H(n) is the nth harmonic number. It turns out H(N) is Theta(log n) so this algorithm is O(n^2 log n) in the average case.
There is a better way to do what you're trying to do: you can start with any permutation and shuffle it into another one using an algorithm that is O(n) in the worst case. The algorithm is the Fisher-Yates algorithm and works as follows:
FisherYates(array[1...n])
1. if n == 1 then return
2. r = random(2, n)
3. temp = array[1]
4. array[1] = array[r]
5. array[r] = temp
6. FisherYates(array[2...n])
This is a recursive formulation but an iterative one is straightforward. It calls random exactly n times, where n is the size of the array at the topmost invocation.

Given an integer n, return the number of ways it can be represented as a sum of 1s and 2s

For example:
5 = 1+1+1+1+1
5 = 1+1+1+2
5 = 1+1+2+1
5 = 1+2+1+1
5 = 2+1+1+1
5 = 1+2+2
5 = 2+2+1
5 = 2+1+2
Can anyone give a hint for a pseudo code on how this can be done please.
Honestly have no clue how to even start.
Also this looks like an exponential problem can it be done in linear time?
Thank you.
In the example you have provided order of addends is important. (See the last two lines in your example). With this in mind, the answer seems to be related to Fibonacci numbers. Let's F(n) be the ways n can be written as 1s and 2s. Then the last addened is either 1 or 2. So F(n) = F(n-1) + F(n-2). These are the initial values:
F(1) = 1 (1 = 1)
F(2) = 2 (2 = 1 + 1, 2 = 2)
This is actually the (n+1)th Fibonacci number. Here's why:
Let's call f(n) the number of ways to represent n. If you have n, then you can represent it as (n-1)+1 or (n-2)+2. Thus the ways to represent it are the number of ways to represent it is f(n-1) + f(n-2). This is the same recurrence as the Fibonacci numbers. Furthermore, we see if n=1 then we have 1 way, and if n=2 then we have 2 ways. Thus the (n+1)th Fibonacci number is your answer. There are algorithms out there to compute enormous Fibonacci numbers very quickly.
Permutations
If we want to know how many possible orderings there are in some set of size n without repetition (i.e., elements selected are removed from the available pool), the factorial of n (or n!) gives the answer:
double factorial(int n)
{
if (n <= 0)
return 1;
else
return n * factorial(n - 1);
}
Note: This also has an iterative solution and can even be approximated using the gamma function:
std::round(std::tgamma(n + 1)); // where n >= 0
The problem set starts with all 1s. Each time the set changes, two 1s are replaced by one 2. We want to find the number of ways k items (the 2s) can be arranged in a set of size n. We can query the number of possible permutations by computing:
double permutation(int n, int k)
{
return factorial(n) / factorial(n - k);
}
However, this is not quite the result we want. The problem is, permutations consider ordering, e.g., the sequence 2,2,2 would count as six distinct variations.
Combinations
These are essentially permutations which ignore ordering. Since the order no longer matters, many permutations are redundant. Redundancy per permutation can be found by computing k!. Dividing the number of permutations by this value gives the number of combinations:
Note: This is known as the binomial coefficient and should be read as "n choose k."
double combination(int n, int k)
{
return permutation(n, k) / factorial(k);
}
int solve(int n)
{
double result = 0;
if (n > 0) {
for ( int k = 0; k <= n; k += 1, n -= 1 )
result += combination(n, k);
}
return std::round(result);
}
This is a general solution. For example, if the problem were instead to find the number of ways an integer can be represented as a sum of 1s and 3s, we would only need to adjust the decrement of the set size (n-2) at each iteration.
Fibonacci numbers
The reason the solution using Fibonacci numbers works, has to do with their relation to the binomial coefficients. The binomial coefficients can be arranged to form Pascal's triangle, which when stored as a lower-triangular matrix, can be accessed using n and k as row/column indices to locate the element equal to combination(n,k).
The pattern of n and k as they change over the lifetime of solve, plot a diagonal when viewed as coordinates on a 2-D grid. The result of summing values along a diagonal of Pascal's triangle is a Fibonacci number. If the pattern changes (e.g., when finding sums of 1s and 3s), this will no longer be the case and this solution will fail.
Interestingly, Fibonacci numbers can be computed in constant time. Which means we can solve this problem in constant time simply by finding the (n+1)th Fibonacci number.
int fibonacci(int n)
{
constexpr double SQRT_5 = std::sqrt(5.0);
constexpr double GOLDEN_RATIO = (SQRT_5 + 1.0) / 2.0;
return std::round(std::pow(GOLDEN_RATIO, n) / SQRT_5);
}
int solve(int n)
{
if (n > 0)
return fibonacci(n + 1);
return 0;
}
As a final note, the numbers generated by both the factorial and fibonacci functions can be extremely large. Therefore, a large-maths library may be needed if n will be large.
Here is the code using backtracking which solves your problem. At each step, while remembering the numbers used to get the sum so far(using vectors here), first make a copy of them, first subtract 1 from n and add it to the copy then recur with n-1 and the copy of the vector with 1 added to it and print when n==0. then return and repeat the same for 2, which essentially is backtracking.
#include <stdio.h>
#include <vector>
#include <iostream>
using namespace std;
int n;
void print(vector<int> vect){
cout << n <<" = ";
for(int i=0;i<vect.size(); ++i){
if(i>0)
cout <<"+" <<vect[i];
else cout << vect[i];
}
cout << endl;
}
void gen(int n, vector<int> vect){
if(!n)
print(vect);
else{
for(int i=1;i<=2;++i){
if(n-i>=0){
std::vector<int> vect2(vect);
vect2.push_back(i);
gen(n-i,vect2);
}
}
}
}
int main(){
scanf("%d",&n);
vector<int> vect;
gen(n,vect);
}
This problem can be easily visualized as follows:
Consider a frog, that is present in front of a stairway. It needs to reach the n-th stair, but he can only jump 1 or 2 steps on the stairway at a time. Find the number of ways in which he can reach the n-th stair?
Let T(n) denote the number of ways to reach the n-th stair.
So, T(1) = 1 and T(2) = 2(2 one-step jumps or 1 two-step jump, so 2 ways)
In order to reach the n-th stair, we already know the number of ways to reach the (n-1)th stair and the (n-2)th stair.
So, once can simple reach the n-th stair by a 1-step jump from (n-1)th stair or a 2-step jump from (n-2)th step...
Hence, T(n) = T(n-1) + T(n-2)
Hope it helps!!!

Number of swaps in a permutation [duplicate]

This question already has answers here:
Counting the adjacent swaps required to convert one permutation into another
(6 answers)
Closed 8 years ago.
Is there an efficient algorithm (efficient in terms of big O notation) to find number of swaps to convert a permutation P into identity permutation I? The swaps do not need to be on adjacent elements, but on any elements.
So for example:
I = {0, 1, 2, 3, 4, 5}, number of swaps is 0
P = {0, 1, 5, 3, 4, 2}, number of swaps is 1 (2 and 5)
P = {4, 1, 3, 5, 0, 2}, number of swaps is 3 (2 with 5, 3 with 5, 4 with 0)
One idea is to write an algorithm like this:
int count = 0;
for(int i = 0; i < n; ++ i) {
for(; P[i] != i; ++ count) { // could be permuted multiple times
std::swap(P[P[i]], P[i]);
// look where the number at hand should be
}
}
But it is not very clear to me whether that is actually guaranteed to terminate or whether it finds a correct number of swaps. It works on the examples above. I tried generating all permutation on 5 and on 12 numbers and it always terminates on those.
This problem arises in numerical linear algebra. Some matrix decompositions use pivoting, which effectively swaps row with the greatest value for the next row to be manipulated, in order to avoid division by small numbers and improve numerical stability. Some decompositions, such as the LU decomposition can be later used to calculate matrix determinant, but the sign of the determinant of the decomposition is opposite to that of the original matrix, if the number of permutations is odd.
EDIT: I agree that this question is similar to Counting the adjacent swaps required to convert one permutation into another. But I would argue that this question is more fundamental. Converting permutation from one to another can be converted to this problem by inverting the target permutation in O(n), composing the permutations in O(n) and then finding the number of swaps from there to identity. Solving this question by explicitly representing identity as another permutation seems suboptimal. Also, the other question had, until yesterday, four answers where only a single one (by |\/|ad) was seemingly useful, but the description of the method seemed vague. Now user lizusek provided answer to my question there. I don't agree with closing this question as duplicate.
EDIT2: The proposed algorithm actually seems to be rather optimal, as pointed out in a comment by user rcgldr, see my answer to Counting the adjacent swaps required to convert one permutation into another.
I believe the key is to think of the permutation in terms of the cycle decomposition.
This expresses any permutation as a product of disjoint cycles.
Key facts are:
Swapping elements in two disjoint cycles produces one longer cycle
Swapping elements in the same cycle produces one fewer cycle
The number of permutations needed is n-c where c is the number of cycles in the decomposition
Your algorithm always swaps elements in the same cycle so will correctly count the number of swaps needed.
If desired, you can also do this in O(n) by computing the cycle decomposition and returning n minus the number of cycles found.
Computing the cycle decomposition can be done in O(n) by starting at the first node and following the permutation until you reach the start again. Mark all visited nodes, then start again at the next unvisited node.
I believe the following are true:
If S(x[0], ..., x[n-1]) is the minimum number of swaps needed to convert x to {0, 1, ..., n - 1}, then:
If x[n - 1] == n - 1, then S(x) == S(x[0],...,x[n-2]) (ie, cut off the last element)
If x[-1] != n - 1, then S(x) == S(x[0], ..., x[n-1], ..., x[i], ... x[n-2]) + 1, where x[i] == n - 1.
S({}) = 0.
This suggests a straightforward algorithm for computing S(x) that runs in O(n) time:
int num_swaps(int[] x, int n) {
if (n == 0) {
return 0;
} else if (x[n - 1] == n - 1) {
return num_swaps(x, n - 1);
} else {
int* i = std::find(x, x + n, n - 1);
std::swap(*i, x[n - 1])
return num_swaps(x, n - 1) + 1;
}
}

return the sum of the max sublist

I have to write a function that takes a list of integers and returns the maximum sum sublist of the list. An example would be:
l = [4,-2,-8,5,-2,7,7,2,-6,5]
returns 19
so far my code is:
count = 0
for i in range(0,len(l)-1):
for j in range(i,len(l)-1):
if l[i] >= l[j]:
count += l[i:j]
return count
I am kind of stuck and confused, can anyone help?
Thank You!
I assume this is a homework, so I won't try to google algorithms here and/or post too much code.
Some ideas (just from the top of my head, 'cause I like these kind of tasks :-))
As user lc already pointed out the naive, and also exhaustive way is to test every single sublist. I believe your (user2101463) code goes in that direction. Just use sum() to build up the sums and compare against a known best. To prime the best known sum with a reasonable starting value, just use the first value of the list.
the_list = [4,-2,-8,5,-2,7,7,2,-6,5]
best_value = the_list[0]
best_idx = (0,0)
for start_element in range(0, len(the_list)+1):
for stop_element in range(start_element+1, len(the_list)+1):
sum_sublist = sum(the_list[start_element:stop_element])
if sum_sublist > best_value:
best_value = sum_sublist
best_idx = (start_element, stop_element)
print("sum(list([{}:{}])) yields the biggest sum of {}".format(best_idx[0], best_idx[1], best_value))
This of course has quadratic runtime O(N^2). That means: If the problem size, as defined by the number of elements of the input list, grows with N, the runtime grows with N*N, with some arbitrary coefficients.
Some heuristics for improvement:
Obviously negative numbers are not good because they decrease the achievable sum
If you encounter a sequence of negative numbers, restart your best sublist after that sequence, if the sum of the best list so far plus the negative numbers is < 0. In your example list the first three numbers cannot be part of a best list because the positive effect of the 4 is always negated by the -2, -8.
Possibly this even leads to an O(N) implementation which just iterates from start to end, memorizing the best known start index while calculating running sums of a full total from that start index as well as positive and negative subtotals of the last continues sequence of positive and negative numbers, respectively.
Once such a best list is found, possibly this requires a final cleanup to remove a trailing negative sublist such as the -6, 5 at the end of your example.
Hope this leads in the right direction.
This is called the 'maximum subarray problem' and can be done in linear time. The wikipedia article has your answer.
The most optimal solution is which takes linear runtime that is O(n).But this problem has "n*lgn" runtime solution(based on divide and conquer algorithm) and "n^2" runtime solution.If you are interested in those algorithms here is the link introduction to algorithms which is highly recommended and here I write a code in java which has linear runtime.
public static void main(String[] args) {
// TODO Auto-generated method stub
Scanner sc=new Scanner(System.in);
int n=sc.nextInt();
int []A=new int[n];
int highestsum=Integer.MIN_VALUE;
int sumvariable=0;
int x=0;
for(int i=0;i<n;i++)
{
A[i]=sc.nextInt();
}
for(int i=0;i<n;i++)
{
sumvariable+=A[i];
if(sumvariable<0)
{
if(sumvariable>=highestsum)
{
highestsum=A[i];
sumvariable=A[i];
}
else
{
sumvariable=0;
}
}
else
{
if(sumvariable>highestsum)
{
highestsum=sumvariable;
}
}
}
System.out.println(highestsum);
}