Proving that a two-pointer approach works (pair sum) - c++

I was trying to solve the pair sum problem, i.e., given a sorted array, we need to if there exist two indices i and j such that i!=j and a[i]+a[j] == k for some k.
One of the approaches to do the same problem is running two nested for loops, resulting in a complexity of O(n*n).
Another way to solve it is using a two-pointer technique. I wasn't able to solve the problem using the two-pointer method and therefore looked it up, but I couldn't understand why it works. How do I prove that it works?
#define lli long long
//n is size of array
bool f(lli sum) {
int l = 0, r = n - 1;
while ( l < r ) {
if ( A[l] + A[r] == sum ) return 1;
else if ( A[l] + A[r] > sum ) r--;
else l++;
}
return 0;
}

Well, think of it this way:
You have a sorted array (you didn't mention that the array is sorted, but for this problem, that is generally the case):
{ -1,4,8,12 }
The algorithm starts by choosing the first element in the array and the last element, adding them together and comparing them to the sum you are after.
If our initial sum matches the sum we are looking for, great!! If not, well, we need to continue looking at possible sums either greater than or less than the sum we started with. By starting with the smallest and the largest value in the array for our initial sum, we can eliminate one of those elements as being part of a possible solution.
Let's say we are looking for the sum 3. We see that 3 < 11. Since our big number (12) is paired with the smallest possible number (-1), the fact that our sum is too large means that 12 cannot be part of any possible solution, since any other sum using 12 would have to be larger than 11 (12 + 4 > 12 - 1, 12 + 8 > 12 - 1).
So we know we cannot possibly make a sum of 3 using 12 + one other number in the array; they would all be too big. So we can eliminate 12 from our search by moving down to the next largest number, 8. We do the same thing here. We see 8 + -1 is still too big, so we move down to the next number, 4, and voila! We find a match.
The same logic applies if the sum we get is too small. We can eliminate our small number, because any sum we can get using our current smallest number has to be less than or equal to the sum we get when it is paired with our current largest number.
We keep doing this until we find a match, or until the indices cross each other, since, after they cross, we are simply adding up pairs of numbers we have already checked (i.e. 4 + 8 = 8 + 4).
This may not be a mathematical proof, but hopefully it illustrates how the algorithm works.

Stephen Docy made a great job tracing the program's execution and explaining the rationale behind its decisions. Maybe making the answer closer to a mathematical proof of the algorithm's correctness could make it easier to generalize to problems like the one mentioned by zzzzzzz in the comments.
We are given a sorted array A of length n and an integer sum. We need to find if there are any two indices i and j such that i != j and A[i] + A[j] == sum.
The solutions (i, j) and (j, i) are equivalent, so we can assume that i < j without loss of generality. In the program, the current guess at i is called l and the current guess at j is called r.
We iteratively slice the array till we find a slice that has the two summands that sum to sum at its boundary, or we find there is no such slice. The slice starts at index l and ends at index r and I will write it as (l, r).
Initially, the slice is the whole array. In each iteration, the length of the slice is decreased by 1: either the left boundary index l increases or the right boundary index r decreases. When the slice length decreases to 1 (l == r), there are no pairs of different indexes inside the slice, so false is returned. This means that the algorithm halts for any input. The O(n) complexity is also immediately clear. The correctness remains to be proven.
We can assume there is a solution; if there is none, the analysis in the above paragraph applies and the branch returning true can never be executed.
The loop has an invariant (statement that holds true regardless of how many iterations have been done yet): When a solution exists, it is either (l, r) itself or its sub-slice. Mathematically, such an invariant is a lemma -- something that is not very useful by itself but makes a stepping stone in the overall proof. We get the overall correctness by initially making (l, r) the whole array and observing that as each iteration makes the slice shorter, the invariant ensures that we will eventually find the solution. Now, we just need to prove the invariant.
We will prove the invariant by induction. The induction base is trivial -- the initial slice (l, r) either is the solution, or contains it as a sub-slice. The hard part is the induction step, i.e. proving that when (l, r) contains the solution, either it is the solution itself or the slice for the next iteration contains the solution as a sub-slice.
When A[l] + A[r] == sum, (l, r) is the solution itself; the first condition in the loop is triggered, true is returned, and everyone is happy.
When A[l] + A[r] > sum, the slice for the next iteration is (l, r - 1), which still contains the solution. Let's prove that by contradiction, assuming (l, r - 1) does not contain the solution. How could that happen, when (l, r) contained the solution (by induction hypothesis)? The only way would be that the solution (i, j) has j == r (r is the only index we removed from the slice). Because by definition A[i] + A[j] == sum, we get A[i] + A[r] == sum < A[l] + A[r] in this branch. When we subtract A[r] from both sides of the inequality, we get A[i] < A[l]. But A[l] is the smallest value in the (l, r) slice (the array is sorted), so this is a contradiction.
When A[l] + A[r] < sum, the slice for the next iteration is (l + 1, r). The argument is symmetric to the previous case.
∎
The algorithm may be easily rewritten as recursive, which simplifies the analysis at the expense of actual performance. This is the functional programming approach.
#define lli long long
//n is size of array
bool f(lli sum) {
return g(sum, 0, n - 1);
}
bool g(lli sum, int l, int r) {
if ( l >= r ) return 0;
else if ( A[l] + A[r] == sum ) return 1;
else if ( A[l] + A[r] > sum ) return g(sum, l, r - 1);
else return g(sum, l + 1, r);
}
The f function still contains the initialization, but it calls the new g function, which implements the original loop. Instead of keeping the state in local variables, it uses its parameters. Each call of the g function corresponds to a single iteration of the original loop.
The g function is a solution to a more general problem than the original one: Given a sorted array A, are there any two indices i and j such that i != j and A[i] + A[j] == sum and both i and j are between l and r (inclusive)?
This makes reading the analysis even simpler. The loop invariant is actually the proof of correctness of g and the structure of g guides the proof.

Related

Intuition behind initializing both the pointers at the beginning versus one at the beginning and other at the ending

I solved a problem few days ago:
Given an unsorted array A containing N integers and an integer B, find if there exists a pair of elements in the array whose difference is B. Return true if any such pair exists else return false. For [2, 3, 5, 10, 50, 80]; B=40;, it should return true.
as:
int Solution::solve(vector<int> &A, int B) {
if(A.size()==1) return false;
int i=0, j=0; //note: both initialized at the beginning
sort(begin(A), end(A));
while(i< A.size() && j<A.size()) {
if(A[j]-A[i]==B && i!=j) return true;
if(A[j]-A[i]<B) j++;
else i++;
}
return false;
}
While solving this problem the mistake I had committed earlier was initializing i=0 and j=A.size()-1. Due to this, decrementing j and incrementing i both decreased the differences and so valid differences were missed. On initializing both at the beginning as above, I was able to solve the problem.
Now I am solving a follow-up 3sum problem:
Given an integer array nums, return all the triplets [nums[i], nums[j], nums[k]] such that i != j, i != k, and j != k, and nums[i] + nums[j] + nums[k] == 0. Notice that the solution set must not contain duplicate triplets. If nums = [-1,0,1,2,-1,-4], output should be: [[-1,-1,2],[-1,0,1]] (any order works).
A solution to this problem is given as:
vector<vector<int>> threeSum(vector<int>& nums) {
sort(nums.begin(), nums.end());
vector<vector<int>> res;
for (unsigned int i=0; i<nums.size(); i++) {
if ((i>0) && (nums[i]==nums[i-1]))
continue;
int l = i+1, r = nums.size()-1; //note: unlike `l`, `r` points to the end
while (l<r) {
int s = nums[i]+nums[l]+nums[r];
if (s>0) r--;
else if (s<0) l++;
else {
res.push_back(vector<int> {nums[i], nums[l], nums[r]});
while (nums[l]==nums[l+1]) l++;
while (nums[r]==nums[r-1]) r--;
l++; r--;
}
}
}
return res;
}
The logic is pretty straightforward: each of nums[i]s (from the outer loop) is the 'target' that we search for, in the inner while loop using a two pointer approach like in the first code at the top.
What I don't follow is the logic behind initializing r=nums.size()-1 and working backwards - how are valid differences (in this case, the 'sum's actually) not being missed?
Edit1: Both problems contain negative and positive numbers, as well as zeroes.
Edit2: I understand how both snippets work. My question specifically is the reasoning behind r=nums.size()-1 in code# 2: as we see in code #1 above it, starting r from the end misses some valid pairs (http://cpp.sh/36y27 - the valid pair (10,50) is missed); so why do we not miss valid pair(s) in the second code?
Reformulating the problem
The difference between the two algorithms boils down to addition and subtraction, not 3 vs 2 sums.
Your 3-sum variant asks for the sum of 3 numbers matching a target. When you fix one number in the outer loop, the inner loop reduces to a 2-sum that's actually a 2-sum (i.e. addition). The "2-sum" variant in your top code is really a 2-difference (i.e. subtraction).
You're comparing 2-sum (A[i] + A[j] == B s.t. i != j) to a 2-difference (A[i] - A[j] == B s.t. i != j). I'll use those terms going forward, and forget about the outer loop in 3-sum as a red herring.
2-sum
Why L = 0, R = length - 1 works for 2-sum
For 2-sum, you probably already see the intuition of starting at the ends and working towards the middle, but it's worth making the logic explicit.
At any iteration in the loop, if the sum of A[L] + A[R] > B, then we have no choice but to decrement the right pointer to a lower index. Incrementing the left pointer is guaranteed to increase our sum or leave it the same and we'll get further and further away from the target, potentially closing off the potential to find the solution pair, which may well still include A[L].
On the other hand, if A[L] + A[R] < B, then you must increase your sum by moving the left pointer forward to a larger number. There's a chance A[R] is still part of that sum -- we can't guarantee it's not a part of the sum until A[L] + A[R] > B.
The key takeaway is that there is no decision to be made at each step: either the answer was found or one of the two numbers at either index can be definitively eliminated from further consideration.
Why L = 0, R = 0 doesn't work for 2-sum
This explains why starting both numbers at 0 won't help for 2-sum. What rule would you use to increment the pointers to find a solution? There's no way to know which pointer needs to move forward and which should wait. Both moves increase the sum at best and neither move decreases the sum (the start is the minimum sum, A[0] + A[0]). Moving the wrong one could prohibit finding the solution later on, and there's no way to definitively eliminate either number.
You're back to keeping left at 0 and moving the right pointer forward to the first element that causes A[R] + A[L] > B, then running the tried-and-true original two-pointer logic. You might as well just start R at length - 1.
2-difference
Why L = 0, R = length - 1 doesn't work for 2-difference
Now that we understand how 2-sum works, let's look at 2-difference. Why is it that the same approach starting from both ends and working towards the middle won't work?
The reason is that when you're subtacting two numbers, you lose the all-important guarantee from 2-sum that moving the left pointer forward will always increase the sum and that moving the right pointer backwards will always decrease it.
In subtraction between two numbers in a sorted array, A[R] - A[L] s.t. R > L, regardless of whether you move L forward or R backwards, the sum will decrease, even in an array of only positive numbers. This means that at a given index, there's no way to know which pointer needs to move to find the correct pair later on, breaking the algorithm for the same reason as 2-sum with both pointers starting at 0.
Why L = 0, R = 0 works for 2-difference
Finally, why does starting both pointers at 0 work on 2-difference? The reason is that you're back to the 2-sum guarantee that moving one pointer increases the difference while the other decreases the difference. Specifically, if A[R] - A[L] < B, then L++ is guaranteed to decrease the difference, while R++ is guaranteed to increase it.
We're back in business: there is no choice or magical oracle necessary to decide which index to move. We can systematically eliminate values that are either too large or too small and hone in on the target. The logic works for the same reasons L = 0, R = length - 1 works on 2-sum.
As an aside, the first solution is suboptimal O(n log(n)) instead of O(n) with two passes and O(n) space. You can use an unordered map to keep track of the items seen so far, then perform a lookup for every item in the array: if B - A[i] for some i is in the map, you found your pair.
Conside this:
A = {2, 3, 5, 10, 50, 80}
B = 40
i = 0, j = 5;
When you have something like
while(i<j) {
if(A[j]-A[i]==B && i!=j) return true;
if(A[j]-A[i]>B) j--;
else i++;
}
consider the case when if(A[j]-A[i]==B && i!=j) is not true. Your code makes an incorrect assumption that if the difference of the two endpoints is > B then one should decrement j. Given a sorted array, you don't know whether you decrementing j and then taking the difference would give you the target difference, or incrementing i and then taking the difference would give you the target number since it can go both ways. In your example, when A[5] - A[0] != 10 you could've gone both ways, A[4] - A[0] (which is what you do) or A[5] - A[1]. Both would still give you a difference greater than the target difference. In short, the presumption in your algorithm is incorrect and hence isn't the right way to go about.
In the second approach, that's not the case. When the triplet nums[i]+nums[l]+nums[r] isn't found, you know that the array is sorted and if the sum was more than 0, it has to mean that the num[r] needs to be decremented since incrementing l would only further increase the sum further since num[l + 1] > num[l].
Your question boils down to the following:
For a sorted array in ascending order A, why is it that we perform a different two-pointer search for t for the problem A[i] + A[j] == t versus A[i] - A[j] == t, where j > i?
It's more intuitive why for the first problem, we can fix i and j to be at opposite ends and decrease the j or increase i, so I'll focus on the second problem.
With array problems it's sometimes easiest to draw out the solution space, then come up with the algorithm from there. First, let's draw out the solution space B, where B[i][j] = -(A[i] - A[j]) (defined only for j > i):
B, for A of length N
i ---------------------------->
j B[0][0] B[0][1] ... B[0][N - 1]
| B[1][0] B[1][1] ... B[1][N - 1]
| . . .
| . . .
| . . .
v B[N - 1][0] B[N - 1][1] ... B[N - 1][N - 1]
---
In terms of A:
X -(A[0] - A[1]) -(A[0] - A[2]) ... -(A[0] - A[N - 2]) -(A[0] - A[N - 1])
X X -(A[1] - A[2]) ... -(A[1] - A[N - 2]) -(A[1] - A[N - 1])
. . . . .
. . . . .
. . . . .
X X X ... X -(A[N - 2] - A[N - 1])
X X X ... X X
Notice that B[i][j] = A[j] - A[i], so the rows of B are in ascending order and the columns of B are in descending order. Let's compute B for A = [2, 3, 5, 10, 50, 80].
B = [
i------------------------>
j X 1 3 8 48 78
| X X 2 7 47 77
| X X X 5 45 75
| X X X X 40 70
| X X X X X 30
v X X X X X X
]
Now the equivalent problem is searching for t = 40 in B. Note that if we start with i = 0 and j = N = 5 there's no good/guaranteed way to reach 40. However, if we start in a position where we can always increment/decrement our current element in B in small steps, we can guarantee that we'll get as close to t as possible.
In this case, the small steps we take involve traversing either right/downwards in the matrix, starting from the top left (could equivalently traverse left/upwards from the bottom right), which corresponds to incrementing both i and j in the original question in A.

[Competitive Programming]:How do I optimise this brute force method? [duplicate]

If n numbers are given, how would I find the total number of possible triangles? Is there any method that does this in less than O(n^3) time?
I am considering a+b>c, b+c>a and a+c>b conditions for being a triangle.
Assume there is no equal numbers in given n and it's allowed to use one number more than once. For example, we given a numbers {1,2,3}, so we can create 7 triangles:
1 1 1
1 2 2
1 3 3
2 2 2
2 2 3
2 3 3
3 3 3
If any of those assumptions isn't true, it's easy to modify algorithm.
Here I present algorithm which takes O(n^2) time in worst case:
Sort numbers (ascending order).
We will take triples ai <= aj <= ak, such that i <= j <= k.
For each i, j you need to find largest k that satisfy ak <= ai + aj. Then all triples (ai,aj,al) j <= l <= k is triangle (because ak >= aj >= ai we can only violate ak < a i+ aj).
Consider two pairs (i, j1) and (i, j2) j1 <= j2. It's easy to see that k2 (found on step 2 for (i, j2)) >= k1 (found one step 2 for (i, j1)). It means that if you iterate for j, and you only need to check numbers starting from previous k. So it gives you O(n) time complexity for each particular i, which implies O(n^2) for whole algorithm.
C++ source code:
int Solve(int* a, int n)
{
int answer = 0;
std::sort(a, a + n);
for (int i = 0; i < n; ++i)
{
int k = i;
for (int j = i; j < n; ++j)
{
while (n > k && a[i] + a[j] > a[k])
++k;
answer += k - j;
}
}
return answer;
}
Update for downvoters:
This definitely is O(n^2)! Please read carefully "An Introduction of Algorithms" by Thomas H. Cormen chapter about Amortized Analysis (17.2 in second edition).
Finding complexity by counting nested loops is completely wrong sometimes.
Here I try to explain it as simple as I could. Let's fix i variable. Then for that i we must iterate j from i to n (it means O(n) operation) and internal while loop iterate k from i to n (it also means O(n) operation). Note: I don't start while loop from the beginning for each j. We also need to do it for each i from 0 to n. So it gives us n * (O(n) + O(n)) = O(n^2).
There is a simple algorithm in O(n^2*logn).
Assume you want all triangles as triples (a, b, c) where a <= b <= c.
There are 3 triangle inequalities but only a + b > c suffices (others then hold trivially).
And now:
Sort the sequence in O(n * logn), e.g. by merge-sort.
For each pair (a, b), a <= b the remaining value c needs to be at least b and less than a + b.
So you need to count the number of items in the interval [b, a+b).
This can be simply done by binary-searching a+b (O(logn)) and counting the number of items in [b,a+b) for every possibility which is b-a.
All together O(n * logn + n^2 * logn) which is O(n^2 * logn). Hope this helps.
If you use a binary sort, that's O(n-log(n)), right? Keep your binary tree handy, and for each pair (a,b) where a b and c < (a+b).
Let a, b and c be three sides. The below condition must hold for a triangle (Sum of two sides is greater than the third side)
i) a + b > c
ii) b + c > a
iii) a + c > b
Following are steps to count triangle.
Sort the array in non-decreasing order.
Initialize two pointers ‘i’ and ‘j’ to first and second elements respectively, and initialize count of triangles as 0.
Fix ‘i’ and ‘j’ and find the rightmost index ‘k’ (or largest ‘arr[k]‘) such that ‘arr[i] + arr[j] > arr[k]‘. The number of triangles that can be formed with ‘arr[i]‘ and ‘arr[j]‘ as two sides is ‘k – j’. Add ‘k – j’ to count of triangles.
Let us consider ‘arr[i]‘ as ‘a’, ‘arr[j]‘ as b and all elements between ‘arr[j+1]‘ and ‘arr[k]‘ as ‘c’. The above mentioned conditions (ii) and (iii) are satisfied because ‘arr[i] < arr[j] < arr[k]'. And we check for condition (i) when we pick 'k'
4.Increment ‘j’ to fix the second element again.
Note that in step 3, we can use the previous value of ‘k’. The reason is simple, if we know that the value of ‘arr[i] + arr[j-1]‘ is greater than ‘arr[k]‘, then we can say ‘arr[i] + arr[j]‘ will also be greater than ‘arr[k]‘, because the array is sorted in increasing order.
5.If ‘j’ has reached end, then increment ‘i’. Initialize ‘j’ as ‘i + 1′, ‘k’ as ‘i+2′ and repeat the steps 3 and 4.
Time Complexity: O(n^2).
The time complexity looks more because of 3 nested loops. If we take a closer look at the algorithm, we observe that k is initialized only once in the outermost loop. The innermost loop executes at most O(n) time for every iteration of outer most loop, because k starts from i+2 and goes upto n for all values of j. Therefore, the time complexity is O(n^2).
I have worked out an algorithm that runs in O(n^2 lgn) time. I think its correct...
The code is wtitten in C++...
int Search_Closest(A,p,q,n) /*Returns the index of the element closest to n in array
A[p..q]*/
{
if(p<q)
{
int r = (p+q)/2;
if(n==A[r])
return r;
if(p==r)
return r;
if(n<A[r])
Search_Closest(A,p,r,n);
else
Search_Closest(A,r,q,n);
}
else
return p;
}
int no_of_triangles(A,p,q) /*Returns the no of triangles possible in A[p..q]*/
{
int sum = 0;
Quicksort(A,p,q); //Sorts the array A[p..q] in O(nlgn) expected case time
for(int i=p;i<=q;i++)
for(int j =i+1;j<=q;j++)
{
int c = A[i]+A[j];
int k = Search_Closest(A,j,q,c);
/* no of triangles formed with A[i] and A[j] as two sides is (k+1)-2 if A[k] is small or equal to c else its (k+1)-3. As index starts from zero we need to add 1 to the value*/
if(A[k]>c)
sum+=k-2;
else
sum+=k-1;
}
return sum;
}
Hope it helps........
possible answer
Although we can use binary search to find the value of 'k' hence improve time complexity!
N0,N1,N2,...Nn-1
sort
X0,X1,X2,...Xn-1 as X0>=X1>=X2>=...>=Xn-1
choice X0(to Xn-3) and choice form rest two item x1...
choice case of (X0,X1,X2)
check(X0<X1+X2)
OK is find and continue
NG is skip choice rest
It seems there is no algorithm better than O(n^3). In the worst case, the result set itself has O(n^3) elements.
For Example, if n equal numbers are given, the algorithm has to return n*(n-1)*(n-2) results.

Given a sorted array and a parameter k, find the count of sum of two numbers greater than or equal to k in linear time

I am trying to find all pairs in an array with sum equal to k. My current solution takes O(n*log(n)) time (code snippet below).Can anybody help me in finding a better solution, O(n) or O(lgn) may be (if it exists)
map<int,int> mymap;
map<int,int>::iterator it;
cin>>n>>k;
for( int i = 0 ; i < n ; i++ ){
cin>>a;
if( mymap.find(a) != mymap.end() )
mymap[a]++;
else
mymap[a] = 1;
}
for( it = mymap.begin() ; it != mymap.end() ; it++ ){
int val = it->first;
if( mymap.find(k-val) != mymap.end() ){
cnt += min( it->second, mymap.find(k-val)->second );
it->second = 0;
}
}
cout<<cnt;
Another aproach which will take O(log n) in the best case and O(nlog n) in the worst one for positive numbers can be done in this way:
Find element in array that is equal to k/2 or if it doesn’t exist than finds the minimum greater then k/2. All combinations with this element and all greater elements will be interested for us because p + s >= k when p>= k/2 and s>=k/2. Array is sorted, so binary search with some modifications can be used. This step will take O(log n) time.
All elements which are less then k/2 + elements greater or equal to "mirror elements" (according to median k/2) will also be interested for us because p + s >= k when p=k/2-t and s>= k/2+t. Here we need to loop through elements less then k/2 and find their mirror elements (binary search). The loop should be stopped if mirror element is greater then the last array.
For instance we have array {1,3,5,8,11} and k = 10, so on the first step we will have k/2 = 5 and pairs {5,7}, {8,11}, {8, 11}. The count of these pairs will be calculated by formula l * (l - 1)/2 where l = count of elements >= k/2. In our case l = 3, so count = 3*2/2=3.
On the second step for 3 number a mirror element will be 7 (5-2=3 and 5+2=7), so pairs {3, 8} and {3, 11} will be interested. For 1 number mirror will be 9 (5-4=1 and 5+4=9), so {1, 11} is what we look for.
So, if k/2 < first array element this algorithm will be O(log n).
For negative the algorithm will be a little bit more complex but can be solved also with the same complexity.
There exists a rather simple O(n) approach using the so-called "two pointers" or "two iterators" approach. The key idea is to have two iterators (not necessarily C++ iterators, indices would do too) running on the same array so that if first iterator points to value x, then the second iterator points to the maximal element in the array that is less then k-x.
We will be increasing the first iterator, and while doing this we'll also change the second iterator to maintain this property. Note that as the first pointer increases, the corresponding position of the second pointer will only decrease, so on every iteration we can start from the position where we stopped at the previous iteration; we will never need to increase the second pointer. This is how we achieve O(n) time.
Code is like this (did not test this, but the idea should be clear)
vector<int> a; // the given array
int r = a.size() - 1;
for (int l=0; l<a.size(); l++) {
while ((r >= 0) && (a[r] >= k - a[l]))
r--;
// now r is the maximal position in a so that a[r] < k - a[l]
// so all elements right to r form a needed pair with a[l]
ans += a.size() - r - 1; // this is how many pairs we have starting at l
}
Another approach which might be simpler to code, but a bit slower, is O(n log n) using binary search. For each element a[l] of the array, you can find the maximal position r so that a[r]<k-a[l] using binary search (this is the same r as in the first algorithm).
#Drew Dormann - thanks for the remark.
Run through the array with two pointers. left and right.
Assuming left is the small side, start with left at location 0 and then right moves towards left until a[left]+a[right] >= k for the last time.
When this is achieved, then total_count += (a.size - right + 1).
You then move left one step forwards and right needs to (maybe) move towards it. Repeat this until they meet.
When this is done, and let us say they met at location x, then totla_count += choose(2, a.size - x).
Sort the array (n log n)
for (i = 1 to n)
Start at the root
if a[i] + curr_node >= k, go left and match = indexof(curr_nod)e
else, go right
If curr_node = leaf node, add all nodes after a[match] to the list of valid pairs with a[i]
Step 2 also takes O(n log n). The for loop runs n times. Within the loop, we perform a binary search for each node i.e. log n steps. Hence the overall complexity of the algorithm is O (n log n).
This should do the work:
void count(int A[], int n) //n being the number of terms in array
{ int i, j, k, count = 0;
cin>>k;
for(i = 0; i<n; i++)
for(j = 0; j<n; j++)
if(A[i] + A[j] >= k)
count++ ;
cout<<"There are "<<count<<" such numbers" ;
}

Filling an array in such a way that each element equal to minimum sum of two numbers such that

Given an array (contains only positive integers) that already has the first k elements: a1, a2, .... ak.
I need to fill the remaining (n - k) elements (the array has n elements in total).
The value of n is about 10 ^ 3 and 1 <= k <= n.
The value of each ai is the minimum sum of two numbers such that the sum of positions of those two numbers is equal to i.
Here is the pseudocode (my algorithm):
for i = k + 1 to n
a[i] = max_value
for j = 1 to (i / 2)
a[i] = min(a[i], a[j] + a[i - j])
Time complexity: O(n ^ 2)
The question: Is there any other way to do this faster?
I'm looking for any data structures or algorithms that can find the value of each ai in less than O(n).
P/S: This is a procedure in my program so I need to do this as fast as possible.
You could increase your program speed by using threads to run your check for the minimum value in parallel. For example you could run 4 threads each of which checks 1/4 of the range of j. This will improve the speed marginally, but your algorithm will still take O(n^2) running time.
I agree with the comment that you most likely can't get beyond O(n^2). So your best bet will probably be to try things like this to optimize your code to reduce the coefficient in front of that n^2.
Idea 1
AFAICT this will not give a guaranteed improvement on O(n^2), but it should bring the number of inner loop cycles down a lot in practice. The basic idea is that we can test pairs in the inner loop in a different order that enables us to finish early a lot of the time. Specifically we first make a sorted list of positions of the numbers and store this in s[], so that a[s[i]] is the ith smallest number in a[]. Then in the main inner loop, we form pair-sums in increasing order of the first term by using a[s[j]] instead of a[j] (and a[i - s[j]] instead of a[i - j]). This gives us 2 ways to stop the inner loop early:
If a[s[j]] >= a[i], then we can stop because every later sum must be larger, since the first term of each of them (a[s[j+1]] etc.) must be at least as large as the best solution so far (already in a[i]), and the other term can never be negative.
If a[i - s[j]] <= a[s[j]] (that is, if the "partner" of the next smallest number is less than or equal to it), we can stop, for a more complicated reason. Suppose to the contrary that there was some better pair-sum a[s[m]] + a[i - s[m]] later on (i.e. m > j). We know the first term, a[s[m]], must be at least as large as our current first term a[s[j]] because we're accessing first terms in increasing order, and m > j; therefore, for the pair-sum a[s[m]] + a[i - s[m]] to be better (i.e. less) than a[s[j]] + a[i - s[j]], it must be that its second term, a[i - s[m]], is less than our current second term, a[i - s[j]]. (This is not a sufficient condition, but that doesn't matter here.) But since we have just observed that a[i - s[j]] <= a[s[j]], we know that a[i - s[m]] < a[s[j]] too, which means that a[i - s[m]] must have appeared as a first term in a pair-sum that we already processed earlier on! That contradicts m > j, meaning that no such better pair-sum can exist, so we can safely stop.
I expect the second condition to remove a lot of inner loop cycles; the first will probably only help much on datasets where there are a few small numbers and a lot of very high numbers, and it is possible to cover most positions with pair-sums of the small numbers.
Bonus efficiency: If we implement the second condition above then we don't actually need a separate j < i / 2 loop termination test, since after examining any i / 2 + 1 pair-sums, we must have encountered at least one pair-sum twice (once with the first and second terms swapped), and this will cause condition 2 to fire and exit the loop.
Pseudocode:
s[1 .. k] = 1 .. k
sort s using comparator function comp(i, j) { a[i] < a[j] }
for i = k + 1 to n
a[i] = max_value
for (j = 1; a[s[j]] < a[i] && a[i - s[j]] > a[s[j]]; ++j)
a[i] = min(a[i], a[s[j]] + a[i - s[j]])

Given an array of N numbers,find the number of sequences of all lengths having the range of R?

This is a follow up question to Given a sequence of N numbers ,extract number of sequences of length K having range less than R?
I basically need a vector v as an answer of size N such that V[i] denotes number of sequences of length i which have range <=R.
Traditionally, in recursive solutions, you would compute the solution for K = 0, K = 1, and then find some kind of recurrence relation between subsequent elements to avoid recomputing the solution from scratch each time.
However here I believe that maybe attacking the problem from the other side would be interesting, because of the property of the spread:
Given a sequence of spread R (or less), any subsequence has a spread inferior to R as well
Therefore, I would first establish a list of the longest subsequences of spread R beginning at each index. Let's call this list M, and have M[i] = j where j is the higher index in S (the original sequence) for which S[j] - S[i] <= R. This is going to be O(N).
Now, for any i, the number of sequences of length K starting at i is either 0 or 1, and this depends whether K is greater than M[i] - i or not. A simple linear pass over M (from 0 to N-K) gives us the answer. This is once again O(N).
So, if we call V the resulting vector, with V[k] denoting the number of subsequences of length K in S with spread inferior to R, then we can do it in a single iteration over M:
for i in [0, len(M)]:
for k in [0, M[i] - i]:
++V[k]
The algorithm is simple, however the number of updates can be rather daunting. In the worst case, supposing than M[i] - i equals N - i, it is O(N*N) complexity. You would need a better data structure (probably an adaptation of a Fenwick Tree) to use this algorithm an lower the cost of computing those numbers.
If you are looking for contiguous sequences, try doing it recursively : The K-length subsequences set having a range inferior than R are included in the (K-1)-length subsequences set.
At K=0, you have N solutions.
Each time you increase K, you append (resp. prepend) the next (resp.previous) element, check if it the range is inferior to R, and either store it in a set (look for duplicates !) or discard it depending on the result.
If think the complexity of this algorithm is O(n*n) in the worst-case scenario, though it may be better on average.
I think Matthieu has the right answer when looking for all sequences with spread R.
As you are only looking for sequences of length K, you can do a little better.
Instead of looking at the maximum sequence starting at i, just look at the sequence of length K starting at i, and see if it has range R or not. Do this for every i, and you have all sequences of length K with spread R.
You don't need to go through the whole list, as the latest start point for a sequence of length K is n-K+1. So complexity is something like (n-K+1)*K = n*K - K*K + K. For K=1 this is n,
and for K=n it is n. For K=n/2 it is n*n/2 - n*n/4 + n/2 = n*n/2 + n/2, which I think is the maximum. So while this is still O(n*n), for most values of K you get a little better.
Start with a simpler problem: count the maximal length of sequences, starting at each index and having the range, equal to R.
To do this, let first pointer point to the first element of the array. Increase second pointer (also starting from the first element of the array) while sequence between pointers has the range, less or equal to R. Push every array element, passed by second pointer, to min-max-queue, made of a pair of mix-max-stacks, described in this answer. When difference between max and min values, reported by min-max-queue exceeds R, stop increasing second pointer, increment V[ptr2-ptr1], increment first pointer (removing element, pointed by it, from min-max-queue), and continue increasing second pointer (keeping range under control).
When second pointer leaves bounds of the array, increment V[N-ptr1] for all remaining ptr1 (corresponding ranges may be less or equal to R). To add all other ranges, that are less than R, compute cumulative sum of array V[], starting from its end.
Both time and space complexities are O(N).
Pseudo-code:
p1 = p2 = 0;
do {
do {
min_max_queue.push(a[p2]);
++p2;
} while (p2 < N && min_max_queue.range() <= R);
if (p2 < N) {
++v[p2 - p1 - 1];
min_max_queue.pop();
++p1;
}
} while (p2 < N);
for (i = 1; i <= N-p1; ++i) {
++v[i];
}
sum = 0;
for (j = N; j > 0; --j) {
value = v[j];
v[j] += sum;
sum += value;
}