Minimum in bitonic array with plateaus - c++

I'm trying to find minimum in array which has this kind of structure in general:
Array consists of non-negative integers [0; 1e5-1]. It may contain any number of such steps, be sorted or just a constant. I want to find it in O(logn) thats why I'm using binary search. This code handle all cases except cases there is any plateau:
size_t left = 0, right = arr.size() - 1;
while (left < right) {
const size_t mid = left + (right - left) / 2;
if ((mid == 0 || arr[mid] < arr[mid - 1]) && (mid + 1 == size || arr[mid] < arr[mid + 1])) {
return mid;
}
if (arr[mid] > arr[mid + 1] || arr[mid] > arr[right]) {
left = mid + 1;
}
else {
right = mid;
}
}
return left;
Example of bad input: [4, 3, 3, 2, 1, 2].
Unfortenatly, I'm out of ideas how to fix this cases. Maybe it's even impossible. Thank you in advance.

I am afraid it is not possible to do in log n time in general.
Assume an array of n elements equal to 1 and a single element of 0.
Your problem now reduces into finding that 0 element.
By "visiting" (=indexing) any member 1 you gain no knowledge about position of 0 - making the search order irrelevant.
Therefore you have to visit every element to find where the 0 is.
If you really want, I think the following algorithm should be roughly O(log n + #elements-on-plateaus)
Set left, right as for binary search
Compute middle.
Go left from middle until:
If you find a decrease, set right=pos where pos is the decreased element and go 4.
If you find an increase, set left=pos where pos is the increased element and go 4.
If you reach left position, go right from middle instead and do the analogous actions.
[X] If you reach right too, you are on a plateau and range [left,right] are the minimal elements of the array.
Repeat until you hit [X].

Related

Intuition behind initializing both the pointers at the beginning versus one at the beginning and other at the ending

I solved a problem few days ago:
Given an unsorted array A containing N integers and an integer B, find if there exists a pair of elements in the array whose difference is B. Return true if any such pair exists else return false. For [2, 3, 5, 10, 50, 80]; B=40;, it should return true.
as:
int Solution::solve(vector<int> &A, int B) {
if(A.size()==1) return false;
int i=0, j=0; //note: both initialized at the beginning
sort(begin(A), end(A));
while(i< A.size() && j<A.size()) {
if(A[j]-A[i]==B && i!=j) return true;
if(A[j]-A[i]<B) j++;
else i++;
}
return false;
}
While solving this problem the mistake I had committed earlier was initializing i=0 and j=A.size()-1. Due to this, decrementing j and incrementing i both decreased the differences and so valid differences were missed. On initializing both at the beginning as above, I was able to solve the problem.
Now I am solving a follow-up 3sum problem:
Given an integer array nums, return all the triplets [nums[i], nums[j], nums[k]] such that i != j, i != k, and j != k, and nums[i] + nums[j] + nums[k] == 0. Notice that the solution set must not contain duplicate triplets. If nums = [-1,0,1,2,-1,-4], output should be: [[-1,-1,2],[-1,0,1]] (any order works).
A solution to this problem is given as:
vector<vector<int>> threeSum(vector<int>& nums) {
sort(nums.begin(), nums.end());
vector<vector<int>> res;
for (unsigned int i=0; i<nums.size(); i++) {
if ((i>0) && (nums[i]==nums[i-1]))
continue;
int l = i+1, r = nums.size()-1; //note: unlike `l`, `r` points to the end
while (l<r) {
int s = nums[i]+nums[l]+nums[r];
if (s>0) r--;
else if (s<0) l++;
else {
res.push_back(vector<int> {nums[i], nums[l], nums[r]});
while (nums[l]==nums[l+1]) l++;
while (nums[r]==nums[r-1]) r--;
l++; r--;
}
}
}
return res;
}
The logic is pretty straightforward: each of nums[i]s (from the outer loop) is the 'target' that we search for, in the inner while loop using a two pointer approach like in the first code at the top.
What I don't follow is the logic behind initializing r=nums.size()-1 and working backwards - how are valid differences (in this case, the 'sum's actually) not being missed?
Edit1: Both problems contain negative and positive numbers, as well as zeroes.
Edit2: I understand how both snippets work. My question specifically is the reasoning behind r=nums.size()-1 in code# 2: as we see in code #1 above it, starting r from the end misses some valid pairs (http://cpp.sh/36y27 - the valid pair (10,50) is missed); so why do we not miss valid pair(s) in the second code?
Reformulating the problem
The difference between the two algorithms boils down to addition and subtraction, not 3 vs 2 sums.
Your 3-sum variant asks for the sum of 3 numbers matching a target. When you fix one number in the outer loop, the inner loop reduces to a 2-sum that's actually a 2-sum (i.e. addition). The "2-sum" variant in your top code is really a 2-difference (i.e. subtraction).
You're comparing 2-sum (A[i] + A[j] == B s.t. i != j) to a 2-difference (A[i] - A[j] == B s.t. i != j). I'll use those terms going forward, and forget about the outer loop in 3-sum as a red herring.
2-sum
Why L = 0, R = length - 1 works for 2-sum
For 2-sum, you probably already see the intuition of starting at the ends and working towards the middle, but it's worth making the logic explicit.
At any iteration in the loop, if the sum of A[L] + A[R] > B, then we have no choice but to decrement the right pointer to a lower index. Incrementing the left pointer is guaranteed to increase our sum or leave it the same and we'll get further and further away from the target, potentially closing off the potential to find the solution pair, which may well still include A[L].
On the other hand, if A[L] + A[R] < B, then you must increase your sum by moving the left pointer forward to a larger number. There's a chance A[R] is still part of that sum -- we can't guarantee it's not a part of the sum until A[L] + A[R] > B.
The key takeaway is that there is no decision to be made at each step: either the answer was found or one of the two numbers at either index can be definitively eliminated from further consideration.
Why L = 0, R = 0 doesn't work for 2-sum
This explains why starting both numbers at 0 won't help for 2-sum. What rule would you use to increment the pointers to find a solution? There's no way to know which pointer needs to move forward and which should wait. Both moves increase the sum at best and neither move decreases the sum (the start is the minimum sum, A[0] + A[0]). Moving the wrong one could prohibit finding the solution later on, and there's no way to definitively eliminate either number.
You're back to keeping left at 0 and moving the right pointer forward to the first element that causes A[R] + A[L] > B, then running the tried-and-true original two-pointer logic. You might as well just start R at length - 1.
2-difference
Why L = 0, R = length - 1 doesn't work for 2-difference
Now that we understand how 2-sum works, let's look at 2-difference. Why is it that the same approach starting from both ends and working towards the middle won't work?
The reason is that when you're subtacting two numbers, you lose the all-important guarantee from 2-sum that moving the left pointer forward will always increase the sum and that moving the right pointer backwards will always decrease it.
In subtraction between two numbers in a sorted array, A[R] - A[L] s.t. R > L, regardless of whether you move L forward or R backwards, the sum will decrease, even in an array of only positive numbers. This means that at a given index, there's no way to know which pointer needs to move to find the correct pair later on, breaking the algorithm for the same reason as 2-sum with both pointers starting at 0.
Why L = 0, R = 0 works for 2-difference
Finally, why does starting both pointers at 0 work on 2-difference? The reason is that you're back to the 2-sum guarantee that moving one pointer increases the difference while the other decreases the difference. Specifically, if A[R] - A[L] < B, then L++ is guaranteed to decrease the difference, while R++ is guaranteed to increase it.
We're back in business: there is no choice or magical oracle necessary to decide which index to move. We can systematically eliminate values that are either too large or too small and hone in on the target. The logic works for the same reasons L = 0, R = length - 1 works on 2-sum.
As an aside, the first solution is suboptimal O(n log(n)) instead of O(n) with two passes and O(n) space. You can use an unordered map to keep track of the items seen so far, then perform a lookup for every item in the array: if B - A[i] for some i is in the map, you found your pair.
Conside this:
A = {2, 3, 5, 10, 50, 80}
B = 40
i = 0, j = 5;
When you have something like
while(i<j) {
if(A[j]-A[i]==B && i!=j) return true;
if(A[j]-A[i]>B) j--;
else i++;
}
consider the case when if(A[j]-A[i]==B && i!=j) is not true. Your code makes an incorrect assumption that if the difference of the two endpoints is > B then one should decrement j. Given a sorted array, you don't know whether you decrementing j and then taking the difference would give you the target difference, or incrementing i and then taking the difference would give you the target number since it can go both ways. In your example, when A[5] - A[0] != 10 you could've gone both ways, A[4] - A[0] (which is what you do) or A[5] - A[1]. Both would still give you a difference greater than the target difference. In short, the presumption in your algorithm is incorrect and hence isn't the right way to go about.
In the second approach, that's not the case. When the triplet nums[i]+nums[l]+nums[r] isn't found, you know that the array is sorted and if the sum was more than 0, it has to mean that the num[r] needs to be decremented since incrementing l would only further increase the sum further since num[l + 1] > num[l].
Your question boils down to the following:
For a sorted array in ascending order A, why is it that we perform a different two-pointer search for t for the problem A[i] + A[j] == t versus A[i] - A[j] == t, where j > i?
It's more intuitive why for the first problem, we can fix i and j to be at opposite ends and decrease the j or increase i, so I'll focus on the second problem.
With array problems it's sometimes easiest to draw out the solution space, then come up with the algorithm from there. First, let's draw out the solution space B, where B[i][j] = -(A[i] - A[j]) (defined only for j > i):
B, for A of length N
i ---------------------------->
j B[0][0] B[0][1] ... B[0][N - 1]
| B[1][0] B[1][1] ... B[1][N - 1]
| . . .
| . . .
| . . .
v B[N - 1][0] B[N - 1][1] ... B[N - 1][N - 1]
---
In terms of A:
X -(A[0] - A[1]) -(A[0] - A[2]) ... -(A[0] - A[N - 2]) -(A[0] - A[N - 1])
X X -(A[1] - A[2]) ... -(A[1] - A[N - 2]) -(A[1] - A[N - 1])
. . . . .
. . . . .
. . . . .
X X X ... X -(A[N - 2] - A[N - 1])
X X X ... X X
Notice that B[i][j] = A[j] - A[i], so the rows of B are in ascending order and the columns of B are in descending order. Let's compute B for A = [2, 3, 5, 10, 50, 80].
B = [
i------------------------>
j X 1 3 8 48 78
| X X 2 7 47 77
| X X X 5 45 75
| X X X X 40 70
| X X X X X 30
v X X X X X X
]
Now the equivalent problem is searching for t = 40 in B. Note that if we start with i = 0 and j = N = 5 there's no good/guaranteed way to reach 40. However, if we start in a position where we can always increment/decrement our current element in B in small steps, we can guarantee that we'll get as close to t as possible.
In this case, the small steps we take involve traversing either right/downwards in the matrix, starting from the top left (could equivalently traverse left/upwards from the bottom right), which corresponds to incrementing both i and j in the original question in A.

Why finding median of 2 sorted arrays of different sizes takes O(log(min(n,m)))

Pleas consider this problem:
We have 2 sorted arrays of different sizes, A[n] and B[m];
I have and implemented a classical algorithm that takes at most O(log(min(n,m))).
Here's the approach:
Start partitioning the two arrays into two groups of halves (not two parts, but both partitioned should have same number of elements). The first half contains some first elements from the first and the second arrays, and the second half contains the rest (or the last) elements form the first and the second arrays. Because the arrays can be of different sizes, it does not mean to take every half from each array. Reach a condition such that, every element in the first half is less than or equal to every element in the second half.
Please see the code above:
double median(std::vector<int> V1, std::vector<int> V2)
{
if (V1.size() > V2.size())
{
V1.swap(V2);
};
int s1 = V1.size();
int s2 = V2.size();
int low = 0;
int high = s1;
while (low <= high)
{
int px = (low + high) / 2;
int py = (s1 + s2 + 1) / 2 - px;
int maxLeftX = (px == 0) ? MIN : V1[px - 1];
int minRightX = (px == s1) ? MAX : V1[px];
int maxLeftY = (py == 0) ? MIN : V2[py - 1];
int minRightY = (py == s2) ? MAX : V2[py];
if (maxLeftX <= minRightY && maxLeftY <= minRightX)
{
if ((s1 + s2) % 2 == 0)
{
return (double(std::max(maxLeftX, maxLeftY)) + double(std::min(minRightX, minRightY)))/2;
}
else
{
return std::max(maxLeftX, maxLeftY);
}
}
else if(maxLeftX > minRightY)
{
high = px - 1;
}
else
{
low = px + 1;
}
}
throw;
}
Although the approach is pretty straightforward and it works, I still cannot convince myself of its correctness. Furthermore I cant understand why its takes O(log(min(n,m)) steps.
If anyone can briefly explain the correcthnes and why it takes O(log(min(n,m))) steps that would be awesome. Even if you can provide a link with meaningfull explanation.
Time complexity is quite straightforward, you binary search through the array with less elements to find such a partition, that enables you to find the median. You make exactly O(log(#elements)) steps, and since your #elements is exactly min(n, m) the complexity is O(log(min(n+m)).
There are exactly (n + m)/2 elements smaller than the median and the same amount of elements greater. Let's think about them as two halves (let the median belong to one of your choice).
You can surely divide the smaller array into two subarrays, that one of them lies entirely in the first half and the second one in the other half. However, you have no idea how many elements are in any of them.
Let's choose some x - your guess of number of elements from the smaller array in the first half. It must be in range from 0 to n. Then you know, since there are exactly (n + m)/2 elements smaller than the median, that you have to choose (n+m)/2 - x elements from the bigger array. Then you have to check if that partition actually works.
To check if partition is good you have to check if all the elements in the smaller half are smaller than all the elements in the greater half. You have to check if maxLeftX <= minRightY and if maxLeftY <= minRightX (then every element in the left half is smaller then every element in the right half)
If so, you've found the correct partition. You can now easily find your median (it's either max(maxLeftX, maxLeftY)), min(minRightX, minRightY) or their sum divided by 2).
If not, you either took too much elements from the smaller array (the case when maxLeftX > minRightY), so next time you have to guess smaller value for x, or too little of them, then you have to guess greater value for x.
To get the best complexity always guess in the middle of a range of possible values that x may take.

Binary search - why ceil?

I'm studying binary search algorithm and I've seen many times the algorithm written as follows (this is C++ but the language is not that important here):
int start = 0;
int end = vec.size() - 1;
do {
int mid = (lo + hi) / 2;
if (target < vec[mid])
start = mid + 1;
else if (target > vec[mid])
end = mid - 1;
else
// found
} while (start <= end);
However I've also seen implementations like this:
int start = 0;
int end = vec.size() - 1;
do {
int mid = (int)ceil((lo + hi) / 2.0);
if (target < vec[mid])
start = mid + 1;
else if (target > vec[mid])
end = mid - 1;
else
// found
} while (start <= end);
Both seem to work. Is there any correctness or performance reason why I should get the ceil and do that second case floating point arithmetic instead of using the first version?
When int mid = (lo + hi) / 2:
You are deciding the mid element by taking the left element of the two potential middle elements when the array size between [left, right] is odd i.e. for array [4, 5] your mid will be 4. So without any ceil of floor, the division works pretty like floor.
When (int)ceil((lo + hi) / 2.0);:
You are deciding the mid element by taking the right element of the two potential middle elements when the array size between [left, right] is odd i.e. for [4, 5] your mid will be 5.
So both selection will work because you're discarding/taking a part based on some valid conditions (target < vec[mid] and target > vec[mid]), The partition point won't matter here that much.
Another thing is, during operation like int mid = (lo + hi) / 2 you may encounter overflow when adding lo and hi if the summation exceeds integer range. So safe is to write like mid = lo + (hi - lo) / 2 which will yield same output.
Hope it helps!
Edit
so both work only because I'm discarding the mid element from the new
search range when restarting the search, right?
Yes. If you wouldn't discard the mid element, it will fall into infinity loop i.e. [4, 5], 4 would be always selected as mid and for call like left = mid, it would create an infinite loop.

Given a sorted array and a parameter k, find the count of sum of two numbers greater than or equal to k in linear time

I am trying to find all pairs in an array with sum equal to k. My current solution takes O(n*log(n)) time (code snippet below).Can anybody help me in finding a better solution, O(n) or O(lgn) may be (if it exists)
map<int,int> mymap;
map<int,int>::iterator it;
cin>>n>>k;
for( int i = 0 ; i < n ; i++ ){
cin>>a;
if( mymap.find(a) != mymap.end() )
mymap[a]++;
else
mymap[a] = 1;
}
for( it = mymap.begin() ; it != mymap.end() ; it++ ){
int val = it->first;
if( mymap.find(k-val) != mymap.end() ){
cnt += min( it->second, mymap.find(k-val)->second );
it->second = 0;
}
}
cout<<cnt;
Another aproach which will take O(log n) in the best case and O(nlog n) in the worst one for positive numbers can be done in this way:
Find element in array that is equal to k/2 or if it doesn’t exist than finds the minimum greater then k/2. All combinations with this element and all greater elements will be interested for us because p + s >= k when p>= k/2 and s>=k/2. Array is sorted, so binary search with some modifications can be used. This step will take O(log n) time.
All elements which are less then k/2 + elements greater or equal to "mirror elements" (according to median k/2) will also be interested for us because p + s >= k when p=k/2-t and s>= k/2+t. Here we need to loop through elements less then k/2 and find their mirror elements (binary search). The loop should be stopped if mirror element is greater then the last array.
For instance we have array {1,3,5,8,11} and k = 10, so on the first step we will have k/2 = 5 and pairs {5,7}, {8,11}, {8, 11}. The count of these pairs will be calculated by formula l * (l - 1)/2 where l = count of elements >= k/2. In our case l = 3, so count = 3*2/2=3.
On the second step for 3 number a mirror element will be 7 (5-2=3 and 5+2=7), so pairs {3, 8} and {3, 11} will be interested. For 1 number mirror will be 9 (5-4=1 and 5+4=9), so {1, 11} is what we look for.
So, if k/2 < first array element this algorithm will be O(log n).
For negative the algorithm will be a little bit more complex but can be solved also with the same complexity.
There exists a rather simple O(n) approach using the so-called "two pointers" or "two iterators" approach. The key idea is to have two iterators (not necessarily C++ iterators, indices would do too) running on the same array so that if first iterator points to value x, then the second iterator points to the maximal element in the array that is less then k-x.
We will be increasing the first iterator, and while doing this we'll also change the second iterator to maintain this property. Note that as the first pointer increases, the corresponding position of the second pointer will only decrease, so on every iteration we can start from the position where we stopped at the previous iteration; we will never need to increase the second pointer. This is how we achieve O(n) time.
Code is like this (did not test this, but the idea should be clear)
vector<int> a; // the given array
int r = a.size() - 1;
for (int l=0; l<a.size(); l++) {
while ((r >= 0) && (a[r] >= k - a[l]))
r--;
// now r is the maximal position in a so that a[r] < k - a[l]
// so all elements right to r form a needed pair with a[l]
ans += a.size() - r - 1; // this is how many pairs we have starting at l
}
Another approach which might be simpler to code, but a bit slower, is O(n log n) using binary search. For each element a[l] of the array, you can find the maximal position r so that a[r]<k-a[l] using binary search (this is the same r as in the first algorithm).
#Drew Dormann - thanks for the remark.
Run through the array with two pointers. left and right.
Assuming left is the small side, start with left at location 0 and then right moves towards left until a[left]+a[right] >= k for the last time.
When this is achieved, then total_count += (a.size - right + 1).
You then move left one step forwards and right needs to (maybe) move towards it. Repeat this until they meet.
When this is done, and let us say they met at location x, then totla_count += choose(2, a.size - x).
Sort the array (n log n)
for (i = 1 to n)
Start at the root
if a[i] + curr_node >= k, go left and match = indexof(curr_nod)e
else, go right
If curr_node = leaf node, add all nodes after a[match] to the list of valid pairs with a[i]
Step 2 also takes O(n log n). The for loop runs n times. Within the loop, we perform a binary search for each node i.e. log n steps. Hence the overall complexity of the algorithm is O (n log n).
This should do the work:
void count(int A[], int n) //n being the number of terms in array
{ int i, j, k, count = 0;
cin>>k;
for(i = 0; i<n; i++)
for(j = 0; j<n; j++)
if(A[i] + A[j] >= k)
count++ ;
cout<<"There are "<<count<<" such numbers" ;
}

find middle elements from an array

In C++ how can i find the middle 'n' elements of an array? For example if n=3, and the array is [0,1,5,7,7,8,10,14,20], the middle is [7,7,8].
p.s. in my context, n and the elements of array are odd numbers, so i can find the middle.
Thanks!
This is quick, not tested but the basic idea...
const int n = 5;
// Get middle index
int arrLength = sizeof(myArray) / sizeof(int);
int middleIndex = (arrLength - 1) / 2;
// Get sides
int side = (n - 1) / 2;
int count = 0;
int myNewArray[n];
for(int i = middleIndex - side; i <= middleIndex + side; i++){
myNewArray[count++] = myArray[i];
}
int values[] = {0,1,2,3,4,5,6,7,8};
const size_t total(sizeof(values) / sizeof(int));
const size_t needed(3);
vector<int> middle(needed);
std::copy(values + ((total - needed) / 2),
values + ((total + needed) / 2), middle.begin());
Have not checked this with all possible boundary conditions. With the sample data I get middle = (3,4,5), as desired.
Well, if you have to pick n numbers, you know there will be size - n unpicked items. As you want to pick numbers in the middle, you want to have as many 'unpicked' number on each side of the array, that is (size - n) / 2.
I won't do your homework, but I hope this will help.
Well, the naive algorithm follows:
Find the middle, which exists because you specified that the length is odd.
Repeatedly pick off one element to the left and one element to the right. You can always do this because you specified that n is odd.
You can also make the following observation:
Note that after you've picked the middle, there are n - 1 elements remaining to pick off. This is an even number and (n - 1)/2 must come from the left of the middle element and (n - 1)/2 must come from the right. The middle element has index (length - 1)/2. Therefore, the lower index of the first element selected is (length - 1)/2 - (n - 1)/2 and the upper index of the last element selected is (length - 1)/2 + (n - 1)/2. Consequently, the indices needed are (length - n)/2 - 1 to (length + n)/2 - 1.