Binary search stuck infinite loop? - c++

I created this binary search but it seems to get stuck in a loop each time. All it's checking is a vector. I'm stuck as to what I need to change I've tried so many different things.
[1,2,4,6] if I search for 4 is never is found it keeps hitting the lower = mid + 1.
bool SortSearch::binarySearcher(int size, int val)
{
int lower = 0, upper = size - 1, mid;
while (lower < upper)
{
mid = (lower + (upper-lower))/2;
if (students[mid].getID() > val)
upper = mid - 1;
else if (students[mid].getID() < val)
lower = mid + 1;
else if (students[mid].getID() == val)
return true;
else
return false;
}
}

I believe:
mid = (lower + (upper-lower))/2;
should be:
mid = lower + (upper-lower)/2;
I'd probably add:
assert(lower <= mid && mid <= upper);
Also, the:
return false;
should be after the loop. Once you've checked <, and >, the only possible result left is == (with ints), so that final else clause will never hit. (If you were using a floating point type for the index, then you can get some weird situations with NaNs, infinities, and maybe negative zeroes.)
Turn up the warning level on your compiler. It should have warned you about the unreachable code and the path without a return.

This:
mid = (lower + (upper-lower))/2;
Should probably be:
mid = lower + (upper-lower) / 2;
Or simply the average:
mid = (upper + lower) / 2;

It might be illuminating to print the values of lower, upper, and mid on each iteration. Consider what happens when lower and upper only differ by 1 - then mid will be calculated to be equal to lower, which means on the next iteration that lower and upper will be identical to the previous iteration, leading to an infinite loop condition.

Related

Minimum in bitonic array with plateaus

I'm trying to find minimum in array which has this kind of structure in general:
Array consists of non-negative integers [0; 1e5-1]. It may contain any number of such steps, be sorted or just a constant. I want to find it in O(logn) thats why I'm using binary search. This code handle all cases except cases there is any plateau:
size_t left = 0, right = arr.size() - 1;
while (left < right) {
const size_t mid = left + (right - left) / 2;
if ((mid == 0 || arr[mid] < arr[mid - 1]) && (mid + 1 == size || arr[mid] < arr[mid + 1])) {
return mid;
}
if (arr[mid] > arr[mid + 1] || arr[mid] > arr[right]) {
left = mid + 1;
}
else {
right = mid;
}
}
return left;
Example of bad input: [4, 3, 3, 2, 1, 2].
Unfortenatly, I'm out of ideas how to fix this cases. Maybe it's even impossible. Thank you in advance.
I am afraid it is not possible to do in log n time in general.
Assume an array of n elements equal to 1 and a single element of 0.
Your problem now reduces into finding that 0 element.
By "visiting" (=indexing) any member 1 you gain no knowledge about position of 0 - making the search order irrelevant.
Therefore you have to visit every element to find where the 0 is.
If you really want, I think the following algorithm should be roughly O(log n + #elements-on-plateaus)
Set left, right as for binary search
Compute middle.
Go left from middle until:
If you find a decrease, set right=pos where pos is the decreased element and go 4.
If you find an increase, set left=pos where pos is the increased element and go 4.
If you reach left position, go right from middle instead and do the analogous actions.
[X] If you reach right too, you are on a plateau and range [left,right] are the minimal elements of the array.
Repeat until you hit [X].

clarification about working of two pointer approach

I have some doubts about using two pointer approach.
Case 1: - Suppose we have an array A, that is sorted and a target value B. We want to find out if there exist two elements whose difference is equal to B or not.
int helper(vector<int> &A, int B)
{
int left = 0, n = A.size();
int right = left + 1;
while (right < n)
{
int currDiff = A[right] - A[left];
if (currDiff < B)
right++;
else if (currDiff > B)
{
left++;
if (left == right)
right++;
}
else
return 1;
}
return 0;
}
Case 2: - Suppose we have an array A, that is sorted and a target value B. We want to find out if there exist two elements whose sum is equal to B or not.
int helper(vector<int> &A, int B)
{
int left = 0, n = A.size();
int right = n - 1;
while (left < right)
{
int currSum = A[right] + A[left];
if (currSum < B)
left++;
else if (currSum > B)
{
right--;
}
else
return 1;
}
return 0;
}
The doubt is that in case 1 we set both pointers on the left side(left = 0, right = left + 1) and start scanning while in case 2 we set one pointer on the left side and the other one on the right side(left = 0, right = A.size() - 1).
I am a bit confused about how this is working.
There's no rule that you must have to set the two pointers in different way. It's all about the algorithm you're following. It may be good, it may be bad. Let's say, for difference, we set the left=0 and right=A.size()-1. As the given array A is sorted, the first difference between A[right] and A[left] will be maximum.
int currDiff = A[right] - A[left]; //max possible value for currDiff for A
So, now if currDiff is greater than the given number, what will you do? increase the left or decrease the right? Let say you do the later one, I mean decrease the right, and the corresponding condition satisfies again, do the same, decrease the right. Now, let say now you got the currDiff is smaller than the given number, what will you do? increase the left? probably. But in the next iteration, if you get the same condition satisfied, that is, currDiff is still smaller than the given number, what will you do now? Again increase the left? What if increasing the right in this particular position would give you the result?
So, you see, there arises a lot of cases needed to be handled if you started the finding diff of pair having left and right in the opposite ends.
Finally, what I want to say is, it's all about the algorithm you are following, nothing else.

Trouble with finding floor(log2(int)) using binary search in O(log2(amount_bits))

In our algorithms class, we've got an extra question in the lab session by the professor. Find the floor(log2(x)) for an int of n bits in log2(n) steps (e.g. when T = uint64_t, then n = 64).
We've found that we should be able to solve this with binary search, but we get an off by 1 result or an endless loop in certain edge cases. We're scratching our heads for some time, but cannot seem to get this right. How do we best deal with this? We've tried to reason with the invariant trick as discussed here, but it seems to be a little more complex than. E.g. for a decimal number, when choosing between bit 7 or 6 is difficult as 128 is larger than 100, but 64 is smaller. Unfortunately, when mitigating this, we break some edge cases.
EDIT: As noted below, this is purely an academic question with low to none usability in real-life scenario's.
Here is our code so far:
//
// h l
// 76543210
// 0b01000001 = 65
//
using T = unsigned char;
int lgfloor(T value)
{
assert(value > 0);
int high = ((sizeof(value) * 8) - 1);
int low = 0;
int mid = 0;
T guess = 0;
while (high > low)
{
mid = (low + ((high - low) / 2));
guess = static_cast<T>(1) << mid;
printf("high: %d, mid: %d, low: %d\n", high, mid, low);
if (value < guess)
{
high = mid - 1;
}
else
{
low = mid;
}
}
return low;
}
We have created the following unit tests (using GoogleTest):
TEST(LgFloor, lgfloor)
{
ASSERT_DEATH(lgfloor(-1), "Assertion `value > 0' failed.");
ASSERT_DEATH(lgfloor(0), "Assertion `value > 0' failed.");
ASSERT_EQ(lgfloor(1), 0);
ASSERT_EQ(lgfloor(2), 1);
ASSERT_EQ(lgfloor(64), 6);
ASSERT_EQ(lgfloor(100), 6);
}
Thanks in advance,
with kind regards,
Marten
You need a proper exit condition. Let's say y = floor(lg2(x)). You should exit the loop when 2^low <= x and x < 2^(low+1). But if high == low+1 then this is fulfilled, yet you do not currently exit. Just do:
while (high > low+1)
{
It is good to look at invariants in your loop. For example, we could try to maintain x < 2^high (that would require starting at sizeof(T)*8, not sizeof(T)*8 - 1). Then all you need to do is bisecting until low == high-1 and you are done.
We can maintain this invariant by only changing high to mid if x < 2^mid, i.e. if value < guess. That's the first case:
if (value < guess)
high = mid;
We further must maintain 2^low <= x = value. So, in the else branch (which requires 2^mid == guess < value, we can safely set low = mid.
else
low = mid;
All that is left is to prove that the loop always progresses. Since high > low+1, we have high - low >= 2 and thus mid != low and mid != high. Clearly, we are reducing the interval (by half) each iteration.
So there you go:
int lgfloor(T value)
{
assert(value > 0);
int high = (sizeof(value) * 8);
int low = 0;
while (high > low+1)
{
int mid = (low + ((high - low) / 2));
T guess = static_cast<T>(1) << mid;
printf("high: %d, mid: %d, low: %d\n", high, mid, low);
if (value < guess)
high = mid;
else
low = mid;
}
return low;
}
I should of course note that there are dedicated intrinsics for this exact purpose in modern hardware. For example, search Intel's intrinsics guide for _BitScanReverse which will complete in a fraction of the cycles the above code would take.
One way or another, asymptotic runtimes that depend on bit-width are pretty meaningless when dealing with fixed-width types such as C++' integral ones (although the question has educational value still).
Endless loop is due to this line:
mid = (low + ((high - low) / 2));
if high and low is differ by 1, the result could be mid == low and then at the condition that cause low = mid inside the while loop, you resulted in checking the same condition forever. My suggestion would be, if you have low = mid in the loop, you have to make sure your mid != low in that case. So just check this before the assignment and do low = mid+1 instead if that happens.
The solution must be found in lg(n) steps, which means that an initialisation such as low= 0, high= 32 won't work, because it would take 5 steps in every case and wouldn't work for x larger than 2^32. A correct solution must combine a first geometric search where you double the exponent, then a standard dichotomic search.
# Geometric search
low= 0
high= 1
while (1 << high) <= x:
low= high
high+= high
# Dichotomic search
while high - low > 1:
mid= (high + low) >> 1
if x < mid:
high= mid
else:
low= mid
Seems like you just have to shift if to the right 'log' times, until you have a '1'.
using T = unsigned char;
int lgfloor(T value)
{
assert(value > 0);
int log = 0;
while(value != 1) {
value >> 1;
log++;
}
return log;
}

Binary search - why ceil?

I'm studying binary search algorithm and I've seen many times the algorithm written as follows (this is C++ but the language is not that important here):
int start = 0;
int end = vec.size() - 1;
do {
int mid = (lo + hi) / 2;
if (target < vec[mid])
start = mid + 1;
else if (target > vec[mid])
end = mid - 1;
else
// found
} while (start <= end);
However I've also seen implementations like this:
int start = 0;
int end = vec.size() - 1;
do {
int mid = (int)ceil((lo + hi) / 2.0);
if (target < vec[mid])
start = mid + 1;
else if (target > vec[mid])
end = mid - 1;
else
// found
} while (start <= end);
Both seem to work. Is there any correctness or performance reason why I should get the ceil and do that second case floating point arithmetic instead of using the first version?
When int mid = (lo + hi) / 2:
You are deciding the mid element by taking the left element of the two potential middle elements when the array size between [left, right] is odd i.e. for array [4, 5] your mid will be 4. So without any ceil of floor, the division works pretty like floor.
When (int)ceil((lo + hi) / 2.0);:
You are deciding the mid element by taking the right element of the two potential middle elements when the array size between [left, right] is odd i.e. for [4, 5] your mid will be 5.
So both selection will work because you're discarding/taking a part based on some valid conditions (target < vec[mid] and target > vec[mid]), The partition point won't matter here that much.
Another thing is, during operation like int mid = (lo + hi) / 2 you may encounter overflow when adding lo and hi if the summation exceeds integer range. So safe is to write like mid = lo + (hi - lo) / 2 which will yield same output.
Hope it helps!
Edit
so both work only because I'm discarding the mid element from the new
search range when restarting the search, right?
Yes. If you wouldn't discard the mid element, it will fall into infinity loop i.e. [4, 5], 4 would be always selected as mid and for call like left = mid, it would create an infinite loop.

How to find if 3 numbers in a set of size N exactly sum up to M

I want to know how I can implement a better solution than O(N^3). Its similar to the knapsack and subset problems. In my question N<=8000, so i started computing sums of pairs of numbers and stored them in an array. Then I would binary search in the sorted set for each (M-sum[i]) value but the problem arises how will I keep track of the indices which summed up to sum[i]. I know I could declare extra space but my Sums array already has a size of 64 million, and hence I couldn't complete my O(N^2) solution. Please advice if I can do some optimization or if I need some totally different technique.
You could benefit from some generic tricks to improve the performance of your algorithm.
1) Don't store what you use only once
It is a common error to store more than you really need. Whenever your memory requirement seem to blow up the first question to ask yourself is Do I really need to store that stuff ? Here it turns out that you do not (as Steve explained in comments), compute the sum of two numbers (in a triangular fashion to avoid repeating yourself) and then check for the presence of the third one.
We drop the O(N**2) memory complexity! Now expected memory is O(N).
2) Know your data structures, and in particular: the hash table
Perfect hash tables are rarely (if ever) implemented, but it is (in theory) possible to craft hash tables with O(1) insertion, check and deletion characteristics, and in practice you do approach those complexities (tough it generally comes at the cost of a high constant factor that will make you prefer so-called suboptimal approaches).
Therefore, unless you need ordering (for some reason), membership is better tested through a hash table in general.
We drop the 'log N' term in the speed complexity.
With those two recommendations you easily get what you were asking for:
Build a simple hash table: the number is the key, the index the satellite data associated
Iterate in triangle fashion over your data set: for i in [0..N-1]; for j in [i+1..N-1]
At each iteration, check if K = M - set[i] - set[j] is in the hash table, if it is, extract k = table[K] and if k != i and k != j store the triple (i,j,k) in your result.
If a single result is sufficient, you can stop iterating as soon as you get the first result, otherwise you just store all the triples.
There is a simple O(n^2) solution to this that uses only O(1)* memory if you only want to find the 3 numbers (O(n) memory if you want the indices of the numbers and the set is not already sorted).
First, sort the set.
Then for each element in the set, see if there are two (other) numbers that sum to it. This is a common interview question and can be done in O(n) on a sorted set.
The idea is that you start a pointer at the beginning and one at the end, if your current sum is not the target, if it is greater than the target, decrement the end pointer, else increment the start pointer.
So for each of the n numbers we do an O(n) search and we get an O(n^2) algorithm.
*Note that this requires a sort that uses O(1) memory. Hell, since the sort need only be O(n^2) you could use bubble sort. Heapsort is O(n log n) and uses O(1) memory.
Create a "bitset" of all the numbers which makes it constant time to check if a number is there. That is a start.
The solution will then be at most O(N^2) to make all combinations of 2 numbers.
The only tricky bit here is when the solution contains a repeat, but it doesn't really matter, you can discard repeats unless it is the same number 3 times because you will hit the "repeat" case when you pair up the 2 identical numbers and see if the unique one is present.
The 3 times one is simply a matter of checking if M is divisible by 3 and whether M/3 appears 3 times as you create the bitset.
This solution does require creating extra storage, up to MAX/8 where MAX is the highest number in your set. You could use a hash table though if this number exceeds a certain point: still O(1) lookup.
This appears to work for me...
#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int main(void)
{
set<long long> keys;
// By default this set is sorted
set<short> N;
N.insert(4);
N.insert(8);
N.insert(19);
N.insert(5);
N.insert(12);
N.insert(35);
N.insert(6);
N.insert(1);
typedef set<short>::iterator iterator;
const short M = 18;
for(iterator i(N.begin()); i != N.end() && *i < M; ++i)
{
short d1 = M - *i; // subtract the value at this location
// if there is more to "consume"
if (d1 > 0)
{
// ignore below i as we will have already scanned it...
for(iterator j(i); j != N.end() && *j < M; ++j)
{
short d2 = d1 - *j; // again "consume" as much as we can
// now the remainder must eixst in our set N
if (N.find(d2) != N.end())
{
// means that the three numbers we've found, *i (from first loop), *j (from second loop) and d2 exist in our set of N
// now to generate the unique combination, we need to generate some form of key for our keys set
// here we take advantage of the fact that all the numbers fit into a short, we can construct such a key with a long long (8 bytes)
// the 8 byte key is made up of 2 bytes for i, 2 bytes for j and 2 bytes for d2
// and is formed in sorted order
long long key = *i; // first index is easy
// second index slightly trickier, if it's less than j, then this short must be "after" i
if (*i < *j)
key = (key << 16) | *j;
else
key |= (static_cast<int>(*j) << 16); // else it's before i
// now the key is either: i | j, or j | i (where i & j are two bytes each, and the key is currently 4 bytes)
// third index is a bugger, we have to scan the key in two byte chunks to insert our third short
if ((key & 0xFFFF) < d2)
key = (key << 16) | d2; // simple, it's the largest of the three
else if (((key >> 16) & 0xFFFF) < d2)
key = (((key << 16) | (key & 0xFFFF)) & 0xFFFF0000FFFFLL) | (d2 << 16); // its less than j but greater i
else
key |= (static_cast<long long>(d2) << 32); // it's less than i
// Now if this unique key already exists in the hash, this won't insert an entry for it
keys.insert(key);
}
// else don't care...
}
}
}
// tells us how many unique combinations there are
cout << "size: " << keys.size() << endl;
// prints out the 6 bytes for representing the three numbers
for(set<long long>::iterator it (keys.begin()), end(keys.end()); it != end; ++it)
cout << hex << *it << endl;
return 0;
}
Okay, here is attempt two: this generates the output:
start: 19
size: 4
10005000c
400060008
500050008
600060006
As you can see from there, the first "key" is the three shorts (in hex), 0x0001, 0x0005, 0x000C (which is 1, 5, 12 = 18), etc.
Okay, cleaned up the code some more, realised that the reverse iteration is pointless..
My Big O notation is not the best (never studied computer science), however I think the above is something like, O(N) for outer and O(NlogN) for inner, reason for log N is that std::set::find() is logarithmic - however if you replace this with a hashed set, the inner loop could be as good as O(N) - please someone correct me if this is crap...
I combined the suggestions by #Matthieu M. and #Chris Hopman, and (after much trial and error) I came up with this algorithm that should be O(n log n + log (n-k)! + k) in time and O(log(n-k)) in space (the stack). That should be O(n log n) overall. It's in Python, but it doesn't use any Python-specific features.
import bisect
def binsearch(r, q, i, j): # O(log (j-i))
return bisect.bisect_left(q, r, i, j)
def binfind(q, m, i, j):
while i + 1 < j:
r = m - (q[i] + q[j])
if r < q[i]:
j -= 1
elif r > q[j]:
i += 1
else:
k = binsearch(r, q, i + 1, j - 1) # O(log (j-i))
if not (i < k < j):
return None
elif q[k] == r:
return (i, k, j)
else:
return (
binfind(q, m, i + 1, j)
or
binfind(q, m, i, j - 1)
)
def find_sumof3(q, m):
return binfind(sorted(q), m, 0, len(q) - 1)
Not trying to boast about my programming skills or add redundant stuff here.
Just wanted to provide beginners with an implementation in C++.
Implementation based on the pseudocode provided by Charles Ma at Given an array of numbers, find out if 3 of them add up to 0.
I hope the comments help.
#include <iostream>
using namespace std;
void merge(int originalArray[], int low, int high, int sizeOfOriginalArray){
// Step 4: Merge sorted halves into an auxiliary array
int aux[sizeOfOriginalArray];
int auxArrayIndex, left, right, mid;
auxArrayIndex = low;
mid = (low + high)/2;
right = mid + 1;
left = low;
// choose the smaller of the two values "pointed to" by left, right
// copy that value into auxArray[auxArrayIndex]
// increment either left or right as appropriate
// increment auxArrayIndex
while ((left <= mid) && (right <= high)) {
if (originalArray[left] <= originalArray[right]) {
aux[auxArrayIndex] = originalArray[left];
left++;
auxArrayIndex++;
}else{
aux[auxArrayIndex] = originalArray[right];
right++;
auxArrayIndex++;
}
}
// here when one of the two sorted halves has "run out" of values, but
// there are still some in the other half; copy all the remaining values
// to auxArray
// Note: only 1 of the next 2 loops will actually execute
while (left <= mid) {
aux[auxArrayIndex] = originalArray[left];
left++;
auxArrayIndex++;
}
while (right <= high) {
aux[auxArrayIndex] = originalArray[right];
right++;
auxArrayIndex++;
}
// all values are in auxArray; copy them back into originalArray
int index = low;
while (index <= high) {
originalArray[index] = aux[index];
index++;
}
}
void mergeSortArray(int originalArray[], int low, int high){
int sizeOfOriginalArray = high + 1;
// base case
if (low >= high) {
return;
}
// Step 1: Find the middle of the array (conceptually, divide it in half)
int mid = (low + high)/2;
// Steps 2 and 3: Recursively sort the 2 halves of origianlArray and then merge those
mergeSortArray(originalArray, low, mid);
mergeSortArray(originalArray, mid + 1, high);
merge(originalArray, low, high, sizeOfOriginalArray);
}
//O(n^2) solution without hash tables
//Basically using a sorted array, for each number in an array, you use two pointers, one starting from the number and one starting from the end of the array, check if the sum of the three elements pointed to by the pointers (and the current number) is >, < or == to the targetSum, and advance the pointers accordingly or return true if the targetSum is found.
bool is3SumPossible(int originalArray[], int targetSum, int sizeOfOriginalArray){
int high = sizeOfOriginalArray - 1;
mergeSortArray(originalArray, 0, high);
int temp;
for (int k = 0; k < sizeOfOriginalArray; k++) {
for (int i = k, j = sizeOfOriginalArray-1; i <= j; ) {
temp = originalArray[k] + originalArray[i] + originalArray[j];
if (temp == targetSum) {
return true;
}else if (temp < targetSum){
i++;
}else if (temp > targetSum){
j--;
}
}
}
return false;
}
int main()
{
int arr[] = {2, -5, 10, 9, 8, 7, 3};
int size = sizeof(arr)/sizeof(int);
int targetSum = 5;
//3Sum possible?
bool ans = is3SumPossible(arr, targetSum, size); //size of the array passed as a function parameter because the array itself is passed as a pointer. Hence, it is cummbersome to calculate the size of the array inside is3SumPossible()
if (ans) {
cout<<"Possible";
}else{
cout<<"Not possible";
}
return 0;
}