I'm studying binary search algorithm and I've seen many times the algorithm written as follows (this is C++ but the language is not that important here):
int start = 0;
int end = vec.size() - 1;
do {
int mid = (lo + hi) / 2;
if (target < vec[mid])
start = mid + 1;
else if (target > vec[mid])
end = mid - 1;
else
// found
} while (start <= end);
However I've also seen implementations like this:
int start = 0;
int end = vec.size() - 1;
do {
int mid = (int)ceil((lo + hi) / 2.0);
if (target < vec[mid])
start = mid + 1;
else if (target > vec[mid])
end = mid - 1;
else
// found
} while (start <= end);
Both seem to work. Is there any correctness or performance reason why I should get the ceil and do that second case floating point arithmetic instead of using the first version?
When int mid = (lo + hi) / 2:
You are deciding the mid element by taking the left element of the two potential middle elements when the array size between [left, right] is odd i.e. for array [4, 5] your mid will be 4. So without any ceil of floor, the division works pretty like floor.
When (int)ceil((lo + hi) / 2.0);:
You are deciding the mid element by taking the right element of the two potential middle elements when the array size between [left, right] is odd i.e. for [4, 5] your mid will be 5.
So both selection will work because you're discarding/taking a part based on some valid conditions (target < vec[mid] and target > vec[mid]), The partition point won't matter here that much.
Another thing is, during operation like int mid = (lo + hi) / 2 you may encounter overflow when adding lo and hi if the summation exceeds integer range. So safe is to write like mid = lo + (hi - lo) / 2 which will yield same output.
Hope it helps!
Edit
so both work only because I'm discarding the mid element from the new
search range when restarting the search, right?
Yes. If you wouldn't discard the mid element, it will fall into infinity loop i.e. [4, 5], 4 would be always selected as mid and for call like left = mid, it would create an infinite loop.
Related
I'm trying to find minimum in array which has this kind of structure in general:
Array consists of non-negative integers [0; 1e5-1]. It may contain any number of such steps, be sorted or just a constant. I want to find it in O(logn) thats why I'm using binary search. This code handle all cases except cases there is any plateau:
size_t left = 0, right = arr.size() - 1;
while (left < right) {
const size_t mid = left + (right - left) / 2;
if ((mid == 0 || arr[mid] < arr[mid - 1]) && (mid + 1 == size || arr[mid] < arr[mid + 1])) {
return mid;
}
if (arr[mid] > arr[mid + 1] || arr[mid] > arr[right]) {
left = mid + 1;
}
else {
right = mid;
}
}
return left;
Example of bad input: [4, 3, 3, 2, 1, 2].
Unfortenatly, I'm out of ideas how to fix this cases. Maybe it's even impossible. Thank you in advance.
I am afraid it is not possible to do in log n time in general.
Assume an array of n elements equal to 1 and a single element of 0.
Your problem now reduces into finding that 0 element.
By "visiting" (=indexing) any member 1 you gain no knowledge about position of 0 - making the search order irrelevant.
Therefore you have to visit every element to find where the 0 is.
If you really want, I think the following algorithm should be roughly O(log n + #elements-on-plateaus)
Set left, right as for binary search
Compute middle.
Go left from middle until:
If you find a decrease, set right=pos where pos is the decreased element and go 4.
If you find an increase, set left=pos where pos is the increased element and go 4.
If you reach left position, go right from middle instead and do the analogous actions.
[X] If you reach right too, you are on a plateau and range [left,right] are the minimal elements of the array.
Repeat until you hit [X].
EDIT:
Add more details about the logic behind the code.
Thx to #Stef.
I am trying to do an algorithm problem in LeetCode (https://leetcode.com/problems/find-the-duplicate-number/).
Below is my method to solve this, which uses binary search thinking.
The basic logic of the code is, I am trying to find the duplicated number in the range [1, n] inclusive using binary search.
For instance, if I am going to find the duplicated num in the list [1, 3, 4, 2, 2].
Firstly, count the midpoint of [1, 4], because the start point is 1, the endpoint is 4, hence the midpoint is 2. Then I use the cntRange function to count how many numbers in the list are among the range of [1, 2]. if the number of numbers(we have 1, 2, 2, three numbers) is more than it should be (should be 2), we shrink the range by setting the endpoint as midpoint and continue the binary search, until we finish the search and we return the present value of start point, which is the duplicated one.
class Solution {
public:
int findRepeatNumber(vector<int> &nums) {
// special case we return -1
if (nums.size() < 2) {
return -1;
}
// binary search to cnt the numbers in certain range
int start = 1;
int end = nums.size() - 1;
while (end >= start) {
int mid = ((end - start) >> 1) + start;
int cnt = cntRange(nums, start, mid);
if (end == start) {
if (cnt > 1) {
return start;
} else {
break;
}
}
if (cnt > (mid - start + 1))
end = mid;
else
start = mid + 1;
}
return -1;
}
int cntRange(vector<int> &nums, int start, int end) {
int cnt = 0;
for (int i = 0; i < nums.size(); ++i) {
if (nums[i] >= start && nums[i] <= end)
cnt++;
}
return cnt;
}
};
This method passes in LeetCode, however, I am curious about the range [1, n], what if the range is [0, n-1]?
I tried with two test sets:
one is [0, 1, 2, 0, 4, 5, 6]
the other is [2, 3, 1, 0, 2, 5, 3]
they all failed, so I go back to my code to try to fix this.
I initialize the start int to 0 instead and change the cnt compare condition
from cnt > (mid - start + 1) to cnt > (mid - start).
But in this case, only the first test is passed, I still can not pass the second one.
I still think this problem arose in the cnt compare process, but do not know how to solve this.
Can anybody help me on this?
Your problem for the 2 cases you've mentioned is with:
start = mid + 1;
The value of mid can never go negative and therefore the minimum value for start can never be less than 1 after the first time this line is reached. This means you never see the value at index 0 when doing binary search.
This question already has answers here:
Binary Search algorithm implementations
(2 answers)
Closed 2 years ago.
When performing a binary search on answer, I've seen different forms of the following:
loop condition : (low+1<hi), (low<hi), (low<=hi) updating indices: (hi=mid+1), (hi=mid), (low=mid), (low=mid-1)
What is the difference between these and do they actually matter?
Each of the loop conditions simply state when the loop will end. If you want to find exactly one element lo < hi is usually the easiest method. For two elements, or lo + 1 < hi could be used. lo <= hi is usually paired with an early return statement in the while loop.
Before updating the indices, a mid is chosen usually either (lo + hi) / 2 or (lo + hi + 1) / 2 (ignoring integer overflow). The difference between these is that the first has a bias towards lo if there are an even number of elements between lo and hi, whereas the second has a bias towards hi.
The updating indices have + 1 attached to them to ensure that there is no infinite loop. In general, you want to make sure lo and hi are modified by at least 1 for every iteration of the loop.
For reference, here is my preferred way of doing binary search:
int binary_search(std::vector<int> nums, int target) {
if (nums.empty())
return -1;
int l = 0;
int h = nums.size() - 1;
while (l < h) {
// If the language doesn't have big ints, make sure there is no overflow.
// This has a left bias if there are an even number of elements left.
int m = l + (h - l) / 2;
if (nums[m] < target) {
// The `+ 1` here is important. Without this, if there are two elements
// and nums[0] < target, we'll get an infinite loop.
l = m + 1;
} else {
// Since `m < h`, we "make progress" in this case.
h = m;
}
}
return nums[l] == target ? l : -1;
}
I like this method, because it is clear that there is no infinite loop, and the exit condition does not rely on early return statements.
I created this binary search but it seems to get stuck in a loop each time. All it's checking is a vector. I'm stuck as to what I need to change I've tried so many different things.
[1,2,4,6] if I search for 4 is never is found it keeps hitting the lower = mid + 1.
bool SortSearch::binarySearcher(int size, int val)
{
int lower = 0, upper = size - 1, mid;
while (lower < upper)
{
mid = (lower + (upper-lower))/2;
if (students[mid].getID() > val)
upper = mid - 1;
else if (students[mid].getID() < val)
lower = mid + 1;
else if (students[mid].getID() == val)
return true;
else
return false;
}
}
I believe:
mid = (lower + (upper-lower))/2;
should be:
mid = lower + (upper-lower)/2;
I'd probably add:
assert(lower <= mid && mid <= upper);
Also, the:
return false;
should be after the loop. Once you've checked <, and >, the only possible result left is == (with ints), so that final else clause will never hit. (If you were using a floating point type for the index, then you can get some weird situations with NaNs, infinities, and maybe negative zeroes.)
Turn up the warning level on your compiler. It should have warned you about the unreachable code and the path without a return.
This:
mid = (lower + (upper-lower))/2;
Should probably be:
mid = lower + (upper-lower) / 2;
Or simply the average:
mid = (upper + lower) / 2;
It might be illuminating to print the values of lower, upper, and mid on each iteration. Consider what happens when lower and upper only differ by 1 - then mid will be calculated to be equal to lower, which means on the next iteration that lower and upper will be identical to the previous iteration, leading to an infinite loop condition.
In C++ how can i find the middle 'n' elements of an array? For example if n=3, and the array is [0,1,5,7,7,8,10,14,20], the middle is [7,7,8].
p.s. in my context, n and the elements of array are odd numbers, so i can find the middle.
Thanks!
This is quick, not tested but the basic idea...
const int n = 5;
// Get middle index
int arrLength = sizeof(myArray) / sizeof(int);
int middleIndex = (arrLength - 1) / 2;
// Get sides
int side = (n - 1) / 2;
int count = 0;
int myNewArray[n];
for(int i = middleIndex - side; i <= middleIndex + side; i++){
myNewArray[count++] = myArray[i];
}
int values[] = {0,1,2,3,4,5,6,7,8};
const size_t total(sizeof(values) / sizeof(int));
const size_t needed(3);
vector<int> middle(needed);
std::copy(values + ((total - needed) / 2),
values + ((total + needed) / 2), middle.begin());
Have not checked this with all possible boundary conditions. With the sample data I get middle = (3,4,5), as desired.
Well, if you have to pick n numbers, you know there will be size - n unpicked items. As you want to pick numbers in the middle, you want to have as many 'unpicked' number on each side of the array, that is (size - n) / 2.
I won't do your homework, but I hope this will help.
Well, the naive algorithm follows:
Find the middle, which exists because you specified that the length is odd.
Repeatedly pick off one element to the left and one element to the right. You can always do this because you specified that n is odd.
You can also make the following observation:
Note that after you've picked the middle, there are n - 1 elements remaining to pick off. This is an even number and (n - 1)/2 must come from the left of the middle element and (n - 1)/2 must come from the right. The middle element has index (length - 1)/2. Therefore, the lower index of the first element selected is (length - 1)/2 - (n - 1)/2 and the upper index of the last element selected is (length - 1)/2 + (n - 1)/2. Consequently, the indices needed are (length - n)/2 - 1 to (length + n)/2 - 1.