Buggy simple function for binary search (C++) - c++

I wrote a simple function for binary search, but it's not working as expected. I have a vector with 4000000 32-bit ints. Usually, when I search for a number, if it's there, it's found and the index is returned, if it's not, -1 is returned (the index always corresponds to the value, but that's not the point).
While messing around with the program I found out that it can't find 93 (even though it's there), obviously, there must be more values it can't find.
I use CLion, which implements GDB as the debugger and G++ as the compiler.
template<typename T>
int BinarySearch(vector<T>& vec, T& request)
{
int low = 0;
int high = vec.size() - 1;
while (low < high)
{
int mid = (low / 2 ) + (high / 2); // Styled it this way to avoid overflows.
// This looks like where the bug happens, basically low and high both
// become 93 while mid becomes 92,
// it then exits the loop and returns -1 because low is not lower than
// high anymore.
if (vec[mid] == request)
{
return mid;
}
else if (vec[mid] < request)
{
low = mid + 1;
}
else if (vec[mid] > request)
{
high = mid - 1;
}
}
return - 1;
}
I'm pretty confused by this, what's wrong?

Condition should be while (low <= high).
If you keep it as while (low < high), then when low==high (means we reach the final element), while loop will break and will return -1. So,your program wont check that element.
Also you should use mid=low+(high-low)/2; to prevent overflow and access all values.Problem in your code is that suppose when low=high=1, it will give mid =0(due to data conversion), which is wrong.

Related

C++ time limit exceeded when it doesn't even execute the function

While I was solving a problem in LeetCode, I found something very strange.
I have this line which I assume gives me a time limit exceeded error:
s.erase(i-k, k);
when I comment(//) this line, it doesn't show me time exceed error, but the strange part was, it has never executed even when i didn't comment it.
below is the entire code.
and Here is the problem link.
class Solution {
public:
string removeDuplicates(string s, int k) {
char prev = s[0];
int cnt = 1;
cnt = 1;
for(int i = 1; i < s.size() + 1; i++){
if(s[i] == prev){
cnt++;
} else {
if(cnt == k){
// when input is "abcd" it never comes to this scope
// which is impossible to run erase function.
s.erase(i-k, k);
i = 0;
}
if(i >= s.size()) break;
cnt = 1;
prev = s[i];
}
}
return s;
}
};
When Input is "abcd", it never even go to the if scope where 'erase' function is in.
Although 'erase' function never run, it still affect on the time complexity, and I can't get the reason.
Does anyone can explain this? or is this just problem of LeetCode?
Many online contest servers report Time Exceeding when program encounters critical error (coding bug) and/or crashes.
For example error of reading out of bounds of array. Or dereferencing bad (junk) pointers.
Why Time Exceeded. Because with critical error program can hang up and/or crash. Meaning it also doesn't deliver result in time.
So I think you have to debug your program to find all coding errors, not spending your time optimizing algorithm.
Regarding this line s.erase(i-k, k); - it may crash/hang-up when i < k, then you have negative value, which is not allowed by .erase() method. When you get for example i - k equal to -1 then size_t type (type of first argument of erase) will overflow (wrap around) to value 18446744073709551615 which is defnitely out of bounds, and out of memory border, hence your program may crash and/or hang. Also erase crashes when there is too many chars deleted, i.e. for erase s.erase(a, b) you have to watch that a + b <= s.size(), it is not controlled by erase function.
See documentation of erase method, and don't put negative values as arguments to this method. Check that your algorithm never has negative value i.e. never i < k when calling s.erase(i-k, k);, also never i-k + k > s.size(). To make sure there is no program crash you may do following:
int start = std::min(std::max(0, i-k), int(s.size()));
int num = std::min(k, std::max(0, int(s.size()) - start));
s.erase(start, num);

Understanding a recursive Binary Search Algorithm that mysteriously works [duplicate]

This question already has answers here:
Why does flowing off the end of a non-void function without returning a value not produce a compiler error?
(11 answers)
Closed 4 years ago.
On an assignment I had to make a recursive binary search algorithm output the index instead of True/False without modifying the parameters. I had a really tough time but after resorting to semi-trial-and-error I stumbled upon this mess:
#include <iostream>
#include <math.h>
#include <climits>
using namespace std;
int BinarySearch(int arr[], int len, int target) {
int temp = 0;
int mid = len/2;
if (len <= 0) return INT_MIN; // not found
if (target == arr[mid]){
return mid; // found
}
if (target < arr[mid]){
temp = BinarySearch(arr, mid, target);
}
else {
temp = mid+1 + BinarySearch(arr+mid+1, len-mid-1, target);
}
}
I have literally no idea why it works, even after running it through a visualizer. It's very sensitive to the code being changed and I can't get it to output -1 when it fails to find the target so I made it at least always output a negative number instead.
I don't really need it fixed, I just want to know how it even works since seemingly none of the recursive call's outputs are even used. Thanks.
It is undefined behaviour (see e.g. Why does flowing off the end of a non-void function without returning a value not produce a compiler error?).
The compiler appears to return temp by chance, likely because it is the first local variable declared inside the function. Returning temp would fix it.
As far as I understand you want to return -1, if the target is not found and the index of the target otherwise. In
if (len <= 0) return INT_MIN; // not found
you are returning INT_MIN, if the target is not found. You need to change it to
if (len <= 0) return -1; // not found
Since your function returns an int value, it has to return something on each patch. You can fix it by adding the return at the end of the function:
if (target < arr[mid]){
temp = BinarySearch(arr, mid, target);
}
else {
temp = mid+1 + BinarySearch(arr+mid+1, len-mid-1, target);
}
return temp;
}
BinarySearch returns the index of target in the current arr. Since the current arr often doesn't begin with index 0, you're adding and subtracting mid+1. You're also doing it, if the target was not found and BinarySearch returns -1. You have to fix the else part:
else {
int index(BinarySearch(arr+mid+1, len-mid-1, target));
temp = index == -1 ? -1 : mid + 1 + index;
}

Binary Search avoid unreadable entry (hole in list)

I have implemented a binary search function but I have an issue with a list entry that may become unreadable. It's implemented in C++ but ill just use some pseudo code to make it easier. Please to not focus on the unreadable or string implementation, it's just pseudo code. What matter is that there are unreadable entries in the list that have to be navigated around.
int i = 0;
int imin = 0;
int imax = 99;
string search = "test";
while(imin <= imax)
{
i = imin + (imax - imin) / 2;
string text = vector.at(i);
if(text.isUnreadable())
{
continue;
}
if(compare(text, search) = 0)
{
break;
}
else if(compare(text, search) < 0)
{
imin = i + 1;
}
else if(compare(text, search) > 0)
{
imax = i - 1;
}
}
The searching itself is working pretty well, but the problem I have is how to avoid getting an endless loop if the text is unreadable. Anyone has a time tested approach for this? The loop should not just exit when unreadable but rather navigate around the hole.
I had similar task in one of projects - lookup on sequence where some of items are non-comparable.
I am not sure is this the best possible implementation, in my case it looks like this:
int low = first_comparable(0,env);
int high = last_comparable(env.total() - 1,env);
while (low < high)
{
int mid = low + ((high - low) / 2);
int tmid = last_comparable(mid,env);
if( tmid < low )
{
tmid = first_comparable(mid,env);
if( tmid == high )
return high;
if( tmid > high )
return -1;
}
mid = tmid;
...
}
If vector.at(mid) item is non-comparable it does lookup in its neighborhood to find closest comparable.
first/last_comparable() functions return index of first comparable element from given index. Difference is in direction.
inline int first_comparable( int n, E& env)
{
int n_elements = env.total();
for( ; n < n_elements; ++n )
if( env.is_comparable(n) )
return n;
return n;
}
Create a list of pointers to your data items. Do not add "unreadable" ones. Search the resulting list of pointers.
the problem I have is how to avoid getting an endless loop if the text is unreadable.
Seems like that continue should be break instead, so that you break out of the loop. You'd probably want to set a flag or something to indicate the error to whatever code follows the loop.
Another option is to throw an exception.
Really, you should do almost anything other than what you're doing. Currently, when you read one of these 'unreadable' states, you simply continue the loop. But imin and imax still have the same values, so you end up reading the same string from the same place in the vector, and find that it's unreadable again, and so on. You need to decide how you want to respond to one of these 'unreadable' states. I guessed above that you'd want to stop the search, in which case either setting a flag and breaking out of the loop or throwing an exception to accomplish the same thing would be reasonable choices.

Using rand() to get a number but that number can't be the number that was last generated

I want to use std::rand() to generate a number between 0 and amountOfNumbers, but the generated number can't be the same number that was last generated.
I wrote this function:
void reroll() {
int newRand;
while (true) {
newRand = std::rand() % (amountOfNumbers);
if (newRand == lastNumber) {
continue;
}
lastNumber = newRand;
break;
}
// do something with newRand
// ...
}
amountOfNumbers is just an int (> 1) that defines the upper bound (e.g. 5 so the possible number range is 0 to 4). lastNumber which is initially -1 stores the last generated number.
I was wondering if there's a better way to write this.
The function seems to be working so far, but I'm not sure if my code is flawless... it looks kind of bad. That while (true) makes me feel a bit uneasy.
The code works but I'd structure it like this
int reroll(int n, int last) {
while(true) {
int v = std::rand() % n;
if(v!=last)
return v;
}
}
void reroll() {
...
int v = reroll(n, last);
...
}
Also you can avoid the need for the while loop altogether by generating values in a smaller range (1 less) and adjusting around last.
int reroll(int n, int last) {
if(last==-1)
return std::rand() % n;
int v = std::rand() % (n-1);
if (v<last) return v
return v+1;
}
I have a few suggestions.
Since you never declare lastNumber and amountOfNumbers, I am going to assume they are global. It would be better to pass these as variables to the function instead. Also, you should return the new number from the function instead of setting it as void.
The following code below will calculate a new roll. Instead of rerolling until there is a new roll, we will just take the random of the set of numbers, but one less. If the number is greater than or equal, we will add the one back in, thus avoiding the lastNumber. The function then returns the newRand. Doing it this way will avoid the (albeit low) risk of an infinite loop, and it will always run in constant time.
int reroll(int lastNumber, int amountOfNumbers)
{
int newRand;
newRand = std::rand() % (amountOfNumbers - 1);
if (newRand >= lastNumber) {
newRand++;
}
return newRand;
}
The while true loop is definitely not good practise, I'd suggest doing something like this. But you should make it in the same structure as Michael's answer above like this:
void reroll(int lastNumber, int amountOfNumbers) {
int newRand = std::rand() % (amountOfNumbers);
while (newRand == lastNumber) {
newRand = std::rand() % (amountOfNumbers);
}
lastNumber = newRand;
return newRand;
}
Depending on your value for amountOfNumbers, the modulus operation you've used might not guarantee an even distribution (which is a small part of your problem).
For a start, if amountOfNumbers is greater than RAND_MAX there will be numbers you'll never see using this.
Next, consider if you're using this to generate values between 0 and 6 ([0,1,2,3,4,5]) for a dice, and RAND_MAX is 7. You'll see the values 0 and 1 twice as often as the rest! In reality, RAND_MAX must be at least 32767 (whatever that means, which according to the standard isn't really much)... but that isn't evenly divisible by 6, either, so of course there will be some values that have a slight bias.
You can use modulus to reduce that range, but you'll probably want to discard the bias in the second problem. Unfortunately, due to the common implementation of rand, extending the range beyond max will introduce further bias.
unsigned int rand_range(unsigned int max) {
double v = max > RAND_MAX ? max : RAND_MAX;
v /= RAND_MAX;
int n;
do {
n = rand();
} while (RAND_MAX - n <= RAND_MAX % (unsigned int)(max / v));
return (unsigned int)(v * n) % max;
}
This still doesn't guarantee that there won't be any repeating values, but at least any bias causing repeating values will have been reduced significantly. We can use a similar approach to the (currently) accepted answer to remove any repeated values, with minimal bias, now:
unsigned int non_repetitive_rand_range(unsigned int max) {
if (max <= 1) {
return 0; // Can't avoid repetitive values here!
}
static unsigned int previous;
unsigned int current;
do {
current = rand_range(max);
} while (current != previous);
previous = current;
return current;
}
On a technical note, this doesn't necessarily guarantee a problem to the solution either. This quote from the standard explains why:
There are no guarantees as to the quality of the random sequence produced and some implementations are known to produce sequences with distressingly non-random low-order bits. Applications with particular requirements should use a generator that is known to be sufficient for their needs.
As a result, it is possible that some silly, obscure implementation might implement rand like so:
int rand(void) {
return 0;
}
For such an implementation, there's no avoiding it: This implementation of rand would cause the code in this answer (and all the other current answers) to go into an infinite loop. Your only hope would be to re-implement rand and srand. Here's an example implementation given within the standard, should you ever have to do that:
static unsigned long int next = 1;
int rand(void) // RAND_MAX assumed to be 32767
{
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}
void srand(unsigned int seed)
{
next = seed;
}
You'll probably want to rename them to my_rand and my_srand respectively, and use #define rand my_rand and #define srand my_srand or use a find/replace operation.

Custom sorting, always force 0 to back of ascending order?

Premise
This problem has a known solution (shown below actually), I'm just wondering if anyone has a more elegant algorithm or any other ideas/suggestions on how to make this more readable, efficient, or robust.
Background
I have a list of sports competitions that I need to sort in an array. Due to the nature of this array's population, 95% of the time the list will be pre sorted, so I use an improved bubble sort algorithm to sort it (since it approaches O(n) with nearly sorted lists).
The bubble sort has a helper function called CompareCompetitions that compares two competitions and returns >0 if comp1 is greater, <0 if comp2 is greater, 0 if the two are equal. The competitions are compared first by a priority field, then by game start time, and then by Home Team Name.
The priority field is the trick to this problem. It is an int that holds a positve value or 0. They are sorted with 1 being first, 2 being second, and so on with the exception that 0 or invalid values are always last.
e.g. the list of priorities
0, 0, 0, 2, 3, 1, 3, 0
would be sorted as
1, 2, 3, 3, 0, 0, 0, 0
The other little quirk, and this is important to the question, is that 95% of the time, priority will be it's default 0, because it is only changed if the user wants to manually change the sort order, which is rarely. So the most frequent case in the compare function is that priorities are equal and 0.
The Code
This is my existing compare algorithm.
int CompareCompetitions(const SWI_COMPETITION &comp1,const SWI_COMPETITION &comp2)
{
if(comp1.nPriority == comp2.nPriority)
{
//Priorities equal
//Compare start time
int ret = comp1.sStartTime24Hrs.CompareNoCase(comp2.sStartTime24Hrs);
if(ret != 0)
{
return ret; //return compare result
}else
{
//Equal so far
//Compare Home team Name
ret = comp1.sHLongName.CompareNoCase(comp2.sHLongName);
return ret;//Home team name is last field to sort by, return that value
}
}
else if(comp1.nPriority > comp2.nPriority)
{
if(comp2.nPriority <= 0)
return -1;
else
return 1;//comp1 has lower priority
}else /*(comp1.nPriority < comp2.nPriority)*/
{
if(comp1.nPriority <= 0)
return 1;
else
return -1;//comp1 one has higher priority
}
}
Question
How can this algorithm be improved?
And more importantly...
Is there a better way to force 0 to the back of the sort order?
I want to emphasize that this code seems to work just fine, but I am wondering if there is a more elegant or efficient algorithm that anyone can suggest. Remember that nPriority will almost always be 0, and the competitions will usually sort by start time or home team name, but priority must always override the other two.
Isn't it just this?
if (a==b) return other_data_compare(a, b);
if (a==0) return 1;
if (b==0) return -1;
return a - b;
You can also reduce some of the code verbosity using the trinary operator like this:
int CompareCompetitions(const SWI_COMPETITION &comp1,const SWI_COMPETITION &comp2)
{
if(comp1.nPriority == comp2.nPriority)
{
//Priorities equal
//Compare start time
int ret = comp1.sStartTime24Hrs.CompareNoCase(comp2.sStartTime24Hrs);
return ret != 0 ? ret : comp1.sHLongName.CompareNoCase(comp2.sHLongName);
}
else if(comp1.nPriority > comp2.nPriority)
return comp2.nPriority <= 0 ? -1 : 1;
else /*(comp1.nPriority < comp2.nPriority)*/
return comp1.nPriority <= 0 ? 1 : -1;
}
See?
This is much shorter and in my opinion easily read.
I know it's not what you asked for but it's also important.
Is it intended that if the case nPriority1 < 0 and nPriority2 < 0 but nPriority1 != nPriority2 the other data aren't compared?
If it isn't, I'd use something like
int nPriority1 = comp1.nPriority <= 0 ? INT_MAX : comp1.nPriority;
int nPriority2 = comp2.nPriority <= 0 ? INT_MAX : comp2.nPriority;
if (nPriority1 == nPriority2) {
// current code
} else {
return nPriority1 - nPriority2;
}
which will consider values less or equal to 0 the same as the maximum possible value.
(Note that optimizing for performance is probably not worthwhile if you consider that there are insensitive comparisons in the most common path.)
If you can, it seems like modifying the priority scheme would be the most elegant, so that you could just sort normally. For example, instead of storing a default priority as 0, store it as 999, and cap user defined priorities at 998. Then you won't have to deal with the special case anymore, and your compare function can have a more straightforward structure, with no nesting of if's:
(pseudocode)
if (priority1 < priority2) return -1;
if (priority1 > priority2) return 1;
if (startTime1 < startTime2) return -1;
if (startTime1 > startTime2) return 1;
if (teamName1 < teamName2) return -1;
if (teamName1 > teamName2) return -1;
return 0; // exact match!
I think the inelegance you feel about your solution comes from duplicate code for the zero priority exception. The Pragmatic Programmer explains that each piece of information in your source should be defined in "one true" place. To the naive programmer reading your function, you want the exception to stand-out, separate from the other logic, in one place, so that it is readily understandable. How about this?
if(comp1.nPriority == comp2.nPriority)
{
// unchanged
}
else
{
int result, lowerPriority;
if(comp1.nPriority > comp2.nPriority)
{
result = 1;
lowerPriority = comp2.nPriority;
}
else
{
result = -1;
lowerPriority = comp1.nPriority;
}
// zero is an exception: always goes last
if(lowerPriority == 0)
result = -result;
return result;
}
I Java-ized it, but the approach will work fine in C++:
int CompareCompetitions(Competition comp1, Competition comp2) {
int n = comparePriorities(comp1.nPriority, comp2.nPriority);
if (n != 0)
return n;
n = comp1.sStartTime24Hrs.compareToIgnoreCase(comp2.sStartTime24Hrs);
if (n != 0)
return n;
n = comp1.sHLongName.compareToIgnoreCase(comp2.sHLongName);
return n;
}
private int comparePriorities(Integer a, Integer b) {
if (a == b)
return 0;
if (a <= 0)
return -1;
if (b <= 0)
return 1;
return a - b;
}
Basically, just extract the special-handling-for-zero behavior into its own function, and iterate along the fields in sort-priority order, returning as soon as you have a nonzero.
As long as the highest priority is not larger than INT_MAX/2, you could do
#include <climits>
const int bound = INT_MAX/2;
int pri1 = (comp1.nPriority + bound) % (bound + 1);
int pri2 = (comp2.nPriority + bound) % (bound + 1);
This will turn priority 0 into bound and shift all other priorities down by 1. The advantage is that you avoid comparisons and make the remainder of the code look more natural.
In response to your comment, here is a complete solution that avoids the translation in the 95% case where priorities are equal. Note, however, that your concern over this is misplaced since this tiny overhead is negligible with respect to the overall complexity of this case, since the equal-priorities case involves at the very least a function call to the time comparison method and at worst an additional call to the name comparator, which is surely at least an order of magnitude slower than whatever you do to compare the priorities. If you are really concerned about efficiency, go ahead and experiment. I predict that the difference between the worst-performing and best-performing suggestions made in this thread won't be more than 2%.
#include <climits>
int CompareCompetitions(const SWI_COMPETITION &comp1,const SWI_COMPETITION &comp2)
{
if(comp1.nPriority == comp2.nPriority)
if(int ret = comp1.sStartTime24Hrs.CompareNoCase(comp2.sStartTime24Hrs))
return ret;
else
return comp1.sHLongName.CompareNoCase(comp2.sHLongName);
const int bound = INT_MAX/2;
int pri1 = (comp1.nPriority + bound) % (bound + 1);
int pri2 = (comp2.nPriority + bound) % (bound + 1);
return pri1 > pri2 ? 1 : -1;
}
Depending on your compiler/hardware, you might be able to squeeze out a few more cycles by replacing the last line with
return (pri1 > pri2) * 2 - 1;
or
return (pri1-pri2 > 0) * 2 - 1;
or (assuming 2's complement)
return ((pri1-pri2) >> (CHAR_BIT*sizeof(int) - 1)) | 1;
Final comment: Do you really want CompareCompetitions to return 1,-1,0 ? If all you need it for is bubble sort, you would be better off with a function returning a bool (true if comp1 is ">=" comp2 and false otherwise). This would simplify (albeit slightly) the code of CompareCompetitions as well as the code of the bubble sorter. On the other hand, it would make CompareCompetitions less general-purpose.