Where can I get a "useful" C++ binary search algorithm?

Where can I get a "useful" C++ binary search algorithm? - c++

I need a binary search algorithm that is compatible with the C++ STL containers, something like std::binary_search in the standard library's <algorithm> header, but I need it to return the iterator that points at the result, not a simple boolean telling me if the element exists.
(On a side note, what the hell was the standard committee thinking when they defined the API for binary_search?!)
My main concern here is that I need the speed of a binary search, so although I can find the data with other algorithms, as mentioned below, I want to take advantage of the fact that my data is sorted to get the benefits of a binary search, not a linear search.
so far lower_bound and upper_bound fail if the datum is missing:
//lousy pseudo code
vector(1,2,3,4,6,7,8,9,0) //notice no 5
iter = lower_bound_or_upper_bound(start,end,5)
iter != 5 && iter !=end //not returning end as usual, instead it'll return 4 or 6
Note: I'm also fine using an algorithm that doesn't belong to the std namespace as long as its compatible with containers. Like, say, boost::binary_search.

There is no such functions, but you can write a simple one using std::lower_bound, std::upper_bound or std::equal_range.
A simple implementation could be
template<class Iter, class T>
Iter binary_find(Iter begin, Iter end, T val)
{
// Finds the lower bound in at most log(last - first) + 1 comparisons
Iter i = std::lower_bound(begin, end, val);
if (i != end && !(val < *i))
return i; // found
else
return end; // not found
}
Another solution would be to use a std::set, which guarantees the ordering of the elements and provides a method iterator find(T key) that returns an iterator to the given item. However, your requirements might not be compatible with the use of a set (for example if you need to store the same element multiple times).

You should have a look at std::equal_range. It will return a pair of iterators to the range of all results.

There is a set of them:
http://www.sgi.com/tech/stl/table_of_contents.html
Search for:
lower_bound
upper_bound
equal_range
binary_search
On a separate note:
They were probably thinking that searching containers could term up more than one result. But on the odd occasion where you just need to test for existence an optimized version would also be nice.

If std::lower_bound is too low-level for your liking, you might want to check boost::container::flat_multiset.
It is a drop-in replacement for std::multiset implemented as a sorted vector using binary search.

The shortest implementation, wondering why it's not included in the standard library:
template<class ForwardIt, class T, class Compare=std::less<>>
ForwardIt binary_find(ForwardIt first, ForwardIt last, const T& value, Compare comp={})
{
// Note: BOTH type T and the type after ForwardIt is dereferenced
// must be implicitly convertible to BOTH Type1 and Type2, used in Compare.
// This is stricter than lower_bound requirement (see above)
first = std::lower_bound(first, last, value, comp);
return first != last && !comp(value, *first) ? first : last;
}
From https://en.cppreference.com/w/cpp/algorithm/lower_bound

int BinarySearch(vector<int> array,int var)
{
//array should be sorted in ascending order in this case
int start=0;
int end=array.size()-1;
while(start<=end){
int mid=(start+end)/2;
if(array[mid]==var){
return mid;
}
else if(var<array[mid]){
end=mid-1;
}
else{
start=mid+1;
}
}
return 0;
}
Example: Consider an array, A=[1,2,3,4,5,6,7,8,9]
Suppose you want to search the index of 3
Initially, start=0 and end=9-1=8
Now, since start<=end; mid=4; (array[mid] which is 5) !=3
Now, 3 lies to the left of mid as its smaller than 5. Therefore, we only search the left part of the array
Hence, now start=0 and end=3; mid=2.Since array[mid]==3, hence we got the number we were searching for. Hence, we return its index which is equal to mid.

Check this function, qBinaryFind:
RandomAccessIterator qBinaryFind ( RandomAccessIterator begin, RandomAccessIterator end, const T & value )
Performs a binary search of the range
[begin, end) and returns the position
of an occurrence of value. If there
are no occurrences of value, returns
end.
The items in the range [begin, end)
must be sorted in ascending order; see
qSort().
If there are many occurrences of the
same value, any one of them could be
returned. Use qLowerBound() or
qUpperBound() if you need finer
control.
Example:
QVector<int> vect;
vect << 3 << 3 << 6 << 6 << 6 << 8;
QVector<int>::iterator i =
qBinaryFind(vect.begin(), vect.end(), 6);
// i == vect.begin() + 2 (or 3 or 4)
The function is included in the <QtAlgorithms> header which is a part of the Qt library.

std::lower_bound() :)

A solution returning the position inside the range could be like this, using only operations on iterators (it should work even if iterator does not arithmetic):
template <class InputIterator, typename T>
size_t BinarySearchPos(InputIterator first, InputIterator last, const T& val)
{
const InputIterator beginIt = first;
InputIterator element = first;
size_t p = 0;
size_t shift = 0;
while((first <= last))
{
p = std::distance(beginIt, first);
size_t u = std::distance(beginIt, last);
size_t m = p + (u-p)/2; // overflow safe (p+u)/2
std::advance(element, m - shift);
shift = m;
if(*element == val)
return m; // value found at position m
if(val > *element)
first = element++;
else
last = element--;
}
// if you are here the value is not present in the list,
// however if there are the value should be at position u
// (here p==u)
return p;
}

Related

incorrect output when using C++ lower/upper_bound function with custom compare operator

I've been experimenting with the "lower_bound()/upper_bound()" functions in C++ w.r.t. arrays/vectors, and I get incorrect results when applying custom compare operators to the function.
My current understanding (based on https://www.cplusplus.com/reference/algorithm/upper_bound/) is that when you search for some value 'val' (of any datatype) in an array, it returns the first iterator position "it" in the array (from left to right) that satisfies !comp(val,*it), is this wrong? If so, how exactly does the searching work?
P.S. In addition, what is the difference of using lowerbound/upperbound when your searching criterion is a specific boolean compare function?
Here is an example that produced erroneous results:
auto comp2 = [&](int num, pair<int,int>& p2){return num>p2.second;};
vector<pair<int,int>> pairs = {{1,2},{2,3},{3,4}}; //this array should be binary-searchable with 'comp2' comparator, since pairs[i].second is monotonously increasing
int pos2 = upper_bound(pairs.begin(),pairs.end(),2,comp2)-pairs.begin();
cout<<pos2<<endl; //outputs 3, but should give 0 because !comp2(2,arr[0]) is true, and arr[0] is the ealiest element in the array
Thanks!

I think most (If not all) of the comparator functions are less, it can be std::less or something similar. So when we provide a custom comp function, we have to provide the less logic and think of it as less.
Now back to the upper_bound, it returns the first element greater than the value, which means our less should return true for it to stop (As Francois pointed out). While our comp function always returns false.
And your understanding about !comp(val,*it) is also not correct. It is the condition to continue the search, not to stop it.
Here is an example implementation of the upper_bound, let's take a look:
template<class ForwardIt, class T, class Compare>
ForwardIt upper_bound(ForwardIt first, ForwardIt last, const T& value, Compare comp)
{
ForwardIt it;
typename std::iterator_traits<ForwardIt>::difference_type count, step;
count = std::distance(first, last);
while (count > 0) {
it = first;
step = count / 2;
std::advance(it, step);
if (!comp(value, *it)) {
first = ++it;
count -= step + 1;
}
else
count = step;
}
return first;
}
You can see, if (!comp(value, *it)) is when the less return false, it means the value is greater than the current item, it will move forward and continue from the next item. (Because the items are increasing).
In the other case, it will try to reduce the search distance (By half the count) and hope to find earlier item that is greater than value.
Summary: You have to provide comp as less logic and let the upper_bound do the rest.

upper_bound returns the first element that satisfies comp(val, *it). In the link you provided, it shows
template <class ForwardIterator, class T>
ForwardIterator upper_bound (ForwardIterator first, ForwardIterator last, const T& val)
{
ForwardIterator it;
iterator_traits<ForwardIterator>::difference_type count, step;
count = std::distance(first,last);
while (count>0)
{
it = first; step=count/2; std::advance (it,step);
if (!(val<*it)) // or: if (!comp(val,*it)), for version (2)
{ first=++it; count-=step+1; }
else count=step;
}
return first;
}
Returns an iterator pointing to the first element in the range [first,last) which compares greater than val.
The searching works by starting at position 0(first). It then uses count to see the range of values it needs to check. It checks the middle of the range (first+count/2), and if that does not satisfy the condition, that position is now first (discarding all values before it), and repeats with the new first and range. If it does satisfy the condition, then the algorithm can discard all values after that, and repeat with the new range. When the range drops to 0, the algorithm can end. It assumes that if arr[5] is false, arr[0], arr[1] ... arr[4] are also false. Same with if arr[8] is true, arr[9], arr[10] ... arr[n] are also true.
The reason your code does not work is because the comparator used returns num>p2.second, meaning it looks for a value of p2.second that is less than num. Since you put in 2 for num, and there is no p2.second less than that in the vector, the output points to a position outside of the vector because it didn't find anything.
The difference between upper_bound and lower_bound is that upper_bound looks for the first value that satisfies the condition, while lower_bound looks for the first value that does not satisfy the condition. So
lower_bound(v.begin(), v.end(), val, [](int it, int val) {return !(val < it);});
is the same as
upper_bound(v.begin(), v.end(), val, [](int val, int it){return val < it;});
Note that for lower_bound, the comparator used takes (*it, val), not (val, *it).
I guess the only difference is how easy it is to frame the comparator in those terms - realizing that a<b is the same as not a>=b.
More explained here. I liked the explanation that said it finds [lower_bound, upper_bound) when using the same comparator.

c++ std::vector iterator constructor with first == last

I have a function which needs to divide a vector up into n sub-vectors.
For example, it might look something like this:
void foo(std::vector<uint8_t> vec, int numSubVectors){
size_t size = vec.size() / numSubVectors;
auto iter = vec.begin();
for (int i = 0; i < numSubVectors; ++i) {
auto sub_vec = std::vector<uint8_t>(iter, iter + size);
// do something with sub_vec
// ...
iter += size;
}
}
I need this to work when called with foo({}, 1), where sub_vec gets assigned an empty vector on the first (and only) iteration of the loop.
I'm concerned about the std::vector<uint8_t>(iter, iter + size). Does the c++ standard allow a vector to be constructed using its range constructor when first == last?
According to cplusplus.com, "The range used is [first,last), which includes all the elements between first and last, including the element pointed by first but not the element pointed by last", but that statement doesn't make any sense when first == last?
I tried running it on an online IDE and it seems to work (https://ideone.com/V9hylA), so it's clearly not prohibited, but is it undefined behaviour?

From iterator.requirements.general of the standard:
An iterator and a sentinel denoting a range are comparable. A range [i, s) is empty if i == s; otherwise [...]
So when first == last, the standard explicitly defines this as an empty range.

The iterator pair [first, last) where first == last is how we define an empty range. It's syntactically and logically valid. Constructing a std::vector from that iterator pair will do the correct thing and create an empty container.

Check if iterator to std::map points to second last element

I'm (forward) iterating over a std::map and would like to find if the iterator points to the second last element. I can't seem to find how to do that anywhere.
I've got:
bool
isSecondLastFile(const TDateFileInfoMap::const_iterator &tsFile)
{
TDateFileInfoMap::reverse_iterator secondLastIt = mFileInfoMap.rbegin() + 1;
return (tsFile == secondLastIt);
}
Where TDateFileInfoMap is std::map
I'm getting:
error: no match for ‘operator==’ in ‘tsFile == secondLastIt’
/usr/lib/gcc/i686-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_tree.h:287: note: candidates are: bool std::_Rb_tree_const_iterator<_Tp>::operator==(const std::_Rb_tree_const_iterator<_Tp>&) const [with _Tp = std::pair<const long int, TFileInfo>]
Does that mean I can't compare the forward and reverse iterator?
How do I figure out if the forward iterator is pointing at the second last element?

std::map's iterator type is BidirectionalIterator. Just decrement the end iterator twice--first to get the last element since m.end() returns an iterator at the after the end position, and then again to get the second-last element:
auto penultimate = std::prev(m.end(), 2);
Then you can simply check for equality with the resultant iterator:
auto it = m.begin();
it == penultimate;
see it live on Coliru
Naturally, you should check that the map has two elements first if it's not guaranteed by other logic in your program.

Does that mean I can't compare the forward and reverse iterator?
Yes you can't compare them directly.
You can use base() to get the underlying base iterator.
Returns the underlying base iterator. That is
std::reverse_iterator(it).base() == it.
The base iterator refers to the element that is next (from the
std::reverse_iterator::iterator_type perspective) to the element the
reverse_iterator is currently pointing to. That is &*(rit.base() - 1) == &*rit.
e.g.
return (tsFile == (++secondLastIt).base());
BTW: mFileInfoMap.rbegin() + 1 won't compile since the iterator of std::map is not RandomAccessIterator. You might write:
TDateFileInfoMap::reverse_iterator secondLastIt = mFileInfoMap.rbegin();
++secondLastIt;
Note that we're not checking whether the map is empty or has only one element.

A simple solution for forward iterators:
template <typename ForwardIterator>
inline bool isNthLast(std::size_t n, ForwardIterator pos, ForwardIterator last) {
for( ;; --n, ++pos) {
if(n == 0)
return (pos == last);
if(pos == last)
return false;
}
}
bool isSecondLastFile(TDateFileInfoMap::const_iterator sFile) {
return isNthLast(2, sFile, mFileInfoMap.end());
}

Let's say you have a set with name s.
s= {s1,s2,...,sN-1, sN}
Now to iterate from s1.. to sN-1 (which is second last element) we will use STL functions, s.begin() and s.end().
end = s.end(); //end points to end
end--// end points to sN
Now in the for loop when itr (starts from the beginning of set) becomes equal to sN the loop will break, and you will get s1,s2,..sN-1 inside the loop.
map<int,int> s;
// to iterate till fixed range in map
auto end =s.end();
end--; // end to second last;
for(auto itr = s.begin(); itr!=end;itr++){
// do your operation
}

Why does std::binary_search return bool?

According to draft N4431, the function std::binary_search in the algorithms library returns a bool, [binary.search]:
template<class ForwardIterator, class T>
bool binary_search(ForwardIterator first, ForwardIterator last,
const T& value);
template<class ForwardIterator, class T, class Compare>
bool binary_search(ForwardIterator first, ForwardIterator last,
const T& value, Compare comp);
Requires: The elements e of [first,last) are partitioned with respect to the expressions e < value and !(value < e) or comp(e, value) and !comp(value, e). Also, for all elements e of [first,last), e < value implies !(value < e) or comp(e, value) implies !comp(value, e).
Returns: true if there is an iterator i in the range [first,last) that satisfies the corresponding conditions:
!(*i < value) && !(value < *i) or comp(*i, value) == false && comp(value, *i) ==
false.
Complexity: At most log2(last - first) + O(1) comparisons.
Does anyone know why this is the case?
Most other generic algorithms either return an iterator to the element or an iterator that is equivalent to the iterator denoting the end of the sequence of elements (i.e., one after the last element to be considered in the sequence), which is what I would have expected.

The name of this function in 1994 version of STL was isMember. I think you'd agree that a function with that name should return bool
http://www.stepanovpapers.com/Stepanov-The_Standard_Template_Library-1994.pdf

It's split into multiple different functions in C++, as for the reasoning it's nearly impossible to tell why someone made something one way or another. binary_search will tell you if such an element exists. If you need to know the location of them use lower_bound and upper_bound which will give the begin/end iterator respectively. There's also equal_range that gives you both the begin and end at once.
Since others seem to think that it's obvious why it was created that way I'll argue my points why it's hard/impossible to answer if you aren't Alexander Stepanov or someone who worked with him.
Sadly the SGI STL FAQ doesn't mention binary_search at all. It explains reasoning for list<>::size being linear time or pop returning void. It doesn't seem like they deemed binary_search special enough to document it.
Let's look at the possible performance improvement mentioned by #user2899162:
You can find the original implementation of the SGI STL algorithm binary_search here. Looking at it one can pretty much simplify it (we all know how awful the internal names in the standard library are) to:
template <class ForwardIter, class V>
bool binary_search(ForwardIter first, ForwardIter last, const V& value) {
ForwardIter it = lower_bound(first, last, value);
return it != last && !(value < *it);
}
As you can see it was implemented in terms of lower_bound and got the same exact performance. If they really wanted it to take advantage of possible performance improvements they wouldn't have implemented it in terms of the slower one, so it doesn't seem like that was the reason they did it that way.
Now let's look at it simply being a convenience function
It being simply a convenience function seems more likely, but looking through the STL you'll find numerous other algorithms where this could have been possible. Looking at the above implementation you'll see that it's only trivially more to do than a std::find(begin, end, value) != end; yet we have to write that all the time and don't have a convenience function that returns a bool. Why exactly here and not all the other algorithms too? It's not really obvious and can't simply be explained.
In conclusion I find it far from obvious and don't really know if I could confidently and honestly answer it.

The binary search algorithm relies on strict weak ordering. Meaning that the elements are supposed to be partitioned according to the operator < or according to a custom comparator that has the same guarantees. This means that there isn't necessarily only one element that could be found for a given query. Thus you need the lower_bound, upper_bound and equal_range functions to retrieve iterators.

The standard library contains variants of binary search algorithm that return iterators. They are called std::lower_bound and std::upper_bound. I think the rationale behind std::binary_search returning bool is that it wouldn't be clear what iterator to return in case of equivalent elements, while in case of std::lower_bound and std::upper_bound it is clear.
There might have been performance considerations as well, because in theory std::binary_search could be implemented to perform better in case of multiple equivalent elements and certain types. However, at least one popular implementation of the standard library (libstdc++) implements std::binary_search using std::lower_bound and, moreover, they have the same theoretical complexity.

If you want to get an iterator on a value, you can use std::equal_range which will return 2 iterators, one on the lower bound and one on the higher bound of the range of values that are equal to the one you're looking for.
Since the only requirement is that values are sorted and not unique, there's is no simple "find" that would return an iterator on the one element you're looking for. If there's only one element equal to the value you're looking for, there will only be a difference of 1 between the two iterators.

Here's a C++20 binary-seach alternative that returns an iterator:
template<typename RandomIt, typename T, typename Pred>
inline
RandomIt xbinary_search( RandomIt begin, RandomIt end, T const &key, Pred pred )
requires std::random_access_iterator<RandomIt>
&&
requires( Pred pred, typename std::iterator_traits<RandomIt>::value_type &elem, T const &key )
{
{ pred( elem, key ) } -> std::convertible_to<std::strong_ordering>;
}
{
using namespace std;
size_t lower = 0, upper = end - begin, mid;
strong_ordering so;
while( lower != upper )
{
mid = (lower + upper) / 2;
so = pred( begin[mid], key );
if( so == 0 )
{
assert(mid == 0 || pred( begin[mid - 1], key ) < 0);
assert(begin + mid + 1 == end || pred( begin[mid + 1], key ) > 0);
return begin + mid;
}
if( so > 0 )
upper = mid;
else
lower = mid + 1;
}
return end;
}
This code only works correctly if there's only one value between begin and end that matches the key. But if you debug and NDEBUG is not defined, the code stops in your debugger.

Insert multiple values into vector

I have a std::vector<T> variable. I also have two variables of type T, the first of which represents the value in the vector after which I am to insert, while the second represents the value to insert.
So lets say I have this container: 1,2,1,1,2,2
And the two values are 2 and 3 with respect to their definitions above. Then I wish to write a function which will update the container to instead contain:
1,2,3,1,1,2,3,2,3
I am using c++98 and boost. What std or boost functions might I use to implement this function?
Iterating over the vector and using std::insert is one way, but it gets messy when one realizes that you need to remember to hop over the value you just inserted.

This is what I would probably do:
vector<T> copy;
for (vector<T>::iterator i=original.begin(); i!=original.end(); ++i)
{
copy.push_back(*i);
if (*i == first)
copy.push_back(second);
}
original.swap(copy);
Put a call to reserve in there if you want. You know you need room for at least original.size() elements. You could also do an initial iteraton over the vector (or use std::count) to determine the exact amount of elements to reserve, but without testing, I don't know whether that would improve performance.

I propose a solution that works in place and in O(n) in memory and O(2n) time. Instead of O(n^2) in time by the solution proposed by Laethnes and O(2n) in memory by the solution proposed by Benjamin.
// First pass, count elements equal to first.
std::size_t elems = std::count(data.begin(), data.end(), first);
// Resize so we'll add without reallocating the elements.
data.resize(data.size() + elems);
vector<T>::reverse_iterator end = data.rbegin() + elems;
// Iterate from the end. Move elements from the end to the new end (and so elements to insert will have some place).
for(vector<T>::reverse_iterator new_end = data.rbegin(); end != data.rend() && elems > 0; ++new_end,++end)
{
// If the current element is the one we search, insert second first. (We iterate from the end).
if(*end == first)
{
*new_end = second;
++new_end;
--elems;
}
// Copy the data to the end.
*new_end = *end;
}
This algorithm may be buggy but the idea is to copy only once each elements by:
Firstly count how much elements we'll need to insert.
Secondly by going though the data from the end and moving each elements to the new end.

This is what I probably would do:
typedef ::std::vector<int> MyList;
typedef MyList::iterator MyListIter;
MyList data;
// ... fill data ...
const int searchValue = 2;
const int addValue = 3;
// Find first occurence of searched value
MyListIter iter = ::std::find(data.begin(), data.end(), searchValue);
while(iter != data.end())
{
// We want to add our value after searched one
++iter;
// Insert value and return iterator pointing to the inserted position
// (original iterator is invalid now).
iter = data.insert(iter, addValue);
// This is needed only if we want to be sure that out value won't be used
// - for example if searchValue == addValue is true, code would create
// infinite loop.
++iter;
// Search for next value.
iter = ::std::find(iter, data.end(), searchValue);
}
but as you can see, I couldn't avoid the incrementation you mentioned. But I don't think that would be bad thing: I would put this code to separate functions (probably in some kind of "core/utils" module) and - of course - implement this function as template, so I would write it only once - only once worrying about incrementing value is IMHO acceptable. Very acceptable.
template <class ValueType>
void insertAfter(::std::vector<ValueType> &io_data,
const ValueType &i_searchValue,
const ValueType &i_insertAfterValue);
or even better (IMHO)
template <class ListType, class ValueType>
void insertAfter(ListType &io_data,
const ValueType &i_searchValue,
const ValueType &i_insertAfterValue);
EDIT:
well, I would solve problem little different way: first count number of the searched value occurrence (preferably store in some kind of cache which can be kept and used repeatably) so I could prepare array before (only one allocation) and used memcpy to move original values (for types like int only, of course) or memmove (if the vector allocated size is sufficient already).

In place, O(1) additional memory and O(n) time (Live at Coliru):
template <typename T, typename A>
void do_thing(std::vector<T, A>& vec, T target, T inserted) {
using std::swap;
typedef typename std::vector<T, A>::size_type size_t;
const size_t occurrences = std::count(vec.begin(), vec.end(), target);
if (occurrences == 0) return;
const size_t original_size = vec.size();
vec.resize(original_size + occurrences, inserted);
for(size_t i = original_size - 1, end = i + occurrences; i > 0; --i, --end) {
if (vec[i] == target) {
--end;
}
swap(vec[i], vec[end]);
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Where can I get a "useful" C++ binary search algorithm? - c++

You should have a look at std::equal_range. It will return a pair of iterators to the range of all results.

If std::lower_bound is too low-level for your liking, you might want to check boost::container::flat_multiset. It is a drop-in replacement for std::multiset implemented as a sorted vector using binary search.

std::lower_bound() :)

Related

incorrect output when using C++ lower/upper_bound function with custom compare operator

c++ std::vector iterator constructor with first == last

Check if iterator to std::map points to second last element

Why does std::binary_search return bool?

Insert multiple values into vector

Categories

Resources