Why does std::binary_search return bool? - c++

According to draft N4431, the function std::binary_search in the algorithms library returns a bool, [binary.search]:
template<class ForwardIterator, class T>
bool binary_search(ForwardIterator first, ForwardIterator last,
const T& value);
template<class ForwardIterator, class T, class Compare>
bool binary_search(ForwardIterator first, ForwardIterator last,
const T& value, Compare comp);
Requires: The elements e of [first,last) are partitioned with respect to the expressions e < value and !(value < e) or comp(e, value) and !comp(value, e). Also, for all elements e of [first,last), e < value implies !(value < e) or comp(e, value) implies !comp(value, e).
Returns: true if there is an iterator i in the range [first,last) that satisfies the corresponding conditions:
!(*i < value) && !(value < *i) or comp(*i, value) == false && comp(value, *i) ==
false.
Complexity: At most log2(last - first) + O(1) comparisons.
Does anyone know why this is the case?
Most other generic algorithms either return an iterator to the element or an iterator that is equivalent to the iterator denoting the end of the sequence of elements (i.e., one after the last element to be considered in the sequence), which is what I would have expected.

The name of this function in 1994 version of STL was isMember. I think you'd agree that a function with that name should return bool
http://www.stepanovpapers.com/Stepanov-The_Standard_Template_Library-1994.pdf

It's split into multiple different functions in C++, as for the reasoning it's nearly impossible to tell why someone made something one way or another. binary_search will tell you if such an element exists. If you need to know the location of them use lower_bound and upper_bound which will give the begin/end iterator respectively. There's also equal_range that gives you both the begin and end at once.
Since others seem to think that it's obvious why it was created that way I'll argue my points why it's hard/impossible to answer if you aren't Alexander Stepanov or someone who worked with him.
Sadly the SGI STL FAQ doesn't mention binary_search at all. It explains reasoning for list<>::size being linear time or pop returning void. It doesn't seem like they deemed binary_search special enough to document it.
Let's look at the possible performance improvement mentioned by #user2899162:
You can find the original implementation of the SGI STL algorithm binary_search here. Looking at it one can pretty much simplify it (we all know how awful the internal names in the standard library are) to:
template <class ForwardIter, class V>
bool binary_search(ForwardIter first, ForwardIter last, const V& value) {
ForwardIter it = lower_bound(first, last, value);
return it != last && !(value < *it);
}
As you can see it was implemented in terms of lower_bound and got the same exact performance. If they really wanted it to take advantage of possible performance improvements they wouldn't have implemented it in terms of the slower one, so it doesn't seem like that was the reason they did it that way.
Now let's look at it simply being a convenience function
It being simply a convenience function seems more likely, but looking through the STL you'll find numerous other algorithms where this could have been possible. Looking at the above implementation you'll see that it's only trivially more to do than a std::find(begin, end, value) != end; yet we have to write that all the time and don't have a convenience function that returns a bool. Why exactly here and not all the other algorithms too? It's not really obvious and can't simply be explained.
In conclusion I find it far from obvious and don't really know if I could confidently and honestly answer it.

The binary search algorithm relies on strict weak ordering. Meaning that the elements are supposed to be partitioned according to the operator < or according to a custom comparator that has the same guarantees. This means that there isn't necessarily only one element that could be found for a given query. Thus you need the lower_bound, upper_bound and equal_range functions to retrieve iterators.

The standard library contains variants of binary search algorithm that return iterators. They are called std::lower_bound and std::upper_bound. I think the rationale behind std::binary_search returning bool is that it wouldn't be clear what iterator to return in case of equivalent elements, while in case of std::lower_bound and std::upper_bound it is clear.
There might have been performance considerations as well, because in theory std::binary_search could be implemented to perform better in case of multiple equivalent elements and certain types. However, at least one popular implementation of the standard library (libstdc++) implements std::binary_search using std::lower_bound and, moreover, they have the same theoretical complexity.

If you want to get an iterator on a value, you can use std::equal_range which will return 2 iterators, one on the lower bound and one on the higher bound of the range of values that are equal to the one you're looking for.
Since the only requirement is that values are sorted and not unique, there's is no simple "find" that would return an iterator on the one element you're looking for. If there's only one element equal to the value you're looking for, there will only be a difference of 1 between the two iterators.

Here's a C++20 binary-seach alternative that returns an iterator:
template<typename RandomIt, typename T, typename Pred>
inline
RandomIt xbinary_search( RandomIt begin, RandomIt end, T const &key, Pred pred )
requires std::random_access_iterator<RandomIt>
&&
requires( Pred pred, typename std::iterator_traits<RandomIt>::value_type &elem, T const &key )
{
{ pred( elem, key ) } -> std::convertible_to<std::strong_ordering>;
}
{
using namespace std;
size_t lower = 0, upper = end - begin, mid;
strong_ordering so;
while( lower != upper )
{
mid = (lower + upper) / 2;
so = pred( begin[mid], key );
if( so == 0 )
{
assert(mid == 0 || pred( begin[mid - 1], key ) < 0);
assert(begin + mid + 1 == end || pred( begin[mid + 1], key ) > 0);
return begin + mid;
}
if( so > 0 )
upper = mid;
else
lower = mid + 1;
}
return end;
}
This code only works correctly if there's only one value between begin and end that matches the key. But if you debug and NDEBUG is not defined, the code stops in your debugger.

Related

incorrect output when using C++ lower/upper_bound function with custom compare operator

I've been experimenting with the "lower_bound()/upper_bound()" functions in C++ w.r.t. arrays/vectors, and I get incorrect results when applying custom compare operators to the function.
My current understanding (based on https://www.cplusplus.com/reference/algorithm/upper_bound/) is that when you search for some value 'val' (of any datatype) in an array, it returns the first iterator position "it" in the array (from left to right) that satisfies !comp(val,*it), is this wrong? If so, how exactly does the searching work?
P.S. In addition, what is the difference of using lowerbound/upperbound when your searching criterion is a specific boolean compare function?
Here is an example that produced erroneous results:
auto comp2 = [&](int num, pair<int,int>& p2){return num>p2.second;};
vector<pair<int,int>> pairs = {{1,2},{2,3},{3,4}}; //this array should be binary-searchable with 'comp2' comparator, since pairs[i].second is monotonously increasing
int pos2 = upper_bound(pairs.begin(),pairs.end(),2,comp2)-pairs.begin();
cout<<pos2<<endl; //outputs 3, but should give 0 because !comp2(2,arr[0]) is true, and arr[0] is the ealiest element in the array
Thanks!
I think most (If not all) of the comparator functions are less, it can be std::less or something similar. So when we provide a custom comp function, we have to provide the less logic and think of it as less.
Now back to the upper_bound, it returns the first element greater than the value, which means our less should return true for it to stop (As Francois pointed out). While our comp function always returns false.
And your understanding about !comp(val,*it) is also not correct. It is the condition to continue the search, not to stop it.
Here is an example implementation of the upper_bound, let's take a look:
template<class ForwardIt, class T, class Compare>
ForwardIt upper_bound(ForwardIt first, ForwardIt last, const T& value, Compare comp)
{
ForwardIt it;
typename std::iterator_traits<ForwardIt>::difference_type count, step;
count = std::distance(first, last);
while (count > 0) {
it = first;
step = count / 2;
std::advance(it, step);
if (!comp(value, *it)) {
first = ++it;
count -= step + 1;
}
else
count = step;
}
return first;
}
You can see, if (!comp(value, *it)) is when the less return false, it means the value is greater than the current item, it will move forward and continue from the next item. (Because the items are increasing).
In the other case, it will try to reduce the search distance (By half the count) and hope to find earlier item that is greater than value.
Summary: You have to provide comp as less logic and let the upper_bound do the rest.
upper_bound returns the first element that satisfies comp(val, *it). In the link you provided, it shows
template <class ForwardIterator, class T>
ForwardIterator upper_bound (ForwardIterator first, ForwardIterator last, const T& val)
{
ForwardIterator it;
iterator_traits<ForwardIterator>::difference_type count, step;
count = std::distance(first,last);
while (count>0)
{
it = first; step=count/2; std::advance (it,step);
if (!(val<*it)) // or: if (!comp(val,*it)), for version (2)
{ first=++it; count-=step+1; }
else count=step;
}
return first;
}
Returns an iterator pointing to the first element in the range [first,last) which compares greater than val.
The searching works by starting at position 0(first). It then uses count to see the range of values it needs to check. It checks the middle of the range (first+count/2), and if that does not satisfy the condition, that position is now first (discarding all values before it), and repeats with the new first and range. If it does satisfy the condition, then the algorithm can discard all values after that, and repeat with the new range. When the range drops to 0, the algorithm can end. It assumes that if arr[5] is false, arr[0], arr[1] ... arr[4] are also false. Same with if arr[8] is true, arr[9], arr[10] ... arr[n] are also true.
The reason your code does not work is because the comparator used returns num>p2.second, meaning it looks for a value of p2.second that is less than num. Since you put in 2 for num, and there is no p2.second less than that in the vector, the output points to a position outside of the vector because it didn't find anything.
The difference between upper_bound and lower_bound is that upper_bound looks for the first value that satisfies the condition, while lower_bound looks for the first value that does not satisfy the condition. So
lower_bound(v.begin(), v.end(), val, [](int it, int val) {return !(val < it);});
is the same as
upper_bound(v.begin(), v.end(), val, [](int val, int it){return val < it;});
Note that for lower_bound, the comparator used takes (*it, val), not (val, *it).
I guess the only difference is how easy it is to frame the comparator in those terms - realizing that a<b is the same as not a>=b.
More explained here. I liked the explanation that said it finds [lower_bound, upper_bound) when using the same comparator.

Why is there no std::inplace_merge_unique?

I tried looking for an algorithm that would do what std::inplace_merge
followed by std::unique would do. Seems more efficient to do it in 1 pass than in 2.
Could not find it in standard library or by oogling.
So is there implementation somewhere in boost under different name maybe?
Is such algorithn possible (in a sense that it has same complexity guarantees as normal inplace_merge)?
It doesn't operate in-place, but assuming that neither range contains duplicates beforehand, std::set_union will find the same result as merge followed by unique.
There are many interesting algorithms missing from the algorithms section. The original submission of STL was incomplete from Stepanov's view and some algorithms were even removed. The proposal by Alexander Stepanov and Meng Lee doesn't seem to include an algorithm inplace_merge_unique() or any variation thereof.
One of the potential reasons why there is no such algorithm is that it isn't clear which of the element should be dropped: since the comparison is only a strict weak ordering, the choice of element matters. One approach to implement inplace_merge_unique() is to
Use std::remove_if() to remove any element which is a duplicate from the second range.
Use inplace_merge() to do the actual merge.
The predicate to std::remove_if() would track the current position in the first part of the sequence to be merged. The code below isn't tested but something like that should work:
template <typename BiDirIt, typename Comp>
BiDirIt inplace_merge_unique(BiDirIt begin, BiDirIt middle, BiDirIt end, Comp comp) {
using reference = typename std::iterator_traits<BiDirIt>::reference;
BiDirIt result = std::remove_if(middle, end, [=](reference other) mutable -> bool {
begin = std::find_if(begin, middle, [=](reference arg)->bool {
return !comp(arg, other);
});
return begin != middle && !comp(other, *begin);
});
std::inplace_merge(begin, middle, result, comp);
return result;
}

How to find a value in a sorted C++ vector in the most efficient way?

I have looked at find and binary_search, but find doesn't take advantage of the fact that the vector is sorted, and binary_search only returns a true or false, not where it found the value. Is there any function that can give me the best of both worlds?
You can use find to locate a particular element in any container in time O(N). With vector you can do random access and take advantage of the lower_bound (log2(N)), upper_bound, or equal_range class of std algorithms. std::lower_bound will do that for you. It's in the equivalent-behavior section at the top for binary_search. However, the utility of binary_search is limited to yes and no answers (maybe the naming needs to be improved in the future version of C++; binary_in()).
There is a method, std::equal_range, which will give you a pair containing the lower and upper bound of the subset holding the desired value. If both of those items in the pair are identical, then the value you were looking for doesn't exist.
template<class T, class U>
bool contains(const std::vector<T>& container, const U& v)
{
auto it = std::lower_bound(
container.begin(),
container.end(),
v,
[](const T& l, const U& r){ return l < r; });
return it != container.end() && *it == v;
}

C++ equivalent of Python difference_update?

s1 and s2 are sets (Python set or C++ std::set)
To add the elements of s2 to s1 (set union), you can do
Python: s1.update(s2)
C++: s1.insert(s2.begin(), s2.end());
To remove the elements of s2 from s1 (set difference), you can do
Python: s1.difference_update(s2)
What is the C++ equivalent of this? The code
s1.erase(s2.begin(), s2.end());
does not work, for s1.erase() requires iterators from s1.The code
std::set<T> s3;
std::set_difference(s1.begin(), s1.end(), s2.begin(), s2.end(), std::inserter(s3, s3.end());
s1.swap(s3);
works, but seems overly complex, at least compared with Python.
Is there a simpler way?
Using std::set_difference is the idiomatic way to do this in C++. You have stumbled across one of the primary differences (pun intended) between C++/STL and many other languages. STL does not bundle operations directly with the data structures. This is why std::set does not implement a difference routine.
Basically, algorithms such as std::set_difference write the result of the operation to another object. It is interesting to note that the algorithm does not require that either or both of the operands are actually std::set. The definition of the algorithm is:
Effects: Copies the elements of the range [first1, last1) which are not present in the range [first2, last2) to the range beginning at result. The elements in the constructed range are sorted.
Requires: The resulting range shall not overlap with either of the original ranges. Input ranges are required to be order by the same operator<.
Returns: The end of the constructed range.
Complexity: At most 2 * ((last1 - first1) + (last2 - first2)) - 1 comparisons
The interesting difference is that the C++ version is applicable to any two sorted ranges. In most languages, you are forced to coerce or translate the calling object (left-hand operand) into a set before you have access to the set difference algorithm.
This is not really pertinent to your question, but this is the reason that the various set algorithms are modeled as free-standing algorithms instead of member methods.
You should iterate through the second set:
for( set< T >::iterator iter = s2.begin(); iter != s2.end(); ++iter )
{
s1.erase( *iter );
}
This will could be cheaper than using std::set_difference - set_difference copies the unique objects into a new container, but it takes linear time, while .erase will not copy anything, but is O(n * log( n ) ).
In other words, depends on the container, you could choose the way, that will be faster for your case.
Thanks David Rodríguez - dribeas for the remark! (:
EDIT: Doh! I thought about BOOST_FOREACH at the very beginning, but I was wrong that it could not be used.. - you don't need the iterator, but just the value.. As user763305 said by himself/herself.
In c++ there is no difference method in the set. The set_difference looks much more awkward as it is more generic than applying a difference on two sets. Of course you can implement your own version of in place difference on sets:
template <typename T, typename Compare, typename Allocator>
void my_set_difference( std::set<T,Compare,Allocator>& lhs, std::set<T,Compare,Allocator> const & rhs )
{
typedef std::set<T,Comapre,Allocator> set_t;
typedef typename set_t::iterator iterator;
typedef typename set_t::const_iterator const_iterator;
const_iterator rit = rhs.begin(), rend = rhs.end();
iterator it = lhs.begin(), end = lhs.end();
while ( it != end && rit != rend )
{
if ( lhs.key_comp( *it, *rit ) ) {
++it;
} else if ( lhs.key_comp( *rit, *it ) ) {
++rit;
} else {
++rit;
lhs.erase( it++ );
}
}
}
The performance of this algorithm will be linear in the size of the arguments, and require no extra copies as it modifies the first argument in place.
You can also do it with remove_if writing your own functor for testing existence in a set, e.g.
std::remove_if(s1.begin(), s1.end(), ExistIn(s2));
I suppose that set_difference is more efficient though as it probably scans both sets only once
Python set is unordered, and is more of an equivalent of C++ std::unordered_set than std::set, which is ordered.
David Rodríguez's algorithm relies on the fact that std::set is ordered, so the lhs and rhs sets can be traversed in the way as exhibit in the algorithm.
For a more general solution that works for both ordered and unordered sets, Kiril Kirov's algorithm should be the safe one to adopt if you are enforcing/preserving the "unorderedness" nature of Python set.

Where can I get a "useful" C++ binary search algorithm?

I need a binary search algorithm that is compatible with the C++ STL containers, something like std::binary_search in the standard library's <algorithm> header, but I need it to return the iterator that points at the result, not a simple boolean telling me if the element exists.
(On a side note, what the hell was the standard committee thinking when they defined the API for binary_search?!)
My main concern here is that I need the speed of a binary search, so although I can find the data with other algorithms, as mentioned below, I want to take advantage of the fact that my data is sorted to get the benefits of a binary search, not a linear search.
so far lower_bound and upper_bound fail if the datum is missing:
//lousy pseudo code
vector(1,2,3,4,6,7,8,9,0) //notice no 5
iter = lower_bound_or_upper_bound(start,end,5)
iter != 5 && iter !=end //not returning end as usual, instead it'll return 4 or 6
Note: I'm also fine using an algorithm that doesn't belong to the std namespace as long as its compatible with containers. Like, say, boost::binary_search.
There is no such functions, but you can write a simple one using std::lower_bound, std::upper_bound or std::equal_range.
A simple implementation could be
template<class Iter, class T>
Iter binary_find(Iter begin, Iter end, T val)
{
// Finds the lower bound in at most log(last - first) + 1 comparisons
Iter i = std::lower_bound(begin, end, val);
if (i != end && !(val < *i))
return i; // found
else
return end; // not found
}
Another solution would be to use a std::set, which guarantees the ordering of the elements and provides a method iterator find(T key) that returns an iterator to the given item. However, your requirements might not be compatible with the use of a set (for example if you need to store the same element multiple times).
You should have a look at std::equal_range. It will return a pair of iterators to the range of all results.
There is a set of them:
http://www.sgi.com/tech/stl/table_of_contents.html
Search for:
lower_bound
upper_bound
equal_range
binary_search
On a separate note:
They were probably thinking that searching containers could term up more than one result. But on the odd occasion where you just need to test for existence an optimized version would also be nice.
If std::lower_bound is too low-level for your liking, you might want to check boost::container::flat_multiset.
It is a drop-in replacement for std::multiset implemented as a sorted vector using binary search.
The shortest implementation, wondering why it's not included in the standard library:
template<class ForwardIt, class T, class Compare=std::less<>>
ForwardIt binary_find(ForwardIt first, ForwardIt last, const T& value, Compare comp={})
{
// Note: BOTH type T and the type after ForwardIt is dereferenced
// must be implicitly convertible to BOTH Type1 and Type2, used in Compare.
// This is stricter than lower_bound requirement (see above)
first = std::lower_bound(first, last, value, comp);
return first != last && !comp(value, *first) ? first : last;
}
From https://en.cppreference.com/w/cpp/algorithm/lower_bound
int BinarySearch(vector<int> array,int var)
{
//array should be sorted in ascending order in this case
int start=0;
int end=array.size()-1;
while(start<=end){
int mid=(start+end)/2;
if(array[mid]==var){
return mid;
}
else if(var<array[mid]){
end=mid-1;
}
else{
start=mid+1;
}
}
return 0;
}
Example: Consider an array, A=[1,2,3,4,5,6,7,8,9]
Suppose you want to search the index of 3
Initially, start=0 and end=9-1=8
Now, since start<=end; mid=4; (array[mid] which is 5) !=3
Now, 3 lies to the left of mid as its smaller than 5. Therefore, we only search the left part of the array
Hence, now start=0 and end=3; mid=2.Since array[mid]==3, hence we got the number we were searching for. Hence, we return its index which is equal to mid.
Check this function, qBinaryFind:
RandomAccessIterator qBinaryFind ( RandomAccessIterator begin, RandomAccessIterator end, const T & value )
Performs a binary search of the range
[begin, end) and returns the position
of an occurrence of value. If there
are no occurrences of value, returns
end.
The items in the range [begin, end)
must be sorted in ascending order; see
qSort().
If there are many occurrences of the
same value, any one of them could be
returned. Use qLowerBound() or
qUpperBound() if you need finer
control.
Example:
QVector<int> vect;
vect << 3 << 3 << 6 << 6 << 6 << 8;
QVector<int>::iterator i =
qBinaryFind(vect.begin(), vect.end(), 6);
// i == vect.begin() + 2 (or 3 or 4)
The function is included in the <QtAlgorithms> header which is a part of the Qt library.
std::lower_bound() :)
A solution returning the position inside the range could be like this, using only operations on iterators (it should work even if iterator does not arithmetic):
template <class InputIterator, typename T>
size_t BinarySearchPos(InputIterator first, InputIterator last, const T& val)
{
const InputIterator beginIt = first;
InputIterator element = first;
size_t p = 0;
size_t shift = 0;
while((first <= last))
{
p = std::distance(beginIt, first);
size_t u = std::distance(beginIt, last);
size_t m = p + (u-p)/2; // overflow safe (p+u)/2
std::advance(element, m - shift);
shift = m;
if(*element == val)
return m; // value found at position m
if(val > *element)
first = element++;
else
last = element--;
}
// if you are here the value is not present in the list,
// however if there are the value should be at position u
// (here p==u)
return p;
}