How to understand the requirement of `std::lower_bound`? - c++

As per the document about std::lower_bound, which says that:
The range [first, last) must be partitioned with respect to the expression element < value or comp(element, value), i.e., all elements for which the expression is true must precede all elements for which the expression is false. A fully-sorted range meets this criterion.
I have a little difficulty to fully understand it.
1.What's element < value? It seems that element (or value) is never mentioned before this paragraph in the aforementioned document. Does the said element mean the elements before the current element, and the said value means the value of the current element?
UPDATED:
2.Since whether a specific sequence is valid(i.e. suits the requirement) or not depends the value, the requirement for a specific sequence could not be always be guaranteed when the value is different. I think it's meaningless to define such a requirement. It's seems that a fully sorted seuqence is more reliable and practical.

What's element < value
value is the parameter to lower_bound, see at the beginning of that page:
template< class ForwardIt, class T > ForwardIt lower_bound( ForwardIt
first, ForwardIt last, const T& value );
The value in question is mentioned right here, the last parameter to the template. And element, references to some element and every element in the sequence.
This is rather a terse way to define the following: take every element in the sequence, one element at a time. When you do that, all elements for which the expression element < value returns true must appear in the sequence before all other elements, for which the same expression is false. It is explicitly intentional for the requirements to be defined in this manner, here's the explanation:
For example, if value is 4, and we're talking about natural integers, here's one such sequence:
1, 2, 3, 4, 5, 6
Here, all the elements for which this expression is true (1, 2, and 3), appear before all the elements for which this expression is false (4, 5, and 6).
The following sequence is also a valid sequence, in this case:
3, 2, 1, 6, 5, 4
Here, same thing: 3, 2, and 1, for which the expression element < 4 is true, appears before value 4, 5 and 6 for which the expression element < 4 would be false. So, yes, this would also be a valid sequence for a call to lower_bound for the value of 4.
The following sequence will NOT be a valid sequence for this specific case of using std::lower_bound:
1, 2, 4, 3, 5, 6
And as far as why is this lower_bound requirement specified in such an strange manner, well, that would be a different question. But this is what it means.

From cppreference
Returns an iterator pointing to the first element in the range [first,
last) that is not less than (i.e. greater or equal to) value, or last
if no such element is found.
The range [first, last) must be partitioned with respect to the
expression element < value or comp(element, value), i.e., all elements
for which the expression is true must precede all elements for which
the expression is false. A fully-sorted range meets this criterion.
From your question:
What's element < value? It seems that element (or value) is never
mentioned before this paragraph in the aforementioned document. Does
the said element mean the elements before the current element, and the
said value means the value of the current element?
value and comp are mentioned in the function signatures at the beginning of the page. value is the argument with which you call std::lower_bound, and comp an optional comparison function; shouldn't a comparison function be provided, < would be used instead.
element refers to each element in the range [first, last).
So, what std::lower_bound does is to compare value to the elements in the range until it finds the first one that is not "less than" (through < or comp) value.
The requirement for std::lower_bound to work is that the input range is partitioned in such a way that all the elements that are "less than" value are placed before the rest; that requirement is met, for example, if the range is fully sorted.
(As #Passerby mentions in the comments below, std::lower_bound won't need to compare against all the elements that are "less than" value, due to the partitioned range requirement, but that is an implementation detail.)

Partitions
A list of values may be partitioned, or grouped according to some criterion. For example:
┌────┬────┬────┬────┬────┬────┬────┬────┐
| 2 | 7 | 3 | -5 | 11 | 94 | 15 | 12 |
└────┴────┴────┴────┴────┴────┴────┴────┘
x < 10 | x ≥ 10
In this list we have two partitions:
elements where x < 10
elements where x is not < 10
Further, the criterion implies an order:
all xs such that (x < 10) come —before— all xs such that !(x < 10)
(In C++ we tend to use a “comparator” or “comparison function” to specify the criterion.)
Notice that it does not matter what order the elements are relative to each other in each partition! That said, it is also noteworthy that if the list were sorted, it would still have the exact same two partitions:
┌────┬────┬────┬────┬────┬────┬────┬────┐
| -5 | 2 | 3 | 7 | 11 | 12 | 15 | 94 |
└────┴────┴────┴────┴────┴────┴────┴────┘
x < 10 | x ≥ 10
(My example here has two equal-sized partitions. That is not necessarily the case. A partition may have zero or more elements, and each partition may have a different size than the others.)
Lower bound → index of start of partition
What the lower_bound algorithm does is find the first element of an existing partition.
The only caveat that the algorithm requires is that the sequence must already be partitioned in a way that the sort criterion makes sense. (Because the algorithm only finds partitions, it does not sort stuff!)
For example, our original sequence does not support a criterion that separates elements into (x < 7) and (x ≥ 7), because the elements are not partitioned in that way — it would not make sense to try to find the “first” element of a partition that doesn’t exist.
This is the meaning of the language used by cppreference.com.

Related

Problem to print all values of Priority queue of pairs in C++

Here is the function of printing second element(of pair) of priority queue :
void show(priority_queue <pair<int,string>> pq)
{
priority_queue <pair<int,string>> tmp=pq;
while (!tmp.empty())
{
cout<<tmp.top().second<<endl;
tmp.pop();
}
}
Input values are:
1 www.youtube.com
2 www.google.com
3 www.google.com.hk
10 www.alibaba.com
5 www.taobao.com
10 www.bad.com
7 www.good.com
8 www.fudan.edu.cn
9 www.university.edu.cn
10 acm.university.edu.cn
I know it should sort according to first element in descending order and when two elements are same then it keep the element first which I enter first.
It should print "www.alibaba.com" first then "www.bad.com" and then "acm.university.edu.cn" because the first value for all is 10.
But it prints "www.bad.com" first then "www.alibaba.com" and then "acm.university.edu.cn" and so on. What is the wrong here?
The std::pair comparison operators uses lexiographical comparion.
For two pairs p1 and p2 it means that if p1.first == p2.first then it compares p1.second < p2.second. So the order will be "largest" second to "smallest" (since priority queues does reverse ordering).
If you want custom comparison then you could provide a custom "less than" function for the queue. For example one that doesn't compare the second member of the pair (but then I think the order will be indeterminate).
The answer provided by #Someprogrammerdude is correct. To explain why in more detail, here is the full output, if you print both members of the pair:
10, www.bad.com
10, www.alibaba.com
10, acm.university.edu.cn
9, www.university.edu.cn
8, www.fudan.edu.cn
7, www.good.com
5, www.taobao.com
3, www.google.com.hk
2, www.google.com
1, www.youtube.com
Perhaps it is more obvious if we use a simpler dataset (1-4 paired with a,b,c):
4, c
4, b
4, a
3, c
3, b
3, a
2, c
2, b
2, a
1, c
1, b
1, a
The missing piece can be supplied by cppreference.com - priority_queue (emphasis by me)
A priority queue is a container adaptor that provides constant time
lookup of the largest (by default) element, at the expense of
logarithmic insertion and extraction.
A user-provided Compare can be supplied to change the ordering, e.g.
using std::greater would cause the smallest element to appear as
the top().

What's the logic behind the order the elements are passed to a comparison function in std::sort?

I'm practicing lambdas:
int main()
{
std::vector<int> v {1,2,3,4};
int count = 0;
sort(v.begin(), v.end(), [](const int& a, const int& b) -> bool
{
return a > b;
});
}
This is just code from GeeksForGeeks to sort in descending order, nothing special. I added some print statements (but took them out for this post) to see what was going on inside the lambda. They print the entire vector, and the a and b values:
1 2 3 4
a=2 b=1
2 1 3 4
a=3 b=2
3 2 1 4
a=4 b=3
4 3 2 1 <- final
So my more detailed question is:
What's the logic behind the order the vector elements are being passed into the a and b parameters?
Is b permanently at index 0 while a is iterating? And if so, isn't it a bit odd that the second param passed to the lambda stays at the first element? Is it compiler-specific? Thanks!
By passing a predicate to std::sort(), you are specifying your sorting criterion. The predicate must return true if the first parameter (i.e., a) precedes the second one (i.e., b), for the sorting criterion you are specifying.
Therefore, for your predicate:
return a > b;
If a is greater than b, then a will precede b.
So my more detailed question is: What's the logic behind the order the vector elements are being passed into the a and b parameters?
a and b are just pairs of elements of the elements you are passing to std::sort(). The "logic" will depend on the underlying algorithm that std::sort() implements. The pairs may also differ for calls with identical input due to randomization.
Is 'b' permanently at index 0 while 'a' is iterating? And if so, isn't it a bit odd that the second param passed to the lambda stays at the first element?
No, because the first element is the higher.
Seems that, with this algorithm, all elements are checked (and maybe switched) with the higher one (at first round) and the higher one is placed in first position; so b ever points to the higher one.
For Visual Studio, std::sort uses insertion sort if the sub-array size is <= 32 elements. For a larger sub-array, it uses intro sort, which is quick sort unless the "recursion" depth gets too deep, in which case it switches to heap sort. The output you program produces appears to correspond to some variation of insertion sort. Since the compare function is "less than", and since insertion sort is looking for out of order due to left values "greater than" right values, the input parameters are swapped.
You just compare two elements, with a given ordering. This means that if the order is a and then b, then the lambda must return true.
The fact that a or b are the first or the last element of the array, or fixed, depends on the sorting algorithm and of course of your data!

emplace_hint performance when hint is wrong

I am trying to determine if emplace_hint should be used to insert a key into a multimap (as opposed to regular emplace). I have already calculated the range of the key in an earlier operation (on the same key):
range = multimap.equal_range(key);
Should I use range.first, range.second, or nothing as a hint to insert the key, value pair? What if the range is empty?
Should I use range.first, range.second, or nothing as a hint to insert the key, value pair?
As std::multimap::emplace_hint() states:
Inserts a new element into the container as close as possible to the position just before hint.
(emphasis is mine) you should use second iterator from range and it should make insertion more efficient:
Complexity
Logarithmic in the size of the container in general, but amortized constant if the new element is inserted just before hint.
as for empty range, it is still fine to use second iterator as it should always point to greater than element or behind the last if not such one exists.
First, performance wise, it will not make any difference if you use range.first or range.second. Let's have a look at the return value of equal_range:
std::equal_range - return value
std::pair containing a pair of iterators defining the wanted range,
the first pointing to the first element that is not less than value
and the second pointing to the first element greater than value. If
there are no elements not less than value, last is returned as the
first element. Similarly if there are no elements greater than value,
last is returned as the second element
This means that - when obtained for a value key - both range.first and range.secod are represent positions wherekeymay be correctly inserted right before. So performance wise it should not matter if you userange.firstorrange.last`. Complexity should be "amortized constant", since the new element is inserted just before hint.
Second, when the range is "empty", range.first and range.second are both one-past-the-end, and therefore performance as well as result are identical, actually the same as if you used emplace without any hint.
See the following program demonstrating this:
int main()
{
std::multimap<std::string, std::string> m;
// some clutter:
m.emplace(std::make_pair(std::string("k"), std::string("1")));
m.emplace(std::make_pair(std::string("k"), std::string("2")));
m.emplace(std::make_pair(std::string("z"), std::string("1")));
m.emplace(std::make_pair(std::string("z"), std::string("2")));
// relevant portion of demo data: order a-c-b may be preserved
m.emplace(std::make_pair(std::string("x"), std::string("a")));
m.emplace(std::make_pair(std::string("x"), std::string("c")));
m.emplace(std::make_pair(std::string("x"), std::string("b")));
auto r = m.equal_range("x");
// will insert "x.zzzz" before "x.a":
m.emplace_hint(r.first, std::make_pair(std::string("x"), std::string("zzzz")));
// will insert "x.0" right after "x.b":
m.emplace_hint(r.second, std::make_pair(std::string("x"), std::string("0")));
auto rEmpty = m.equal_range("e");
// "empty" range, normal lookup:
m.emplace_hint(rEmpty.first, std::make_pair(std::string("e"), std::string("b")));
m.emplace_hint(rEmpty.second, std::make_pair(std::string("e"), std::string("a")));
auto rWrong = m.equal_range("k");
m.emplace_hint(rWrong.first, std::make_pair(std::string("z"), std::string("a")));
for (const auto &p : m) {
std::cout << p.first << " => " << p.second << '\n';
}
}
Output:
e => b
e => a
k => 1
k => 2
x => zzzz
x => a
x => c
x => b
x => 0
z => a
z => 1
z => 2
In short: if you have a valid range for key pre-calculated, then use it when inserting key. It will help anyway.
EDIT:
There have been discussions around whether an "invalid" hint might lead to an insertion at a position that does not then reflect the "order of insertion" for values with the same key. This might be concluded from a general multimap statement "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)".
I did not find support for the one or the other point of view in any normative document. I just found the following statement in cplusplus multimap/emplace_hint documentation:
emplate <class... Args>
iterator emplace_hint (const_iterator position, Args&&... args);
position Hint for the position where the element can be inserted. The function optimizes its insertion time if position points to the
element that will follow the inserted element (or to the end, if it
would be the last). Notice that this does not force the new element to
be in that position within the multimap container (the elements in a
multimap always follow a specific order). const_iterator is a member
type, defined as a bidirectional iterator type that points to
elements.
I know that this is not a normative reference, but at least my Apple LLVM 8.0 compiler adheres to this definition (see demo above):
If one inserts an element with a "wrong" hint, i.e. one pointing even before the position where a pair shall be inserted, the algorithm recognizes this and chooses a valid position (see inserting "z"=>"a" where a hint points to an "x"-element).
If we use a range for key "x" and use range.first, the position right before the first x is interpreted as a valid position.
So: I think that m.emplace_hint(r.first,... behaves in a way that the algorithm chooses a valid position immediately, and that to a position close to hint overrules the general statement "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)".

Why is std::set::lower_bound(x) (effectively) defined as the smallest number >= x rather than the largest number <= x?

Perhaps I'm misunderstanding the technical definition of lower bound but I would expect if I had a set a = { 0, 3, 4 } and computed the a.lower_bound(2) that the result would be 0. I.e. I would expect std::set::lower_bound to be close to the mathematical concept of infimum
And yet the standard library defines it as the largest number not less than (effectively >=) x.
What is the reasoning behind this?
The "[lower|upper]_bound" functions are meant to return a place in a set where you could insert a key that would not violate the ordering of the set. Because an iterator of an STL set points to before the next element, if lower_bound(2) returned an iterator to 0, then inserting 2 would violate the order of your set, it would now be {2, 0, 3, 4}. Upper bound serves to show the last place you could insert without violating set order.
This is most useful if your set may have duplicate key entries. Consider {0, 3, 3, 4}. lower_bound(3) would return an iterator to here: {0, *, 3, 3, 4}, while upper_bound(3) would return it here: {0, 3, 3, *, 4}.
It may help to consider the behavior of lower_bound and upper_bound together.
In the STL, ranges are always closed-open intervals. The range delimited by two iterators, first and last, includes all of the elements between first and last, including first and excluding last. Using interval notation, we'd represent this as [first, last).
lower_bound and upper_bound are defined such that they find the range of elements that compare equal to the specified value. If you iterate between lower_bound(x) and upper_bound(x), you will iterate over all of the elements that compare equal to x. If no element is equal to x, then it is guaranteed that lower_bound(x) == upper_bound(x).
This feature is less important for std::map, which has at most one element for every key, but is a very useful feature for non-unique associative containers and for the nonmember std::lower_bound that may be used with arbitrary sequences of sorted elements.
[Note that if you want to obtain both the lower and upper bound, you should call equal_range instead, which returns both.]

Test lower_bound's return value against the end iterator

In effective STL by Scott Meyers (page 195) there is the following line:
"The result of lower_bound must be tested to see if it's pointing to the value you're looking for. Unlike find, you can't just test lower_bound's return value against the end iterator."
Can anyone explain why you can't do this? seems to work fine for me.
It works fine for you because your element is present.
lower_bound returns an iterator to the first element not less than the given value, and upper_bound returns an iterator to the first element greater than the given value.
Given the array 1, 2, 3, 3, 4, 6, 7, lower_bound(..., 5) will return an iterator pointing to 6.
Hence, two ways of checking whether the value is present:
Use equal_range to also get the upper_bound (computing separately lower_bound and upper_bound will probably be suboptimal). If the std::distance between the bounds is greater than 0 then the element is present.
1, 2, 3, 3, 4, 6, 7
std::distance(std::lower_bound(v.begin(),v.end(),5), std::upper_bound(v.begin(),v.end(),5)) == 0 // 6 is absent
std::distance(std::lower_bound(v.begin(),v.end(),3), std::upper_bound(v.begin(),v.end(),3)) == 2 // 3 is present
Compare the element pointed by the iterator with your value (provided operators != and < are coherent), but you have to make sure it does not return the end iterator.
*(std::lower_bound(v.begin(), v.end(), 5)) != 5
Additionally, since lower_bound is a binary search algorithms it would be inconsistent to return end if the element was not found. Actually, the iterators returned by this algorithm can be used as a hint for a subsequent insertion operation for example.