From the documentation of std::nth_element we have:
template< class RandomIt >
void nth_element( RandomIt first, RandomIt nth, RandomIt last );
Partially sorts the range [first, last) in ascending order so that all
elements in the range [first, nth) are less than those in the range
[nth, last).
The thing that bothers me is the less word. Shouldn't it be less or equal? If the range is for example:
#include <algorithm>
#include <vector>
#include <iostream>
int main()
{
std::vector<int> numbers = {3, 2, 2, 2, 1};
auto middlePosition = numbers.begin() + 2;
std::nth_element(numbers.begin(), middlePosition, numbers.end());
for (int x : numbers)
std::cout << x << std::endl;
return 0;
}
The algorithm cannot make both numbers before the middlePosition less than 2, because there is only one such number. The algorithm does its best and the output is as desired:
1
2
2
3
2
Can I rely on such nice behavior?
My implementation (gcc 4.7) uses the introselect algorithm. Unfortunately I couldn't find the requirements on input of the algorithm. Does introselect need all values to be different?
I think, cppreference is incorrect here. Take loot at the Standard(N3337 25.4.2):
template<class RandomAccessIterator>
void nth_element
(
RandomAccessIterator first,
RandomAccessIterator nth,
RandomAccessIterator last
);
... For any iterator i in the range [first,nth) and any iterator j in
the range [nth,last) it holds that: !(*j < *i) or comp(*j,
*i) == false.
So, elements in range [first,nth) will be less or equal than elements in range [nth,last).
Here is a better definition of std::nth_element:
Rearranges the elements in the range [first,last), in such a way that
the element at the nth position is the element that would be in that
position in a sorted sequence.
The other elements are left without any specific order, except that
none of the elements preceding nth are greater than it, and none of
the elements following it are less.
No, it does not sort equal or less! nth_element randomly picks one of the values, that could be at the nth position in the sorted array. Since this would be the nth position there is no way, it can put all equal on the left side, since then it would not be the nth element anymore. Test it! The values equal appear on both sides of the nth position.
Related
I've been experimenting with the "lower_bound()/upper_bound()" functions in C++ w.r.t. arrays/vectors, and I get incorrect results when applying custom compare operators to the function.
My current understanding (based on https://www.cplusplus.com/reference/algorithm/upper_bound/) is that when you search for some value 'val' (of any datatype) in an array, it returns the first iterator position "it" in the array (from left to right) that satisfies !comp(val,*it), is this wrong? If so, how exactly does the searching work?
P.S. In addition, what is the difference of using lowerbound/upperbound when your searching criterion is a specific boolean compare function?
Here is an example that produced erroneous results:
auto comp2 = [&](int num, pair<int,int>& p2){return num>p2.second;};
vector<pair<int,int>> pairs = {{1,2},{2,3},{3,4}}; //this array should be binary-searchable with 'comp2' comparator, since pairs[i].second is monotonously increasing
int pos2 = upper_bound(pairs.begin(),pairs.end(),2,comp2)-pairs.begin();
cout<<pos2<<endl; //outputs 3, but should give 0 because !comp2(2,arr[0]) is true, and arr[0] is the ealiest element in the array
Thanks!
I think most (If not all) of the comparator functions are less, it can be std::less or something similar. So when we provide a custom comp function, we have to provide the less logic and think of it as less.
Now back to the upper_bound, it returns the first element greater than the value, which means our less should return true for it to stop (As Francois pointed out). While our comp function always returns false.
And your understanding about !comp(val,*it) is also not correct. It is the condition to continue the search, not to stop it.
Here is an example implementation of the upper_bound, let's take a look:
template<class ForwardIt, class T, class Compare>
ForwardIt upper_bound(ForwardIt first, ForwardIt last, const T& value, Compare comp)
{
ForwardIt it;
typename std::iterator_traits<ForwardIt>::difference_type count, step;
count = std::distance(first, last);
while (count > 0) {
it = first;
step = count / 2;
std::advance(it, step);
if (!comp(value, *it)) {
first = ++it;
count -= step + 1;
}
else
count = step;
}
return first;
}
You can see, if (!comp(value, *it)) is when the less return false, it means the value is greater than the current item, it will move forward and continue from the next item. (Because the items are increasing).
In the other case, it will try to reduce the search distance (By half the count) and hope to find earlier item that is greater than value.
Summary: You have to provide comp as less logic and let the upper_bound do the rest.
upper_bound returns the first element that satisfies comp(val, *it). In the link you provided, it shows
template <class ForwardIterator, class T>
ForwardIterator upper_bound (ForwardIterator first, ForwardIterator last, const T& val)
{
ForwardIterator it;
iterator_traits<ForwardIterator>::difference_type count, step;
count = std::distance(first,last);
while (count>0)
{
it = first; step=count/2; std::advance (it,step);
if (!(val<*it)) // or: if (!comp(val,*it)), for version (2)
{ first=++it; count-=step+1; }
else count=step;
}
return first;
}
Returns an iterator pointing to the first element in the range [first,last) which compares greater than val.
The searching works by starting at position 0(first). It then uses count to see the range of values it needs to check. It checks the middle of the range (first+count/2), and if that does not satisfy the condition, that position is now first (discarding all values before it), and repeats with the new first and range. If it does satisfy the condition, then the algorithm can discard all values after that, and repeat with the new range. When the range drops to 0, the algorithm can end. It assumes that if arr[5] is false, arr[0], arr[1] ... arr[4] are also false. Same with if arr[8] is true, arr[9], arr[10] ... arr[n] are also true.
The reason your code does not work is because the comparator used returns num>p2.second, meaning it looks for a value of p2.second that is less than num. Since you put in 2 for num, and there is no p2.second less than that in the vector, the output points to a position outside of the vector because it didn't find anything.
The difference between upper_bound and lower_bound is that upper_bound looks for the first value that satisfies the condition, while lower_bound looks for the first value that does not satisfy the condition. So
lower_bound(v.begin(), v.end(), val, [](int it, int val) {return !(val < it);});
is the same as
upper_bound(v.begin(), v.end(), val, [](int val, int it){return val < it;});
Note that for lower_bound, the comparator used takes (*it, val), not (val, *it).
I guess the only difference is how easy it is to frame the comparator in those terms - realizing that a<b is the same as not a>=b.
More explained here. I liked the explanation that said it finds [lower_bound, upper_bound) when using the same comparator.
vector<int> vec = {2,4,3};
vector<int>::iterator it;
it=lower_bound(vec.begin(),vec.end(),3);
cout<<*it;
This returns an output of 4 not 3 but
vector<int> vec = {2,3,4};
vector<int>::iterator it;
it=lower_bound(vec.begin(),vec.end(),3);
cout<<*it;
But this returns the correct output of 3. Please help me understand why it is failing in the corner case.
According to cppreference and its documentation of std::lower_bound:
Returns an iterator pointing to the first element in the range [first,
last) that is not less than (i.e. greater or equal to) value, or last
if no such element is found.
So, std::lower_bound returns the first element that is greater or equal to the value (3 here).
For {2, 4, 3}, the first element greater or equal to 3 is 4, but for {2, 3, 4,} it is 3.
P.S. According to the cppreference again:
The range [first, last) must be partitioned with respect to the
expression element < value or comp(element, value), i.e., all elements
for which the expression is true must precede all elements for which
the expression is false. A fully-sorted range meets this criterion.
Both of your vectors are partitioned correctly with the condition (element < value)
In short, the vector {2,4,3} doesn't meet the requirements of lower_bound.
https://en.cppreference.com/w/cpp/algorithm/lower_bound says:
The range [first, last) must be partitioned with respect to the
expression element < value or comp(element, value), i.e., all elements
for which the expression is true must precede all elements for which
the expression is false. A fully-sorted range meets this criterion.
If your vector is not sorted then use std::find.
I am new to STL and used find() and upper_bound() functions on vector to find the position of 6 . The code is given below
#include <bits/stdc++.h>
using namespace std;
int main()
{
vector<int> sam ={1,2,5,3,7,8,4,6};
int f=upper_bound(sam.begin(), sam.end(), 6)- sam.begin();
vector<int>::iterator it;
it =find (sam.begin(), sam.end(), 6);
int d=it - sam.begin() ;
cout<<d<<" "<<f<<endl;
return 0;
}
The output when you run the code is 7 4 ,while I expected it to be 7 7 .
What am I doing wrong ?
cppreference.com for std::upper_bound() explains it nicely (emphasis mine):
Returns an iterator pointing to the first element in the range [first, last) that is greater than value, or last if no such element is found.
The range [first, last) must be partitioned with respect to the
expression !(value < element) or !comp(value, element), i.e., all
elements for which the expression is true must precede all elements
for which the expression is false. A fully-sorted range meets this
criterion.
In your case, you have a 7 (greater than 6, at index 4) appearing before a 4 (which is equal or less than 6), so the precondition is not met.
The idea of std::upper_bound() and its companions is to quickly do binary searches in sorted arrays. As opposed to linear search as in std::find(), it only needs O(log(n)) time complexity instead of O(n).
I have a a vector of pair with the following typdef
typedef std::pair<double, int> myPairType;
typedef std::vector<myPairType> myVectorType;
myVectorType myVector;
I fill this vector with double values and the int part of the pair is an index.
The vector then looks like this
0.6594 1
0.5434 2
0.5245 3
0.8431 4
...
My program has a number of time steps with slight variations in the double values and every time step I sort this vector with std::sort to something like this.
0.5245 3
0.5434 2
0.6594 1
0.8431 4
The idea is now to somehow use the vector from the last time step (the "old vector, already sorted) to presort the current vector (the new vector, not yet sorted). And use an insertions sort or tim sort to sort the "rest" of the then presorted vector.
Is this somehow possible? I couldn't find a function to order the "new" vector of pairs by one part (the int part).
And if it is possible could this be faster then sorting the whole unsorted "new" vector?
Thanks for any pointers into the right direction.
tiom
UPDATE
First of all thanks for all the suggestions and code examples. I will have a look at each of them and do some benchmarking if they will speed up the process.
Since there where some questions regarding the vectors I will try to explain in more detail what I want to accomplish.
As I said I have a number if time steps 1 to n. For every time step I have a vector of double data values with approximately 260000 elements.
In every time step I add an index to this vector which will result in a vector of pairs <double, int>. See the following code snippet.
typedef typename myVectorType::iterator myVectorTypeIterator; // iterator for myVector
std::vector<double> vectorData; // holds the double data values
myVectorType myVector(vectorData.size()); // vector of pairs <double, int>
myVectorTypeIterator myVectorIter = myVector.begin();
// generating of the index
for (int i = 0; i < vectorData.size(); ++i) {
myVectorIter->first = vectorData[i];
myVectorIter->second = i;
++myVectorIter;
}
std::sort(myVector.begin(), myVector.end() );
(The index is 0 based. Sorry for my initial mistake in the example above)
I do this for every time step and then sort this vector of pairs with std::sort.
The idea was now to use the sorted vector of pairs of time step j-1 (lets call it vectorOld) in time step j as a "presorter" for the "new" myVector since I assume the ordering of the sorted "new" myVector of time step j will only differ in some cases from the already sorted vectorOld of time step j-1.
With "presorter" I mean to rearrange the pairs in the "new" myVector into a vector presortedVector of type myVectorType by the same index order as the vectorOld and then let a tim sort or some similar sorting algorithm that is good in presorted date do the rest of the sorting.
Some data examples:
This is what the beginning of myVector looks like in time step j-1 before the sorting.
0.0688015 0
0.0832928 1
0.0482259 2
0.142874 3
0.314859 4
0.332909 5
...
And after the sorting
0.000102207 23836
0.000107378 256594
0.00010781 51300
0.000109315 95454
0.000109792 102172
...
So I in the next time step j this is my vectorOld and I like to take the element with index 23836 of the "new" myVector and put it in the first place of the presortedVector, element with index 256594 should be the second element in presortedVector and so on. But the elements have to keep their original index. So 256594 will not be index 0 but only element 0 in presortedVector still with index 256594
I hope this is a better explanation of my plan.
First, scan through the sequence to find the first element that's smaller than the preceding one (either a loop, or C++11's std::is_sorted_until). This is the start of the unsorted portion. Use std::sort on the remainder, then merge the two halves with std::inplace_merge.
template<class RandomIt, class Compare>
void sort_new_elements(RandomIt first, RandomIt last, Compare comp)
{
RandomIt mid = std::is_sorted_until(first, last, comp);
std::sort(mid, last, comp);
std::inplace_merge(first, mid, last, comp);
}
This should be more efficient than sorting the whole sequence indiscriminately, as long as the presorted sequence at the front is significantly larger than the unsorted part.
Using the sorted vector would likely result in more comparisons (just to find a matching item).
What you seem to be looking for is a self-ordering container.
You could use a set (and remove/re-insert on modification).
Alternatively you could use Boost Multi Index which affords a bit more convenience (e.g. use a struct instead of the pair)
I have no idea if this could be faster than sorting the whole unsorted "new" vector. It will depend on the data.
But this will create a sorted copy of a new vector based on the order of an old vector:
myVectorType getSorted(const myVectorType& unsorted, const myVectorType& old) {
myVectorType sorted(unsorted.size());
auto matching_value
= [&unsorted](const myPairType& value)
{ return unsorted[value.second - 1]; };
std::transform(old.begin(), old.end(), sorted.begin(), matching_value);
return sorted;
}
You will then need to "finish" sorting this vector. I don't know how much quicker (if at all) this will be than sorting it from scratch.
Live demo.
Well you can create new vector with the order of the old and then use algorithms that has good complexity for (nearly) sorted inputs for the restoration of order.
Below I put an example of how it works, with Mark's function as restore_order:
#include <iostream>
#include <algorithm>
#include <vector>
#include <utility>
using namespace std;
typedef std::pair<double, int> myPairType;
typedef std::vector<myPairType> myVectorType;
void outputMV(const myVectorType& vect, std::ostream& out)
{
for(const auto& element : vect)
out << element.first << " " << element.second << '\n';
}
//https://stackoverflow.com/a/28813905/1133179
template<class RandomIt, class Compare>
void restore_order(RandomIt first, RandomIt last, Compare comp)
{
RandomIt mid = std::is_sorted_until(first, last, comp);
std::sort(mid, last, comp);
std::inplace_merge(first, mid, last, comp);
}
int main() {
myVectorType myVector = {{3.5,0},{1.4,1},{2.5,2},{1.0,3}};
myVectorType mv2 = {{3.6,0},{1.35,1},{2.6,2},{1.36,3}};
auto comparer = [] (const auto& lhs, const auto& rhs) { return lhs.first < rhs.first;};
// make sure we didn't mess with the initial indexing
int i = 0;
for(auto& element : myVector) element.second = i++;
i = 0;
for(auto& element : mv2) element.second = i++;
//sort the initial vector
std::sort(myVector.begin(), myVector.end(), comparer);
outputMV(myVector, cout);
// this will replace each element of myVector with a corresponding
// value from mv2 using the old sorted order
std::for_each(myVector.begin(), myVector.end(),
[mv2] (auto& el) {el = mv2[el.second];}
);
// restore order in case it was different for the new vector
restore_order(myVector.begin(), myVector.end(), comparer);
outputMV(myVector, cout);
return 0;
}
This works in O(n) up to the point of restore then. Then the trick is to use good function for it. A nice candidate will have good complexity for nearly sorted inputs. I used function that Mark Ransom posted, which works, but still isn't perfect.
It could get outperformed by bubble sort inspired method. Something like, iterate over each element, if the order between current and next element is wrong recursively swap current and next. However there is a bet on how much the order changes - if the order doesn't vary much you will stay close to O(2n), if does - you will go up to O(n^2).
I think the best would be an implementation of natural merge sort. That has best case (sorted input) O(n), and worst O(n log n).
The following snippet is returning me 0. I expected it to be 1. What's wrong going on here?
#include <iostream>
#include <iterator>
#include <ostream>
#include <algorithm>
#include <vector>
using namespace std;
int main(){
vector<int> v;
int arr[] = {10,20,30,40,50};
v.push_back(11);
v.push_back(22);
copy(arr,arr + sizeof(arr)/sizeof(arr[0]),back_inserter(v)); // back_inserter makes space starting from the end of vector v
for(auto i = v.begin(); i != v.end(); ++i){
cout << *i << endl;
}
cout << endl << "Binary Search - " << binary_search(v.begin(), v.end(), 10) <<endl; // returns bool
}
I am using gcc /usr/lib/gcc/i686-linux-gnu/4.6/lto-wrapper
I ran the program and saw this:
11
22
10
20
30
40
50
Binary Search - 0
Your array is not sorted, therefore, binary search fails. (it sees 11 in the first position, and concludes 10 does not exist here)
You either want to ensure the array is sorted before binary searching or use the regular std::find.
binary_search says:
Checks if the sorted range [first, last) contains an element equal to
value. The first version uses operator< to compare the elements, the
second version uses the given comparison function comp.
Your list is not sorted, it contains the elements 11 and 22 prior to 10.
Your array is not sorted, so binary_search got undefined behavior. Try std::find instead
bool found = std::find(v.begin(), v.end(), 10) != v.end()
ยง25.4.3.4 of the C++11 standard (3242 draft)
Requires: The elements e of [first,last) are partitioned with respect to the expressions e < value and !(value < e) or comp(e,
value) and !comp(value, e). Also, for all elements e of [first, last),
e < value implies !(value < e) or comp(e, value) implies !comp(value,
e).
"Unexpected behavior"? There's nothing unexpected here.
The whole idea of binary search algorithm is taking advantage of the fact that the input array is sorted. If the array is not sorted, there can't be any binary search on it.
When you use std::binary_search (as well as all other standard binary search-based algorithms), the input sequence must be sorted in accordance with the same comparison predicate as the one used by std::binary_search. Since you did not pass any custom predicate to std::binary_search, it will use the ordering defined by < operator. That means that your input Sequence of integers must be sorted in ascending order.
In your case the input sequence does not satisfy that requirement. std::binary_search cannot be used on it.