How to partially sort in a stable way - c++

Is std::partial_sort stable and if not, is there a stable partial sort provided by the standard library or e.g. boost?

partial_sort is efficient and easy to provide because it is basically a quicksort where recursions that aren't necessary for the desired range are skipped. There is no equivalent efficient partial stable sort algorithm; stable_sort is usually implemented as a merge sort, and merge sort's recursion works the wrong way.
If you want a partial sort to be stable, you need to associate position information with each element. If you have a modifiable zip range you can do that by zipping together the elements and a iota vector, but modifiable zip ranges are actually impossible to build within the current iterator concepts, so it's easier to do indirect sorting via iterators and rely on the iterators' ordering. In other words, you can do this:
using MyThingV = std::vector<MyThing>;
using MyThingIt = typename MyThingV::iterator;
MyThingV things;
// Set up a vector of iterators. We'll sort that.
std::vector<MyThingIt> sorted; sorted.reserve(things.size());
for (auto it = things.begin(); it != things.end(); ++it) sorted.push_back(it);
std::partial_sort(sorted.begin(), sorted.begin() + upto_index, sorted.end(),
[](MyThingIt lhs, MyThingIt rhs) {
// First see if the underlying elements differ.
if (*lhs < *rhs) return true;
if (*rhs < *lhs) return false;
// Underlying elements are the same, so compare iterators; these represent
// position in original vector.
return lhs < rhs;
});
Now your base vector is still unsorted, but the vector of iterators is sorted the way you want.

Related

Is there an even faster approach than swap-and-pop for erasing from std::vector?

I am asking this as the other relevant questions on SO seem to be either for older versions of the C++ standard, do not mention any form of parallelization, or are focused on keeping the ordering/indexing the same as elements are removed.
I have a vector of potentially hundreds of thousands or millions of elements (which are fairly light structures, around ~20 bytes assuming they're compacted down).
Due to other restrictions, it must be a std::vector and other containers would not work (like std::forward_list), or be even less optimal in other uses.
I recently swapped from simple it = std::erase(it) approach to using pop-and-swap using something like this:
for(int i = 0; i < myVec.size();) {
// Do calculations to determine if element must be removed
// ...
// Remove if needed
if(elementMustBeRemoved) {
myVec[i] = myVec.back();
myVec.pop_back();
} else {
i++;
}
}
This works, and was a significant improvement. It cut the runtime of the method down to ~61% of what it was previously. But I would like to improve this further.
Does C++ have a method to remove many non-consecutive elements from a std::vector efficiently? Like passing a vector of indices to erase() and have C++ do some magic under the hood to minimize movement of data?
If so, I could have threads individually gather indices that must be removed in parallel, and then combine them and pass them to erase().
Take a look at std::remove_if algorithm. You could use it like this:
auto firstToErase = std::remove_if(myVec.begin(), myVec.end(),
[](const & T x){
// Do calculations to determine if element must be removed
// ...
return elementMustBeRemoved;});
myVec.erase(firstToErase, myVec.end());
cppreference says that following code is a possible implementation for remove_if:
template<class ForwardIt, class UnaryPredicate>
ForwardIt remove_if(ForwardIt first, ForwardIt last, UnaryPredicate p)
{
first = std::find_if(first, last, p);
if (first != last)
for(ForwardIt i = first; ++i != last; )
if (!p(*i))
*first++ = std::move(*i);
return first;
}
Instead of swapping with the last element it continuously moves through a container building up a range of elements which should be erased, until this range is at the very end of vector. This looks like a more cache-friendly solution and you might notice some performance improvement on a very big vector.
If you want to experiment with a parallel version, there is a version (4) which allows to specify execution policy.
Or, since C++20 you can type sligthly less and use erase_if.
However, in such case you lose the option to choose execution policy.
Is there an even faster approach than swap-and-pop for erasing from std::vector?
Ever since C++11, the optimal removal of single element from vector without preserving order has been move-and-pop rather than swap-and-pop.
Does C++ have a method to remove many non-consecutive elements from a std::vector efficiently?
The remove-erase (std::erase in C++20) idiom is the most efficient that the standard provides. std::remove_if does preserve order, and if you don't care about that, then a more efficient algorithm may be possible. But standard library does not come with unstable remove out of the box. The algorithm goes as follows:
Find first element to be removed (a)
Find last element to not be removed (b)
Move b to a.
Repeat between a and b until iterators meet.
There is a proposal P0048 to add such algorithm to the standard library, and there is a demo implementation in https://github.com/WG21-SG14/SG14/blob/6c5edd5c34e1adf42e69b25ddc57c17d99224bb4/SG14/algorithm_ext.h#L84

Error when comparing iterators c++ [duplicate]

It's very basic, but I could not find a similar question here. I am trying to iterate the same sorted STL list from different directions using list. I know I can compare an iterator to the list.begin() and list.end(), so why doesn't this work?
list<family>::iterator itLargeFamily =
families.begin(); //starts from the biggest families
list<family>::iterator itSmallFamily =
families.end(); //starts from the smallest families
for (; itSmallFamily > itLargeFamily; --itSmallFamily, ++itLargeFamily) {
// stuff...
}
The error is of course
no operator > matches these operands
100% chance I'm missing something basic.
Only random access iterators are ordered. std::list iterators are only bidirectional iterators, so they do not support operator< or operator>.
Instead, you could do your comparison with !=.
while (itSmallFamily != itLargeFamily)
You'll have to make sure that the iterators don't jump over each other for this to work though. That is, if itSmallFamily is only one increment away from itLargeFamily, you will simply swap them over and they'll never have been equal to each other.
You could instead use std::vector, whose iterators are random access iterators. In addition, std::array and std::deque are also support random access.
From the comments and the answer of sftrabbit you can see that relational operators are only defined for random access iterators, and std::list has only bidirectional iterators. So there are several solutions for your problem:
Use std::vector or std::array. They provide random access iterators, have better performance for smaller sizes, and depending of how you fill/use them for larger sizes as well, and they have better memory footprint. This is the preferred solution, I'd call it the "default" solution. Use other containers only if there is a very good, measurable reason (e.g. a profiler tells you that using that container is a performance bottleneck).
Since you know the size of the list, you can use a counter for your iterations:
for (size_t i = 0, count = families.size()/2;
i < count;
++i, --itSmallFamily, ++itLargeFamily)
{ /* do stuff */ }
Since your list is sorted, you can compare the elements the iterators point to instead of the iterators themselves.

why can't i use list iterator logical comparisons operator?

It's very basic, but I could not find a similar question here. I am trying to iterate the same sorted STL list from different directions using list. I know I can compare an iterator to the list.begin() and list.end(), so why doesn't this work?
list<family>::iterator itLargeFamily =
families.begin(); //starts from the biggest families
list<family>::iterator itSmallFamily =
families.end(); //starts from the smallest families
for (; itSmallFamily > itLargeFamily; --itSmallFamily, ++itLargeFamily) {
// stuff...
}
The error is of course
no operator > matches these operands
100% chance I'm missing something basic.
Only random access iterators are ordered. std::list iterators are only bidirectional iterators, so they do not support operator< or operator>.
Instead, you could do your comparison with !=.
while (itSmallFamily != itLargeFamily)
You'll have to make sure that the iterators don't jump over each other for this to work though. That is, if itSmallFamily is only one increment away from itLargeFamily, you will simply swap them over and they'll never have been equal to each other.
You could instead use std::vector, whose iterators are random access iterators. In addition, std::array and std::deque are also support random access.
From the comments and the answer of sftrabbit you can see that relational operators are only defined for random access iterators, and std::list has only bidirectional iterators. So there are several solutions for your problem:
Use std::vector or std::array. They provide random access iterators, have better performance for smaller sizes, and depending of how you fill/use them for larger sizes as well, and they have better memory footprint. This is the preferred solution, I'd call it the "default" solution. Use other containers only if there is a very good, measurable reason (e.g. a profiler tells you that using that container is a performance bottleneck).
Since you know the size of the list, you can use a counter for your iterations:
for (size_t i = 0, count = families.size()/2;
i < count;
++i, --itSmallFamily, ++itLargeFamily)
{ /* do stuff */ }
Since your list is sorted, you can compare the elements the iterators point to instead of the iterators themselves.

C++ get index of element of array by value

So far, I have been storing the array in a vector and then looping through the vector to find the matching element and then returning the index.
Is there a faster way to do this in C++? The STL structure I use to store the array doesn't really matter to me (it doesn't have to be a vector). My array is also unique (no repeating elements) and ordered (e.g. a list of dates going forward in time).
Since the elements are sorted, you can use a binary search to find the matching element. The C++ Standard Library has a std::lower_bound algorithm that can be used for this purpose. I would recommend wrapping it in your own binary search algorithm, for clarity and simplicity:
/// Performs a binary search for an element
///
/// The range `[first, last)` must be ordered via `comparer`. If `value` is
/// found in the range, an iterator to the first element comparing equal to
/// `value` will be returned; if `value` is not found in the range, `last` is
/// returned.
template <typename RandomAccessIterator, typename Value, typename Comparer>
auto binary_search(RandomAccessIterator const first,
RandomAccessIterator const last,
Value const& value,
Comparer comparer) -> RandomAccessIterator
{
RandomAccessIterator it(std::lower_bound(first, last, value, comparer));
if (it == last || comparer(*it, value) || comparer(value, *it))
return last;
return it;
}
(The C++ Standard Library has a std::binary_search, but it returns a bool: true if the range contains the element, false otherwise. It's not useful if you want an iterator to the element.)
Once you have an iterator to the element, you can use std::distance algorithm to compute the index of the element in the range.
Both of these algorithms work equally well any random access sequence, including both std::vector and ordinary arrays.
If you want to associate a value with an index and find the index quickly you can use std::map or std::unordered_map. You can also combine these with other data structures (e.g. a std::list or std::vector) depending on the other operations you want to perform on the data.
For example, when creating the vector we also create a lookup table:
vector<int> test(test_size);
unordered_map<int, size_t> lookup;
int value = 0;
for(size_t index = 0; index < test_size; ++index)
{
test[index] = value;
lookup[value] = index;
value += rand()%100+1;
}
Now to look up the index you simply:
size_t index = lookup[find_value];
Using a hash table based data structure (e.g. the unordered_map) is a fairly classical space/time tradeoff and can outperform doing a binary search for this sort of "reverse" lookup operation when you need to do a lot of lookups. The other advantage is that it also works when the vector is unsorted.
For fun :-) I've done a quick benchmark in VS2012RC comparing James' binary search code with a linear search and with using unordered_map for lookup, all on a vector:
To ~50000 elements unordered_set significantly (x3-4) outpeforms the binary search which is exhibiting the expected O(log N) behavior, the somewhat surprising result is that unordered_map loses it's O(1) behavior past 10000 elements, presumably due to hash collisions, perhaps an implementation issue.
EDIT: max_load_factor() for the unordered map is 1 so there should be no collisions. The difference in performance between the binary search and the hash table for very large vectors appears to be caching related and varies depending on the lookup pattern in the benchmark.
Choosing between std::map and std::unordered_map talks about the difference between ordered and unordered maps.

one line assert to test if STL container is sorted

Is there a way to write a one line condition that would return true if STL container is sorted? The container in question is std::vector
I intend to use it in an assert
Use adjacent_find in combination with less or greater functor.
Restriction:
You should know whether the container is sorted in ascending or descending.
If the vector is supposed to be sorted in ascending order:
//Checks the first element where adjacent value where elem > nextElem
//returns end if the vector is sorted!
//Complexity is O(n)
vector<int>::iterator pos = std::adjacent_find (aVec.begin(), aVec.end(), // range
std::greater<int>());
if (pos == aVec.end())
{
std::cout<<" sorted"<<endl;
}
else
{
std::cout<<"Not sorted"<<endl;
}
You can use std::is_sorted(vec.begin(),vec.end()) to test if it is sorted. Note, though, that this is O(n).
It depends what STL data type you want to use.
A map is already sorted by the key provided the key has overloaded compare operators. You're good to go here.
A list requires that you explicitly call the sort function. You will need to keep track of whether or not you sorted it yet.
Hope this helps.