c++ std::vector std::sort infinite loop - c++

I ran across an issue whenever I was trying to sort a vector of objects that was resulting in an infinite loop. I am using a custom compare function that I passed in to the sort function.
I was able to fix the issue by returning false when two objects were equal instead of true but I don't fully understand the solution. I think it's because my compare function was violating this rule as outlined on cplusplus.com:
Comparison function object that,
taking two values of the same type
than those contained in the range,
returns true if the first argument
goes before the second argument in the
specific strict weak ordering it
defines, and false otherwise.
Can anyone provide a more detailed explanation?

The correct answer, as others have pointed out, is to learn what a "strict weak ordering" is. In particular, if comp(x,y) is true, then comp(y,x) has to be false. (Note that this implies that comp(x,x) is false.)
That is all you need to know to correct your problem. The sort algorithm makes no promises at all if your comparison function breaks the rules.
If you are curious what actually went wrong, your library's sort routine probably uses quicksort internally. Quicksort works by repeatedly finding a pair of "out of order" elements in the sequence and swapping them. If your comparison tells the algorithm that a,b is "out of order", and it also tells the algorithm that b,a is "out of order", then the algorithm can wind up swapping them back and forth over and over forever.

If you're looking for a detailed explanation of what 'strict weak ordering' is, here's some good reading material: Order I Say!
If you're looking for help fixing your comparison functor, you'll need to actually post it.

If the items are the same, one does not go before the other. The documentation was quite clear in stating that you should return false in that case.

The actual rule is specified in the C++ standard, in 25.3[lib.alg.sorting]/2
Compare is used as a function object which returns true if the first argument is less than the second, and false otherwise.
The case when the arguments are equal falls under "otherwise".

A sorting algorithm could easily loop because you're saying that A < B AND B < A when they're equal. Thus the algorithm might infinitely try to swap elements A and B, trying to get them in the correct order.

Strict weak ordering means a < b == true and when you return true for equality its a <= b == true. This requirement is needed for optimality for different sort algorithms.

Related

What will std::sort do if the comparison is inconsistent? (A<B, B<C, C<A)

I need to sort a file list by date. There's this answer how to do it. It worries me though: it operates on a live filesystem that can change during operation.
The comparison function uses:
struct FileNameModificationDateComparator{
//Returns true if and only if lhs < rhs
bool operator() (const std::string& lhs, const std::string& rhs){
struct stat attribLhs;
struct stat attribRhs; //File attribute structs
stat( lhs.c_str(), &attribLhs);
stat( rhs.c_str(), &attribRhs); //Get file stats
return attribLhs.st_mtime < attribRhs.st_mtime; //Compare last modification dates
}
};
From what I understand, this function can, and will be called multiple times against the same file, comparing it against different files. The file can be modified by external processes while sort is running; one of older files can become the newest in between two comparisons and turn up older than a rather old file, and later newer than one of newest files...
What will std::sort() do? I'm fine with some scarce ordering errors in the result. I'm not fine with a crash or a freeze (infinite loop) or other such unpleasantries. Am I safe?
Am I safe?
No.
std::sort requires a comparison with strict weak ordering and A<B, B<C, C<A violates that.
This violation incurs undefined behavior, and in practice, results in some of the worst kinds of undefined behavior.
It should also be noted that any sort algorithm that were written to work on elements that arbitrarily change ordering during the sort would be near-impossible. At no time would the algorithm know that the entire collection is currently sorted.
As other answers have already said, handing std::sort a comparator that doesn't satisfy the weak strict ordering requirement and is preserved when called multiple times with the same value will cause undefined behavior.
That doesn't only mean that the range may end up not correctly sorted, it may actually cause more serious problems, not only in theory, but also in practice. A common one is as you already said infinite loops in the algorithm, but it can also introduce crashes or vulnerabilities.
For example (I haven't checked whether other implementations behave similarly) I looked at libstdc++'s std::sort implementation, which as part of introsort uses insertion sort. The insertion sort calls a function __unguarded_linear_insert, see github mirror. This function performs a linear search on a range via the comparator without guarding for the end of the range, because the caller is supposed to have already verified that the searched item will fall into the range. If the result of the comparison changes between the guard comparison in the caller and the unguarded linear search, the iterator will be incremented out-of-bounds, which could produce a heap overrun or null dereference or anything else depending on the iterator type.
Demonstration see https://godbolt.org/z/8qajYEad7.
std::sort() assumes that the collection is sortable.
Relational algebra defines a set as sortable if:
it's reflexive, that is, a <= a is true
antisymmetric, that is: (a <= b and b <= a) <=> a = b
transitive, that is: (a <= b <= c) => a <= c
See the definition of partial ordering at page 7 of https://web.stanford.edu/class/archive/cs/cs103/cs103.1126/handouts/060%20Relations.pdf
In practice, reflexivity is not a necessary expectation, because, even though a < a is false, but a sorting algorithm may unnecessarily swap equal elements, so, it's strongly advisable to make it reflexive.
Your problem statement says that the relation over your collection is not transitive. But mind you, it is strictly transitive in any moment, the problem is, that during the (short) duration of your sorting algorithm elements may change their values.
This is not a well-defined behavior and in C++ it is undefined behavior.
So, the way I would approach your problem would be to bank on the fact that it's transitive at any time. Also, why would you measure the file sizes each time you compare them? Measuring files is I/O operation and slows down your process. It makes much more sense to measure the files only once, before you sort them, store the results into a collection whose items may change their order, but the values themselves will not change (file1's size will be measured before the algorithm and from there on, until the end of the sort will be unchanged in your set, even if it's no longer true).
The risk involved with this approach is that the result would be deprecated by a few milliseconds that passed since the measurements, a problem that you already specified as being acceptable.
Furthermore, if you need this sorting often, then it might make sense to do a sorting periodically (maybe once every 10 minutes, or the time interval you choose), cache the results and whenever you need the sort, just refer the cache.

Sort function from standard library gives an error using an iterator

I am relatively new to C++ language and trying to implement sort function from algorithm library defined in standard namespace and use std::sort directly.
The common structure to sort a vector using sort is given using iterator and comparison function.
Consider vector v as {4,3,5,9} and after sorting it will look like {9,5,4,3}.
For an instance
std::sort(v.begin(),v.end(),a>b)
So, I wanted to use this method to sort an list of nodes based on heuristic value for my A* search algorithm.Heuristic is basically addition of 2 attributes of Node object and I the sorting operation has to be done vector of this nodes and I want to use
open_list.begin() and open_list.end() as my iterators
to use my compare function as a third argument for std::sort() function here is actual implementation:
std::sort(open_list.begin(),open_list.end(),open_list.begin()->g_value + open_list.begin()->h_value > open_list.end()->g_value + open_list.end()->h_value );
Here, I basically am adding h and g values which are attributes of object Node and
open_list is a vector of pointers to the nodes. I felt my implementation was right but it throws me a weird error which looks like this:
/home/piyushkumar/CppND-Route-Planning-Project/src/route_planner.cpp:65:93: error: request for member ‘h_value’ in ‘((RoutePlanner)this)->RoutePlanner::open_list.std::vector::begin().__gnu_cxx::__normal_iterator >::operator->()’, which is of pointer type ‘RouteModel::Node*’ (maybe you meant to use ‘->’ ?)
std::sort(open_list.begin(),open_list.end(),open_list.begin()->g_value + open_list.begin()->h_value > open_list.end()->g_value + open_list.end()->h_value );
Some clarification regarding the error:
RouteModel is an class and Node inherits from that class. Why this simple comparison function as a 3rd argument fails and says that you should use -> which I have already used to retrieve values of g_value and h_value from Node object.
Any help and leads will be appreciated.
Ok let's sort things out one by one.
First about your error message. Your vector stores pointers and keep in mind that a->b is equivalent to (*a).b, so open_list.begin()->h_value equals to open_list.front().h_value, and a pointer clearly doesn't have member variable. To refer to the member variable, you need to write (*open_list.begin())->h_value. Furthermore, dereferencing .end() gives you undefined behaviour immediately. To access the last element of std::vector, use .back() instead of *(you_vector.end()). (Remember to check the vector is not empty beforehand! Otherwise you will step into undefined behaviour again :) )
Secondly your idea about how to use std::sort is wrong. To sort a range of elements by a "standard" you chose, the first two parameters of sort provide the information about the range, and the third parameter is your "standard", so it must be a "callable", and it takes two parameters and tell std::sort whether the first parameter should be sorted before the second. As a result, to sort v in descending order, you need to call std::sort in the following way:
std::sort(v.begin(), v.end(), [](int a,int b)->bool{ return a > b;});
Here, the third parameter is a lambda expression, if you don't know what the hell it is, google helps you. (FYI callable may not necessarily be a lambda expression, it may also be a function or a functor (a.k.a. function object) but I personally think lambda is the clearest one here.)
I will not give you the statement needed for sorting your open_list, you can use it to check whether you have figured out how the things work or not. Enjoy learning.
Ok, so much not right here.
The third parameter to std::sort is a "callable" (think like a function pointer), which std::sort calls to compare two elements in the sequence. It needs to take two elements of the sequence (usually by const & and return a bool.
Your example std::sort(v.begin(),v.end(),a>b) will not work, because a>b is not callable.
Your "real" code suffers from the same problem.
std::sort(open_list.begin(),
open_list.end(),
open_list.begin()->g_value + open_list.begin()->h_value >
open_list.end()->g_value + open_list.end()->h_value );
That big expression is not callable, and that's why the compiler is complaining.
Also, FWIW, open_list.end() is an iterator to the "one past the end" position in the sequence, and dereferencing it (as you do in open_list.end()->g_value) is undefined behavior, since there's no element there.
See cppreference for more info.

In c++, what is the fastest way to sort in reverse order?

Is it one of the following or something else?
//1
sort(first,last,[](const T &a,const T &b){return comp(b,a);});
//2
sort(first,last,bind(comp,ref(_2),ref(_1));
//3
sort(make_reverse_iterator(last),make_reverse_iterator(first),comp);
//Use value instead of reference if object size is small.
This is not a duplicate of Sorting a vector in descending order ,this one considers user-defined comparison function.
No easier way than to simply reverse your comparator. If your comparator returns true when it compares A and B, make it return false instead, and vice versa.
Make sure to take care of the case where A and B are equal; that case, you want the comparator to still return false
It doesn't get any faster when you do that btw.
Benchmark shows using a lambda is much faster than the other two.

C++ comp(a,a)==false

I was using lambda function in sort() function. In my lambda function I return true if two are equal. Then I got segmentation error.
After reviewing C++ Compare, it says
For all a, comp(a,a) == false
I don't understand why it must be false. Why can't I let comp(a,a)==true?
(Thanks in advance)
Think of Comp as some sort of "is smaller than" relationship, that is it defines some kind of ordering on a set of data.
Now you probably want to do some stuff with this relationship, like sorting data in increasing order, binary search in sorted data, etc.
There are many algorithms that do stuff like this very fast, but they usually have the requirement that the ordering they deal with is "reasonable", which was formalized with the term Strict weak ordering. It is defined by the rules in the link you gave, and the first one basically means:
"No element shall be smaller than itself."
This is indeed reasonable to assume, and one of the things our algorithms require.

Best sorting algorithm for case where many objects have "do-not-care" relationships to each other

I have an unusual sorting case that my googling has turned up little on. Here are the parameters:
1) Random access container. (C++ vector)
2) Generally small vector size (less than 32 objects)
3) Many objects have "do-not-care" relationships relative to each other, but they are not equal. (i.e. They don't care about which of them appears first in the final sorted vector, but they may compare differently to other objects.) To put it a third way (if it's still unclear), the comparison function for 2 objects can return 3 results: "order is correct," "order need to be fliped," or "do not care."
4) Equalities are possible, but will be very rare. (But this would probably just be treated like any other "do-not-care."
5) Comparison operator is far more expensive than object movement.
6) There is no comparison speed difference for determining that objects care or don't care about each other. (i.e. I don't know of a way to make a quicker comparison that simply says whether the 2 objects care about each other of not.)
7) Random starting order.
Whatever you're going to do, given your conditions I'd make sure you draw up a big pile of tests cases (eg get a few datasets and shuffle them a few thousand times) as I suspect it'd be easy to choose a sort that fails to meet your requirements.
The "do not care" is tricky as most sort algorithms depend on a strict ordering of the sort value - if A is 'less than or equal to' B, and B is 'less than or equal to' C, then it assumes that A is less than or equal to C -- in your case if A 'doesn't care' about B but does care about C, but B is less than C, then what do you return for the A-B comparison to ensure A will be compared to C?
For this reason, and it being small vectors, I'd recommend NOT using any of the built in methods as I think you'll get the wrong answers, instead I'd build a custom insertion sort.
Start with an empty target vector, insert first item, then for each subsequent item scan the array looking for the bounds of where it can be inserted (ie ignoring the 'do not cares', find the last item it must go after and the first it must go before) and insert it in the middle of that gap, moving everything else along the target vector (ie it grows by one entry each time).
[If the comparison operation is particularly expensive, you might do better to start in the middle and scan in one direction until you hit one bound, then choose whether the other bound is found moving from that bound, or the mid point... this would probably reduce the number of comparisons, but from reading what you say about your requirements you couldn't, say, use a binary search to find the right place to insert each entry]
Yes, this is basically O(n^2), but for a small array this shouldn't matter, and you can prove that the answers are right. You can then see if any other sorts do better, but unless you can return a proper ordering for any given pair then you'll get weird results...
You can't make the sorting with "don't care", it is likely to mess with the order of elemets. Example:
list = {A, B, C};
where:
A dont care B
B > C
A < C
So even with the don't care between A and B, B has to be greater than A, or one of those will be false: B > C or A < C. If it will never happen, then you need to treat them as equals instead of the don't care.
What you have there is a "partial order".
If you have an easy way to figure out the objects where the order is not "don't care" for a given objects, you can tackle this with basic topological sorting.
If you have a lot of "don't care"s (i.e. if you only have a sub-quadratic number of edges in your partial ordering graph), this will be a lot faster than ordinary sorting - however, if you don't the algorithm will be quadratic!
I believe a selection sort will work without modification, if you treat the "do-not-care" result as equal. Of course, the performance leaves something to be desired.