Why is this std::sort comparison failing? - c++

I have a vector of vectors of unsigned ints. Each element of the parent vector is a vector of three unsigned ints. I primarily want to sort the parent vector in descending order of the first element of the child vectors, but I also want to order any child vectors that have the same first element in ascending order of the third element. I initially did this with the following code:
sort(league_vector.begin(), league_vector.end());
reverse(league_vector.begin(), league_vector.end());
sort(league_vector.begin(), league_vector.end(),
[](const std::vector<unsigned int>& a, const std::vector<unsigned int>& b) {return a[0] == b[0] && a[2] < b[2];});
So just sorting, then reversing, the whole thing, which will order by the first element. Then a custom sort using a lambda function which should only return true if the third element is smaller and the first element is equal.
This seems to work fine when I have a relatively small number of elements in the parent vector (around 50 or less), but when I have more than this the final ordering is coming out pretty jumbled with no obvious pattern at all.
I've replaced this with a single custom sort:
sort(league_vector.begin(), league_vector.end(),
[](const std::vector<unsigned int>& a, const std::vector<unsigned int>& b)
{return ((a[0] > b[0]) || (a[0] == b[0] && a[2] < b[2]));});
So this returns true either when the first element is larger, or when the third element is smaller AND the first element is the same. This seems to work fine so I am just using that, but I can't work out what's wrong with the first approach. Particularly as the first approach seems to work some of the time, and the second comparison is just an extension of the first anyway.

First of all, std::sort is not a stable sort, which means it doesn't preserve the order of equivalent elements. If you want a stable sort, use std::stable_sort. Also, your custom comparison function makes no sense. Let's analyze how it behaves:
If a[0] is equal to b[0], your function returns the result of comparing a[2] and b[2]. However, if a[0] is not equal to b[0], your function always returns false. Since equivalence is defined as !(a < b) && !(b < a), according to your comparison function, any two vectors with different first elements are equal.
This function also isn't a valid comparison function because it does not satisfy a strict weak ordering. Any two vectors with different first elements are equal but two vectors with the same first element are not necessarily equal. This means that if a = {1, 2, 3}, b = {2, 3, 4}, and c = {1, 3, 4}, a == b and b == c but a != c.

Why did your first attempt fail? Let's take a concrete example and focus on this comparison, and try to explain why this is invalid by doing a simple test.
Here are the two league vectors as an example:
std::vector<std::vector<unsigned int>> league_vector = {{1,2,3,4}, {2,3,4,5}, {4,5,6,7}};
Now give this to std::sort:
std::sort(league_vector.begin(), league_vector.end(),
[](const std::vector<unsigned int>& a,
const std::vector<unsigned int>& b) {return a[0] == b[0] && a[2] < b[2];});
Concentrate on this:
a[0] == b[0]
So let's say std::sort gives your comparison the first two vectors in league_vector in this order
a={1,2,3,4} b={2,3,4,5}
Your comparison function will return false since a[0] != b[0].
Then what if the compiler does a switch up, and gives you this right afterwards, just to see if your function is not ambiguous:
a={2,3,4,5} b={1,2,3,4}
In other words, a simple switch of the values. You again return false since a[0] != b[0].
How could that make sense, where you are saying on the first test a should come after b and at the second test with just switched values, a should come after b?
The sort algorithm justifiably becomes confused, and puts values in some unorthodox order.
Note that the Visual Studio compiler does this test I described, where the comparison function is given a and b, the return value is checked, and then b and a and the return value checked. If there is an inconsistency as pointed out, the debug runtime asserts with an "invalid comparison" (or similar) condition.

Related

How can I check, straight away, if a set of pairs have a commom number?

Suppose we have 4 pairs, e.g.:
pair<int, int> P1(1, 2);
pair<int, int> P2(3, 1);
pair<int, int> P3(2, 1);
pair<int, int> P4(1, 5);
How can I compare those 4 pairs straight away and conclude that they all have the number 1 in common? I can only think of comparing two by two, but that is a lot of work for a lot of pairs...
Is there some function that does that for any given set of pairs?
There is no helper "built in" to check that all pairs contain a certain number, but it's a fairly easy operation to make!
For this you would need a function that receives a list of pairs.
bool allPairsShareNumber(list<pair<int, int>> pairs, int number) {
return all_of(pairs.begin(), pairs.end(), [&number](pair<int,int> pair){
return pair.first == number || pair.second == number;
});
}
And then you can just pass the list to the function!
pair<int, int> P1(1, 2);
pair<int, int> P2(3, 1);
pair<int, int> P3(2, 1);
pair<int, int> P4(1, 5);
list<pair<int, int>> pairList = { P1, P2, P3, P4 };
bool doTheyAllContain1 = allPairsShareNumber(pairList, 1);
In the sample code you've given, you need to check each pair (P1, P2, etc) separately (e.g. if (P1.first == 1 || P1.second == 1 || P2.first == 1 || <etc> )).
If you insist on having P1, ... P4 as distinct variables, there are no shortcuts on that, since you've defined P1, P2, ... P4 in a way that imposes no logical or structural relationship between them. (e.g. there is no guarantee where they are located in machine memory - they could be together, they could be in completely unrelated memory locations).
But having multiple variables with sequential names like P1, P2, ..... is an indication that you need to use a raw array (e.g. pair<int, int> P[4]) or a standard container (e.g. vector<pair<int, int> > P). If you structure your code using a raw array or a standard container then there are options. For example, a raw array;
std::pair<int, int> P[4];
// set the four elements of P
bool has_one = false;
for (int i = 0; has_one == false && i < 4; ++i)
if (P[i].first == 1 || P[i].second == 1) has_one = true;
which is readily extendable to an arbitrary number of pairs, as long as the number is fixed at compile time. Keep in mind that array indexing starts at zero, not one (i.e. P[0] exists in the above but P[4] does not).
The danger in such code is not updating the number correctly (e.g. changing the number of elements of P from 4 to 27, but forgetting to make the same change in the loop).
Instead of a raw array, a better option is to use a standard container - particularly if you want to do multiple things with the set of pairs. If the number of pairs is fixed at compile time, the definition of P above can be changed to use the array standard container.
std::array<std::pair<int, int>, 4> P; // array is from standard header <array>
// assign the four elements of P (code omitted)
which offers a number of advantages over using a raw array.
If the number of pairs is not known at compile time (e.g. the number is computed based on values read at run time) you can use another standard container, such as
// compute n as the number of elements
std::vector<std::pair<int, int> > P (n);
In all cases a raw array (with care to avoid problems such as checking more elements than an array has) and the standard containers can be used in loops.
However, it is considered better (since it is less error prone) to avoid - where possible - using loops, and instead use algorithms supplied by the C++ standard library. For example (C++11 and later) you could do
#include <algorithm>
#include <vector>
int main()
{
// compute n as the number of elements (code omitted)
std::vector<std::pair<int, int>> P(n);
// populate elements of P (code omitted)
auto check_if_one = [](std::pair<int, int> a)
{return a.first == 1 || a.second == 1;};
bool has_one = (std::find_if(std::begin(P), std::end(P), check_if_one) != std::end(P));
}
The advantage of this is that the code always accounts correctly for the number of elements in P. The approach for calculating the value for has_one in the above is identical, regardless of whether P is a raw array, a std::array, a std::vector, or any other container in the standard library.

How myCompare function is working in vector pair sort?

How does the myCompare function works in vector pair sorting? like what is p1 and what is p2? I want to know what is happening in the function (like in debugging).
#include<iostream>
#include<vector>
#include<algorithm>
#include<utility>
using namespace std;
bool myCompare(pair<int, int> p1, pair<int, int> p2){
return p1.first<p2.first;
}
int main(){
int arr[]={10,16,7,14,5,3,12,9};
vector <pair <int, int>> v;
for(int i=0;i<(sizeof(arr)/sizeof(arr[0]));i++){
v.push_back(make_pair(arr[i],i));
}
for(auto a:v){
cout<<a.first<<" "<<a.second<<" ";
}cout<<endl;
sort(v.begin(),v.end(),myCompare);
for(auto a:v){
cout<<a.first<<" "<<a.second<<" ";
}cout<<endl;
}
The short answer is that:
myCompare tells the std::sort function how to sort integer pairs.
p1 and p2 are the integer pairs to be compared.
Think about it. If you have 2 pairs of integers, say {10, 4} and {20, 2}, how would you know how to sort them?
Should {10, 4} come first because 10 < 20?
Should {20, 2} come first because 2 < 4?
Maybe you want to use both values in your comparison, like (10/4) < (20/2)?
The myCompare function simply describes that the first comparison method should be used, only taking into account the first value of each pair.
So in this example where p1 is {10, 4} and p2 is {20, 2}, myCompare would order them p1, p2 because 10 < 20.
In your main() function, myCompare will be called many times while std::sort sorts through your vector and passes in the 2 integer pairs (as p1 and p2) it is comparing in that moment.
A sort function typically does a series of comparisons to build a sorted range of given elements. For comparison you can use less than or greater than operator for ascending or descending ordering. You can also define and use a completely unique comparison operator for your interpretation of your data type as long as it satisfies Compare requirements.
A comparison function defines an ordering on a type. It takes two elements as input, and returns a boolean. A comparison function comp must satisfy some rules to define a meaningful ordering (and no UB) such as:
For all a, comp(a,a)==false
If comp(a,b)==true then comp(b,a)==false
if comp(a,b)==true and comp(b,c)==true then comp(a,c)==true
In your example, v is sorted using myCompare function defined as a comparison operator on type pair<int, int>. myCompare only takes the first element of the pair into account, which is perfectly valid and satisfies all the rules for Compare.

the n should be inclusive or exclusive when using std::nth_element?

Hi I have a question on the usage of std::nth_element.
If I want to obtain the k-th largest element from a vector, should it inclusive or exclusive?
int k = 3;
vector<int> nums{1,2,3,4,5,6,7};
std::nth_element(nums.begin(), nums.begin()+k-1, nums.end(), [](int& a, int& b){return a > b;});
int result = nums[k-1]
or
int k = 3
vector<int> nums{1,2,3,4,5,6,7};
std::nth_element(nums.begin(), nums.begin()+k, nums.end(), [](int& a, int& b){return a > b;});
int result = nums[k-1]
It reminds me that when we get a subarray using iterator, it should be exclusive?
for example,
vector<int> sub(nums.begin(), nums.begin()+k);
So, the n for nth_element is also exclusive?
This is the type of question that is best to figure out and understand by playing around with a bunch of examples on your own.
However, I'll try to explain why you're getting the same answer in both of your examples. As explained in cppreference, std::nth_element is a partial sorting algorithm. It only guarantees that, given an iterator to an element n as its second argument:
All of the elements before this new nth element are less than or equal to the elements after the new nth element.
("Less than or equal to" is the default behavior if you don't pass a special comparison function.)
That means if you use nums.begin()+k-1 in one case and nums.begin()+k in another case as the second argument to std::nth_element, then in the latter case the partial sorting algorithm will include one additional item in the sort. In that case, you are dividing the vector between larger and smaller items at a spot one index higher than in the first case. However, the (default) algorithm only guarantees that each of the items in the "small half" of the vector will be smaller than each of the items in the "large half," not that the two halves are sorted within themselves.
In other words, if you've done a partial sort through nums.begin()+k, there is no guarantee that nums[k-1] will be the next-smallest (or in your case, the next-largest) number in the entire vector.
With certain inputs, like your {1, 2, 3, 4, 5, 6, 7}, or {9, 4, 1, 8, 5}, you do happen to get the same answers.
However, with many others, like {1, 4, 9, 8, 5}, the results do not match:
int k = 3;
vector<int> nums{1,4,9,8,5};
auto numsCopy = nums;
// First with +k - 1
std::nth_element(nums.begin(), nums.begin()+k-1, nums.end(), [](int& a, int& b){return a > b;});
// Then with only +k
std::nth_element(numsCopy.begin(), numsCopy.begin()+k, numsCopy.end(), [](int& a, int& b){return a > b;});
cout << nums[k-1]; // 5
cout << numsCopy[k-1]; // 9
Demo
Can you figure out why that is?
Also, to clearly answer your question about inclusive vs exclusive, as #Daniel Junglas pointed out in the comments, the second argument to std::nth_element is meant to point directly to the item you wish to be changed. So if it helps you, you can think of that as "inclusive." This is different from the third argument to std::nth_element, the end iterator, which is always exclusive since .end() points beyond the last item in the vector.

How does comparator function of c++ STL sort work?

bool sortbysec(const pair<int,int> &a,const pair<int,int> &b)
{
return (a.second < b.second);
}
sort(vect.begin(), vect.end(), sortbysec);
vector< pair <int, int> > vect;
int arr[] = {10, 17, 5, 70 };
int arr1[] = {30, 60, 20, 50};
int n = sizeof(arr)/sizeof(arr[0]);
for (int i=0; i<n; i++)
vect.push_back( make_pair(arr[i],arr1[i]));
what does return(a.second<b.second) mean?
how is it sorting by the second element?
The concept of a sorted sequence s, in abstract, is that for any pair of elements s[i] and s[j], s[i] is not greater than s[j] when i is less than j.
Sorting a sequence is simply rearranging the elements of the sequence to satisfy this definition. Therefore, in order to sort elements, we need to be able to ask if a particular value is less than or greater than some other value -- otherwise we cannot be sure that our arrangement of the sequence satisfies the definition of a "sorted sequence."
std::sort takes a comparison function as a means of answering this question. By default, it is std::less<T>() which simply uses the operator < to compare two elements. By applying this function to two elements, it can determine if they need to be rearranged. Looking at our definition of a sorted sequence, if s[j] < s[i] when i < j then the definition is not met. Swapping those two elements corrects the problem for that specific pair of elements.
By applying this comparison function along with a sorting algorithm, std::sort is able to determine the order that the elements should be in for the sequence to be sorted. That's all the sort function does: it applies this comparison function to pairs of elements and rearranges them until the sequence is sorted.
You can supply any comparison function fn that has strict weak ordering and std::sort will rearrange the elements as necessary to ensure that !fn(s[i], s[j]) is true for all valid index pairs i and j where i > j. This allows you to manipulate the sort function to get specific orders. For example:
If you supply a function that compares using the > operator instead of the < operator, then the sorted sequence will be in descending order.
You can supply a function that compares a specific attribute of the values. If you have a sequence of struct Person { std::string name; int age; } values, you could have a function that compares the ages only, which will sort the sequence by the age attribute.
You could even sort on multiple attributes. If you compare age first and then compare name if the ages are equal, the sequence will be sorted by age -- but within each subsequence where the age is equal, that subsequence is sorted by name.

How does std::set comparator function work?

Currently working on an algorithm problems using set.
set<string> mySet;
mySet.insert("(())()");
mySet.insert("()()()");
//print mySet:
(())()
()()()
Ok great, as expected.
However if I put a comp function that sorts the set by its length, I only get 1 result back.
struct size_comp
{
bool operator()(const string& a, const string& b) const{
return a.size()>b.size();
}
};
set<string, size_comp> mySet;
mySet.insert("(())()");
mySet.insert("()()()");
//print myset
(())()
Can someone explain to me why?
I tried using a multi set, but its appending duplicates.
multiset<string,size_comp> mSet;
mSet.insert("(())()");
mSet.insert("()()()");
mSet.insert("()()()");
//print mset
"(())()","()()()","()()()"
std::set stores unique values only. Two values a,b are considered equivalent if and only if
!comp(a,b) && !comp(b,a)
or in everyday language, if a is not smaller than b and b is not smaller than a. In particular, only this criterion is used to check for equality, the normal operator== is not considered at all.
So with your comparator, the set can only contain one string of length n for every n.
If you want to allow multiple values that are equivalent under your comparison, use std::multiset. This will of course also allow exact duplicates, again, under your comparator, "asdf" is just as equivalent to "aaaa" as it is to "asdf".
If that does not make sense for your problem, you need to come up with either a different comparator that induces a proper notion of equality or use another data structure.
A quick fix to get the behavior you probably want (correct me if I'm wrong) would be introducing a secondary comparison criterion like the normal operator>. That way, we sort by length first, but are still able to distinguish between different strings of the same length.
struct size_comp
{
bool operator()(const string& a, const string& b) const{
if (a.size() != b.size())
return a.size() > b.size();
return a > b;
}
};
The comparator template argument, which defaults to std::less<T>, must represent a strict weak ordering relation between values in its domain.
This kind of relation has some requirements:
it's not reflexive (x < x yields false)
it's asymmetric (x < y implies that y < x is false)
it's transitive (x < y && y < z implies x < z)
Taking this further we can define equivalence between values in term of this relation, because if !(x < y) && !(y < x) then it must hold that x == y.
In your situation you have that ∀ x, y such that x.size() == y.size(), then both comp(x,y) == false && comp(y,x) == false, so since no x or y is lesser than the other, then they must be equal.
This equivalence is used to determine if two items correspond to the same, thus ignoring second insertion in your example.
To fix this you must make sure that your comparator never returns false for both comp(x,y) and comp(y,x) if you don't want to consider x equal to y, for example by doing
auto cmp = [](const string& a, const string& b) {
if (a.size() != b.size())
return a.size() > b.size();
else
return std::less()(a, b);
}
So that for input of same length you fallback to normal lexicographic order.
This is because equality of elements is defined by the comparator. An element is considered equal to another if and only if !comp(a, b) && !comp(b, a).
Since the length of "(())()" is not greater, nor lesser than the length of "()()()", they are considered equal by your comparator. There can be only unique elements in a std::set, and an equivalent object will overwrite the existing one.
The default comparator uses operator<, which in the case of strings, performs lexicographical ordering.
I tried using a multi set, but its appending duplicates.
Multiset indeed does allow duplicates. Therefore both strings will be contained despite having the same length.
size_comp considers only the length of the strings. The default comparison operator uses lexicographic comparison, which distinguishes based on the content of the string as well as the length.