Will std::sort always compare equal values?

Will std::sort always compare equal values? - c++

I am doing the following problem on leetcode: https://leetcode.com/problems/contains-duplicate/
Given an integer array nums, return true if any value appears at least
twice in the array, and return false if every element is distinct.
The solution I came up to the problem is the following:
class Solution {
public:
bool containsDuplicate(vector<int>& nums) {
try {
std::sort(nums.begin(), nums.end(), [](int a, int b) {
if (a == b) {
throw std::runtime_error("found duplicate");
}
return a < b;
});
} catch (const std::runtime_error& e) {
return true;
}
return false;
}
};
It was accepted on leetcode but I am still not sure if it will always work. The idea is to start sorting nums array and interrupt as soon as duplicate values are found inside comparator. Sorting algorithm can compare elements in many ways. I expect that equal elements will be always compared but I am not sure about this. Will std::sort always compare equal values or sometimes it can skip comparing them and therefore duplicate values will not be found?

Will std::sort always compare equal values or sometimes it can skip comparing them and therefore duplicate values will not be found?
Yes, some equal value elements will always be compared if duplicates do exist.
Let us assume the opposite: initial array of elements {e} for sorting contains a subset of elements having the same value and a valid sorting algorithm does not call comparison operator < for any pair of the elements from the subset.
Then we construct same sized array of tuples {e,k}, with the first tuple value from the initial array and arbitrary selected second tuple value k, and apply the same sorting algorithm using the lexicographic comparison operator for the tuples. The order of tuples after sorting can deviate from the order of sorted elements {e} only for same value elements, where in the case of array of tuples it will depend on second tuple value k.
Since we assumed that the sorting algorithm does not compare any pair of same value elements, then it will not compare the tuples with the same first tuple value, so the algorithm will be unable to sort them properly. This contradicts our assumptions and proves that some equal value elements (if they exist in the array) will always be compared during sorting.

Related

What's the logic behind the order the elements are passed to a comparison function in std::sort?

I'm practicing lambdas:
int main()
{
std::vector<int> v {1,2,3,4};
int count = 0;
sort(v.begin(), v.end(), [](const int& a, const int& b) -> bool
{
return a > b;
});
}
This is just code from GeeksForGeeks to sort in descending order, nothing special. I added some print statements (but took them out for this post) to see what was going on inside the lambda. They print the entire vector, and the a and b values:
1 2 3 4
a=2 b=1
2 1 3 4
a=3 b=2
3 2 1 4
a=4 b=3
4 3 2 1 <- final
So my more detailed question is:
What's the logic behind the order the vector elements are being passed into the a and b parameters?
Is b permanently at index 0 while a is iterating? And if so, isn't it a bit odd that the second param passed to the lambda stays at the first element? Is it compiler-specific? Thanks!

By passing a predicate to std::sort(), you are specifying your sorting criterion. The predicate must return true if the first parameter (i.e., a) precedes the second one (i.e., b), for the sorting criterion you are specifying.
Therefore, for your predicate:
return a > b;
If a is greater than b, then a will precede b.
So my more detailed question is: What's the logic behind the order the vector elements are being passed into the a and b parameters?
a and b are just pairs of elements of the elements you are passing to std::sort(). The "logic" will depend on the underlying algorithm that std::sort() implements. The pairs may also differ for calls with identical input due to randomization.

Is 'b' permanently at index 0 while 'a' is iterating? And if so, isn't it a bit odd that the second param passed to the lambda stays at the first element?
No, because the first element is the higher.
Seems that, with this algorithm, all elements are checked (and maybe switched) with the higher one (at first round) and the higher one is placed in first position; so b ever points to the higher one.

For Visual Studio, std::sort uses insertion sort if the sub-array size is <= 32 elements. For a larger sub-array, it uses intro sort, which is quick sort unless the "recursion" depth gets too deep, in which case it switches to heap sort. The output you program produces appears to correspond to some variation of insertion sort. Since the compare function is "less than", and since insertion sort is looking for out of order due to left values "greater than" right values, the input parameters are swapped.

You just compare two elements, with a given ordering. This means that if the order is a and then b, then the lambda must return true.
The fact that a or b are the first or the last element of the array, or fixed, depends on the sorting algorithm and of course of your data!

Sorting elements by decreasing frequency of occurence

I am trying to solve the question:
Print the elements of an array in the decreasing frequency if 2 numbers have same frequency then print the one which came first. (https://www.geeksforgeeks.org/sort-elements-by-frequency/)
I am trying to implement the solution on my own. I have thought of creating the following data structure:
map<int,pair<int,int>> mymap
I am storing the number itself in the first int, and I am storing the index and count of the number in the array in the pair in the above map.
I want to write a custom comparator for sorting the pairs, something like this:
bool cmp(pair<int,int>&a, pair<int,int>&b)
{
if (a.first == b.first)
return a < b;
else
return a > b;
}
I am still learning to write custom comparators. I am not able to wrap around my head, that how can I pass the comparator for sorting the map. Also, if pairs are sorted, then will the key in the map be sorted alongside?
Please let me know! Thanks!

You don't need to use maps for this, or better, not in this way. You can use a basic array arr that will contain elements and than use a map cnt<int,int> which will keep the number of occurrences of each element in the array and another one firstIndex<int,int> which will keep the index of the first appearance of the element. In this case the sorting function becomes simply:
bool cmp(int a, int b)
{
if(cnt[a] != cnt[b]){
return cnt[a] > cnt[b];
} else {
return firstIndex[a] < firstIndex[b];
}
}
use it like this:
sort(arr, arr+n, cmp);
where n is the number of elements in the array.

Sort vector of vectors with doubles according to a column in C++

I have a matrix consisting of a vector of which each element representing the rows is composed of a vector representing the columns of the matrix. I would like to sort the rows according to the 1st column.
Each element inside this matrix is a double, although the first column contains a number that serves as an identifier (but is not unique).
My goal is to have something like the aggregate functions available in SQL, such as count() and sum() when I group by the first column.
For instance, if I have:
ID VALUE
1 10
2 20
1 30
2 40
3 60
I would like to get:
ID COUNT MEAN
1 2 20
2 2 30
3 1 60
However, I am stuck in the very first step: how do I sort the rows according to the value of the first element of each row?
I found a clue on this topic, and changed adapted the comparator to:
bool compareFunction (double i,double j)
{
return (i<j);
}
But the compiler was not very happy about that (making a reference to the stl_algo.h file):
error: cannot convert 'std::vector<double>' to 'double' in argument passing
I was therefore wondering if there is a way to sort such a vector of vectors when it contains doubles.

Answer (imho): use a different datastructure. What you are trying to do is setup a multimap. Oh hey look:
http://www.cplusplus.com/reference/map/multimap/
stl::multimap - how do i get groups of data?
It'll be faster for large numbers of elements. And is actually a map rather than a vector of vector of double.
Either that, or skip the sorting all together, and count by key using std::map, std::unordered_map, or (if you know the number of keys and/or the keys are offset by 1 with no breaks) std::vector.
To expand, sorting your list to get means will be slow. Sorting (using std::sort) is O(nlogn), and will be O(nlogn) every time you compute this mean. And it is an unessisary step: your stuff is grouped by key reguardless of order. std::map and std::multimap will "sort as you go" which will be just a little faster than sorting every time, but you won't have to sort the whole thing to get the list. Then you can just iterate the multimap to get the mean, O(n) each mean calculation. (It is still O(nlg(n)) to add all the elements to the multimap)
But if you know the key output is going to be 1,2,3...n-1,n, than sorting is a complete waste of time. Just make a counter for each key (since you know what the keys can be) and add to the key while iterating the array.
BUT WAIT THERE IS MORE
If the keys are actually setup the way you are thinking, than the best way from the get go is to forget the table structure, and make build it like this:
Index VALUE
0 10,30
1 20,40
2 60
Count is now constant time for each row. Mean for each row is O(n). Getting a list is constant time for each row. EVERYBODY WINS.

You need to create a comparator function comparing vector<double>:
struct VecComp {
bool operator()(const vector<double>& _a, const vector<double>& _b) {
//compare first elements
}
}
Then you can use std::sort on your structure with the new comparator function:
std::sort(myMat.begin(), myMat.end(), VecComp());
If you are using c++11 features you can also utilize lambda functions here:
std::sort(myMat.begin(), myMat.end(), [](const vector<double>& a, const vector<double>& b) {
//compare the first elements
}
);

You need to write your own comparator functor to pass into your vector declaration:
struct comp {
bool operator() (const std::vector<double>& i,
const std::vector<double>& j) {
return i[0] < j[0];
}

Have you tried just this?:
std::sort(vecOfVecs.begin(), vecOfVecs.end());
That should work as std::vector has operator< which provides lexicographical sorting, which is (a little more specific than) what you want.

An fast algorithm for sorting and shuffling equal valued entries (preferably by STL's)

I'm currently developing stochastic optimization algorithms and have encountered the following issue (which I imagine appears also in other places): It could be called totally unstable partial sort:
Given a container of size n and a comparator, such that entries may be equally valued.
Return the best k entries, but if values are equal, it should be (nearly) equally probable to receive any of them.
(output order is irrelevant to me, i.e. equal values completely among the best k need not be shuffled. To even have all equal values shuffled is however a related, interesting question and would suffice!)
A very (!) inefficient way would be to use shuffle_randomly and then partial_sort, but one actually only needs to shuffle the block of equally valued entries "at the selection border" (resp. all blocks of equally valued entries, both is much faster). Maybe that Observation is where to start...
I would very much prefer, if someone could provide a solution with STL algorithms (or at least to a large portion), both because they're usually very fast, well encapsulated and OMP-parallelized.
Thanx in advance for any ideas!

You want to partial_sort first. Then, while elements are not equal, return them. If you meet a sequence of equal elements which is larger than the remaining k, shuffle and return first k. Else return all and continue.

Not fully understanding your issue, but if you it were me solving this issue (if I am reading it correctly) ...
Since it appears you will have to traverse the given object anyway, you might as well build a copy of it for your results, sort it upon insert, and randomize your "equal" items as you insert.
In other words, copy the items from the given container into an STL list but overload the comparison operator to create a B-Tree, and if two items are equal on insert randomly choose to insert it before or after the current item.
This way it's optimally traversed (since it's a tree) and you get the random order of the items that are equal each time the list is built.
It's double the memory, but I was reading this as you didn't want to alter the original list. If you don't care about losing the original, delete each item from the original as you insert into your new list. The worst traversal will be the first time you call your function since the passed in list might be unsorted. But since you are replacing the list with your sorted copy, future runs should be much faster and you can pick a better pivot point for your tree by assigning the root node as the element at length() / 2.
Hope this is helpful, sounds like a neat project. :)

If you really mean that output order is irrelevant, then you want std::nth_element, rather than std::partial_sort, since it is generally somewhat faster. Note that std::nth_element puts the nth element in the right position, so you can do the following, which is 100% standard algorithm invocations (warning: not tested very well; fencepost error possibilities abound):
template<typename RandomIterator, typename Compare>
void best_n(RandomIterator first,
RandomIterator nth,
RandomIterator limit,
Compare cmp) {
using ref = typename std::iterator_traits<RandomIterator>::reference;
std::nth_element(first, nth, limit, cmp);
auto p = std::partition(first, nth, [&](ref a){return cmp(a, *nth);});
auto q = std::partition(nth + 1, limit, [&](ref a){return !cmp(*nth, a);});
std::random_shuffle(p, q); // See note
}
The function takes three iterators, like nth_element, where nth is an iterator to the nth element, which means that it is begin() + (n - 1)).
Edit: Note that this is different from most STL algorithms, in that it is effectively an inclusive range. In particular, it is UB if nth == limit, since it is required that *nth be valid. Furthermore, there is no way to request the best 0 elements, just as there is no way to ask for the 0th element with std::nth_element. You might prefer it with a different interface; do feel free to do so.
Or you might call it like this, after requiring that 0 < k <= n:
best_n(container.begin(), container.begin()+(k-1), container.end(), cmp);
It first uses nth_element to put the "best" k elements in positions 0..k-1, guaranteeing that the kth element (or one of them, anyway) is at position k-1. It then repartitions the elements preceding position k-1 so that the equal elements are at the end, and the elements following position k-1 so that the equal elements are at the beginning. Finally, it shuffles the equal elements.
nth_element is O(n); the two partition operations sum up to O(n); and random_shuffle is O(r) where r is the number of equal elements shuffled. I think that all sums up to O(n) so it's optimally scalable, but it may or may not be the fastest solution.
Note: You should use std::shuffle instead of std::random_shuffle, passing a uniform random number generator through to best_n. But I was too lazy to write all the boilerplate to do that and test it. Sorry.

If you don't mind sorting the whole list, there is a simple answer. Randomize the result in your comparator for equivalent elements.
std::sort(validLocations.begin(), validLocations.end(),
[&](const Point& i_point1, const Point& i_point2)
{
if (i_point1.mX == i_point2.mX)
{
return Rand(1.0f) < 0.5;
}
else
{
return i_point1.mX < i_point2.mX;
}
});

STL "closest" method?

I'm looking for an STL sort that returns the element "closest" to the target value if the exact value is not present in the container. It needs to be fast, so essentially I'm looking for a slightly modified binary search... I could write it, but it seems like something that should already exist...

Do you mean the lower_bound/upper_bound functions? These perform a binary search and return the closest element above the value you're looking for.
Clarification: The global versions of lower/upper_bound only work if the range is sorted, as they use some kind of binary search internally. (Obviously, the lower/upper_bound methods in std::map always work). You said in your question that you were looking for some kind of binary search, so I'll assume the range is sorted.
Also, Neither lower_bound nor upper_bound returns the closest member. If the value X you're looking for isn't a member of the range, they will both return the first element greater then X. Otherwise, lower_bound will return the first value equal to X, upper_boundwill return the last value equals X.
So to find the closest value, you'd have to
call lower_bound
if it returns the end of the range, all values are less then X. The last (i.e. the highest) element is the closest one
it if returns the beginning of the range, all values are greater then X. The first (i.e. the lowest) element is the closest one
if it returns an element in the middle of the range, check that element and the element before - the one that's closer to X is the one you're looking for

So you're looking for an element which has a minimal distance from some value k?
Use std::transform to transform each x to x-k. The use std::min_element with a comparison function which returns abs(l) < abs(r). Then add k back onto the result.
EDIT: Alternatively, you could just use std::min_element with a comparison function abs(l-k) < abs(r-k), and eliminate the std::transform.
EDIT2: This is good for unsorted containers. For sorted containers, you probably want nikie's answer.

If the container is already sorted (as implied) you should be able to use std::upper_bound and the item directly before to figure out which is closest:
// Untested.
template <class Iter, class T>
Iter closest_value(Iter begin, Iter end, T value)
{
Iter result = std::upper_bound(begin, end, value);
if(result != begin)
{
Iter lower_result = result;
--lower_result;
if(result == end || ((value - *lower_result) < (*result - value)))
{
result = lower_result;
}
}
return result;
}
If the container is not sorted, use min_element with a predicate as already suggested.

If your data is not sorted, use std::min_element with a comparison functor that calculates your distance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js