How does std::sort determine the sorting basis?

How does std::sort determine the sorting basis? - c++

I have a following code
#include <bits/stdc++.h>
using namespace std;
int main () {
pair<int, int> p[4];
p[0] = pair<int, int>(5, 2);
p[1] = pair<int, int>(40, -2);
p[2] = pair<int, int>(-3, 2);
p[3] = pair<int, int>(4, 45);
auto print_pairii = [](pair<int, int> pp[]) {
for (int i = 0; i < 4; i++) {
cout << pp[i].first << " ";
}
cout << endl;
};
print_pairii(p);
sort(p, p + 4);
print_pairii(p);
return 0;
}
The first print_pairii shows 5 40 -3 4.
After sorting the array of pair, the print_pairii shows -3 4 5 40, meaning that the sorting was done in the basis of the first element of the pair.
Why does this happen instead of the basis of the second element?
How does sort work in this sense?

Because when using std::sort without specifying comparator, elements are compared using operator<.
1) Elements are compared using operator<.
And the overloaded operator< for std::pair, compares the 1st element firstly, and then the 2nd element if the 1st elements are equal.
Compares lhs and rhs lexicographically, that is, compares the first elements and only if they are equivalent, compares the second elements.

Why does this happen instead of the basis of the second element? How does sort work in this sense?
Because by default std::sort will sort std::pair::first then std::pair::second.
If you want to sort via second element you have to provide custom comparison operator. Something like:
sort(p, p + 4,
[](const std::pair<int, int> &x, const std::pair<int, int> &y) {
return x.second < y.second;
});

It's useful to break this down into its components.
std::pair compares lexicographically, as you've just seen: https://en.cppreference.com/w/cpp/utility/pair. std::sort compares (all types) using operator< by default: https://en.cppreference.com/w/cpp/algorithm/sort. Put these two together, and you sort pairs in increasing order of first then second elements.

Related

Find equals value into an array in c++

There is a faster way to find equals value into an array instead of comparing all elements one by one with all the array's elements ?
for(int i = 0; i < arrayLenght; i ++)
{
for(int k = i; k < arrayLenght; i ++)
{
if(array[i] == array[k])
{
sprintf(message,"There is a duplicate of %s",array[i]);
ShowMessage(message);
break;
}
}
}

Since sorting your container is a possible solution, std::unique is the simplest solution to your problem:
std::vector<int> v {0,1,0,1,2,0,1,2,3};
std::sort(begin(v), end(v));
v.erase(std::unique(begin(v), end(v)), end(v));
First, the vector is sorted. You can use anything, std::sort is just the simplest. After that, std::unique shifts the duplicates to the end of the container and returns an iterator to the first duplicate. This is then eaten by erase and effectively removes those from the vector.

You could use std::multiset and then count duplicates afterwards like this:
#include <iostream>
#include <set>
int main()
{
const int arrayLenght = 14;
int array[arrayLenght] = { 0,2,1,3,1,4,5,5,5,2,2,3,5,5 };
std::multiset<int> ms(array, array + arrayLenght);
for (auto it = ms.begin(), end = ms.end(); it != end; it = ms.equal_range(*it).second)
{
int cnt = 0;
if ((cnt = ms.count(*it)) > 1)
std::cout << "There are " << cnt << " of " << *it << std::endl;
}
}
https://ideone.com/6ktW89
There are 2 of 1
There are 3 of 2
There are 2 of 3
There are 5 of 5

If your value_type of this array could be sorted by operator <(a strict weak order) it's a good choice to do as YSC answered.
If not,maybe you can try to define a hash function to hash the objects to different values.Then you can do this in O(n) time complexity,like:
struct ValueHash
{
size_t operator()(const Value& rhs) const{
//do_something
}
};
struct ValueCmp
{
bool operator()(const Value& lhs, const Value& rhs) const{
//do_something
}
};
unordered_set<Value,ValueHash,ValueCmp> myset;
for(int i = 0; i < arrayLenght; i ++)
{
if(myset.find(array[i])==myset.end())
myset.insert(array[i]);
else
dosomething();
}

In case you have a large amount of data, you can first sort the array (quick sort gives you a first pass in O(n*log(n))) and then do a second pass by comparing each value with the next (as they might be all together) to find duplicates (this is a sequential pass in O(n)) so, sorting in a first pass and searching the sorted array for duplicates gives you O(n*log(n) + n), or finally O(n*log(n)).
EDIT
An alternative has been suggested in the comments, of using a std::set to check for already processed data. The algorithm just goes element by element, checking if the element has been seen before. This can lead to a O(n) algorithm, but only if you take care of using a hash set. In case you use a sorted set, then you incur in an O(log(n)) for each set search and finish in the same O(n*log(n)). But because the proposal can be solved with a hash set (you have to be careful in selecting an std::unsorted_set, so you don't get the extra access time per search) you get a final O(n). Of course, you have to account for possible automatic hash table grow or a huge waste of memory used in the hash table.
Thanks to #freakish, who pointed the set solution in the comments to the question.

How to initialize unordered_map directly with fixed element?

I want to initialize one unordered_map with fixed element 100. And the keys are from 0 to 100, all values of those keys are 0
using HashMap = unordered_map < int, int > ;
HashMap map;
for (int idx = 0; idx < 100; ++idx) {
map[idx] = 0;
}
Question 1:
Is there any directly way to do that like the following codes in python?
d = {x: x % 2 == 0 for x in range(1, 11)}
Question 2:
With initialization codes above, I think all elements are sorted in ascending order, but the results are:
Why the first element is 8 and the second element is 64, all left elements are in ascending order?

This is not quite so pretty as the Python expression, but it should do the trick.
#include <algorithm>
#include <iostream>
#include <iterator>
#include <unordered_map>
int main() {
std::unordered_map<int, bool> m;
int i = -1;
std::generate_n(std::inserter(m, m.begin()),
10,
[&i](){
++i;
return std::make_pair(i, i % 2 == 0);
});
for (auto const &p: m)
std::cout << '<' << p.first << ", " << p.second << ">\n";
return 0;
}
Live on ideone.com
There is a reason unordered maps are called unordered maps. Since they are implemented as hash maps, the keys are not in any predictable order. Using an std::unordered_map for a dense collection of integer keys is probably not the most efficient solution to any problem, particularly if you expect to be able to extract the keys in order.

consider boost::irange
the internal data structure for unordered map is hash table, which does not always hold key order during hashing.

Position of median within a list

I have an unsorted array and I need the position of the median. I know there are several algorithms to calculate the median of a given array in O(n), but all of them include some kind of reordering of the array, like in median of medians and random selection.
I'm not interested int he median itself, only its position within the array interests me.
Is there any way I can do this in O(n)? Keeping track of all the swaps will create a massive overhead, so I'm looking for another solution.

Let's say you have an array of data, and you would like to find its median:
double data[MAX_DATA] = ...
Create an array of indexes, and initialize each index to its own position, like this:
int index[MAX_DATA];
for (int i = 0 ; i != MAX_DATA ; i++) {
index[i] = i;
}
Now implement the linear median algorithm with the following changes:
When the original algorithm compares data[i] to data[j], replace with a comparison of data[index[i]] to data[index[j]]
When the original algorithm swaps data[i] and data[j], swap index[i] and index[j] instead.
Since the elements of data remain in their place all the time, the modified algorithm will produce the position of the median in the unmodified array, rather than its position in the array with some elements moved to different spots.
In C++ you can implement this with pointers instead of indexes, and use std::nth_element on the container of pointers, like this:
vector<int> data = {1, 5, 2, 20, 10, 7, 9, 1000};
vector<const int*> ptr(data.size());
transform(data.begin(), data.end(), ptr.begin(), [](const int& d) {return &d;});
auto mid = next(ptr.begin(), data.size() / 2);
nth_element(ptr.begin(), mid, ptr.end(), [](const int* lhs, const int* rhs) {return *lhs < *rhs;});
ptrdiff_t pos = *mid - &data[0];
cout << pos << endl << data[pos] << endl;
Here is a link to a demo on ideone.

Here's working example that generates a secondary array of indices, and finds the median of the input array through std::nth_element and an indirect comparison
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>
#include <iterator>
int main()
{
// input data, big and expensive to sort or copy
std::string big_data[] = { "hello", "world", "I", "need", "to", "get", "the", "median", "index" };
auto const N = std::distance(std::begin(big_data), std::end(big_data));
auto const M = (N - 1) / 2; // 9 elements, median is 4th element in sorted array
// generate indices
std::vector<int> indices;
auto value = 0;
std::generate_n(std::back_inserter(indices), N, [&](){ return value++; });
// find median of input array through indirect comparison and sorting
std::nth_element(indices.begin(), indices.begin() + M, indices.end(), [&](int lhs, int rhs){
return big_data[lhs] < big_data[rhs];
});
std::cout << indices[M] << ":" << big_data[indices[M]] << "\n";
// check, sort input array and confirm it has the same median
std::sort(std::begin(big_data), std::end(big_data));
std::cout << M << ":" << big_data[M] << "\n";
}
Online output.
This algorithm is guaranteed of O(N) complexity, since it is the sum of std::generate_n and std::nth_element, both of which are O(N) in their input data.

There is an O(n log n) algorithm for keeping track of median on an infinite stream of numbers. (As you don't want to alter the list, you can as well treat it as a stream.) The algorithm involves two heaps; one always points to the maximum number in the lower half and the other points to the minimum number in the higher half. The algorithm is explained here: http://www.ardendertat.com/2011/11/03/programming-interview-questions-13-median-of-integer-stream/. You can use the same code with minimum customization.

Is there a sorted container in the STL?

Is there a sorted container in the STL?
What I mean is following: I have an std::vector<Foo>, where Foo is a custom made class. I also have a comparator of some sort which will compare the fields of the class Foo.
Now, somewhere in my code I am doing:
std::sort( myvec.begin(), myvec.end(), comparator );
which will sort the vector according to the rules I defined in the comparator.
Now I want to insert an element of class Foo into that vector. If I could, I would like to just write:
mysortedvector.push_back( Foo() );
and what would happen is that the vector will put this new element according to the comparator to its place.
Instead, right now I have to write:
myvec.push_back( Foo() );
std::sort( myvec.begin(), myvec.end(), comparator );
which is just a waste of time, since the vector is already sorted and all I need is to place the new element appropriately.
Now, because of the nature of my program, I can't use std::map<> as I don't have a key/value pairs, just a simple vector.
If I use stl::list, I again need to call sort after every insertion.

Yes, std::set, std::multiset, std::map, and std::multimap are all sorted using std::less as the default comparison operation. The underlying data-structure used is typically a balanced binary search tree such as a red-black tree. So if you add an element to these data-structures and then iterate over the contained elements, the output will be in sorted order. The complexity of adding N elements to the data-structure will be O(N log N), or the same as sorting a vector of N elements using any common O(log N) complexity sort.
In your specific scenario, since you don't have key/value pairs, std::set or std::multiset is probably your best bet.

I'd like to expand on Jason's answer. I agree to Jason, that either std::set or std::multiset is the best choice for your specific scenario. I'd like to provide an example in order to help you to further narrow down the choice.
Let's assume that you have the following class Foo:
class Foo {
public:
Foo(int v1, int v2) : val1(v1), val2(v2) {};
bool operator<(const Foo &foo) const { return val2 < foo.val2; }
int val1;
int val2;
};
Here, Foo overloads the < operator. This way, you don't need to specify an explicit comparator function. As a result, you can simply use a std::multiset instead of a std::vector in the following way. You just have to replace push_back() by insert():
int main()
{
std::multiset<Foo> ms;
ms.insert(Foo(1, 6));
ms.insert(Foo(1, 5));
ms.insert(Foo(3, 4));
ms.insert(Foo(2, 4));
for (auto const &foo : ms)
std::cout << foo.val1 << " " << foo.val2 << std::endl;
return 0;
}
Output:
3 4
2 4
1 5
1 6
As you can see, the container is sorted by the member val2 of the class Foo, based on the < operator. However, if you use std::set instead of a std::multiset, then you will get a different output:
int main()
{
std::set<Foo> s;
s.insert(Foo(1, 6));
s.insert(Foo(1, 5));
s.insert(Foo(3, 4));
s.insert(Foo(2, 4));
for (auto const &foo : s)
std::cout << foo.val1 << " " << foo.val2 << std::endl;
return 0;
}
Output:
3 4
1 5
1 6
Here, the second Foo object where val2 is 4 is missing, because a std::set only allows for unique entries. Whether entries are unique is decided based on the provided < operator. In this example, the < operator compares the val2 members to each other. Therefore, two Foo objects are equal, if their val2 members have the same value.
So, your choice depends on whether or not you want to store Foo objects that may be equal based on the < operator.
Code on Ideone

C++ do have sorted container e.g std::set and std::map
int main()
{
//ordered set
set<int> s;
s.insert(5);
s.insert(1);
s.insert(6);
s.insert(3);
s.insert(7);
s.insert(2);
cout << "Elements of set in sorted order: ";
for (auto it : s)
cout << it << " ";
return 0;
}
Output:
Elements of set in sorted order:
1 2 3 5 6 7
int main()
{
// Ordered map
std::map<int, int> order;
// Mapping values to keys
order[5] = 10;
order[3] = 5;
order[20] = 100;
order[1] = 1;
// Iterating the map and printing ordered values
for (auto i = order.begin(); i != order.end(); i++) {
std::cout << i->first << " : " << i->second << '\n';
}
Output:
1 : 1
3 : 5
5 : 10
20 : 100

how would I sort a list and get the top K elements? (STL)

I have a vector of doubles. I want to sort it from highest to lowest, and get the indices of the top K elements. std::sort just sorts in place, and does not return the indices I believe. What would be a quick way to get the top K indices of largest elements?

you could use the nth_element STL algorithm - this will return you the N greatest elements ( this is the fastest way,using stl ) and then use .sort on them,or you could use the partial_sort algorithm,if you want the first K elements to be sorted (:
Using just .sort is awful - it is very slow for the purpose you want.. .sort is great STL algorithm,but for sorting the whole container,not just the first K elements (; it's not an accident the existung of nth_element and partial_sort ;)

The first thing that comes to mind is somewhat hackish, but you could define a struct that stored both the double and its original index, then overload the < operator to sort based on the double:
struct s {
double d;
int index;
bool operator < (const struct &s) const {
return d < s.d;
}
};
Then you could retrieve the original indices from the struct.
Fuller example:
vector<double> orig;
vector<s> v;
...
for (int i=0; i < orig.size(); ++i) {
s s_temp;
s_temp.d = orig[i];
s_temp.index = i;
v.push_back(s);
}
sort(v.begin(), v.end());
//now just retrieve v[i].index
This will leave them sorted from smallest to largest, but you could overload the > operator instead and then pass in greater to the sort function if wanted.

OK, how about this?
bool isSmaller (std::pair<double, int> x, std::pair<double, int> y)
{
return x.first< y.first;
}
int main()
{
//...
//you have your vector<double> here, say name is d;
std::vector<std::pair<double, int> > newVec(d.size());
for(int i = 0; i < newVec.size(); ++i)
{
newVec[i].first = d[i];
newVec[i].second = i; //store the initial index
}
std::sort(newVec.begin(), newVec.end(), &isSmaller);
//now you can iterate through first k elements and the second components will be the initial indices
}

Not sure about pre-canned algorithms, but take a look at selection algorithms; if you need the top K elements of a set of N values and N is much larger than K, there are much more efficient methods.
If you can create an indexing class (like #user470379's answer -- basically a class that encapsulates a pointer/index to the "real" data which is read-only), then use a priority queue of maximum size K, and add each unsorted element to the priority queue, popping off the bottom-most element when the queue reaches size K+1. In cases like N = 106, K = 100, this handles cases much more simply + efficiently than a full sort.

So you actually need a structure that maps indices to corresponding doubles.
You could use std::multimap class to perform this mapping. As Jason have noted std::map does not allow duplicate keys.
std::vector<double> v; // assume it is populated already
std::multimap<double, int> m;
for (int i = 0; i < v.size(); ++i)
m.insert(std::make_pair(v[i], i));
...
After you've done this you could iterate over first ten elements as map preserves sorting of keys to the elements.

Use multimap for vector's (value, index) to handle dups. Use reverse iterators to walk results in descending order.
#include <multimap>
#include <vector>
using namespace std;
multimap<double, size_t> indices;
vector<double> values;
values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);
size_t i = 0;
for(vector<double>::const_iterator iter = values.begin();
iter != values.end(); ++iter, ++i)
{
indices.insert(make_pair<double,int>(*iter, i));
}
i = 0;
size_t limit = 2;
for (multimap<double, size_t>::const_reverse_iterator iter = indices.rbegin();
iter != indices.rend() && i < limit; ++iter, ++i)
{
cout << "Value " << iter->first << " index " << iter->second << endl;
}
Output is
Value 4 index 3
Value 3 index 2
If you just want the vector indices after sort, use this:
#include <algorithm>
#include <vector>
using namespace std;
vector<double> values;
values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);
sort(values.rbegin(), values.rend());
The top K entries are indexed by 0 to K-1, and appear in descending order. This uses reverse iterators combined with standard sort (using less<double> to achieve descending order when iterated forward. Equivalently:
sort(values.rbegin(), values.rend(), less<double>());
Sample code for the excellent nth_element solution suggested by #Kiril here (K = 125000, N = 500000). I wanted to try this out, so here it is.
vector<double> values;
for (size_t i = 0; i < 500000; ++i)
{
values.push_back(rand());
}
nth_element(values.begin(), values.begin()+375000, values.end());
sort(values.begin()+375000, values.end());
vector<double> results(values.rbegin(), values.rbegin() + values.size() - 375000);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How does std::sort determine the sorting basis? - c++

Related

Find equals value into an array in c++

How to initialize unordered_map directly with fixed element?

Position of median within a list

Is there a sorted container in the STL?

how would I sort a list and get the top K elements? (STL)

Categories

Resources