Does qStableSort preserve order of equivalent elements? - c++

Suppose I have a QList of 100 MyItem objects inserted in a certain order. Every MyItem has an associated timestamp and some property p, which is not guaranteed to be unique.
struct MyItem {
enum MyProperty { ONE, TWO, THREE };
double timestamp; //unique
MyProperty p; //non-unique
bool operator<(const MyItem& other) const {
return p < other.p;
}
};
Supposing I added my 100 objects in chronological order, if I were to run qStableSort on that container (thereby sorting by p), do I have a guarantee that for a given value of p that they are still in chronological order?

https://en.wikipedia.org/wiki/Category:Stable_sorts
Stable sorting algorithms maintain the relative order of records with equal keys (i.e. values). That is, a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list.
Therefore the keyword stable in qStableSort is referring exactly to what you're asking for.
Note however, that qStableSort is obsoleted in Qt 5.5
Use std::stable_sort instead.
Sorts the items in range [begin, end) in ascending order using a stable sorting algorithm.
If neither of the two items is "less than" the other, the items are taken to be equal. The item that appeared before the other in the original container will still appear first after the sort. This property is often useful when sorting user-visible data.
As per the Qt documentation, you should prefer to use std::stable_sort

Related

When is a multiset sorted? Insertion, iteration, both?

I have a multi-set containing pointers to custom types. I have provided a custom sorter to the multi-set that compares on a particular attribute of the custom type.
If I change the value of the attribute on any given item (in a way that would influence the sorting order). Do I have to remove the item from the set and re-insert it to guarantee ordering? Or anytime I create an iterator (or a foreach loop), I will still get the items in order?
I can make a quick test for myself, but I wanted to know if the behavior would be consistent on any platform and compiler or if it is standard.
Edit: Here is an example I tried. I noticed two things.
In a multi-set if I change the value that is used to compare before removing the key, I can no longer remove it. Otherwise, my original thought of removing and reinserting seems the best way for this to work.
#include <stdio.h>
#include <set>
struct NodePointerCompare;
struct Node {
int priority;
};
struct NodePointerCompare {
bool operator()(const Node* lhs, const Node* rhs) const {
return lhs->priority < rhs->priority;
}
};
int main()
{
Node n1{1};
Node n2{2};
Node n3{3};
std::multiset<Node*, NodePointerCompare> nodes;
nodes.insert(&n1);
nodes.insert(&n2);
nodes.insert(&n3);
printf("First round\n");
for(Node* n : nodes) {
printf("%d\n", n->priority);
}
n1.priority = 10;
printf("Second round\n");
for(Node* n : nodes) {
printf("%d\n", n->priority);
}
n1.priority = 1;
printf("Third round\n");
nodes.erase(&n1);
n1.priority = 10;
nodes.insert(&n1);
for(Node* n : nodes) {
printf("%d\n", n->priority);
}
return 0;
}
This is the output I get
First round
1
2
3
Second round
10
2
3
Third round
2
3
10
http://eel.is/c++draft/associative.reqmts#general-3
For any two keys k1 and k2 in the same container, calling comp(k1, k2) shall always return the same value.
It is simply illegal to change the change the object in a way that affects how it compares to other objects within the associative container.
If you want to do that, you have to get the object out of the container, apply the change to it, and put it back in. Have a look at https://en.cppreference.com/w/cpp/container/multiset/extract if that's what you want to do.
When is a multiset sorted? Insertion, iteration, both?
The standard doesn't specify explicitly, but practically speaking the ordering must be established on insertion.
If I change the value of the attribute on any given item (in a way that would influence the sorting order). Do I have to remove the item from the set and re-insert it to guarantee ordering?
You may not change the ordering of an element while it is in the set.
However, instead of erase + insert element with different walue, you can extract + modify + re-insert which should be slightly more efficient (or significantly, depending on the element type).
Here is an example I tried.
The behaviour of the example is undefined.
The container must remain sorted at all times because begin has constant complexity. Changing the comparison order of elements in the container is undefined behavior per [associative.reqmts.general]/3 (and [res.on.functions]/2.3):
For any two keys k1 and k2 in the same container, calling comp(k1, k2) shall always return the same value.
You can use node handles to efficiently modify elements by temporarily removing them from the container, although for elements that are just pointers the only efficiency is avoiding a memory (de)allocation.

Vector Sort Algorithm, sort only elements bigger then 0

I have to sort a vector of structs. Let's say the struct has two members:
Struct game
{
string name;
int rating;
};
So I have created a std::vector<game> games and simple sort them by rating.
std::sort(games.begin(),games.end(), [](game& info1, game& info2)
{
return info1.rating > info2.rating;
});
Everything is alright so far.
The problem is if all games have rating value 0, they mix. Simply put I have to sort only elements with rating bigger than zero. Let's give you an example:
All games are pushed in the vector by names in alphabetic order and rating 0, when a sort is triggered, the alphabet order gets violated.
Example before sort:
"A_Game", "B_Game", "C_Game", "E_Game", "G_Game", etc. (continue with all next letters)
after sort (all games are with rating 0):
"G_Game", "S_Game", "P_Game", "M_Game", "L_Game", "I_Game", etc.
I need to sort only these games that have rating bigger than 0.
Thanks in advance.
You can use std::stable_sort to prevent moving around elements that are not affected by the sorting criteria.
std::stable_sort(games.begin(),games.end(), [](game& info1, game& info2)
{
return info1.rating > info2.rating;
});
std::sort() is not a stable sorting algorithm, i.e., elements with equivalent keys may not preserve the original order between them after being sorted.
You can use std::stable_sort() instead of std::sort():
std::stable_sort(games.begin(),games.end(), [](game& info1, game& info2)
{
return info1.rating > info2.rating;
});
As its name already suggest, std::stable_sort() implements a stable sorting algorithm.
You can use std::stable_sort().
However, you can keep using std::sort() and make the comparator return true for games with the same rating (so the relative order is kept), by changing the condition to
return !(info1.rating < info2.rating)
You can use stable_sort instead of sort. This would be the best option for the question.
You can also modify the sort so that when two games have equal rating, sort alphabetically comparing the two names (or any other condition that might come up in the future). It might look like this.
std::sort(games.begin(),games.end(), [](game& info1, game& info2)
{
if (info1.rating == info2.rating)
return info1.name.compare(info2.name);
return info1.rating > info2.rating;
});
std::sort indeed doesn't guarantee any ordering for when elements compare equal. std::stable_sort guarantees that the original ordering gets kept if it compares equal. (See the other answers)
When in doubt about the original order, I like to explicitly sort with all of the criteria:
std::sort(games.begin(),games.end(), [](game const & info1, game const & info2)
{
if (info1.rating != info2.rating)
return info1.rating > info2.rating;
return info1.name < info2.name;
});
In the above, I prefer to use the following pattern
if member1 different
return compare member1
if member2 different
return compare member2
return compare member<last> OR compare pointers
This pattern is easily recognizable and easy extendable when you add extra members.
Ideally, when you want to use this sorting at other places, you make this a function with an unambiguous name. (Don't use operator< as this causes confusion, since the game titles could as well be used as a logical way of sorting)

Sorting both ID and 2 sets of values using STL containers

I need suggestion to use STL containers in the best possible way to sort 3 sets of data
1. A ID (Integer)
2. First Value (String)
3. Second Value (String)
An example of the data structure is as below:
I want to use map as it is sorted at the time of insert and no need to execute a sorting algorithm separately. Since the ID can repeat it must be a multimap, and each data of a column is linked to each other so the rows will change in order to sort keeping the same values attached to a ID.
Sorting the ID and value is ok, but how do I sort 2 values as multimap can take only one value. From my thinking it will be multimap of a multimap or a struct of the data structure and then STL containers. But I want to make it as simple as possible. I need suggestion on how this can be achieved.
Thanks!
Having a map or a set makes sense if and only if you are going to do many insert/erase operations it. If the data structure is static, storing a vector and sorting it once is way more effective. That said, if I understand your question correctly, I think you don't need a map, but a multiset.
typedef pair<int, pair<string, string>> myDataType;
set<myDataType> mySet;
here, the operator < of pair will take care of the ordering.
If you don't want to refer to the id as elem.first, and to the strings as elem.second.first, and elem.second.second, then you can use a struct and overload operator < for it.
struct myType
{
int id;
string s1;
string s2;
};
bool operator < (const myType& t1, const myType& t2)
{
return make_tuple(t1.id, t1.s1, t1.s2) < make_tuple(t2.id, t2.s1, t2.s2);
}
You could just use a std::set<std::tuple<int, std::string, std::string>>. Tuples are lexicographically compared thus you would get the effect you want for free.
Live Demo
Elements in a multimap are sorted by the Key. You cannot 'sort' multimap. What you can do is to create vector of pairs<Key, Map<Key>::Interator> with elements fulfilling some logical condition and sort vector.

An fast algorithm for sorting and shuffling equal valued entries (preferably by STL's)

I'm currently developing stochastic optimization algorithms and have encountered the following issue (which I imagine appears also in other places): It could be called totally unstable partial sort:
Given a container of size n and a comparator, such that entries may be equally valued.
Return the best k entries, but if values are equal, it should be (nearly) equally probable to receive any of them.
(output order is irrelevant to me, i.e. equal values completely among the best k need not be shuffled. To even have all equal values shuffled is however a related, interesting question and would suffice!)
A very (!) inefficient way would be to use shuffle_randomly and then partial_sort, but one actually only needs to shuffle the block of equally valued entries "at the selection border" (resp. all blocks of equally valued entries, both is much faster). Maybe that Observation is where to start...
I would very much prefer, if someone could provide a solution with STL algorithms (or at least to a large portion), both because they're usually very fast, well encapsulated and OMP-parallelized.
Thanx in advance for any ideas!
You want to partial_sort first. Then, while elements are not equal, return them. If you meet a sequence of equal elements which is larger than the remaining k, shuffle and return first k. Else return all and continue.
Not fully understanding your issue, but if you it were me solving this issue (if I am reading it correctly) ...
Since it appears you will have to traverse the given object anyway, you might as well build a copy of it for your results, sort it upon insert, and randomize your "equal" items as you insert.
In other words, copy the items from the given container into an STL list but overload the comparison operator to create a B-Tree, and if two items are equal on insert randomly choose to insert it before or after the current item.
This way it's optimally traversed (since it's a tree) and you get the random order of the items that are equal each time the list is built.
It's double the memory, but I was reading this as you didn't want to alter the original list. If you don't care about losing the original, delete each item from the original as you insert into your new list. The worst traversal will be the first time you call your function since the passed in list might be unsorted. But since you are replacing the list with your sorted copy, future runs should be much faster and you can pick a better pivot point for your tree by assigning the root node as the element at length() / 2.
Hope this is helpful, sounds like a neat project. :)
If you really mean that output order is irrelevant, then you want std::nth_element, rather than std::partial_sort, since it is generally somewhat faster. Note that std::nth_element puts the nth element in the right position, so you can do the following, which is 100% standard algorithm invocations (warning: not tested very well; fencepost error possibilities abound):
template<typename RandomIterator, typename Compare>
void best_n(RandomIterator first,
RandomIterator nth,
RandomIterator limit,
Compare cmp) {
using ref = typename std::iterator_traits<RandomIterator>::reference;
std::nth_element(first, nth, limit, cmp);
auto p = std::partition(first, nth, [&](ref a){return cmp(a, *nth);});
auto q = std::partition(nth + 1, limit, [&](ref a){return !cmp(*nth, a);});
std::random_shuffle(p, q); // See note
}
The function takes three iterators, like nth_element, where nth is an iterator to the nth element, which means that it is begin() + (n - 1)).
Edit: Note that this is different from most STL algorithms, in that it is effectively an inclusive range. In particular, it is UB if nth == limit, since it is required that *nth be valid. Furthermore, there is no way to request the best 0 elements, just as there is no way to ask for the 0th element with std::nth_element. You might prefer it with a different interface; do feel free to do so.
Or you might call it like this, after requiring that 0 < k <= n:
best_n(container.begin(), container.begin()+(k-1), container.end(), cmp);
It first uses nth_element to put the "best" k elements in positions 0..k-1, guaranteeing that the kth element (or one of them, anyway) is at position k-1. It then repartitions the elements preceding position k-1 so that the equal elements are at the end, and the elements following position k-1 so that the equal elements are at the beginning. Finally, it shuffles the equal elements.
nth_element is O(n); the two partition operations sum up to O(n); and random_shuffle is O(r) where r is the number of equal elements shuffled. I think that all sums up to O(n) so it's optimally scalable, but it may or may not be the fastest solution.
Note: You should use std::shuffle instead of std::random_shuffle, passing a uniform random number generator through to best_n. But I was too lazy to write all the boilerplate to do that and test it. Sorry.
If you don't mind sorting the whole list, there is a simple answer. Randomize the result in your comparator for equivalent elements.
std::sort(validLocations.begin(), validLocations.end(),
[&](const Point& i_point1, const Point& i_point2)
{
if (i_point1.mX == i_point2.mX)
{
return Rand(1.0f) < 0.5;
}
else
{
return i_point1.mX < i_point2.mX;
}
});

Is the unordered_map really unordered?

I am very confused by the name 'unordered_map'. The name suggests that the keys are not ordered at all. But I always thought they are ordered by their hash value. Or is that wrong (because the name implies that they are not ordered)?
Or to put it different: Is this
typedef map<K, V, HashComp<K> > HashMap;
with
template<typename T>
struct HashComp {
bool operator<(const T& v1, const T& v2) const {
return hash<T>()(v1) < hash<T>()(v2);
}
};
the same as
typedef unordered_map<K, V> HashMap;
? (OK, not exactly, STL will complain here because there may be keys k1,k2 and neither k1 < k2 nor k2 < k1. You would need to use multimap and overwrite the equal-check.)
Or again differently: When I iterate through them, can I assume that the key-list is ordered by their hash value?
In answer to your edited question, no those two snippets are not equivalent at all. std::map stores nodes in a tree structure, unordered_map stores them in a hashtable*.
Keys are not stored in order of their "hash value" because they're not stored in any order at all. They are instead stored in "buckets" where each bucket corresponds to a range of hash values. Basically, the implementation goes like this:
function add_value(object key, object value) {
int hash = key.getHash();
int bucket_index = hash % NUM_BUCKETS;
if (buckets[bucket_index] == null) {
buckets[bucket_index] = new linked_list();
}
buckets[bucket_index].add(new key_value(key, value));
}
function get_value(object key) {
int hash = key.getHash();
int bucket_index = hash % NUM_BUCKETS;
if (buckets[bucket_index] == null) {
return null;
}
foreach(key_value kv in buckets[bucket_index]) {
if (kv.key == key) {
return kv.value;
}
}
}
Obviously that's a serious simplification and real implementation would be much more advanced (for example, supporting resizing the buckets array, maybe using a tree structure instead of linked list for the buckets, and so on), but that should give an idea of how you can't get back the values in any particular order. See wikipedia for more information.
* Technically, the internal implementation of std::map and unordered_map are implementation-defined, but the standard requires certain Big-O complexity for operations that implies those internal implementations
"Unordered" doesn't mean that there isn't a linear sequence somewhere in the implementation. It means "you can't assume anything about the order of these elements".
For example, people often assume that entries will come out of a hash map in the same order they were put in. But they don't, because the entries are unordered.
As for "ordered by their hash value": hash values are generally taken from the full range of integers, but hash maps don't have 2**32 slots in them. The hash value's range will be reduced to the number of slots by taking it modulo the number of slots. Further, as you add entries to a hash map, it might change size to accommodate the new values. This can cause all the previous entries to be re-placed, changing their order.
In an unordered data structure, you can't assume anything about the order of the entries.
As the name unordered_map suggests, no ordering is specified by the C++0x standard. An unordered_map's apparent ordering will be dependent on whatever is convenient for the actual implementation.
If you want an analogy, look at the RDBMS of your choice.
If you don't specify an ORDER BY clause when performing a query, the results are returned "unordered" - that is, in whatever order the database feels like. The order is not specified, and the system is free to "order" them however it likes in order to get the best performance.
You are right, unordered_map is actually hash ordered. Note that most current implementations (pre TR1) call it hash_map.
The IBM C/C++ compiler documentation remarks that if you have an optimal hash function, the number of operations performed during lookup, insertion, and removal of an arbitrary element does not depend on the number of elements in the sequence, so this mean that the order is not so unordered...
Now, what does it mean that it is hash ordered? As an hash should be unpredictable, by definition you can't take any assumption about the order of the elements in the map. This is the reason why it has been renamed in TR1: the old name suggested an order. Now we know that an order is actually used, but you can disregard it as it is unpredictable.