When is a multiset sorted? Insertion, iteration, both?

When is a multiset sorted? Insertion, iteration, both? - c++

I have a multi-set containing pointers to custom types. I have provided a custom sorter to the multi-set that compares on a particular attribute of the custom type.
If I change the value of the attribute on any given item (in a way that would influence the sorting order). Do I have to remove the item from the set and re-insert it to guarantee ordering? Or anytime I create an iterator (or a foreach loop), I will still get the items in order?
I can make a quick test for myself, but I wanted to know if the behavior would be consistent on any platform and compiler or if it is standard.
Edit: Here is an example I tried. I noticed two things.
In a multi-set if I change the value that is used to compare before removing the key, I can no longer remove it. Otherwise, my original thought of removing and reinserting seems the best way for this to work.
#include <stdio.h>
#include <set>
struct NodePointerCompare;
struct Node {
int priority;
};
struct NodePointerCompare {
bool operator()(const Node* lhs, const Node* rhs) const {
return lhs->priority < rhs->priority;
}
};
int main()
{
Node n1{1};
Node n2{2};
Node n3{3};
std::multiset<Node*, NodePointerCompare> nodes;
nodes.insert(&n1);
nodes.insert(&n2);
nodes.insert(&n3);
printf("First round\n");
for(Node* n : nodes) {
printf("%d\n", n->priority);
}
n1.priority = 10;
printf("Second round\n");
for(Node* n : nodes) {
printf("%d\n", n->priority);
}
n1.priority = 1;
printf("Third round\n");
nodes.erase(&n1);
n1.priority = 10;
nodes.insert(&n1);
for(Node* n : nodes) {
printf("%d\n", n->priority);
}
return 0;
}
This is the output I get
First round
1
2
3
Second round
10
2
3
Third round
2
3
10

http://eel.is/c++draft/associative.reqmts#general-3
For any two keys k1 and k2 in the same container, calling comp(k1, k2) shall always return the same value.
It is simply illegal to change the change the object in a way that affects how it compares to other objects within the associative container.
If you want to do that, you have to get the object out of the container, apply the change to it, and put it back in. Have a look at https://en.cppreference.com/w/cpp/container/multiset/extract if that's what you want to do.

When is a multiset sorted? Insertion, iteration, both?
The standard doesn't specify explicitly, but practically speaking the ordering must be established on insertion.
If I change the value of the attribute on any given item (in a way that would influence the sorting order). Do I have to remove the item from the set and re-insert it to guarantee ordering?
You may not change the ordering of an element while it is in the set.
However, instead of erase + insert element with different walue, you can extract + modify + re-insert which should be slightly more efficient (or significantly, depending on the element type).
Here is an example I tried.
The behaviour of the example is undefined.

The container must remain sorted at all times because begin has constant complexity. Changing the comparison order of elements in the container is undefined behavior per [associative.reqmts.general]/3 (and [res.on.functions]/2.3):
For any two keys k1 and k2 in the same container, calling comp(k1, k2) shall always return the same value.
You can use node handles to efficiently modify elements by temporarily removing them from the container, although for elements that are just pointers the only efficiency is avoiding a memory (de)allocation.

Related

Constraining remove_if on only part of a C++ list

I have a C++11 list of complex elements that are defined by a structure node_info. A node_info element, in particular, contains a field time and is inserted into the list in an ordered fashion according to its time field value. That is, the list contains various node_info elements that are time ordered. I want to remove from this list all the nodes that verify some specific condition specified by coincidence_detect, which I am currently implementing as a predicate for a remove_if operation.
Since my list can be very large (order of 100k -- 10M elements), and for the way I am building my list this coincidence_detect condition is only verified by few (thousands) elements closer to the "lower" end of the list -- that is the one that contains elements whose time value is less than some t_xv, I thought that to improve speed of my code I don't need to run remove_if through the whole list, but just restrict it to all those elements in the list whose time < t_xv.
remove_if() though does not seem however to allow the user to control up to which point I can iterate through the list.
My current code.
The list elements:
struct node_info {
char *type = "x";
int ID = -1;
double time = 0.0;
bool spk = true;
};
The predicate/condition for remove_if:
// Remove all events occurring at t_event
class coincident_events {
double t_event; // Event time
bool spk; // Spike condition
public:
coincident_events(double time,bool spk_) : t_event(time), spk(spk_){}
bool operator()(node_info node_event){
return ((node_event.time==t_event)&&(node_event.spk==spk)&&(strcmp(node_event.type,"x")!=0));
}
};
The actual removing from the list:
void remove_from_list(double t_event, bool spk_){
// Remove all events occurring at t_event
coincident_events coincidence(t_event,spk_);
event_heap.remove_if(coincidence);
}
Pseudo main:
int main(){
// My list
std::list<node_info> event_heap;
...
// Populate list with elements with random time values, yet ordered in ascending order
...
remove_from_list(0.5, true);
return 1;
}
It seems that remove_if may not be ideal in this context. Should I consider instead instantiating an iterator and run an explicit for cycle as suggested for example in this post?

It seems that remove_if may not be ideal in this context. Should I consider instead instantiating an iterator and run an explicit for loop?
Yes and yes. Don't fight to use code that is preventing you from reaching your goals. Keep it simple. Loops are nothing to be ashamed of in C++.

First thing, comparing double exactly is not a good idea as you are subject to floating point errors.
You could always search the point up to where you want to do a search using lower_bound (I assume you list is properly sorted).
The you could use free function algorithm std::remove_if followed by std::erase to remove items between the iterator returned by remove_if and the one returned by lower_bound.
However, doing that you would do multiple passes in the data and you would move nodes so it would affect performance.
See also: https://en.cppreference.com/w/cpp/algorithm/remove
So in the end, it is probably preferable to do you own loop on the whole container and for each each check if it need to be removed. If not, then check if you should break out of the loop.
for (auto it = event_heap.begin(); it != event_heap.end(); )
{
if (coincidence(*it))
{
auto itErase = it;
++it;
event_heap.erase(itErase)
}
else if (it->time < t_xv)
{
++it;
}
else
{
break;
}
}
As you can see, code can easily become quite long for something that should be simple. Thus, if you need to do that kind of algorithm often, consider writing you own generic algorithm.
Also, in practice you might not need to do a complete search for the end using the first solution if you process you data in increasing time order.
Finally, you might consider using an std::set instead. It could lead to simpler and more optimized code.

Thanks. I used your comments and came up with this solution, which seemingly increases speed by a factor of 5-to-10.
void remove_from_list(double t_event,bool spk_){
coincident_events coincidence(t_event,spk_);
for(auto it=event_heap.begin();it!=event_heap.end();){
if(t_event>=it->time){
if(coincidence(*it)) {
it = event_heap.erase(it);
}
else
++it;
}
else
break;
}
}
The idea to make erase return it (as already ++it) was suggested by this other post. Note that in this implementation I am actually erasing all list elements up to t_event value (meaning, I pass whatever I want for t_xv).

Does qStableSort preserve order of equivalent elements?

Suppose I have a QList of 100 MyItem objects inserted in a certain order. Every MyItem has an associated timestamp and some property p, which is not guaranteed to be unique.
struct MyItem {
enum MyProperty { ONE, TWO, THREE };
double timestamp; //unique
MyProperty p; //non-unique
bool operator<(const MyItem& other) const {
return p < other.p;
}
};
Supposing I added my 100 objects in chronological order, if I were to run qStableSort on that container (thereby sorting by p), do I have a guarantee that for a given value of p that they are still in chronological order?

https://en.wikipedia.org/wiki/Category:Stable_sorts
Stable sorting algorithms maintain the relative order of records with equal keys (i.e. values). That is, a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list.
Therefore the keyword stable in qStableSort is referring exactly to what you're asking for.
Note however, that qStableSort is obsoleted in Qt 5.5
Use std::stable_sort instead.
Sorts the items in range [begin, end) in ascending order using a stable sorting algorithm.
If neither of the two items is "less than" the other, the items are taken to be equal. The item that appeared before the other in the original container will still appear first after the sort. This property is often useful when sorting user-visible data.
As per the Qt documentation, you should prefer to use std::stable_sort

Fast search and delete in a std::list of objects

I have a very large list of objects (nodes), and I want to be able to remove/delete elements of the list based on a set of values inside of them.
Preferably in constant time...
The objects (among other things) has values like:
long long int nodeID;
int depth;
int numberOfClusters;
double [] points;
double [][] clusters;
What I need to do is to look through the list, and check if there are any elements that has the same values in all fields except for nodeID.
Right now I'm doing something like this:
for(i = nodes.begin(); i != nodes.end(); i++)
{
for(j = nodes.begin(); j != nodes.end(); j++)
{
if(i != j)
{
if(compareNodes((*i), (*j)))
{
j = nodes.erase (j);
}
}
}
}
Where compareNodes() compares the values inside the two nodes. But this is wildly inefficient.
I'm using erasebecause that seems to be the only way to delete an element in the middle of a std::list.
Optimally, I would like to be able to find an element based on these values, and remove it from the list if it exists.
I am thinking some sort of hash map to find the element (a pointer to the element) in constant time, but even if I can do that, I can't find a way to remove the element without iterating through the list.
It seemes that I have to use erase , but that requires iterating through the list, which means linear complexity in the list size.
There is also remove_if but again, same problem linear complexity in list size.
Is there no way to get remove an element from a std::list without iterating through the whole list?

First off, you can speed up your existing solution by starting j at std::next(i) instead of nodes.begin() (assuming your compareNodes function is commutative).
Second, the hashmap approach sounds viable. But why keep a pointer to the element as a value in the map, when you can keep an iterator? They're both "a thing which references the element," but you can use the iterator to erase the element. And std::list iterators don't invalidate when the list is modified (they're most probably just pointers under the hood).
Thirdly, if you want to encapsulate/automate the lookup & sequential access, you can look into Boost.Multi-index to build a container with both sequential and hashed access.

Erase in multiset

I'm new with STL containers, and right now i'm having some problems working with Multiset.
The problem is with the following two collections:
vector<DataReference*> referenceCol;
multiset<DataCount, DataCountSortingCriterion> orderedCol;
orderedCol mantains some data elements that have two public integer fields: id and count. I'm ordering that structure by the count elements. I may need to increment and decrement the count field from that elements, so, in order to maintain the ordering, i'm using a second collection (referenceCol) which is indexed by the id field and holds a reference (iterator) to the orderedCol collection, so every moment i need to refresh the count i can erase the element from orderedCol quickly (by refering to it in referenceCol), refresh it, and insert it again in its proper place according to the ordering.
The referenceCol is created in the constructor of my class, and has two fields: validReference (bool) that indicates whether the iterator reference is valid or not, and the multiset<....>::iterator variable.
The following methods handle the increment and decrement operations that affect these two collections:
void SomeClass::decrementCount(int index)
{
multiset<DataCount, DataCountSortingCriterion>::iterator it = referenceCol[index]->it;
DataCount dop = *it;
orderedCol.erase(it);
dop.count--;
if (dop.count > 0) {
it = orderedCol.insert(dop);
referenceCol[index]->it = it;
}
else {
referenceCol[index]->validRef = false;
}
}
void SomeClass::incrementCount(int index)
{
DataCount dop;
multiset<DataCount, DataCountSortingCriterion>::iterator it;
if (referenceCol[index]->validRef) {
it = referenceCol[index]->it;
dop = *it;
orderedCol.erase(it); <--------- BOOM!
dop.count++;
}
else {
dop.id = index;
dop.count = 1;
referenceCol[index]->validRef = true;
}
it = orderedCol.insert(dop);
referenceCol[index]->it = it;
}
The problem is that i'm having an error when i try to erase the iterator in the increment operation (look at the BOOM comment from the code).
The error i'm having is this:
"map/set erase iterator outside range"
The only thing that occurs to me is that maybe when erasing elements i may be invalidating other iterators, so those references doesn't hold any more, but i googled it and i found that for multiset, the erase operation only invalidate the erasing elements but no others...
I also checked that in my running example i'm not erasing the element with the problematic index.
Please help! And sorry for my bad english!
Oh, and i'm open to suggestions about better strategies to accomplish the "refresh" of elements in order :)
Thanks in advance!

With only the code you've given us to debug I cannot be certain, but I suspect that you are calling decrementCount(index) such that referenceCol[index]->validRef is false. When this happens your decrementCount method simply calls erase on the iterator without checking validity.
If this were to happen on a formerly invalidated iterator you might see the behavior you're seeing.
As an aside here it appears that you should be using a multimap not a multiset. But again without understanding all of your code I can't say that for sure.

Iterate and erase elments from std::set

I have a std::set and I need to erase similar adjacent elements:
DnaSet::const_iterator next = dna_list.begin();
DnaSet::const_iterator actual = next;
++next;
while(next != dna_list.end()) // cycle over pairs, dna_list is the set
{
if (similar(*actual, *next))
{
Dna dna_temp(*actual); // copy constructor
dna_list.erase(actual); // erase the old one
do
{
dna_temp.mutate(); // change dna_temp
} while(!dna_list.insert(dna_temp).second); // insert dna_temp
}
++actual;
++next;
}
Sometimes the program can't exit from the main loop. I think the problem happens when I erase the last element in the dna_list. What's the correct way to do this task?

Use actual = next rather than ++actual.
Once you erase actual, it is an invalid iterator, so ++actual will behave strangely. next should remain intact, so assigning actual to next should work.

Your best option is to create a comparison functor that uses the similar() predicate. Then all you need to do is construct the set with that comparison functor and you're done. The set itself will see two similar elements as identical and will only let the first one in.
struct lt_different {
bool operator()(int a, int b) {
return a < b && !similar(a, b);
}
private:
bool similar(int a, int b)
{
// TODO:when are two elements similar?
const int EPSILON = 2;
return abs(a - b) < EPSILON;
}
};
// ...
set<int> o; // fill this set with your data
// copy your data to a new set that rejects similar elements
set<int,lt_different> s(o.begin(), o.end(), lt_different());
You can work with set s: insert elements, remove elements, modify elements -- and the set itself will make sure no two similar elements exist in the set.
That said, you can also write an algorithm yourself, if only for an alternative choice. Take a look at std::adjacent_find() from <algorithm>. It will find the first occurrence of two consecutive identical elements; hold on to that position. With that found, find the first element from that point that is different from these elements. You end up with two iterators that denote a range of consecutive, similar elements. You can use the set's erase() method to remove them, as it has an overload that takes two iterators.
Lather, rinse, repeat for the entire set.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

When is a multiset sorted? Insertion, iteration, both? - c++

Related

Constraining remove_if on only part of a C++ list

Does qStableSort preserve order of equivalent elements?

Fast search and delete in a std::list of objects

Erase in multiset

Iterate and erase elments from std::set

Categories

Resources