what happens when you modify an element of an std::set?

what happens when you modify an element of an std::set? - c++

If I change an element of an std::set, for example, through an iterator, I know it is not "reinserted" or "resorted", but is there any mention of if it triggers undefined behavior? For example, I would imagine insertions would screw up. Is there any mention of specifically what happens?

You should not edit the values stored in the set directly. I copied this from MSDN documentation which is somewhat authoritative:
The STL container class set is used
for the storage and retrieval of data
from a collection in which the values
of the elements contained are unique
and serve as the key values according
to which the data is automatically
ordered. The value of an element in a
set may not be changed directly.
Instead, you must delete old values
and insert elements with new values.
Why this is is pretty easy to understand. The set implementation will have no way of knowing you have modified the value behind its back. The normal implementation is a red-black tree. Having changed the value, the position in the tree for that instance will be wrong. You would expect to see all manner of wrong behaviour, such as exists queries returning the wrong result on account of the search going down the wrong branch of the tree.

The precise answer is platform dependant but as a general rule, a "key" (the stuff you put in a set or the first type of a map) is suppose to be "immutable". To put it simply, that should not be modified, and there is no such thing as automatic re-insertion.
More precisely, the member variables used for to compare the key must not be modified.
Windows vc compiler is quite flexible (tested with VC8) and this code compile:
// creation
std::set<int> toto;
toto.insert(4);
toto.insert(40);
toto.insert(25);
// bad modif
(*toto.begin())=100;
// output
for(std::set<int>::iterator it = toto.begin(); it != toto.end(); ++it)
{
std::cout<<*it<<" ";
}
std::cout<<std::endl;
The output is 100 25 40, which is obviously not sorted... Bad...
Still, such behavior is useful when you want to modify data not participating in the operator <. But you better know what you're doing: that's the price you get for being too flexible.
Some might prefer gcc behavior (tested with 3.4.4) which gives the error "assignment of read-only location". You can work around it with a const_cast:
const_cast<int&>(*toto.begin())=100;
That's now compiling on gcc as well, same output: 100 25 40.
But at least, doing so will probably makes you wonder what's happening, then go to stack overflow and see this thread :-)

You cannot do this; they are const. There exists no method by which the set can detect you making a change to the internal element, and as a result you cannot do so. Instead, you have to remove and reinsert the element. If you are using elements that are expensive to copy, you may have to switch to using pointers and custom comparators (or switch to a C++1x compiler that supports rvalue references, which would make things a whole lot nicer).

Related

Reference counting in a collection

Let's have a collection of objects (say string is type of collection). I want each element of collection to have a reference count. So, on Add-Usage it should increment count for this given element.
coll.AddUsage("SomeElement"); // Type doesn't matter - but should increase count
On, Release-Usage, it should decrement the reference count for given element, and if count reaches 0, then it should remove the element from collection.
It is not important if AddUsage will allocate element (and set reference-count to 1), or would fail altogether (since element didn't exist). Important thing is RemoveUsage, which should remove given element (object) from collection.
I thought of using vector of a pair (or a custom struct), or using any kind of map/multimap. There exists no existing class in C++ library (may be out of thread-support library, one atomic classes, shared-pointer classes etc).
Question:
So, my question is how to implement such idea, using existing C++ library? It should be thread safe. Yes, C++11/14 is perfectly okay for me. If good idea is there, I would probably craft it on top of templates.

Assuming you ask for a data structure to implement your reference-counting collection...
Use a map<K,V> with K as the type of collection elements (in your example string) and V a type to keep track of meta-information about the element (e.g. reference count). The simplest case is when V is int.
Then, AddUsage is simple, just do refMap[value]++. For RemoveUsage just do a refMap[value]--, then check if the counter hit zero and remove the value from the map.
You need to add error handling too, since AddUsage / RemoveUsage may be
called with an object which is not in the map (not added to the collection)
EDIT: You tagged your question with "multithreading", so you probably want to have a mutex of some sort which guards the concurrent access to refMap.

You could implement something similar to shared_ptr class but extending it to hold collection of objects.
Like you could design a class with map/multimap as its data member. Key would be your object and value be your reference count.As far as interface is concerned just expose two methods:-
AddUsage(Object);
RemoveUsage(Object);
In your AddUsage method you would first check if element already exists in map.If yes then only increment the count. Likewise you would handle RemoveUsage.Object would be deleted from map if its reference count reaches zero.
This is just my opinion. Please let me know if there are any bottlenecks in this implementation.

You can use static member(integer) variable in the structure or class. Increment or decrement whereever you want. Remove the element if the value is zero.

Cache behaviour of std::vector

My question is simple enough, given the following data structure, std::vector<std::pair<int, std::unique_ptr<foo>>>, if I have the following:
auto it = std::find_if(begin(v), end(v), [&](std::pair<...> const& p){ return p.first == some_value; });
Can I expect that whatever is pointed to by the pointer is not fetched (I don't want it fetched, will later pre-fetch as needed) into the cache purely for the find operation? Or is this impossible to determine (if so I will close the question..)

When "find" searches in a vector, it will look at the value of the entry in the vector, and match that with what you are searching for. So, it will use whatever "equal" function that is provided by the find, or "operator==" if there is no function provided to find.
Since in this case, you are just comparing the int value in the pair with your expected value, the unique_ptr<foo> will not be dereferenced (and thus data pointed to by unique_ptr<foo> will not enter the cache).

strictly speaking you don't... nowadays I see no reason why this loop should prefetch smth from stored pointer. from machine's point of view you've iterated over a contiguous memory block full of ints and pointer, where only ints are accessed... no reason to prefetch pointers content...
but, maybe later in you code (quite near find_if) there is another loop which would dereference that pointers, so damn smart compiler could decide to insert a fetch instructions in a first loop (this wouldn't affect find_if anyway, so it could!)... we don't know -- it is compiler + optimization-options + architecture depended... we even don't know that next Intel's BlahBlahBridge won't do it w/o any compiler instructions...

C++ Deleting objects from memory

Lets say I have allocated some memory and have filled it with a set of objects of the same type, we'll call these components.
Say one of these components needs to be removed, what is a good way of doing this such that the "hole" created by the component can be tested for and skipped by a loop iterating over the set of objects?
The inverse should also be true, I would like to be able to test for a hole in order to store new components in the space.
I'm thinking menclear & checking for 0...

boost::optional<component> seems to fit your needs exactly. Put those in your storage, whatever that happens to be. For example, with std::vector
// initialize the vector with 100 non-components
std::vector<boost::optional<component>> components(100);
// adding a component at position 15
components[15].reset(component(x,y,z));
// deleting a component at position 82
componetnts[82].reset()
// looping through and checking for existence
for (auto& opt : components)
{
if (opt) // component exists
{
operate_on_component(*opt);
}
else // component does not exist
{
// whatever
}
}
// move components to the front, non-components to the back
std::parition(components.begin(), components.end(),
[](boost::optional<component> const& opt) -> bool { return opt; });

The short answer is it depends on how you store it in memmory.
For example, the ansi standard suggests that vectors be allocated contiguously.
If you can predict the size of the object, you may be able to use a function such as size_of and addressing to be able to predict the location in memory.
Good luck.

There are at least two solutions:
1) mark hole with some flag and then skip it when processing. Benefit: 'deletion' is very fast (only set a flag). If object is not that small even adding a "bool alive" flag can be not so hard to do.
2) move a hole at the end of the pool and replace it with some 'alive' object.
this problem is related to storing and processing particle systems, you could find some suggestions there.

If it is not possible to move the "live" components up, or reorder them such that there is no hole in the middle of the sequence, then the best option if to give the component objects a "deleted" flag/state that can be tested through a member function.
Such a "deleted" state does not cause the object to be removed from memory (that is just not possible in the middle of a larger block), but it does make it possible to mark the spot as not being in use for a component.

When you say you have "allocated some memory" you are likely talking about an array. Arrays are great because they have virtually no overhead and extremely fast access by index. But the bad thing about arrays is that they aren't very friendly for resizing. When you remove an element in the middle, all following elements have to be shifted back by one position.
But fortunately there are other data structures you can use, like a linked list or a binary tree, which allow quick removal of elements. C++ even implements these in the container classes std::list and std::set.
A list is great when you don't know beforehand how many elements you need, because it can shrink and grow dynamically without wasting any memory when you remove or add any elements. Also, adding and removing elements is very fast, no matter if you insert them at the beginning, in the end, or even somewhere in the middle.
A set is great for quick lookup. When you have an object and you want to know if it's already in the set, checking it is very quick. A set also automatically discards duplicates which is really useful in many situations (when you need duplicates, there is the std::multiset). Just like a list it adapts dynamically, but adding new objects isn't as fast as in a list (not as expensive as in an array, though).

Two suggestions:
1) You can use a Linked List to store your components, and then not worry about holes.
Or if you need these holes:
2) You can wrap your component into an object with a pointer to the component like so:
class ComponentWrap : public
{
Component component;
}
and use ComponentWrap.component == null to find if the component is deleted.
Exception way:
3) Put your code in a try catch block in case you hit a null pointer error.

getting a C++ std::set's members by index

Is there a way to use one of the stl algorithms define in to get a member of a set using its index position in the set?
I could use a utility method like the one below, but I've got to think this exists already in some generic form in the stl:
ElementPtr elementAt(int elementNumber)
{
list<ElementPtr>::iterator elementIt = elements.begin();
for (int counter = 0; counter < elementNumber && elementIt != elements.end(); counter++, elementIt++)
{
}
return *elementIt;
}

#include <iterator>
list<ElementPtr>::iterator elementIt = elements.begin();
std::advance(elementIt, elementNumber);
x = *elementIt;
Which does essentially what your code does.
But the fact that you want to do this most likely indicates that you're data structures are wrong. Sets are not designed to be processed like this.

There isn't a usable index mechanism if it's implemented as a binary tree or a hash table, both of which are common for sets.

Are you actually using the right container type? Consider using a sorted vector instead.

You could do this using Boost.MultiIndex to build both ordering and random access indices on the same underlying data.

I don't believe so, as "index-of" doesn't really make sense in terms of a generalized std::set. Unless your set is constructed (and initialized) once and never changed, then you cannot guarantee that the results of calls to the index-of operator would always return a predictable result.

The best you are going to get is an iterator. Sets are containers where the value is the index (well, more of a reference in a hash table). Maybe we could better answer your question if we knew what you were trying to do.
I think you are equating a set to an array; they are structured quite differently, a numerical index does not apply.

You say set, but your code actually indicates list. The two are not the same. Sets are designed to have their elements retrieved by their value. Lists, you can just advance along them using std::advance.

There is no such thing as a numerical index into a set. You need to use a vector instead. And what's more, if you do happen to "get the nth item" in the set, it is not guaranteed it will be there (in the same place) after the set is modified.

STL sorted set where the conditions of order may change

I have a C++ STL set with a custom ordering defined.
The idea was that when items get added to the set, they're naturally ordered as I want them.
However, what I've just realised is that the ordering predicate can change as time goes by.
Presumably, the items in the set will then no longer be in order.
So two questions really:
Is it harmful that the items would then be out of order? Am I right in saying that the worst that can happen is that new entries may get put into the wrong place (which actually I can live with). Or, could this cause crashes, lost entries etc?
Is there a way to "refresh" the ordering of the set? You can't seem to use std::sort() on a set. The best I can come up with is dumping out the contents to a temp container and re-add them.
Any ideas?
Thanks,
John

set uses the ordering to lookup items. If you would insert N items according to ordering1 and insert an item according to ordering2, the set cannot find out if the item is already in.
It will violate the class invariant that every item is in there only once.
So it does harm.

The only safe way to do this with the STL is to create a new set with the changed predicate. For example you could do something like this when you needed to sort the set with a new predicate:
std::set<int> newset( oldset.begin(), oldset.end(), NewPred() );

This is actually implementation dependent.
The STL implementation can and usually will assumes the predicate used for sorting is stable (otherwise, "sorted" would not be defined). It is at least possible to construct a valid STL implementation that formats your hard drive when you change the behavior of the predicate instance.
So, yes, you need to re-insert the items into a new set.
Alternatively, you could construct your own container, e.g. a vector + sort + lower_bound for binary search. Then you could re-sort when the predicates behavior changes.

I agree with the other answers, that this is going to break in some strange and hard to debug ways. If you go the refresh route, you only need to do the copy once. Create a tmp set with the new sorting strategy, add each element from the original set to the tmp set, then do
orig.swap(tmp);
This will swap the internals of the sets.
If this were me, I would wrap this up in a new class that handles all of the details, so that you can change implementations as needed. Depending on your access patterns and the number of times the sort order changes, the previously mentioned vector, sort, lowerbound solution may be preferable.

If you can live with an unordered set, then why are you adding them into a set in the first place?
The only case I can think of is where you just want to make sure the list is unique when you add them. If that's the case then you could use a temporary set to protect additions:
if (ts.insert (value).second) {
// insertion took place
realContainer.push_back (value);
}
An alternative, is that depending on how frequently you'll be modifying the entries in the set, you can probably test to see if the entry will be in a different location (by using the set compare functionality) and where the position will move then remove the old entry and re-add the new one.
As everyone else has pointed out - having the set unordered really smells bad - and I would also guess that its possible got undefined behaviour according to the std.

While this doesn't give you exactly what you want, boost::multi_index gives you similar functionality. Due to the way templates work, you will never be able to "change" the ordering predicate for a container, it is set in stone at compile time, unless you are using a sorted vector or something similar, to where you are the one maintaining the invariant, and you can sort it however you want at any given time.
Multi_index however gives you a way to order a set of elements based on multiple ordering predicates at the same time. You can then select views of the container that behave like an std::set ordered by the predicate that you care about at the time.

This can cause lost entries, when searching for an element in a set the ordering operator is used this means that if an element was placed to the left of the root and now the ordering operator says it's to the right then that element will not longer be found.

Here's a simple test for you:
struct comparer : public std::binary_function<int, int, bool>
{
static enum CompareType {CT_LESS, CT_GREATER} CompareMode;
bool operator()(int lhs, int rhs) const
{
if(CompareMode == CT_LESS)
{
return lhs < rhs;
}
else
{
return lhs > rhs;
}
}
};
comparer::CompareType comparer::CompareMode = comparer::CT_LESS;
typedef std::set<int, comparer> is_compare_t;
void check(const is_compare_t &is, int v)
{
is_compare_t::const_iterator it = is.find(v);
if(it != is.end())
{
std::cout << "HAS " << v << std::endl;
}
else
{
std::cout << "ERROR NO " << v << std::endl;
}
}
int main()
{
is_compare_t is;
is.insert(20);
is.insert(5);
check(is, 5);
comparer::CompareMode = comparer::CT_GREATER;
check(is, 5);
is.insert(27);
check(is, 27);
comparer::CompareMode = comparer::CT_LESS;
check(is, 5);
check(is, 27);
return 0;
}
So, basically if you intend to be able to find the elements you once inserted you should not change the predicate used for insertions and find.

Just a follow up:
While running this code the Visual Studio C debug libraries started throwing exceptions complaining that the "<" operator was invalid.
So, it does seem that changing the sort ordering is a bad thing. Thanks everyone!

1) Harmful - no. Result in crashes - no. The worst is indeed a non-sorted set.
2) "Refreshing" would be the same as re-adding anyway!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

what happens when you modify an element of an std::set? - c++

If I change an element of an std::set, for example, through an iterator, I know it is not "reinserted" or "resorted", but is there any mention of if it triggers undefined behavior? For example, I would imagine insertions would screw up. Is there any mention of specifically what happens?

Related

Reference counting in a collection

Cache behaviour of std::vector

C++ Deleting objects from memory

getting a C++ std::set's members by index

STL sorted set where the conditions of order may change

Categories

Resources