Result of invalid iterator passed to std::unordered_set::erase()

Result of invalid iterator passed to std::unordered_set::erase() - c++

std::unordered_set::erase() has 3 overloads: In the one taking a reference, passing an "invalid" value, i.e. one that doesn't exist in the set, simply makes erase() return 0. But what about the other two overloads?
Does the C++11 standard say what erase() should do in this case, or it's compiler dependent? Is it supposed to return end() or undefined behavior?
I couldn't find an answer in the specification, cppreference.com, cplusplus.com. On IBM site they say it returns end() if no element remains after the operation, but what happens if the operation itself fails due to an invalid iterator?
And in general, do erase() methods for STL containers simply have undefined behavior in these case?
(so I need to check my iterators before I pass any to erase(), or use the unordered_set::erase() overload which takes a value_type reference, which would simply return 0 if it fails)

There is a big semantic difference between trying to remove a value that doesn't occur in set and trying to erase from a invalid iterator.
Trying to use an invalid iterator is undefined behaviour and will end badly.
Do you have a specific use-case you are thinking of when you might want to erase an invalid iterator?

These are two completely different cases. There is no "invalid value", values that don't exist in the set are still valid. So you pass a valid value that s not contained in the set and thus get 0 returned - no elements have been erased.
The other overloads are completely different. The standard requires the iterators passed to the erase methods to be "valid and dereferencable" and "a valid iterator range", respectively. Otherwise the behavior is undefined.
So yes, iterators have to be valid. But you cannot check if an iterator is valid programmatically - you have to make sure from your program logic, that they are.

Related

What does end() refere to in the following example? [duplicate]

This question already has an answer here:
Behavior when dereferencing the .end() of a vector of strings
(1 answer)
Closed last year.
I have a list of sets:
std::list<std::set<int>> nn = {{1,2},{4,5,6}};
and I want to print out the element which end() refers to:
for (auto el : nn){
std::cout << *el.end() << std::endl;
}
What I get as a result is:
2 and 3.
I do not know where do these values come from. Can someone help me plz?

Question 1
What does end() refere to
Answer
end() is a public member function of std::set that returns an iterator to the past-the-end element in the set container.
Question 2
I do not know where do these values come from.
Answer
When you wrote:
std::cout << *el.end() << std::endl;//this is undefined behavior
In the above statement you are dereferencing the iterator that was returned by the end() member function.
But note that if we dereference the iterator that was returned by this member function then we get undefined behavior.
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior.
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.

It is not allowed to de-reference the end() iterator. Doing so causes undefined behavior. It doesn't refer to any element of the container, but one past the last element.
The reason that end() "points" after the last element, is that it is necessary to distinguish empty containers. If end() was referring to the last element and begin() to the first, then if begin() == end() that would mean that there is one element in the container and we can't distinguish the case of an empty container.
For containers that support it, to access the last element of the container you can use .back(), which will return a reference, not an iterator. But this is only allowed if there is a last element, i.e. if the container is not empty. Otherwise you have again undefined behavior. So check .empty() first if necessary.
std::set does not have the back() member and is not really intended to be used this way, but if you really want to access the last element, which is not the last element in the constructor initializer list, but the last element in the < order of the elements, then you can use std::prev(el.end()) or el.rbegin() ("reverse begin") which will give you an iterator to the last element. Again, dereferencing this iterator is only allowed if the container is not empty. (For std::prev forming the iterator itself isn't even allowed if the container is empty.)
Undefined behavior means that you will have no guarantees on the program behavior. It could output something in one run and something else in another. It could also output nothing, etc.
There is no requirement that you will get the output you see. For example, current x86_64 Clang with libc++ as standard library implementation, compiles (with or without optimization) a program that prints 0 twice. https://godbolt.org/z/nWMss1fqe
Practically speaking, assuming the compiler didn't take advantage of the undefined behavior for an optimization that drastically changes the program from the "intended" program flow, you will likely, depending on the implementation of the standard library, read some internal memory of the std::set implementation in the standard library, get a segmentation fault if the indirection points to inaccessible memory or incidentally (with no guarantees) refer to other values in the container.

Why does the iterator to set::end in C++ dereferences to the number of elements in the set?

In C++-STL, set::end() returns an iterator pointing to past-the-last element of the set container. Since it does not refer to a valid element, it cannot de-referenced end() function returns a bidirectional iterator.
But when I execute the following code:
set<int> s;
s.insert(1);
s.insert(4);
s.insert(2);
// iterator pointing to the end
auto pos2 = s.end();
cout<<*pos2;
it prints 3 as output. The output increases as I insert more elements to the set and is always equal to the total number of elements in the set.
Why does this happen?

Dereferencing the end() iterator is undefined behavior, so anything is allowed to happen. Ideally you'd get a crash, but unfortunately that doesn't seem to be the case here and everything "seems" to work.

Since it does not refer to a valid element, it cannot de-referenced
It can, as your test code demonstrated. However, it shouldn't be dereferenced.

Although it is undefined behaviour, in this particular case the observed behaviour could be due to implementation details of the standard library in use.
std::set::size() has O(1) complexity, but std::set is a node-based container (internally a binary search tree). So the size needs to be stored somewhere withing the data structure. It could be that the end() iterator points at a location that doubles as storage for the size, and by pure chance, you're able to access it.

When I should use std::map::at to retrieve map element

I have read different articles on web and questions at stackoverflow, but for me it is not clear is there any exclusive case when it is better to use std::map::at to retrieve map element.
According to definition, std::map::at
Returns a reference to the mapped value of the element identified with
key k.
If k does not match the key of any element in the container, the
function throws an out_of_range exception.
For me only case when it is worth to use std::map::at when you 100% sure that element with particular key exist, otherwise you should consider exception handling.
Is there any case where std::map::at considered as most efficient and elegant way to do? In what cases you will recommend to use std::map::at ?
Am I right that it is better to use map::find() when there is a possibility to not have element with such a key? And map::find() it is faster and more elegant approach?
if ( map.find("key") != map.end() )
{
// found
} else
{
// not found
}
p.s
map::operator[] sometimes can be dangerous, because if an element doesn't exist then it will inserts it.
EDITED: links somehow related link 1 link 2 link 3 link 4 link 5 link 6

Contrary to most existing answers here, note that there are actually 4 methods related to finding an element in a map (ignoring lower_bound, upper_bound and equal_range, which are less precise):
operator[] only exist in non-const version, as noted it will create the element if it does not exist
at(), introduced in C++11, returns a reference to the element if it exists and throws an exception otherwise
find() returns an iterator to the element if it exists or an iterator to map::end() if it does not
count() returns the number of such elements, in a map, this is 0 or 1
Now that the semantics are clear, let us review when to use which:
if you only wish to know whether an element is present in the map (or not), then use count().
if you wish to access the element, and it shall be in the map, then use at().
if you wish to access the element, and do not know whether it is in the map or not, then use find(); do not forget to check that the resulting iterator is not equal to the result of end().
finally, if you wish to access the element if it exists or create it (and access it) if it does not, use operator[]; if you do not wish to call the type default constructor to create it, then use either insert or emplace appropriately

std::map::at() throws an out_of_range exception if the element could not be found. This exception is a kind of logic_error exception which for me is a kind of synonym of assert() from the usage standpoint: it should be used to report errors in the internal logic of the program, like violation of logical preconditions or class invariants.
Also, you can use at() to access const maps.
So, for your questions:
I will recommend using at() instead of [] when accessing const maps and when element absence is a logic error.
Yes, it's better to use map::find() when you're not sure element is here: in this case it's not a logic error and so throwing and catching std::logic_error exception will not be very elegant way of programming, even if we don't think about performance.

As you noted, there are three different ways to access elements in a map: at(), operator[] and find() (there are also upper_bound, lower_bound and equal_range, but those are for more complicated circumstances where you might want to find a next/previous element etc.)
So, when should you use which one?
operator[] is basically "if it does not exist, create one with a default-constructed mapped element". That means it won't throw (except in the corner cases when the memory allocation throws or one of the key or value constructors throw), and you definitely get a reference to the element you looked for - either the existing one or the newly created.
at() throws if there is no element for that key. Since you should not use exceptions for normal program flow, using at() is saying "I am sure there is such an element." But with the added benefit that you get an exception (and not undefined behavior) if you are wrong. Don't use this if you are not positive that the element exists.
find() says "there may or may not be such an element, let's see..." and offers you the possibility to react to both cases differently. It therefore is the more general approach.

All 3 of find, operator[] and at are useful.
find is good if you don't want to accidentally insert elements, but merely act if they exist.
at is good if you expect that something should be on a map and you'd throw an exception if it wasn't anyway. It can also access const maps in a more concise matter than find (where you can't use op[])
op[] is good if you want to insert a default element, such as for the word counting program which puts an int 0 for every word encountered for the first time (with the idiom words[word]++;).

This depends on what the requirements are for this function and how you are structuring the project. If you are supposed to return an object and you can't because it was not found then it leaves you with two options on how to handle that. You could through an exception or you could return some sort of sentinel that means nothing was found. If you want to throw an exception then use at() as the exception will be thrown for you. If you do not want to throw an exception then use find() so you do not have to deal with handling an exception just to return a sentinel object.

I think, it depends on your usecase. The return type of std::map::at() is an lvalue reference to the value of the found element, while std::map::find() returns an iterator. You might prefer
return myMap.at("asdf"s) + 42;
in expressions over the more elaborate
return myMap.find("asdf"s)->second + 42;
Whenever you use the result of std::map::at() in an expression, you expect the element to exist, and regard a missing element as an error. So an exception is a good choice to handle that.

I guess the difference is semantics.
std::map::at() looks like this on my machine:
mapped_type&
at(const key_type& __k)
{
iterator __i = lower_bound(__k);
if (__i == end() || key_comp()(__k, (*__i).first))
__throw_out_of_range(__N("map::at"));
return (*__i).second;
}
As you can see, it uses lower_bound, then checks for end(), compares keys, and throws the exception where needed.
find() looks like this:
iterator
find(const key_type& __x)
{ return _M_t.find(__x); }
where _M_t is a red-black tree that stores the actual data. Obviously, both function have the same (logarithmic) complexity. When you use find() + check for end(), you are doing almost the same thing that at does. I would say the semantic difference is:
use at() when you need an element at a specific location, and you assume that it is there. In this case, the situation of the element missing from the desired place is exceptional, thus at() throws an exception.
use find() when you need to find the element in the map. In this case the situation when the element is not present is normal. Also note that find() returns an iterator which you may use for purposes other than simply obtaining it's value.

map::at() returns a l-value reference, and when you return by reference, you can use all its available benefits such as method chaining.
example:
map<int,typ> table;
table[98]=a;
table[99]=b;
table.at(98)=table.at(99);
operator[] also returns the mapped value by reference, but it may insert a value if searched for key is not found, in which case container size increases by one.
This requires you to be extra cautious since you have to take care of iterator invalidation.
Am I right that it is better to use map::find() when there is a
possibility to not have element with such a key? And map::find() it is
faster and more elegant approach?
Yes, semantically it makes sense to use find() when you are not sure of the existence of element.Makes the code easier to understand even for a newbie.
As for the time efficiency, map is generally implemented as a RB-tree/some balanced binary search tree and hence, complexity is O(logN) for find().
C++ Spec:
T& operator[](const key_type& x);
Effects: If there is no key equivalent to x in the map, inserts value_type(x, T()) into the map.
Requires: key_type shall be CopyInsertable and mapped_type shall be
DefaultInsertable into *this. Returns: A reference to the
mapped_type corresponding to x in *this. 4 Complexity: Logarithmic.
T& at(const key_type& x);
const T& at(const key_type& x) const;
Returns: A reference to the mapped_type corresponding to x in *this.
Throws: An exception object of type out_of_range if no such element present.
Complexity: Logarithmic.

Access violation when assigning an uninitialized iterator

I have a problem with assigning an unintialized to an initialized iterator. The following code excerpt produces an access violation when built with Visual Studio 2010. In previous versions of Visual Studio the code should work.
#include <list>
int main() {
std::list<int> list;
std::list<int>::iterator it = list.begin();
std::list<int>::iterator jt;
it = jt; // crashes in VS 2010
}
Wouldn't this be considered valid C++?
I need this code to implement a "cursor" class that either points nowhere or to a specific element in a list. What else could I use as a value for an uninitialized iterator if I don't have a reference to my container yet?

it = jt; // crashes in VS 2010
This invokes undefined behaviour (UB). According to the C++ Standard ,jt is a singular iterator which is not associated with any container, and results of most expressions are undefined for singular iterator.
The section §24.1/5 from the C++ Standard (2003) reads (see the bold text specifically),
Just as a regular pointer to an array
guarantees that there is a pointer
value pointing past the last element
of the array, so for any iterator type
there is an iterator value that points
past the last element of a
corresponding container. These values
are called past-the-end values. Values
of an iterator i for which the
expression *i is defined are called
dereferenceable. The library never
assumes that past-the-end values are
dereferenceable. Iterators can also
have singular values that are not
associated with any container.
[Example: After the declaration of an
uninitialized pointer x (as with int*
x;), x must always be assumed to have
a singular value of a pointer.]
Results of most expressions are
undefined for singular values; the
only exception is an assignment of a
non-singular value to an iterator that
holds a singular value. In this case
the singular value is overwritten the
same way as any other value.
Dereferenceable values are always
nonsingular.
If MSVS2010 crashes this, it is one of infinite possibilities of UB, for UB means anything could happen; the Standard doesn't prescribe any behavior.

C++11, 24.2.1/3:
Results of most expressions are undefined for singular values; the
only exceptions are destroying an iterator that holds a singular
value, the assignment of a non-singular value to an iterator that
holds a singular value, and, for iterators that satisfy the
DefaultConstructible requirements, using a value-initialized iterator
as the source of a copy or move operation.
The list is limitative, and your example isn't listed in the allowed exceptions. jt is singular and default-initialized. Therefore it may not be used as the source of a copy operation.

You need a KNOWN value to use a signal. You don't have that unless you have a container to get .end() from, which you think is your problem.
What you really need to do is get away from thinking that you can use 'special' iterator values for oddball cases that don't involve a container. Iterators, while they work a lot like pointers, are NOT pointers. They don't have the equivalent of 'NULL'.
Instead, use a boolean flag value to see if the container is set or not, and make sure the iterators (all of them, if you have more than one) get set to some valid value when the container becomes known, and the flag gets set back to false when you lose the container. Then you can check the flag before any iterator operations.

list.end() points anywhere beyond the container, so we can consider it like pointing nowhere.
Also accessing unitialized variable causes undefined behavior.

What is singular and non-singular values in the context of STL iterators?

The section §24.1/5 from the C++ Standard (2003) reads,
Just as a regular pointer to an array
guarantees that there is a pointer
value pointing past the last element
of the array, so for any iterator type
there is an iterator value that points
past the last element of a
corresponding container. These values
are called past-the-end values. Values
of an iterator i for which the
expression *i is defined are called
dereferenceable. The library never
assumes that past-the-end values are
dereferenceable. Iterators can also
have singular values that are not
associated with any container.
[Example: After the declaration of an
uninitialized pointer x (as with int*
x;), x must always be assumed to have
a singular value of a pointer.]
Results of most expressions are
undefined for singular values; the
only exception is an assignment of a
non-singular value to an iterator that
holds a singular value. In this case
the singular value is overwritten the
same way as any other value.
Dereferenceable values are always
nonsingular.
I couldn't really understand the text shown in bold?
What is singular value and nonsingular value? How are they defined? And where?
How and why dereferenceable values are always nonsingular?

If I understand this correctly, a singular value for an iterator is essentially the equivalent of an unassigned pointer. It's an iterator that hasn't been initialized to point anywhere and thus has no well-defined element it's iterating over. Declaring a new iterator that isn't set up to point to an element of a range, for example, creates that iterator as a singular iterator.
As the portion of the spec alludes to, singular iterators are unsafe and none of the standard iterator operations, such as increment, assignment, etc. can be used on them. All you can do is assign them a new value, hopefully pointing them at valid data.
I think the reason for having this definition is so that statements like
set<int>::iterator itr;
Can be permitted by the spec while having standardized meaning. The term "singular" here probably refers to the mathematical definition of a singularity, which is also called a "discontinuity" in less formal settings.

Iterators can also have singular values that are not associated with any container.
I suppose that's its definition.
How and why dereferenceable values are always nonsingular?
Because if they wouldn't, dereferencing them would be undefined behavior.

Have a look at What is an iterator's default value?.
As the quote indicates, singular values are iterator values that are not associated with any container. A singular value is almost useless: you can't advance it, dereference it, etc. One way (the only way?) of getting a singular iterator is by not initializing it, as shown in templatetypedef's answer.
One of the useful things you can do with a singular iterator, is assign it a non-singular value. When you do that you can do whatever else you want with it.
The non-singular values are, almost by definition, iterator values that are associated with a container. This answers why dereferenceable values are always non-singular: iterators that do not point to any container cannot be dereferenced (what element would this return?).
As Matthieu M. correctly noted, non-singular values may still be non-dereferenceable. An example is the past-the-end iterator (obtainable by calling container.end()): it is associated with a container, but still cannot be referenced.
I can't say where these terms are defined. However, Google has this to say about "define: singular" (among other definitions):
remarkable: unusual or striking
I suppose this can explain the terminology.

What is singular value and nonsingular value? How are they defined? And where?
Let us use the simplest incarnation of an Iterator: the pointer.
For a pointer:
the singular value alluded to is the NULL value an uninitialized value.
a non-singular value is an explicitly initialized value, it may not be dereferencable still (the past-the-end pointer shall not be dereferenced)
I would say that the NULL pointer is a singular value, though not the only one, since it represents the absence of value.
What is the equivalence for regular iterators ?
std::vector<int>::iterator it;, the default constructor of most iterators (those linked to a container) create a singular value. Since it's not tied to a container, any form of navigation (increment, decrement, ...) is meaningless.
How and why dereferenceable values are always nonsingular ?
Singular values, by definition, represent the absence of a real value. They appear in many languages: Python's None, C#'s null, C's NULL, C++'s std::nullptr. The catch is that in C or C++, they may also be simple garbage... (whatever was there in memory before)
Is a default constructed iterator a singular value ?
Not necessarily, I guess. It is not required by the standard, and one could imagine the use of a sentinel object.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Result of invalid iterator passed to std::unordered_set::erase() - c++

Related

What does end() refere to in the following example? [duplicate]

Why does the iterator to set::end in C++ dereferences to the number of elements in the set?

When I should use std::map::at to retrieve map element

Access violation when assigning an uninitialized iterator

What is singular and non-singular values in the context of STL iterators?

Categories

Resources