When std::set<>::iterator is uninitialized, it is not equal to any other iterator in the set, but it is equal to other uninitialized iterators.
Is this GCC-specific implementation? (Is the uninitialized iterator actually initialized to an invalid value?)
#include <stdio.h>
#include <iostream>
#include <set>
#include <vector>
int main()
{
std::set<int> s;
std::set<int>::reverse_iterator inv = s.rend();
std::cout << (inv == s.rend()) << "\n";
std::cout << (inv == s.rbegin()) << "\n";
s.insert(5);
std::cout << (inv == s.rend()) << "\n";
std::cout << (inv == s.rbegin()) << "\n";
// invalidate
inv = std::set<int>::reverse_iterator();
std::cout << (inv == s.rend()) << "\n";
std::cout << (inv == s.rbegin()) << "\n";
auto inv2 = std::set<int>::reverse_iterator();
std::cout << (inv == inv2) << "!!!\n";
return 0;
}
prints:
1
1
0
1
0
0
1!!!
Live example: https://onlinegdb.com/r1--46u_B
How come 2 uninitialized std::set::iterator are equal?
They are not uninitialised. They are value initialised. A value initialised iterator is singular: it does not point to any container.
Behaviour of reading an uninitialised value would be undefined, but that's not what you do in the program.
it is not equal to any other iterator in the set
Comparison between input iterators is only defined for iterators to the same range. A singular iterator does not point to same range as any non singular iterator, so the comparison is undefined.
but it is equal to other uninitialized iterators.
Two singular iterators always compare equal.
Is this a GCC-specific, non-portable implementation?
Comparing singular iterator to non singular is undefined. Undefined behaviour is generally "non-portable", even within the same compiler version (unless compiler specifies the behaviour in which case it is non-portable to other compilers).
Singular iterators in general are standard since C++14, for all forward iterators.
the question is: Is this GCC-specific? I am NOT asking if this is UB (it is!)
UB is UB.
By definition, the result you see is potentially unique to that particular run of code on that particular day.
Even if no other implementation ever manifested this behaviour, that doesn't matter, because you can't even rely on it on yours.
Worse, by relying on it, you are breaking a contract, and other parts of your program can break as a result.
So:
Is this a GCC-specific, non-portable implementation?
Yes.
Related
I recently started learning about C++ iterators and pointers, and while messing around with some basic exercises I came upon a situation which I think is really unusual.
#include <iostream>
#include <vector>
#include <time.h>
#include <list>
#include <array>
using namespace std;
template<typename ITER>
void print_with_hyphens(ITER begin, ITER end){
cout << "End: " << *(end) << endl;
cout << "Begin: " << *begin << endl;
for(ITER it = begin; it != end; it++){
cout << *it << endl;
}
cout << endl << "Finished" << endl;
}
int main()
{
vector<int> v { 1, 2, 3, 4, 5};
list<int> l { 1, 2, 3, 4, 5};
print_with_hyphens(v.begin(), v.end());
print_with_hyphens(l.begin(), l.end());
// print_with_hyphens(a.begin(), a.end());
return 0;
}
And when I run it like this, I get this unusual result:
Results of the code
Now, the vector is returning a weird (random, if I'm not mistaken) value, because it's trying to access a value that doesn't exist, hence, "past the end" iterator.
And it should be the same for lists, yet, the list is returning the value 5. Shouldn't it also return a "past the end" iterator?
Things such as dereferencing an invalid iterator or accessing an out-of-bounds array index produce undefined behavior.
This means the C++ standard does not specify what should happen if you do it. Anything might happen, such as a segmentation fault, or getting a random value, depending on things like your standard library implementation and compiler.
Needless to say, programs should not rely on undefined behavior.
The phrase "past-the-end" in this context is abstract. It means the iterator is off the end of the logical sequence of elements in the container. It does not mean there is some literal bit of data located just after the container in memory that you can access and read.
Because it's "past-the-end" and doesn't refer to any actual element, dereferencing the end iterator is not permitted. By doing so, you get weird behaviours.
cppreference has this note for std::vector::data:
Returns pointer to the underlying array serving as element storage. The pointer is such that range [data(); data() + size()) is always a valid range, even if the container is empty.
What does "valid range" mean here exactly? What will data() return if the vector is zero-length?
Specifically, for a zero-length vector:
Can data() ever be a null pointer?
Can it be safely dereferenced? (Even if it points to junk.)
Is it guaranteed to be different between two different (zero-length) vectors?
I am working with a C library that takes arrays and won't allow a null pointer even for a zero-length array. However, it does not actually dereference the array storage pointer if the array length is zero, it just checks whether it is NULL. I want to make sure that I can safely pass data() to this C library, so the only relevant question is (1) above. (2) and (3) are just out of curiosity in case a similar situation comes up.
Update
Based on comments that were not turned into answers, we can try the following program:
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<int> v;
cout << v.data() << endl;
v.push_back(1);
cout << v.data() << endl;
v.pop_back();
cout << v.data() << endl;
v.shrink_to_fit();
cout << v.data() << endl;
return 0;
}
With my compiler it output:
0x0
0x7f896b403300
0x7f896b403300
0x0
This shows that:
data() can indeed be a null pointer, thus the answers are (1) yes (2) no (3) no
but it is not always a null pointer for a zero-size vector
Yes, obviously I should have tried this before asking.
"valid range" is defined by [iterator.requirements.general]/7 (C++14):
"Range [i,j) is valid if and only if j is reachable from i".
Luckily C++ defines that adding 0 to a null pointer yields a null pointer. So, is a null pointer reachable from a null pointer ? This is defined by point 6 of the same section:
An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j.
A zero-length sequence is a finite sequence, therefore data() may return a null pointer.
Accordingly the answers to your questions are:
Can data() ever be a null pointer?
Yes
Can it be safely dereferenced? (Even if it points to junk.)
No
Is it guaranteed to be different between two different (zero-length) vectors?
No
Too long for a comment so posting here.
I expected the iterators to be nullptr for an empty sequence so I tested it.
#include <iostream>
#include <vector>
void pr(std::vector<int>& v){
std::cout << &*v.begin() << ", " << &*v.end() << "\n";
}
// technically UB, but for this experiment I don't feel too bad about it.
// Thanks #Revolver
int main(int argc, char** argv) {
std::vector<int> v1;
std::vector<int> v2;
pr(v1);
pr(v2);
return 0;
}
And this does indeed print
0, 0
0, 0
Now for an empty container the only reasonable operation for a valid range is begin() == end(). And no, junk can't be dereferenced so *v.begin() is not a concern.
From the standard:
23.3.6.4 [vector.data]
T* data() noexcept;
const T* data() const noexcept;
Returns: A pointer such that [data(),data() + size()) is a valid range. For a
non-empty vector, data() == &front().
So it's allowed to be null for an empty vector but not necessarily dereferencable nor unique.
I have a vector storing {1,2,3,4,5}. I tried to print *(vec.end()) and got back the result 6. I don't know how to explain this. Similarly, calling vec.find(500) gave the result 6. Why am I getting this number?
#include<iostream>
#include<iterator>
#include<set>
#include<map>
int main()
{
int a[] = {1,2,3,4,5};
std::set<int> set1(a,a+sizeof(a)/sizeof(int));
for (std::set<int>::iterator itr=set1.begin();itr!=set1.end();++itr){
std::cout << *itr << std::endl;
}
//std::pair<std::set<int>::iterator, bool> ret;
//ret = set1.insert(1);
//std::cout << *(ret.first) << "first;second" << ret.second << std::endl;
std::set<int>::iterator itr1 = set1.begin();
set1.insert(itr1,100);
std::advance(itr1,3);
std::cout << *itr1 << std::endl;
std::cout << *(set1.find(500)) << std::endl;
std::cout << *(set1.end()) << std::endl;
}
This line invokes undefined behavior:
std::cout << *(set1.end()) << std::endl;
It is undefined behavior to dereference the end() iterator. Thus anything can be expected.
In C++ containers, the end iterator gives an iterator one past the end of the elements of the container. It's not safe to dereference the iterator because it's not actually looking at an element. You get undefined behavior if you try to do this - it might print something sensible, but it might just immediately crash the program.
Hope this helps!
Never try to use end() of any stl container because it does not point to a valid data. It always point to a chunk of memory that is located after the actual data. Use end() only to check whether your iterator has come to end or not. This image clearly explains where end() is located in default (non-reversed) range:
vec.end() does not point to the last element, but somewhat "behind" the last one.
You are not accessing the last element in the vector. Instead you are dereferencing an "invalid" iterator, which is undefined behaviour and turns out to be an invalid index in the vector in this case.
vec.find returns the end iterator if the searched element can not be found.
I want to construct nested loops over arrays of objects, having a rather complex data structure. Because I use arrays, I want to make use of their iterators. After I got unexpected results I boiled down the problem to the following code snippet, that shows my iterators to be equal when I expect them to be different:
vector<int> intVecA;
vector<int> intVecB;
intVecA.push_back(1);
intVecA.push_back(2);
intVecB.push_back(5);
intVecB.push_back(4);
Foo fooOne(intVecA);
Foo fooTwo(intVecB);
vector<int>::const_iterator itA = fooOne.getMyIntVec().begin();
vector<int>::const_iterator itB = fooTwo.getMyIntVec().begin();
cout << "The beginnings of the vectors are different: "
<< (fooOne.getMyIntVec().begin() == fooTwo.getMyIntVec().begin()) << endl;
cout << (*(fooOne.getMyIntVec().begin()) == *(fooTwo.getMyIntVec().begin())) << endl;
cout << (&(*(fooOne.getMyIntVec().begin())) == &(*(fooTwo.getMyIntVec().begin()))) << endl;
cout << "But the iterators are equal: "
<< (itA==itB) << endl;
This produces:
The beginnings of the vectors are different: 0
0
0
But the iterators are equal: 1
This behaviour does not make sense to me and I'd be happy about hearing an explanation.
Foo is a simple object containing a vector and getter function for it:
class Foo {
public:
Foo(std::vector<int> myIntVec);
std::vector<int> getMyIntVec() const {
return _myIntVec;
}
private:
std::vector<int> _myIntVec;
};
Foo::Foo(std::vector<int> myIntVec) {
_myIntVec = myIntVec;
}
When first copying the vectors the problem vanishes. Why?
vector<int> intVecReceiveA = fooOne.getMyIntVec();
vector<int> intVecReceiveB = fooTwo.getMyIntVec();
vector<int>::const_iterator newItA = intVecReceiveA.begin();
vector<int>::const_iterator newItB = intVecReceiveB.begin();
cout << "The beginnings of the vectors are different: "
<< (intVecReceiveA.begin() == intVecReceiveB.begin()) << endl;
cout << "And now also the iterators are different: "
<< (newItA==newItB) << endl;
produces:
The beginnings of the vectors are different: 0
And now also the iterators are different: 0
Further notes:
I need these nested loops in functions which need to be extremely efficient regarding computation time, thus I would not want to do unnecessary operations. Since I'm new to c++ I do not know whether copying the vectors would actually take additional time or whether they would be copied internally anyway. I'm also thankful for any other advice.
The problem is that your accessor in Foo:
std::vector<int> getMyIntVec() const {
return _myIntVec;
}
I doesn't return _myIntVec, it returns a copy of myIntVec.
Instead it should look like:
const std::vector<int>& getMyIntVec() const {
return _myIntVec;
}
Otherwise when you create iterators they are created from copies that are directly thrown away so your C++ compiler reuses the address. That is why you get "equal" iterators, at least I think so.
You realize that you compare things the wrong way round? If you compare a == b, even if you write
cout << "a is different from b: " << (a==b) << endl;
The output will tell if the two elements are the same not different. To check if two things are different use != instead of ==.
The reason for this is that it is undefined behaviour to compare two iterators which refer to elements in different containers. So, there is no guarantee what you will get. This comes from the fact that getMyIntVec returns a copy of _MyIntVec and you assign these copies to new instances of vector<int>, so these are indeed iterators of two different copies of the _MyIntVec member.
According to the standard:
§ 24.2.1
An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of
the expression ++i that makes i == j. If j is reachable from i, they refer to elements of the same sequence.
and a bit later in the standard:
§ 24.2.5
The domain of == for forward iterators is that of iterators over the same underlying sequence.
This has already been answered in this question
You have a serious logic problem here:
cout << "The beginnings of the vectors are different: "
<< (fooOne.getMyIntVec().begin() == fooTwo.getMyIntVec().begin()) << endl;
If they are equal, it will output 1 instead of 0 which you normally expect.
I found something surprising with std::vector that I thought I'd ask about here to hopefully get some interesting answers.
The code below simply copies a string into a char vector and prints the contents of the vector in two ways.
#include <vector>
#include <string>
#include <iostream>
int main()
{
std::string s("some string");
std::vector<char> v;
v.reserve(s.size()+1);
// copy using index operator
for (std::size_t i=0; i<=s.size(); ++i)
v[i] = s[i];
std::cout << "&v[0]: " << &v[0] << "\n";
std::cout << "begin/end: " << std::string(v.begin(), v.end()) << "\n";
// copy using push_back
for (std::size_t i=0; i<=s.size(); ++i)
v.push_back(s[i]);
std::cout << "&v[0]: " << &v[0] << "\n";
std::cout << "begin/end: " << std::string(v.begin(), v.end()) << "\n";
return 0;
}
Building and running this yields:
$ g++ main.cpp -o v && ./v
&v[0]: some string
begin/end:
&v[0]: some string
begin/end: some string
My expectation was that it would print the string correctly in both cases, but assigning character by character using the index operator doesn't print anything when later using begin() and end() iterators.
Why isn't end() updated when when using []? If this is intentional, what's the reason it's working like this?
Is there a reasonable explanation for this behaviour? :)
I've only tried this with gcc 4.6.1 so far.
Typical example of Undefined Behavior.
You are only ever allowed to access elements by index (using operator[]) between 0 and v.size()-1 (included).
Using reserve does not modify the size, only the capacity. Would you have used resize instead, it would work as expected.
In the first case, you have undefined behaviour. reserve sets the capacity, but leaves the size as zero. Your loop then writes to invalid locations beyond the end of the vector. Printing using the (invalid) pointer appears to work (although there is no guarantee of that), since you've written the string to the memory that it points at; printing using the iterator range prints nothing, because the vector is still empty.
The second loop correctly increases the size each time, so that the vector actually contains the expected contents.
Why isn't end() updated when when using []? If this is intentional, what's the reason it's working like this?
[] is intended to be as fast as possible, so it does no range checking. If you want a range check, use at(), which will throw an exception on an out-of-range access. If you want to resize the array, you have to do it yourself.