What's this unexpected std::vector behavior? - c++

I found something surprising with std::vector that I thought I'd ask about here to hopefully get some interesting answers.
The code below simply copies a string into a char vector and prints the contents of the vector in two ways.
#include <vector>
#include <string>
#include <iostream>
int main()
{
std::string s("some string");
std::vector<char> v;
v.reserve(s.size()+1);
// copy using index operator
for (std::size_t i=0; i<=s.size(); ++i)
v[i] = s[i];
std::cout << "&v[0]: " << &v[0] << "\n";
std::cout << "begin/end: " << std::string(v.begin(), v.end()) << "\n";
// copy using push_back
for (std::size_t i=0; i<=s.size(); ++i)
v.push_back(s[i]);
std::cout << "&v[0]: " << &v[0] << "\n";
std::cout << "begin/end: " << std::string(v.begin(), v.end()) << "\n";
return 0;
}
Building and running this yields:
$ g++ main.cpp -o v && ./v
&v[0]: some string
begin/end:
&v[0]: some string
begin/end: some string
My expectation was that it would print the string correctly in both cases, but assigning character by character using the index operator doesn't print anything when later using begin() and end() iterators.
Why isn't end() updated when when using []? If this is intentional, what's the reason it's working like this?
Is there a reasonable explanation for this behaviour? :)
I've only tried this with gcc 4.6.1 so far.

Typical example of Undefined Behavior.
You are only ever allowed to access elements by index (using operator[]) between 0 and v.size()-1 (included).
Using reserve does not modify the size, only the capacity. Would you have used resize instead, it would work as expected.

In the first case, you have undefined behaviour. reserve sets the capacity, but leaves the size as zero. Your loop then writes to invalid locations beyond the end of the vector. Printing using the (invalid) pointer appears to work (although there is no guarantee of that), since you've written the string to the memory that it points at; printing using the iterator range prints nothing, because the vector is still empty.
The second loop correctly increases the size each time, so that the vector actually contains the expected contents.
Why isn't end() updated when when using []? If this is intentional, what's the reason it's working like this?
[] is intended to be as fast as possible, so it does no range checking. If you want a range check, use at(), which will throw an exception on an out-of-range access. If you want to resize the array, you have to do it yourself.

Related

Switching Vectors Supplied Iterators

I am designing my own generic tree container and am using the STL as a reference. However, when implementing my iterator class I noticed something about the STL's use of iterators.
As an example, the std::vector class relies on iterators as arguments for many of its methods. (ie. erase(const_iterator position))
This got me wondering: what happens if, given two vectors of the same template type, and the first vectors iterator is supplied to the second vector in a method call, what happens? To help answer this question I have put together a simple program to illustrate my thoughts.
// Example program
#include <iostream>
#include <string>
#include <vector>
#include <iomanip>
void printVec(const std::string &label, const std::vector<int> &vec){
for (unsigned int i=0; i<vec.size(); i++){
std::cout << ::std::setw(3) << vec[i] << ", ";
}
std::cout << std::endl;
}
int main()
{
std::vector<int> test={0,1,2,3,4,5,6,7,8,9};
std::vector<int> test2{10,11,12,13,14,15,16,17,18,19};
std::vector<int>::iterator iter=test.begin();
std::vector<int>::iterator iter2=test2.begin();
printVec("One",test);
printVec("Two",test2);
for (int i=0; i<5; i++, iter++, iter2++);
std::cout << "One Pos: " << *iter << std::endl;
std::cout << "Two Pos: " << *iter2 << std::endl;
test.erase(iter2); //Switching the iterators and there respective vectors
test2.erase(iter); //Switching the iterators and there respective vectors
printVec("One",test);
printVec("Two",test2);
}
Running this program results in a seg. fault, which seems to indicate that this is undefined behavior. I hesitate to call this a flaw in the STL vector interface, but it sure seems that way.
So my question is this: is there any way to avoid this when designing my own container?
The iterator passed to a member function of a container must refer to an element within that container (or, in some cases, the past-the-end element returned by end()). If the iterator does not refer to the container you have Undefined Behavior.
There is no simple way to avoid that. About the closest you can come is to validate the iterators, which means you'd have to keep track of the container each iterator belongs to. This gets a bit complicated with some operations like swap or insert that don't invalidate existing iterators but leave them referring to the new container.
Some compilers, like Visual C++ when compiling in debug mode, can detect these sorts of problems at runtime and issue an appropriate notification.

How does vector work while pushing back a reference of its element to itself?

I first declare a vector of string called test. Then I push_back string Hello and World and let a be a reference of test[0].And then I push_back a into test. However , I printed a before and after push_back respectively and observed that a became nothing after pushed into test. Why a becomes nothing? How does the vector work while pushing back a reference (a) of its element to itself ? Does that mean a is no longer a reference of test[0] ?
Thanks .
Remarks: If I push_back test[0] , a also becomes nothing. But test[0] is still "Hello".
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main()
{
vector<string> test;
test.push_back("Hello");
test.push_back("World");
string& a = test[0];
cout << "1"<< a << "\n";
test.push_back(a); //or : test.push_back(test[0]);
cout << "2"<< a << "\n";
}
Live Demo
output:
1Hello
2
Update:
I got it ,thanks to answers and comments below. I printed the size and capacity of test and observed that they are both 2 . When test.push_back(a)is performed , the vector test allocates new memory and copy its old elements to the new memory . Thus a , the reference of it old elements , become undefined.
Here is the similar code using reserve . I think the reason why a becomes undefined is same as my original question. (Let me know if I'm wrong.)
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main()
{
vector<string> test;
test.push_back("Hello");
test.push_back("World");
string& a = test[0];
cout << "1"<< a << "\n";
cout << "size:" << test.size() << " capacity:" <<test.capacity() <<"\n";
test.reserve(3);
cout << "2"<< a << "\n";
}
output:
1Hello
size:2 capacity:2
2
test.push_back(a); //or : test.push_back(test[0]);
Adds a copy of a as the last element of test. The fact that a is a reference to an element in test is not relevant at all.
After that call, a could be dangling reference. Using it as you have in the line
cout << "2"<< a << "\n";
is cause for undefined behavior.
test[0], on the other hand, returns a reference to the first element of test. It could be a reference to a different object than what a references.
The string& a became undefined due to vector::push_back invalidating existing references to the container, per the std::vector definition. See http://en.cppreference.com/w/cpp/container/vector/push_back.
If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
Since you don't know what the capacity of the vector before pushing a was, you can't be sure that your reference wasn't invalidated; this is undefined behavior that may "work" in some cases, but can do anything in others.

how the iterator in c++ could be printed?

Suppose, I have declared a vector in C++ like this:
vector<int>numbers = {4,5,3,2,5,42};
I can iterate it through the following code:
for (vector<int>::iterator it = numbers.begin(); it!=numbers.end(); it++){
// code goes here
}
Now, I would talk about coding in the block of for loop.
I can access and change any value using this iterator. say, I want to increase every value by 10 and the print. So, the code would be:
*it+=10;
cout << *it << endl;
I can print the address of both iterator and elements that are being iterated.
Address of iterator can be printed by:
cout << &it << endl;
Address of iterated elements can be printed by:
cout << &(*it) << endl;
But why the iterator itself could not printed by doing the following?
cout << it <<endl;
At first I thought the convention came from JAVA considering the security purpose. But if it is, then why I could print it's address?
However, Is there any other way to do this? If not, why?
Yes, there is a way to do it!
You can't print the iterator because it is not defined to have a value.
But you can perform arithematic operations on them and that helps you to print the value (of the iterator).
Do the following.
cout << it - v.begin();
Example:
#include <iostream>
#include <algorithm>
#include <vector>
#include <iterator>
using namespace std;
int main () {
vector<int> v = {20,3,98,34,20,11,101,201};
sort (v.begin(), v.end());
vector<int>::iterator low,up;
low = lower_bound (v.begin(), v.end(), 20);
up = upper_bound (v.begin(), v.end(), 20);
std::cout << "lower_bound at position " << (low - v.begin()) << std::endl;
std::cout << "upper_bound at position " << (up - v.begin()) << std::endl;
return 0;
}
Output of the above code:
lower_bound at position 2
upper_bound at position 4
Note: this is just a way to get things done and no way I have claimed that we can print the iterator.
...
There is no predefined output operator for the standard iterators because there is no conventional meaning of printing an iterator. What would you expect such an operation to print? While you seem to expect to see the address of the object the iterator refers to, I find that not clear at all.
There is no universal answer to that, so the committee decided not to add a those operators. (The last half sentence is a guess, I am not part of the committee.)
If you want to print those iterators, I would define a function like print(Iterator); (or something like this, whatever fits your needs) that does what you want. I would not add an operator << for iterators for the reason I mentioned above.
why the iterator itself could not printed by doing the following?
Because, it is not defined to a value internally.
Is there any other way to do this?
Basically, the compiler does not facilitate it by default, you may try to edit the compiler code! But it is too terrific you know!
If not, why?
Because it has no well-defined way to express it.
You can't print the iterator because it is not defined to have a value. But you can perform arithematic operations on them and that helps you to print the value (of the iterator).

Odd values printed when dereferencing the end iterator of a vector

I have a vector storing {1,2,3,4,5}. I tried to print *(vec.end()) and got back the result 6. I don't know how to explain this. Similarly, calling vec.find(500) gave the result 6. Why am I getting this number?
#include<iostream>
#include<iterator>
#include<set>
#include<map>
int main()
{
int a[] = {1,2,3,4,5};
std::set<int> set1(a,a+sizeof(a)/sizeof(int));
for (std::set<int>::iterator itr=set1.begin();itr!=set1.end();++itr){
std::cout << *itr << std::endl;
}
//std::pair<std::set<int>::iterator, bool> ret;
//ret = set1.insert(1);
//std::cout << *(ret.first) << "first;second" << ret.second << std::endl;
std::set<int>::iterator itr1 = set1.begin();
set1.insert(itr1,100);
std::advance(itr1,3);
std::cout << *itr1 << std::endl;
std::cout << *(set1.find(500)) << std::endl;
std::cout << *(set1.end()) << std::endl;
}
This line invokes undefined behavior:
std::cout << *(set1.end()) << std::endl;
It is undefined behavior to dereference the end() iterator. Thus anything can be expected.
In C++ containers, the end iterator gives an iterator one past the end of the elements of the container. It's not safe to dereference the iterator because it's not actually looking at an element. You get undefined behavior if you try to do this - it might print something sensible, but it might just immediately crash the program.
Hope this helps!
Never try to use end() of any stl container because it does not point to a valid data. It always point to a chunk of memory that is located after the actual data. Use end() only to check whether your iterator has come to end or not. This image clearly explains where end() is located in default (non-reversed) range:
vec.end() does not point to the last element, but somewhat "behind" the last one.
You are not accessing the last element in the vector. Instead you are dereferencing an "invalid" iterator, which is undefined behaviour and turns out to be an invalid index in the vector in this case.
vec.find returns the end iterator if the searched element can not be found.

Unexpected behavior using iterators with nested vectors

This sample program gets an iterator to an element of a vector contained in another vector. I add another element to the containing vector and then print out the value of the previously obtained iterator:
#include <vector>
#include <iostream>
int main(int argc, char const *argv[])
{
std::vector<std::vector<int> > foo(3, std::vector<int>(3, 1));
std::vector<int>::iterator foo_it = foo[0].begin();
std::cout << "*foo_it: " << *foo_it << std::endl;
foo.push_back(std::vector<int>(3, 2));
std::cout << "*foo_it: " << *foo_it << std::endl;
return 0;
}
Since the vector correspinding to foo_it has not been modified I expect the iterator to still be valid. However when I run this code I get the following output (also on ideone):
*foo_it: 1
*foo_it: 0
For reference I get this result using g++ versions 4.2 and 4.6 as well as clang 3.1. However I get the expected output with g++ using -std=c++0x (ideone link) and also with clang when using both -std=c++0x and -stdlib=libc++.
Have I somehow invoked some undefined behavior here? If so is this now defined behavior C++11? Or is this simply a compiler/standard library bug?
Edit I can see now that in C++03 the iterators are invalidated since the vector's elements are copied on reallocation. However I would still like to know if this would be valid in C++11 (i.e. are the vector's elements guaranteed to be moved instead of copied, and will moving a vector not invalidate it's iterators).
push_back invalidates iterators, simple as that.
std::vector<int>::iterator foo_it = foo[0].begin();
foo.push_back(std::vector<int>(3, 2));
After this, foo_ti is no longer valid. Any insert/push_back has the potential to internally re-allocate the vector.
Since the vector correspinding to foo_it has not been modified
Wrong. The push_back destroyed the vector corresponding to foo_it. foo_it became invalid when foo[0] was destroyed.
I guess the misperception is that vector< vector < int > > is a vector of pointers and when the outer one is reallocated, the pointers to the inner ones are still valid which is true for **int. But instead, reallocating the vector also reallocates all inner vectors, which makes the inner iterator invalid as well.