How do iterators update after vector reallocation - c++

Here is a code snippet that I was looking at:
vector<int> iv = {1,2,3,4,5,6,7,8,9,10};
auto iter = iv.begin(), mid = iv.begin() + iv.size()/2;
for(int count = 100; count; --count ) {
iter = iv.insert(iter, - 1);
cout << "capacity = " << iv.capacity() << "*mid = " << *mid << endl;
}
As per iterator invalidation rules:
vector: all iterators and references before the point of insertion are unaffected, unless the new container size is greater than the previous capacity (in which case all iterators and references are invalidated)[23.2.4.3/1] Iterator invalidation rules
I understand that since I am reassigning the value of "iter" at each insert operation, perhaps I am able to maintain it's validity (please correct me if I am wrong). However, the iterator "mid" remain valid in this case even when I am not tampering with it in the loop and also when the capacity of the vector is changing.
So, how is "mid" able to update itself after reallocation ?
To know whether mid is changing at all or not, I changed line 4 in the code to:
iv.insert(iter, -1); // Did not assign it back to iter.
Printing the results of dereferencing the value at mid suggests the change and perhaps also that iter is invalidated. (Again, please correct me if I am wrong).

Your understanding is correct. Once the capacity increases, any iterator becomes invalid. The mid iterator becomes invalid even when capacity didn't changes but it basically points to the previous element.
So the original code would work at least for iter, however mid would become unusable upon first insertion. With the modification the code is completely invalid.
Usually the implementation of vector iterator is just a simple pointer that points to some element to backing array. Therefore when capacity changes and array is reallocated, any such iterator is no longer valid as the pointer points to no longer valid memory. As a result, you may see either garbage, you may get segmentation fault or randomly see correct values. When capacity doesn't change, the elements in the array may be moved forward so you could see the previous element in the iterators after the insert point, but only in case there was no empty elements at the beginning of array (e.g. start was greater then zero). But all those are specific to implementation and therefore standard clearly specifies that most of the above is undefined behavior.

Related

C++ Keeping track of start iterator while adding items

I am trying to do a double loop across a std::vector to explore all combinations of items in the vector. If the result is good, I add it to the vector for another pass. This is being used for an association rule problem but I made a smaller demonstration for this question. It seems as though when I push_back it will sometimes change the vector such that the original iterator no longer works. For example:
std::vector<int> nums{1,2,3,4,5};
auto nextStart = nums.begin();
while (nextStart != nums.end()){
auto currentStart = nextStart;
auto currentEnd = nums.end();
nextStart = currentEnd;
for (auto a = currentStart; a!= currentEnd-1; a++){
for (auto b = currentStart+1; b != currentEnd; b++){
auto sum = (*a) + (*b);
if (sum < 10) nums.push_back(sum);
}
}
}
On some iterations, currentStart points to a location that is outside the array and provides garbage data. What is causing this and what is the best way to avoid this situation? I know that modifying something you iterate over is an invitation for trouble...
nums.push_back(sum);
push_back invalidates all existing iterators to the vector if push_back ends up reallocating the vector.
That's just how the vector works. Initially some additional space gets reserved for the vector's growth. Vector's internal buffer that holds its contents has some extra room to spare, but when it is full the next call to push_back allocates a bigger buffer to the vector's contents, moves the contents of the existing buffer, then deletes the existing buffer.
The shown code creates and uses iterators for the vector, but any call to push_back will invalidate the whole lot, and the next invalidated vector dereference results in undefined behavior.
You have two basic options:
Replace the vector with some other container that does not invalidate its existing iterators, when additional values get added to the iterator
Reimplement the entire logic using vector indexes instead of iterators.

does shrink_to_fit() function removes null pointers?

I wanted to ask you about the vector::shrink_to_fit() function.
Lets say i've got a vector of pointers to objects (or unique_ptr in my case)
and i want to resize it to the amount of objects that it stores.
At some point i remove some of the objects from the vector by choice using the release() function of unique_ptr
so there is a null pointer in that specific place in the vector as far as i know.
So i want to resize it and remove that null pointer in between the elements of the vector and i'm asking if i could do that with shrink_to_fit() function?
No, shrink_to_fit does not change the contents or size of the vector. All it might do is release some of its internal memory back to a lower level library or the OS, etc. behind the scenes. It may invalidate iterators, pointers, and references, but the only other change you might see would be a reduction of capacity(). It's also valid for shrink_to_fit to do absolutely nothing at all.
It sounds like you want the "Erase-remove" idiom:
vec.erase(std::remove(vec.begin(), vec.end(), nullptr), vec.end());
The std::remove shifts all the elements which don't compare equal to nullptr left, filling the "gaps". But it doesn't change the vector's size; instead it returns an iterator to the position in the vector just after the sequence of shifted elements; the rest of the elements still exist but have been moved from. Then the erase member function gets rid of those unnecessary end elements, reducing the vector's size.
Or as #chris notes, C++20 adds an erase overload to std::vector and a related erase_if, which makes things easier. They may already be supported in MSVC 2019. Using the new erase could just look like:
vec.erase(nullptr);
This quick test show that u can't do like this.
int x = 1;
vector<int*> a;
cout << a.capacity() << endl;
for (int i = 0; i < 10; ++i) {
a.push_back(&x);
}
cout << a.capacity() << endl;
a[9] = nullptr;
a.shrink_to_fit();
cout << a.capacity() << endl;
Result:
0
16
10
m_gates[index].release(); m_gates.shrink_to_fit();
Based on your comment, what you're looking for is simply to erase this single element from your vector right then and there. Replace both of these statements with:
m_gates.erase(m_gates.begin() + index);
Or a more generic version if swapping containers in the future is a possibility:
using std::begin;
m_gates.erase(std::next(begin(m_gates), index));
erase supports iterators rather than indices, so there's a conversion in there. This will remove the pointer from the vector while calling its destructor, which causes unique_ptr to properly clean up its memory.
Now erasing elements one by one could potentially be a performance concern. If it does end up being a concern, you can do what you were getting at in the question and null them out, then remove them all in one go later on:
m_gates[index].reset();
// At some point in the program's future:
std::erase(m_gates, nullptr);
What you have right now is highly likely to be a memory leak. release releases ownership of the managed memory, meaning you're now responsible for cleaning it up, which isn't what you were looking for. Both erase and reset (or equivalently, = {} or = nullptr) will actually call the destructor of unique_ptr while it still has ownership and properly clean up the memory. shrink_to_fit is for vector capacity, not size, and is unrelated.
at the end the solution that i found was simple:
void Controller::delete_allocated_memory(int index)
{
m_vec.erase(m_vec.begin() + index);
m_vec.shrink_to_fit();
}
it works fine even if the vector is made of unique_ptrs, as far as i know it doesn't even create the null pointer that i was talking about and it shifts left all existing objects in the vector.
what do you think?

exceeding vector does not cause seg fault

I am very puzzled at the result of this bit of code:
std::vector<int> v;
std::cout << (v.end() - v.begin()) << std::endl;
v.reserve(1);
std::cout << (v.end() - v.begin()) << std::endl;
v[9] = 0;
std::cout << (v.end() - v.begin()) << std::endl;
The output:
0
0
0
So... first of all... end() does not point to the end of the internal array but the last occupied cell... ok, that is why the result of the iterator subtraction is still 0 after reserve(1). But, why is it still 0 after one cell has been filled. I expected 1 as the result, because end() should now return the iterator to the second internal array cell.
Furthermore, why on earth am I not getting a seg fault for accessing the tenth cell with v[9] = 0, while the vector is only 1 cell long?
First of all, end() gives you an iterator to one beyond the last element in the vector. And if the vector is empty then begin() can't return anything else than the same as end().
Then when you call reserve() you don't actually create any elements, you only reserve some memory so the vector don't have to reallocate when you do add elements.
Finally, when you do
v[9] = 0;
you are indexing the vector out of bounds which leads to undefined behavior as you write to memory you don't own. UB often leads to crashes, but it doesn't have too, it may seem to work when in reality it doesn't.
As a note on the last part, the [] operator doesn't have bounds-checking, which is why it will accept indexing out of bounds. If you want bounds-checking you should use at().
v[9] = 0;, you're just accessing the vector out of bound, it's UB. It may crash in some cases, and may not. Nothing is guaranteed.
And v[9] = 0;, you don't add element at all. You need to use push_back or resize:
v.push_back(0); // now it has 1 element
v.resize(10); // now it has 10 elements
EDIT
why does v[index] not create an element?
Because std::vector::operator[] just doesn't do that.
Returns a reference to the element at specified location pos. No bounds checking is performed.
Unlike std::map::operator[], this operator never inserts a new element into the container.
So it's supposed that the vector has the sufficient elements for the subscript operator.
BTW: What do you suppose vector should do when you write v[9] = 0? Before set the 10th element to 0, it has to push 10 elements first. And How to set their values? All 0? So, it won't do it, the issue just depends on yourself.
This is a guess, but hopefully a helpful one.
You will only get a segfault when you attempt to access an address that has not been assigned to your process' memory space. When the OS gives memory to a program, it generally does so in 4KB increments. Because of this, you can access past the end of some arrays/vectors without triggering a segfault, but not others.

set erase behaves weird

I just wrote some basic code that pushes in a few values, deletes a value by using an iterator that points to it(erase). The set does not contain that value, however the iterator still points to the deleted value.
Isn't this counter-intuitive? Why does this happen?
// erasing from set
#include <iostream>
#include <set>
int main ()
{
std::set<int> myset;
std::set<int>::iterator it;
// insert some values:
for (int i=1; i<10; i++) myset.insert(i*10); // 10 20 30 40 50 60 70 80 90
it = myset.begin();
++it; // "it" points now to 20
myset.erase (it);
std::cout << *it << std::endl; // still prints 20
std::cout << "myset contains:";
for (it=myset.begin(); it!=myset.end(); ++it)
std::cout << ' ' << *it;
std::cout << '\n';
return 0;
}
You have an invalid set iterator. The standard prescribes no rules for what happens when you dereference an invalid set iterator, so the behavior is undefined. It is allowed to return 20. It is allowed to do anything else, for that matter.
The set does not contain that value, however the iterator still points to the deleted value.
No, it doesn't. Not really.
Isn't this counter-intuitive? Why does this happen?
Because the memory underneath that iterator still happens to contain the bits that make up the value 20. That doesn't mean it's valid memory, or that those bits will always have that value.
It's just a ghost.
You erased it from the set, but the iterator is still pointing to the memory that it was pointing to before you erased it. Unless the erase did something to invalidate that memory (e.g. re-organize the entire set and write over it), the memory and its contents still exist.
HOWEVER it is "dead". You should not reference it. This is a real problem...you can't iterate over a set, calling "erase" on iterators, and expect the contents of the iterate to still be valid. As far as I know, you can't cache iterators and expect to reference them if you are erasing the contents of the set using them.
This is also true when iterating over a list<.>. It is tempting to iterate over a list and use the iterator to erase(.) certain elements. But the erase(.) call breaks the linkage, so your iterator is no longer valid.
This is also true when you iterate over a vector<.>. But it is more obvious there. If I am at element N and call erase on it, the size of the underlying contiguous elements just got smaller by 1.
In general, it is probably a good idea to avoid operations that can effect the allocation of the underlying container (insert, erase, push_xxx, etc.) while using an iterator that will be subsequently referenced (e.g. loop, dereference after operation, etc.).

Iterating, inserting and removing?

Dream output:
/* DREAM OUTPUT:
INT: 1
TESTINT: 1
TESTINT: 2
TESTINT: 23
TESTINT: 24
TESTINT: 25
TESTINT: 3
TESTINT: 4
TESTINT: 5
TESTINT: 6
INT: 23
INT: 24
INT: 25
INT: 3
INT: 4
INT: 5
INT: 6
Problem
ERROR 1: Not erasing the '2' causes a bizzare effect.
ERROR 2: Erasing the '2' causes memory corruption.
Code
#include <cstdlib>
#include <iostream>
#include <vector>
int main(int argc, char* argv[])
{
std::vector<int> someInts;
someInts.push_back(1);
someInts.push_back(2);
someInts.push_back(3);
someInts.push_back(4);
someInts.push_back(5);
someInts.push_back(6);
for(std::vector<int>::iterator currentInt = someInts.begin();
currentInt != someInts.end(); ++currentInt)
if(*currentInt == 2)
{
std::vector<int> someNewInts;
someNewInts.push_back(23);
someNewInts.push_back(24);
someNewInts.push_back(25);
someInts.insert(currentInt + 1, someNewInts.begin(), someNewInts.end());
//someInts.erase(currentInt);
for(std::vector<int>::iterator testInt = someInts.begin();
testInt != someInts.end(); ++testInt)
std::cout << "TESTINT: " << *testInt << '\n';
}
else
std::cout << "INT: " << *currentInt << '\n';
return 0;
}
The code is pretty self-explanatory, but I'd like to know what's going on here. This is a replica using ints of what's happening in a much larger project. It baffles me.
Inserting elements into a vector causes the iterators asociated with it to be invalid, since the vector can grow and thus it reallocates its internal storage space.
As someInts.erase(currentInt); invalidates currentInt you can't use it until you set it right.
It so happens that erase() returns a valid iterator in the list to continue with.
An iterator that designates the first element remaining beyond any elements removed, or a pointer to the end of the vector if no such element exists.
Try
currentInt = someInts.erase(currentInt);
which would put the outer loop at '23' the start of your test data and step to '24' for the next loop.
You need to understand the differences between the stl collections.
A Vector is a continuous (usually) block of memory. Whem you insert into the middle, it tries to be helpful by re-allocating enough memory for the existing data plus the new, then copying it all to the right places and letting you continue as if nothing had happened. However, as you're finding - something has happened. Your iterators that used to refer to the old memory block, are still pointing there - but the data has been moved. You get memory errors if you try to use them.
One answer is to determine where the iterator used to point, and update it to point to the new location. Typically, people use the [] operator for this, but you can use begin() + x (where x is the index into the vector).
Alternatively, use a collection whose iterators are not invalidated by inserting. The best one for this is the list. Lists are constructed from little blocks of memory (1 per item) with a pointer to the next block along. This makes insertion very quick and easy as no memory needs to be modified, just the pointers to the blocks either side of the new item. Your iterator will still be valid too!
Erasing is just the same, except once you delete the item your iterator refers to, its invalid (obviously) so you cannot make any operation on it. Even ++ operator will not work as the memory might have changed in a vector, or the list pointers be different. So, you can first get an iterator to the next element, store it and then use that once you've deleted an item, or use the return value from the erase() method.
If you were to use list as the collection instead of vector, you would not get random-access and it might use more memory but you would have constant-time insertion in the middle of the collection and doing so would not invalidate your iterators.
The exception would be the one you were erasing, so you would not be able to ++ it at the end of the loop. You would have to handle this situation by storing a copy of its next element.