Iterating, inserting and removing? - c++

Dream output:
/* DREAM OUTPUT:
INT: 1
TESTINT: 1
TESTINT: 2
TESTINT: 23
TESTINT: 24
TESTINT: 25
TESTINT: 3
TESTINT: 4
TESTINT: 5
TESTINT: 6
INT: 23
INT: 24
INT: 25
INT: 3
INT: 4
INT: 5
INT: 6
Problem
ERROR 1: Not erasing the '2' causes a bizzare effect.
ERROR 2: Erasing the '2' causes memory corruption.
Code
#include <cstdlib>
#include <iostream>
#include <vector>
int main(int argc, char* argv[])
{
std::vector<int> someInts;
someInts.push_back(1);
someInts.push_back(2);
someInts.push_back(3);
someInts.push_back(4);
someInts.push_back(5);
someInts.push_back(6);
for(std::vector<int>::iterator currentInt = someInts.begin();
currentInt != someInts.end(); ++currentInt)
if(*currentInt == 2)
{
std::vector<int> someNewInts;
someNewInts.push_back(23);
someNewInts.push_back(24);
someNewInts.push_back(25);
someInts.insert(currentInt + 1, someNewInts.begin(), someNewInts.end());
//someInts.erase(currentInt);
for(std::vector<int>::iterator testInt = someInts.begin();
testInt != someInts.end(); ++testInt)
std::cout << "TESTINT: " << *testInt << '\n';
}
else
std::cout << "INT: " << *currentInt << '\n';
return 0;
}
The code is pretty self-explanatory, but I'd like to know what's going on here. This is a replica using ints of what's happening in a much larger project. It baffles me.

Inserting elements into a vector causes the iterators asociated with it to be invalid, since the vector can grow and thus it reallocates its internal storage space.

As someInts.erase(currentInt); invalidates currentInt you can't use it until you set it right.
It so happens that erase() returns a valid iterator in the list to continue with.
An iterator that designates the first element remaining beyond any elements removed, or a pointer to the end of the vector if no such element exists.
Try
currentInt = someInts.erase(currentInt);
which would put the outer loop at '23' the start of your test data and step to '24' for the next loop.

You need to understand the differences between the stl collections.
A Vector is a continuous (usually) block of memory. Whem you insert into the middle, it tries to be helpful by re-allocating enough memory for the existing data plus the new, then copying it all to the right places and letting you continue as if nothing had happened. However, as you're finding - something has happened. Your iterators that used to refer to the old memory block, are still pointing there - but the data has been moved. You get memory errors if you try to use them.
One answer is to determine where the iterator used to point, and update it to point to the new location. Typically, people use the [] operator for this, but you can use begin() + x (where x is the index into the vector).
Alternatively, use a collection whose iterators are not invalidated by inserting. The best one for this is the list. Lists are constructed from little blocks of memory (1 per item) with a pointer to the next block along. This makes insertion very quick and easy as no memory needs to be modified, just the pointers to the blocks either side of the new item. Your iterator will still be valid too!
Erasing is just the same, except once you delete the item your iterator refers to, its invalid (obviously) so you cannot make any operation on it. Even ++ operator will not work as the memory might have changed in a vector, or the list pointers be different. So, you can first get an iterator to the next element, store it and then use that once you've deleted an item, or use the return value from the erase() method.

If you were to use list as the collection instead of vector, you would not get random-access and it might use more memory but you would have constant-time insertion in the middle of the collection and doing so would not invalidate your iterators.
The exception would be the one you were erasing, so you would not be able to ++ it at the end of the loop. You would have to handle this situation by storing a copy of its next element.

Related

C++ double free or corruption(out)

I came across this error when I was using iterators with vector STL.
Code:-
#include <iostream>
#include <vector>
void print_vec(std::vector<int> vec)
{
auto itr = vec.begin();
while (itr != vec.end())
std::cout << *itr++ << " ";
std::cout << std::endl;
}
int main(int argc, char* argv[])
{
std::vector<int> vals = { 1, 2, 3, 4, 5, 6 };
print_vec(vals);
auto it = vals.begin();
vals.insert(it, 100);
print_vec(vals);
it += 4;
std::cout << *it << "\n";
vals.insert(it, 200);
print_vec(vals);
return 0;
}
Output:-
1 2 3 4 5 6
100 1 2 3 4 5 6
5
0 100 1 2 3 4 5 6
double free or corruption (out)
Aborted (core dumped)
I do not understand why this is happening: is it an issue with the iterator after the vector is modified? I think that my intuition regarding iterators is flawed.
When a std::vector is resized, the iterator becomes invalid because it refers to the vector before resizing which may include moving the vector to a different memory location in order to maintain a contiguity. In that case it will not be automatically updated to refer to the new location.
The result in this case (because it is undefined behaviour) is that the value 200 was not inserted in the vector, but it was written to some other heap memory that the vector previously occupied that has since been deallocated - i.e. you corrupted the heap. The vector is destroyed on exit from main, and at that time the system heap management fails and the error is reported (after the fact).
Reassigning the iterator following the insertion resolves the issue:
it = vals.insert(it,100);
...
The output of the single item reports 5 rather then 4 as you might expect because it is simply reading the old vector memory not yet reused and overwritten with something else.
You can observe the error by comparing:
vals.insert(it,100);
std::cout<<std::distance(it,vals.begin())<<'\n' ;
with
it = vals.insert(it,100);
std::cout<<std::distance(it,vals.begin())<<'\n' ;
Your results are likely to differ but at https://www.onlinegdb.com/ when I performed the test, the distance between the iterator position and the vector start in the first case was 16 - demonstrating that the iterator no longer refers to an element in the current vector state. In the second case the distance is zero - the iterator refers to the start of the new vector state.
If the capacity of vals were extended before taking the iterator by for example:
vals.reserve(10);
Then the insertion of a single element would not cause a mem-move, and the failure would not have occurred - that does not make it a good idea not to update the iterator though - it is merely to indicate what is happening under the hood.
...is it an issue with the iterator after the vector is modified?
Yes!
From the cppreference page on std::vector.insert() (bolding mine):
Causes reallocation if the new size() is greater than the old
capacity(). If the new size() is greater than capacity(), all
iterators and references are invalidated. Otherwise, only the
iterators and references before the insertion point remain valid. The
past-the-end iterator is also invalidated.

C++: std::vector with space reserving for both ends

Just some thoughts:
I wanted to erase x elements from the beginning of a std::vector with size n > x. Since a vector uses an array internally, this could be as easy as setting the pointer for vector.begin() to the index of the first element which is kept. But instead, erase shifts all elements in the array to that the first element actually starts at index 0, making the erase operation take much more time than it could.
Furthermore, if the valid 'zone' of the internal array was really controlled by just some start and end indices / pointers of the vector structure, then there would be also the option to reserve space in front of the first element. E.g., a vector is initialized and 20 spaces are reserved at the end and 10 at the beginning. Internally, then an array of space 30 (or 32) is created, where the start index/pointer points to the 11th value of the internal array, allowing to include new elements to the front of the 'vector' in constant speed.
Well, my point is, I think such a data structure would be somewhat useful, at least for my purposes. I'm pretty sure someone already thought of this and already implemented it. So I want to ask: How is the data structure called that I'm describing? If it exists, I'd love to use it. I think this is not a double-linked list, since there, every element is kind of a struct containing the element value and additional pointers to the neighbors (to my knowledge).
EDIT: And yes, such a structure would probably use more memory than necessary, especially when erasing some elements from the beginning, because then, the internal array still has the initial size. But well, memory isn't a big issue anymore for most problems, and there could be a (time-expensive) 'memory-optimize' operation to create a new, smaller array, copying over all old values and deleting the old internal array to use the smallest possible size.
Expanding on #Kerrek SB's comment, boost::circular_buffer<> does I think what you need, for example:
#include <iostream>
#include <boost/circular_buffer.hpp>
int main()
{
boost::circular_buffer<int> cb(3);
cb.push_back(1);
cb.push_back(2);
cb.push_back(3);
for( auto i : cb) {
std::cout << i << std::endl;
}
// Increase to hold two more items
cb.set_capacity(5);
cb.push_back(4);
cb.push_back(5);
for( auto i : cb) {
std::cout << i << std::endl;
}
// Increase to hold two more items
cb.rset_capacity(7);
cb.push_front(0);
cb.push_front(-1);
for( auto i : cb) {
std::cout << i << std::endl;
}
}
TBH - I have not looked at the implementation, so cannot comment on whether it moves data around (I'd be highly surprised.) but if you pull down the source, take a quick peek to satisfy if performance is a concern...
EDIT: Quick look at the code reveals that the push_xxx operations does not indeed move data around, however the xxx_capacity operations do result in a move/copy - to avoid that, ensure the ring has enough capacity at the start and it will work as you wish...

exceeding vector does not cause seg fault

I am very puzzled at the result of this bit of code:
std::vector<int> v;
std::cout << (v.end() - v.begin()) << std::endl;
v.reserve(1);
std::cout << (v.end() - v.begin()) << std::endl;
v[9] = 0;
std::cout << (v.end() - v.begin()) << std::endl;
The output:
0
0
0
So... first of all... end() does not point to the end of the internal array but the last occupied cell... ok, that is why the result of the iterator subtraction is still 0 after reserve(1). But, why is it still 0 after one cell has been filled. I expected 1 as the result, because end() should now return the iterator to the second internal array cell.
Furthermore, why on earth am I not getting a seg fault for accessing the tenth cell with v[9] = 0, while the vector is only 1 cell long?
First of all, end() gives you an iterator to one beyond the last element in the vector. And if the vector is empty then begin() can't return anything else than the same as end().
Then when you call reserve() you don't actually create any elements, you only reserve some memory so the vector don't have to reallocate when you do add elements.
Finally, when you do
v[9] = 0;
you are indexing the vector out of bounds which leads to undefined behavior as you write to memory you don't own. UB often leads to crashes, but it doesn't have too, it may seem to work when in reality it doesn't.
As a note on the last part, the [] operator doesn't have bounds-checking, which is why it will accept indexing out of bounds. If you want bounds-checking you should use at().
v[9] = 0;, you're just accessing the vector out of bound, it's UB. It may crash in some cases, and may not. Nothing is guaranteed.
And v[9] = 0;, you don't add element at all. You need to use push_back or resize:
v.push_back(0); // now it has 1 element
v.resize(10); // now it has 10 elements
EDIT
why does v[index] not create an element?
Because std::vector::operator[] just doesn't do that.
Returns a reference to the element at specified location pos. No bounds checking is performed.
Unlike std::map::operator[], this operator never inserts a new element into the container.
So it's supposed that the vector has the sufficient elements for the subscript operator.
BTW: What do you suppose vector should do when you write v[9] = 0? Before set the 10th element to 0, it has to push 10 elements first. And How to set their values? All 0? So, it won't do it, the issue just depends on yourself.
This is a guess, but hopefully a helpful one.
You will only get a segfault when you attempt to access an address that has not been assigned to your process' memory space. When the OS gives memory to a program, it generally does so in 4KB increments. Because of this, you can access past the end of some arrays/vectors without triggering a segfault, but not others.

set erase behaves weird

I just wrote some basic code that pushes in a few values, deletes a value by using an iterator that points to it(erase). The set does not contain that value, however the iterator still points to the deleted value.
Isn't this counter-intuitive? Why does this happen?
// erasing from set
#include <iostream>
#include <set>
int main ()
{
std::set<int> myset;
std::set<int>::iterator it;
// insert some values:
for (int i=1; i<10; i++) myset.insert(i*10); // 10 20 30 40 50 60 70 80 90
it = myset.begin();
++it; // "it" points now to 20
myset.erase (it);
std::cout << *it << std::endl; // still prints 20
std::cout << "myset contains:";
for (it=myset.begin(); it!=myset.end(); ++it)
std::cout << ' ' << *it;
std::cout << '\n';
return 0;
}
You have an invalid set iterator. The standard prescribes no rules for what happens when you dereference an invalid set iterator, so the behavior is undefined. It is allowed to return 20. It is allowed to do anything else, for that matter.
The set does not contain that value, however the iterator still points to the deleted value.
No, it doesn't. Not really.
Isn't this counter-intuitive? Why does this happen?
Because the memory underneath that iterator still happens to contain the bits that make up the value 20. That doesn't mean it's valid memory, or that those bits will always have that value.
It's just a ghost.
You erased it from the set, but the iterator is still pointing to the memory that it was pointing to before you erased it. Unless the erase did something to invalidate that memory (e.g. re-organize the entire set and write over it), the memory and its contents still exist.
HOWEVER it is "dead". You should not reference it. This is a real problem...you can't iterate over a set, calling "erase" on iterators, and expect the contents of the iterate to still be valid. As far as I know, you can't cache iterators and expect to reference them if you are erasing the contents of the set using them.
This is also true when iterating over a list<.>. It is tempting to iterate over a list and use the iterator to erase(.) certain elements. But the erase(.) call breaks the linkage, so your iterator is no longer valid.
This is also true when you iterate over a vector<.>. But it is more obvious there. If I am at element N and call erase on it, the size of the underlying contiguous elements just got smaller by 1.
In general, it is probably a good idea to avoid operations that can effect the allocation of the underlying container (insert, erase, push_xxx, etc.) while using an iterator that will be subsequently referenced (e.g. loop, dereference after operation, etc.).

c++ vectors and pointers

As i understand if i dont store pointers, everything in c++ gets copied, which can lead to bad performance (ignore the simplicity of my example). So i thought i store my objects as pointers instead of string object inside my vector, thats better for performance right? (assumining i have very long strings and lots of them).
The problem when i try to iterate over my vector of string pointers is i cant extract the actual value from them
string test = "my-name";
vector<string*> names(20);
names.push_back(&test);
vector<string*>::iterator iterator = names.begin();
while (iterator != names.end())
{
std::cout << (*iterator) << ":" << std::endl;
// std::cout << *(*iterator); // fails
iterator++;
}
See the commented line, i have no problem in receiving the string pointer. But when i try to get the string pointers value i get an error (i couldnt find what excatly the error is but the program just fails).
I also tried storing (iterator) in a new string variable and but it didnt help?
You've created the vector and initialized it to contain 20 items. Those items are being default initialized, which in the case of a pointer is a null pointer. The program is having trouble dereferencing those null pointers.
One piece of advice is to not worry about what's most efficient until you have a demonstrated problem. This code would certainly work much better with a vector<string> versus a vector<string*>.
No, no, a thousand times no.
Don't prematurely optimize. If the program is fast, there's no need to worry about performance. In this instance, the pointers clearly reduce performance by consuming memory and time, since each object is only the target of a single pointer!
Not to mention that manual pointer programming tends to introduce errors, especially for novices. Sacrificing correctness and stability for performance is a huge step backwards.
The advantage of C++ is that it simplifies the optimization process by providing encapsulated data structures and algorithms. So when you decide to optimize, you can usually do so by swapping in standard parts.
If you want to learn about optimizing data structures, read up on smart pointers.
This is probably the program you want:
vector<string> names(20, "my-name");
for ( vector<string>::iterator iterator = names.begin();
iterator != names.end();
++ iterator )
{
std::cout << *iterator << '\n';
}
Your code looks like you're storing a pointer to a stack-based variable into a vector. As soon as the function where your string is declared returns, that string becomes garbage and the pointer is invalid. If you're going to store pointers into a vector, you probably need to allocate your strings dynamically (using new).
Have a look at your initialization:
string test = "my-name";
vector<string*> names(20);
names.push_back(&test);
You first create a std::vector with 20 elements.
Then you use push_back to append a 21st element, which points to a valid string. That's fine, but this element is never reached in the loop: the first iteration crashes already since the first 20 pointers stored in the vector don't point to valid strings.
Dereferencing an invalid pointer causes a crash. If you make sure that you have a valid pointers in your vector, **iterator is just fine to access an element.
Try
if (*iterator)
{
std::cout << *(*iterator) << ":" << std::endl;
}
Mark Ransom explains why some of the pointers are now
string test = "my-name";
vector<string*> names(20);
The size of vector is 20, meaning it can hold 20 string pointers.
names.push_back(&test);
With the push_back operation, you are leaving out the first 20 elements and adding a new element to the vector which holds the address of test. First 20 elements of vector are uninitialized and might be pointing to garbage. And the while loop runs till the end of vector whose size is 21 and dereferencing uninitialized pointers is what causing the problem. Since the size of vector can be dynamically increased with a push_back operation, there is no need to explicitly mention the size.
vector<string*> names; // Not explicitly mentioning the size and the rest of
// the program should work as expected.