Any way that we can prevent the breakdown between reference and object when resize a vector? - c++

I was debugging an issue and realized that when a vector is resizing, the reference will not work anymore. To illustrate this point, below is the minimal code. The output is 0 instead of 1. Is there anyway that we can prevent this happen except reserving a large space for x?
#include <iostream>
#include <vector>
using namespace std;
vector<int> x{};
int main(){
x.reserve(1);
x.push_back(0);
int & y = x[0];
x.resize(10);
y=1;
cout << x[0] << endl;
return 0;
}

This is called invalidation and the only way you can prevent it is if you make sure that the vector capacity does not change.
x.reserve(10);
x.push_back(0);
int &y = x[0];
x.resize(10);

The only way I can think of is to use std::deque instead of std::vector.
The reason for suggesting std::deque is this (from cppreference):
The storage of a deque is automatically expanded and contracted as
needed. Expansion of a deque is cheaper than the expansion of a
std::vector because it does not involve copying of the existing
elements to a new memory location.
That line about not copying is really the answer to your question. It means that the objects remain where you placed them (in memory) as long as the deque is alive.
However, on the very next line it says:
On the other hand, deques typically have large minimal memory cost; a
deque holding just one element has to allocate its full internal array
(e.g. 8 times the object size on 64-bit libstdc++; 16 times the object
size or 4096 bytes, whichever is larger, on 64-bit libc++).
It's now up to you to decide which is better - higher initial memory cost or changing your program's logic not to require referencing the items in the vector like that. You might also want to consider std::set or std::unordered_set for quickly finding an object within the container

There are several choices:
Don't use a vector.
Don't keep a reference.
Create a "smart reference" class that tracks the vector and the index and so it will obtain the appropriate object even if the vector moves.

You can create a vector of std::shared_ptr<> as well and keep the values instead of the interators.

Related

Vector of pointers undefined behaviour

I'm trying to make a vector of pointers whose elements are pointing to vector of int elements. (I'm solving a competitive programming-like problem, that's why it sounds kinda nonsense).
but here's the code:
#include <bits/stdc++.h>
using namespace std;
int ct = 0;
vector<int> vec;
vector<int*> rf;
void addRef(int n){
vec.push_back(n);
rf.push_back(&vec[ct]);
ct++;
}
int main(){
addRef(1);
addRef(2);
addRef(5);
for(int i = 0; i < ct; i++){
cout << *rf[i] << ' ';
}
cout << endl;
for(int i = 0; i < ct; i++){
cout << vec[i] << ' ';
}
}
When I execute the code, it's showing weird behaviour that I don't understand. The first element of rf (vector<int*>) seems not pointing to the vec's (vector<int>) element, where the rest of the elements are pointing to it.
here's the output when I run it on Dev-C++:
1579600 2 5
1 2 5
When I tried to run the code here, the output is even weirder:
1197743856 0 5
1 2 5
The code is intended to have same output between the first line and the second.
Can you guys explain why it happens? Is there any mistake in my implementation?
thanks
Adding elements to a std::vector with push_back or similar may invalidate all iterators and references to its elements. See https://en.cppreference.com/w/cpp/container/vector/push_back.
The idea is that in order to grow the vector, it may not have enough free memory to expand into, and thus may have to move the whole array to some other location in memory, freeing the old block. That means in particular that your pointers now point to memory that has been freed, or reused for something else.
If you want to keep this approach, you will need to resize() or reserve() a sufficient number of elements in vec before starting. Which of course defeats the whole purpose of a std::vector, and you might as well use an array instead.
The vector is changing sizes and the addresses you are saving might not be those you want. You can preallocate memory using reserve() and the vector will not resize.
vec.reserve(3);
addRef(1);
addRef(2);
addRef(5);
The problem occurs when you call vec.push_back(n) and vec’s internal array is already full. When that happens, the std::vector::push_back() method allocates a larger array, copies the contents of the full array over to the new array, then frees the old/full array and keeps the new one.
Usually that’s all you need, but your program is keeping pointers to elements of the old array inside (rf), and these pointers all become dangling/invalid when the reallocation occurs, hence the funny (undefined) behavior.
An easy fix would be to call vec.reserve(100) (or similar) at the top of your program (so that no further reallocations are necessary). Or alternatively you could postpone the adding of pointers to (rf) until after you’ve finished adding all the values to (vec).
Just do not take pointer from a vector that may change soon. vector will copy the elements to a new space when it enlarges its capacity.
Use an array to store the ints instead.

How does compiler decide capacity of a vector? [duplicate]

What is the capacity() of an std::vector which is created using the default constuctor? I know that the size() is zero. Can we state that a default constructed vector does not call heap memory allocation?
This way it would be possible to create an array with an arbitrary reserve using a single allocation, like std::vector<int> iv; iv.reserve(2345);. Let's say that for some reason, I do not want to start the size() on 2345.
For example, on Linux (g++ 4.4.5, kernel 2.6.32 amd64)
#include <iostream>
#include <vector>
int main()
{
using namespace std;
cout << vector<int>().capacity() << "," << vector<int>(10).capacity() << endl;
return 0;
}
printed 0,10. Is it a rule, or is it STL vendor dependent?
The standard doesn't specify what the initial capacity of a container should be, so you're relying on the implementation. A common implementation will start the capacity at zero, but there's no guarantee. On the other hand there's no way to better your strategy of std::vector<int> iv; iv.reserve(2345); so stick with it.
Storage implementations of std::vector vary significantly, but all the ones I've come across start from 0.
The following code:
#include <iostream>
#include <vector>
int main()
{
using namespace std;
vector<int> normal;
cout << normal.capacity() << endl;
for (unsigned int loop = 0; loop != 10; ++loop)
{
normal.push_back(1);
cout << normal.capacity() << endl;
}
cin.get();
return 0;
}
Gives the following output:
0
1
2
4
4
8
8
8
8
16
16
under GCC 5.1, 11.2 - Clang 12.0.1 and:
0
1
2
3
4
6
6
9
9
9
13
under MSVC 2013.
As far as I understood the standard (though I could actually not name a reference), container instanciation and memory allocation have intentionally been decoupled for good reason. Therefor you have distinct, separate calls for
constructor to create the container itself
reserve() to pre allocate a suitably large memory block to accomodate at least(!) a given number of objects
And this makes a lot of sense. The only right to exist for reserve() is to give you the opportunity to code around possibly expensive reallocations when growing the vector. In order to be useful you have to know the number of objects to store or at least need to be able to make an educated guess. If this is not given you better stay away from reserve() as you will just change reallocation for wasted memory.
So putting it all together:
The standard intentionally does not specify a constructor that allows you to pre allocate a memory block for a specific number of objects (which would be at least more desirable than allocating an implementation specific, fixed "something" under the hood).
Allocation shouldn't be implicit. So, to preallocate a block you need to make a separate call to reserve() and this need not be at the same place of construction (could/should of course be later, after you became aware of the required size to accomodate)
Thus if a vector would always preallocate a memory block of implementation defined size this would foil the intended job of reserve(), wouldn't it?
What would be the advantage of preallocating a block if the STL naturally cannot know the intended purpose and expected size of a vector? It'll be rather nonsensical, if not counter-productive.
The proper solution instead is to allocate and implementation specific block with the first push_back() - if not already explicitely allocated before by reserve().
In case of a necessary reallocation the increase in block size is implementation specific as well. The vector implementations I know of start with an exponential increase in size but will cap the increment rate at a certain maximum to avoid wasting huge amounts of memory or even blowing it.
All this comes to full operation and advantage only if not disturbed by an allocating constructor. You have reasonable defaults for common scenarios that can be overriden on demand by reserve() (and shrink_to_fit()). So, even if the standard does not explicitely state so, I'm quite sure assuming that a newly constructed vector does not preallocate is a pretty safe bet for all current implementations.
As a slight addition to the other answers, I found that when running under debug conditions with Visual Studio a default constructed vector will still allocate on the heap even though the capacity starts at zero.
Specifically if _ITERATOR_DEBUG_LEVEL != 0 then vector will allocate some space to help with iterator checking.
https://learn.microsoft.com/en-gb/cpp/standard-library/iterator-debug-level
I just found this slightly annoying since I was using a custom allocator at the time and was not expecting the extra allocation.
This is an old question, and all answers here have rightly explained the standard's point of view and the way you can get an initial capacity in a portable manner by using std::vector::reserve;
However, I'll explain why it doesn't make sense for any STL implementation to allocate memory upon construction of an std::vector<T> object;
std::vector<T> of incomplete types;
Prior to C++17, it was undefined behavior to construct a std::vector<T> if the definition of T is still unknown at point of instantiation. However, that constraint was relaxed in C++17.
In order to efficiently allocate memory for an object, you need to know its size. From C++17 and beyond, your clients may have cases where your std::vector<T> class does not know the size of T. Does it makes sense to have memory allocation characteristics dependent on type completeness?
Unwanted Memory allocations
There are many, many, many times you'll need model a graph in software. (A tree is a graph); You are most likely going to model it like:
class Node {
....
std::vector<Node> children; //or std::vector< *some pointer type* > children;
....
};
Now think for a moment and imagine if you had lots of terminal nodes. You would be very pissed if your STL implementation allocates extra memory simply in anticipation of having objects in children.
This is just one example, feel free to think of more...
Standard doesnt specify initial value for capacity but the STL container automatically grows to accomodate as much data as you put in, provided you don't exceed the maximum size(use max_size member function to know).
For vector and string, growth is handled by realloc whenever more space is needed. Suppose you'd like to create a vector holding value 1-1000. Without using reserve, the code will typically result in between
2 and 18 reallocations during following loop:
vector<int> v;
for ( int i = 1; i <= 1000; i++) v.push_back(i);
Modifying the code to use reserve might result in 0 allocations during the loop:
vector<int> v;
v.reserve(1000);
for ( int i = 1; i <= 1000; i++) v.push_back(i);
Roughly to say, vector and string capacities grow by a factor of between 1.5 and 2 each time.

Performance impact when resizing vector within capacity

I have the following synthesized example of my code:
#include <vector>
#include <array>
#include <cstdlib>
#define CAPACITY 10000
int main() {
std::vector<std::vector<int>> a;
std::vector<std::array<int, 2>> b;
a.resize(CAPACITY, std::vector<int> {0, 0})
b.resize(CAPACITY, std::array<int, 2> {0, 0})
for (;;) {
size_t new_rand_size = (std::rand() % CAPACITY);
a.resize(new_rand_size);
b.resize(new_rand_size);
for (size_t i = 0; i < new_rand_size; ++i) {
a[i][0] = std::rand();
a[i][1] = std::rand();
b[i][0] = std::rand();
b[i][1] = std::rand();
}
process(a); // respectively process(b)
}
}
so obviously, the array version is better, because it requires less allocation, as the array is fixed in size and continuous in memory (correct?). It just gets reinitialized when up-resizing again within capacity.
Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization (e.g. by overwriting the allocator or similar) to optimize the code even further.
so obviously,
The word "obviously" is typically used to mean "I really, really want the following to be true, so I'm going to skip the part where I determine if it is true." ;) (Admittedly, you did better than most since you did bring up some reasons for your conclusion.)
the array version is better, because it requires less allocation, as the array is fixed in size and continuous in memory (correct?).
The truth of this depends on the implementation, but the there is some validity here. I would go with a less micro-managementy approach and say that the array version is preferable because the final size is fixed. Using a tool designed for your specialized situation (fixed size array) tends to incur less overhead than using a tool for a more general situation. Not always less, though.
Another factor to consider is the cost of default-initializing the elements. When a std::array is constructed, all of its elements are constructed as well. With a std::vector, you can defer constructing elements until you have the parameters for construction. For objects that are expensive to default-construct, you might be able to measure a performance gain using a vector instead of an array. (If you cannot measure a difference, don't worry about it.)
When you do a comparison, make sure the vector is given a fair chance by using it well. Since the size is known in advance, reserve the required space right away. Also, use emplace_back to avoid a needless copy.
Final note: "contiguous" is a bit more accurate/descriptive than "continuous".
It just gets reinitialized when up-resizing again within capacity.
This is a factor that affects both approaches. In fact, this causes your code to exhibit undefined behavior. For example, let's suppose that your first iteration resizes the outer vector to 1, while the second resizes it to 5. Compare what your code does to the following:
std::vector<std::vector<int>> a;
a.resize(CAPACITY, std::vector<int> {0, 0});
a.resize(1);
a.resize(5);
std::cout << "Size " << a[1].size() <<".\n";
The output indicates that the size is zero at this point, yet your code would assign a value to a[1][0]. If you want each element of a to default to a vector of 2 elements, you need to specify that default each time you resize a, not just initially.
Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization (e.g. by overwriting the allocator or similar) to optimize the code even further.
Yes, you can skip the initialization. In fact, it is advisable to do so. Use the tool designed for the task at hand. Your initialization serves to increase the capacity of your vectors. So use the method whose sole purpose is to increase the capacity of a vector: vector::reserve.
Another option – depending on the exact situation — might be to not resize at all. Start with an array of arrays, and track the last usable element in the outer array. This is sort of a step backwards in that you now have a separate variable for tracking the size, but if your real code has enough iterations, the savings from not calling destructors when the size decreases might make this approach worth it. (For cleaner code, write a class that wraps the array of arrays and that tracks the usable size.)
Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization
Yes: Don't resize. Instead, reserve the capacity and push (or emplace) the new elements.

Safe to iterate over std::vector<some_container> while modifying <some_container>'s size?

Suppose I have a vector of some other container type. While iterating over the vector I change the size of the containers. Given that vectors try to remain contiguous in system memory, could the pointer arithmetic fail in loops like this? For example,
#include <stdlib.h>
#include <vector>
using namespace std;
int main(){
vector<vector<double> > vec_vec(4);
for (auto i=vec_vec.begin(); i!=vec_vec.end(); ++i){
for (double j=0; j<100; j+=1.0){
i->push_back(j)
};
};
return 0;
}
I've had no issues using code like this so far, but now I'm wondering if I just got lucky. Is this safe? Does it depend on the kind of container used inside the vector?
That's perfectly OK, you are not changing the outer vector. However there is no guarantee that all vectors will be contiguous in the memory. Each individual inner one will be, but don't expect that they are arranged one after the other in memory.
You are modifying the contents of the std::vector you are iterating over. No the vector you are iterating over. They are different things.
First one is safe. Second one wouldn't be safe due to eventual memory reallocations.
A vector is a fixed size management object (size,reserved, pointer) with its contiguous memory pointed to by pointer.
Thus you are not changing object's size

Implementation defined to use a reserved vector without resizing it?

Is it implementation defined to use a reserved vector without resizing it?
By that I mean:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
std::vector<unsigned int> foo;
foo.reserve(1024);
foo[0] = 10;
std::cout<<foo[0];
return 0;
}
In the above, I reserve a good amount of space and I assigned a value to one of the indices in that space. However, I did not call push_back which "resizes" the vector and gives it a default value for each element (which I'm trying to avoid). So in this foo.size() is 0 while foo.capacity() is 1024.
So is this valid code or is it implementation defined? Seeing as I'm assigning to a vector with "0" size. It works but I'm not sure if it's a good idea..
The reason I'm trying to avoid the default value is because for large allocations, I don't need it "zero-ing" out each index as I will decide when I want to write to it or not. I'd use a raw pointer but the lodepng API accepts only a vector for decoding from file.
std::vector::reserve just reserves memory, so the next push_back does not have to allocate memory. It does not change the size of the vector.
If you want a vector with an initial size of 1024 elements, you can use the constructor to do that:
std::vector<unsigned int> foo(1024);
Note that if you create a vector with an initial size of e.g. 1024 elements, if you then do push_back you add an element, so the size of the vector increases to 1025 elements.
It is illegal, regardless of the type of item in the container or what seems to happen on a particular compiler. From 23.1.1/12 (Table 68) we learn that operator[] behaves like *(a.begin() + n). Since you haven't added any items to the container this is the same as accessing an iterator past end() which is undefined.