Strange output from dereferencing pointers into a vector - c++

I was writing a program in C++ where I need to have a 2d grid of pointers which point to objects which are stored in a vector. I tested out some part of the program and saw strange results in the output.
I changed the objects to integers and removed everything non-essential to cut it down to the code snippet below, but I still get a weird output.
vector<vector<int*>> lattice(10, vector<int*>(10));//grid of pointers
vector<int> relevant;//vector carrying actual values onto which pointers will point
for(int i = 0; i<10; i++){
int new_integer = i;
relevant.push_back(new_integer);//insert integer into vector
lattice[0][i] = &relevant[i];//let pointer point onto this value
}
//OUTPUT
for(int j = 0; j<10; j++){
cout<<*lattice[0][j]<<" ";
cout<<relevant[j]<<endl;
}
I get strange outputs like this:
19349144 0
19374040 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
Also, my output changes from run to run and depending on how big/small I make my grid.
I would expect all the values on the left and right to be equal, I guess there is something fundamental about pointers I haven't understood, so sorry if this is a very basic question.
Can someone please explain why I get strange outputs for some values of the grid?

I need to have a 2d grid of pointers which point to objects which are stored in a vector
That's impossible. Or rather, that's a guarantee for dereferencing invalid addresses. Why?
At any given time, an std::vector has enough space allocated for some limited number of elements. If you keep adding elements to it, it will eventually max out its storage. At some insertion, it will decide allocate a new stretch of memory to use for storing its data; move (or copy) existing data to the new storage area; free the old storage area; and then be able to add more elements.
When this happens, all existing pointers into objects in the vector become invalid. The memory they point to may continue to hold the previous values, but may also be used to store other data - there are no guarantees on that! In fact, dereferencing invalid pointers officially results in undefined behavior.
... and I do see that what I've described is exactly what you're doing with your code. Your older pointers become invalid.
Instead, consider keeping indices into the vector rather than pointers. Indices don't get invalidated by adding elements to the vector, and you can keep using them.
PS - I see you're using a vector-of-vectors. That's technically valid, but it is often unadvisable. Consider using a matrix class (e.g. from the Eigen library) or allocating a certain amount of memory using std::make_unique(), then using it to initialize a gsl::multi_span.

relevant.push_back invalidates all pointers/references/iterators to its elements (if the new size exceeds its current capacity).
Therefore you are dereferencing potentially invalid pointers when you do
*lattice[0][j]
later.
You can use a container that doesn't invalidate on insertion at the end, such as std::list or std::deque, instead of std::vector, for relevant.
(Or you can reserve a sufficient capacity with a call to .reserve first, so that the size on the .push_back operations will never exceed the capacity and will therefore never invalidate pointers, but that carries the risk of easily accidentally ignoring this requirement on later code changes, again causing UB.)

When a std::vector is inserted into, if its new size() will exceed its current capacity(), the vector has to reallocate its internal array to make room, which will invalidate any existing iterators and pointers to the old memory.
In your example, you can avoid that by reserve()'ing the capacity() ahead of time, eg:
vector<vector<int*>> lattice(10, vector<int*>(10));//grid of pointers
vector<int> relevant;//vector carrying actual values onto which pointers will point
//ADD THIS!
relevant.reserve(10);//pre-allocate the capacity
for(int i = 0; i<10; i++){
int new_integer = i;
relevant.push_back(new_integer);//insert integer into vector
lattice[0][i] = &relevant[i];//let pointer point onto this value
}
//OUTPUT
for(int j = 0; j<10; j++){
cout<<*lattice[0][j]<<" ";
cout<<relevant[j]<<endl;
}
Alternatively, you can pre-allocate the size() and then use operator[] instead of push_back(), eg:
vector<vector<int*>> lattice(10, vector<int*>(10));//grid of pointers
vector<int> relevant(10);//vector carrying actual values onto which pointers will point
for(int i = 0; i<10; i++){
int new_integer = i;
relevant[i] = new_integer;//insert integer into vector
lattice[0][i] = &relevant[i];//let pointer point onto this value
}
//OUTPUT
for(int j = 0; j<10; j++){
cout<<*lattice[0][j]<<" ";
cout<<relevant[j]<<endl;
}

Related

Vector of pointers undefined behaviour

I'm trying to make a vector of pointers whose elements are pointing to vector of int elements. (I'm solving a competitive programming-like problem, that's why it sounds kinda nonsense).
but here's the code:
#include <bits/stdc++.h>
using namespace std;
int ct = 0;
vector<int> vec;
vector<int*> rf;
void addRef(int n){
vec.push_back(n);
rf.push_back(&vec[ct]);
ct++;
}
int main(){
addRef(1);
addRef(2);
addRef(5);
for(int i = 0; i < ct; i++){
cout << *rf[i] << ' ';
}
cout << endl;
for(int i = 0; i < ct; i++){
cout << vec[i] << ' ';
}
}
When I execute the code, it's showing weird behaviour that I don't understand. The first element of rf (vector<int*>) seems not pointing to the vec's (vector<int>) element, where the rest of the elements are pointing to it.
here's the output when I run it on Dev-C++:
1579600 2 5
1 2 5
When I tried to run the code here, the output is even weirder:
1197743856 0 5
1 2 5
The code is intended to have same output between the first line and the second.
Can you guys explain why it happens? Is there any mistake in my implementation?
thanks
Adding elements to a std::vector with push_back or similar may invalidate all iterators and references to its elements. See https://en.cppreference.com/w/cpp/container/vector/push_back.
The idea is that in order to grow the vector, it may not have enough free memory to expand into, and thus may have to move the whole array to some other location in memory, freeing the old block. That means in particular that your pointers now point to memory that has been freed, or reused for something else.
If you want to keep this approach, you will need to resize() or reserve() a sufficient number of elements in vec before starting. Which of course defeats the whole purpose of a std::vector, and you might as well use an array instead.
The vector is changing sizes and the addresses you are saving might not be those you want. You can preallocate memory using reserve() and the vector will not resize.
vec.reserve(3);
addRef(1);
addRef(2);
addRef(5);
The problem occurs when you call vec.push_back(n) and vec’s internal array is already full. When that happens, the std::vector::push_back() method allocates a larger array, copies the contents of the full array over to the new array, then frees the old/full array and keeps the new one.
Usually that’s all you need, but your program is keeping pointers to elements of the old array inside (rf), and these pointers all become dangling/invalid when the reallocation occurs, hence the funny (undefined) behavior.
An easy fix would be to call vec.reserve(100) (or similar) at the top of your program (so that no further reallocations are necessary). Or alternatively you could postpone the adding of pointers to (rf) until after you’ve finished adding all the values to (vec).
Just do not take pointer from a vector that may change soon. vector will copy the elements to a new space when it enlarges its capacity.
Use an array to store the ints instead.

How to access nth index of reserved memory of a vector, without explicitly pushing back? [duplicate]

There is a thread in the comments section in this post about using std::vector::reserve() vs. std::vector::resize().
Here is the original code:
void MyClass::my_method()
{
my_member.reserve(n_dim);
for(int k = 0 ; k < n_dim ; k++ )
my_member[k] = k ;
}
I believe that to write elements in the vector, the correct thing to do is to call std::vector::resize(), not std::vector::reserve().
In fact, the following test code "crashes" in debug builds in VS2010 SP1:
#include <vector>
using namespace std;
int main()
{
vector<int> v;
v.reserve(10);
v[5] = 2;
return 0;
}
Am I right, or am I wrong? And is VS2010 SP1 right, or is it wrong?
There are two different methods for a reason:
std::vector::reserve will allocate the memory but will not resize your vector, which will have a logical size the same as it was before.
std::vector::resize will actually modify the size of your vector and will fill any space with objects in their default state. If they are ints, they will all be zero.
After reserve, in your case, you will need a lot of push_backs to write to element 5.
If you don't wish to do that then in your case you should use resize.
One thing about reserve: if you then add elements with push_back, until you reach the capacity you have reserved, any existing references, iterators or pointers to data in your vector will remain valid. So if I reserve 1000 and my size is 5, the &vec[4] will remain the same until the vector has 1000 elements. After that, I can call push_back() and it will work, but the stored pointer of &vec[4] earlier may no longer be valid.
It depends on what you want to do. reserve does not add any
elements to the vector; it only changes the capacity(), which
guarantees that adding elements will not reallocate (and e.g.
invalidate iterators). resize adds elements immediately. If you want
to add elements later (insert(), push_back()), use reserve. If you
want to access elements later (using [] or at()), use resize. So
youre MyClass::my_method can be either:
void MyClass::my_method()
{
my_member.clear();
my_member.reserve( n_dim );
for ( int k = 0; k < n_dim; ++ k ) {
my_member.push_back( k );
}
}
or
void MyClass::my_method()
{
my_member.resize( n_dim );
for ( int k = 0; k < n_dim; ++ k ) {
my_member[k] = k;
}
}
Which one you chose is a question of taste, but the code you quote is
clearly incorrect.
There probably should be a discussion about when both methods are called with a number that's LESS than the current size of the vector.
Calling reserve() with a number smaller than the capacity will not affect the size or the capacity.
Calling resize() with a number smaller than current size the container will be reduced to that size effectively destroying the excess elements.
To sum up resize() will free up memory whereas reserve() will not.
Yes you’re correct, Luchian just made a typo and is probably too coffee-deprived to realise his mistake.
resize actually changes the amount of elements in the vector, new items are default constructed if the resize causes the vector to grow.
vector<int> v;
v.resize(10);
auto size = v.size();
in this case size is 10.
reserve on the other hand only requests that the internal buffer be grown to the specified size but does not change the "size" of the array, only its buffer size is changed.
vector<int> v;
v.reserve(10);
auto size = v.size();
in this case size is still 0.
So to answer your question, yes you are right, even if you reserve enough space you are still accessing uninitialized memory with the index operator. With an int thats not so bad but in the case of a vector of classes you would be accessing objects which have not been constructed.
Bounds checking of compilers set to debug mode can obviously be confused by this behavior which may be why you are experiencing the crash.

How Physically are Arrays Stored (Specifically with dimensions greater than 2)?

What I Know
I know that arrays int ary[] can be expressed in the equivalent "pointer-to" format: int* ary. However, what I would like to know is that if these two are the same, how physically are arrays stored?
I used to think that the elements are stored next to each other in the ram like so for the array ary:
int size = 5;
int* ary = new int[size];
for (int i = 0; i < size; i++) { ary[i] = i; }
This (I believe) is stored in RAM like: ...[0][1][2][3][4]...
This means we can subsequently replace ary[i] with *(ary + i) by just increment the pointers' location by the index.
The Issue
The issue comes in when I am to define a 2D array in the same way:
int width = 2, height = 2;
Vector** array2D = new Vector*[height]
for (int i = 0; i < width; i++) {
array2D[i] = new Vector[height];
for (int j = 0; j < height; j++) { array2D[i][j] = (i, j); }
}
Given the class Vector is for me to store both x, and y in a single fundamental unit: (x, y).
So how exactly would the above be stored?
It cannot logically be stored like ...[(0, 0)][(1, 0)][(0, 1)][(1, 1)]... as this would mean that the (1, 0)th element is the same as the (0, 1)th.
It cannot also be stored in a 2d array like below, as the physical RAM is a single 1d array of 8 bit numbers:
...[(0, 0)][(1, 0)]...
...[(0, 1)][(1, 1)]...
Neither can it be stored like ...[&(0, 0)][&(1, 0)][&(0, 1)][&(1, 1)]..., given &(x, y) is a pointer to the location of (x, y). This would just mean each memory location would just point to another one, and the value could not be stored anywhere.
Thank you in advanced.
What OP is struggling with a dynamically allocated array of pointers to dynamically allocated arrays. Each of these allocations is its own block of memory sitting somewhere in storage. There is no connection between them other than the logical connection established by the pointers in the outer array.
To try to visualize this say we make
int ** twodee;
twodee = new int*[4];
for (int i = 0; i < 4; i++)
{
twodee[i] = new int[4];
}
and then
int count = 1;
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
twodee[i][j] = count++;
}
}
so we should wind up with twodee looking something like
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
right?
Logically, yes. But laid out in memory twodee might look something like this batsmurph crazy mess:
You can't really predict where your memory will be, you're at the mercy of the whatever memory manager handles the allocations and what already in storage where it might have been efficient for your memory to go. This makes laying dynamically-allocated multi-dimensional arrays out in your head almost a waste of time.
And there are a whole lot of things wrong with this when you get down into the guts of what a modern CPU can do for you. The CPU has to hop around a lot, and when it's hopping, it's ability to predict and preload the cache with memory you're likely to need in the near future is compromised. This means your gigahertz computer has to sit around and wait on your megahertz RAM a lot more than it should have to.
Try to avoid this whenever possible by allocating single, contiguous blocks of memory. You may pick up a bit of extra code mapping one dimensional memory over to other dimensions, but you don't lose any CPU time. C++ will have generated all of that mapping math for you as soon as you compiled [i][j] anyway.
The short answer to your question is: It is compiler dependent.
A more helpful answer (I hope) is that you can create 2D arrays that are layed out directly in memory, or you can create "2D arrays" that are actually 1D arrays, some with data, some with pointers to arrays.
There is a convention that the compiler is happy to generate the right kind of code to dereference and/or calculate the address of an element within an array when you use brackets to access an element in the array.
Generally arrays that are known to be 2D at compile time (eg int array2D[a][b]) will be layed out in memory without extra pointers and the compiler knows to multiply AND add to get an address each time there is an access. If your compiler isn't good at optimizing out the multiply, it makes repeated accesses much slower than they can be, so in the old days we often did pointer math ourselves to avoid the multiply if possible.
There is the issue that a compiler might optimize by rounding the lower dimension size up to a power of two, so a shift can be used instead of multiply, which would then require padding the locations (then even though they are all in one memory block, there are meaningless holes).
(Also, I'm pretty sure I've run into the problem that within a procedure, it needs to know which way the 2D array really is, so you may need to declare parameters in a way that lets the compiler know how to code the procedure, eg a[][] is different from *a[]). And obviously you can actually get the pointer from the array of pointers, if that is what you want--which isn't the same thing as the array it points too, of course.
In your code, you have clearly declared a full set of the lower dimension 1D arrays (inside the loop), and you have ALSO declared another 1D array of pointers you use to get to each one without a mulitply--instead by a dereference. So all those things will be in memory. Each 1D array will surely be sequentially layed out in a contiguous block of memory. It is just that it is entirely up to the memory manager as to where those 1D arrays are, relative to each other. (I doubt a compiler is smart enough to actually do the "new" ops at compile time, but it is theoretically possible, and would obviously affect/control the behavior if it did.)
Using the extra array of pointers clearly avoids the multiply ever and always. But it takes more space, and for sequential access actually makes the accesses slower and bigger (the extra dereference) versus maintaining a single pointer and one dereference.
Even if the 1D arrays DO end up contiguous sometimes, you might break it with another thread using the same memory manager, running a "new" while your "new" inside the loop is repeating.

Assigning vector size vs reserving vector size

bigvalue_t result;
result.assign(left.size() + right.size(), 0);
int carry = 0;
for(size_t i = 0; i < left.size(); i++) {
carry = 0;
for(size_t j = 0; j < right.size(); j++) {
int sum = result[i+j] + (left[i]*right[j]) + carry;
result[i+j] = sum%10;
carry = sum/10;
}
result[i+right.size()] = carry;
}
return result;
Here I used assign to allocate size of result, and result passed back normally.
When I use result.reserve(left.size()+right.size());, the function runs normally inside the both for loops. Somehow when I use print out the result.size(), it is always 0. Does reserve not allocate any space?
It is specified as
void reserve(size_type n);
Effects: A directive that informs a
vector of a planned change in size, so that it can manage the storage
allocation accordingly. After reserve(), capacity() is greater or
equal to the argument of reserve if reallocation happens; and equal to
the previous value of capacity() otherwise. Reallocation happens at
this point if and only if the current capacity is less than the
argument of reserve(). If an exception
is thrown other than by the move constructor of a non-CopyInsertable type, there are no effects.
Complexity: It does not change the size of the sequence and takes at
most linear time in the size of the sequence.
So, yes, it allocates memory, but it doesn't create any objects within the container. To actually create as much elements in the vector as you want to have later, and being able to access them via op[] you need to call resize().
reserve() is for when you want to prevent things like the vector reallocation every now and then when doing lots of push_back()s.
reserve allocates space, but doesn't really create anything. It is used in order to avoid reallocations.
For, example, if you intend to store 10000 elements, by push_back into a vector, you probably will make the vector to use re-allocations. If you use reserve before actually storing your elements, then the vector is prepared to accept about 10000 elements, thus he is prepared and the fill of the vector shall happen faster, than if you didn't use reserve.
resize, actually creates space. Note also, that resize will initialize your elements to their default values (so for an int, it will set every element to 0).
PS - In fact, when you say reserve(1000), then the vector will actually -may- allocate space for more than 1000 elements. If this happens and you store exactly 1000 elements, then the unused space remains unused (it is not de-allocated).
It is the difference between semantically increasing the size of the vector (resize/assign/push_back/etc), and physically creating more underlying memory for it to expand into (reserve).
That you see your code appear to work even with reserve is just because you're not triggering any OS memory errors (because the memory belongs to your vector), but just because you don't see any error messages or crashes doesn't mean your code is safe or correct: as far as the vector is concerned, you are writing into memory that belongs to it and not you.
If you'd used .at() instead of [] you'd have got an exception; as it is, you are simply invoking undefined behaviour.

Advantage of STL resize()

The resize() function makes vector contain the required number of elements. If we require less elements than vector already contain, the last ones will be deleted. If we ask vector to grow, it will enlarge its size and fill the newly created elements with zeroes.
vector<int> v(20);
for(int i = 0; i < 20; i++) {
v[i] = i+1;
}
v.resize(25);
for(int i = 20; i < 25; i++) {
v[i] = i*2;
}
But if we use push_back() after resize(), it will add elements AFTER the newly allocated size, but not INTO it. In the example above the size of the resulting vector is 25, while if we use push_back() in a second loop, it would be 30.
vector<int> v(20);
for(int i = 0; i < 20; i++) {
v[i] = i+1;
}
v.resize(25);
for(int i = 20; i < 25; i++) {
v.push_back(i*2); // Writes to elements with indices [25..30), not [20..25) ! <
}
Then where is the advantage of resize() function ? Doesn't it creates a confusion for indexing and accessing elements from the vector ?
It sounds as though you should be using vector::reserve.
vector::resize is used to initialize the newly created space with a given value (or just the default.) The second parameter to the function is the initialization value to use.
Remember the alternative - reserve. resize is used when you want to act on the vector using the [] operator -- hence you need a "empty" table of elements. resize is not intended to be used with push_back. Use reserve if you want to prepare the array for push_back.
Resize is mainly usefull if the array has meaningful "empty" constructor, when you can create an array of empty elements, and only change the ones that are meaningful.
The resize() method changes the vector's size, which is not the same as the vector's capacity.
It is very important to understand the distinction between these two values:
The size is the number of actual elements that the vector contains.
The capacity is the maximum number of elements that the vector could contain without reallocating a larger chunk of memory.
A vector's capacity is always larger or equal to its size. A vector's capacity never shrinks, even when you reduce its size, with one exception: when you use swap() to exchange the contents with another vector. And as others have mentioned, you can increase a vector's capacity by calling reserve().
I think that using the correct terminology for size and capacity makes it easier to understand the C++ vector class and to speak clearly about its behavior.
resize() function changes the actual content of the vector by inserting or erasing elements from the vector. It does not only change its storage capacity. To direct a change only in storage capacity, use vector::reserve instead. Have a look at the vector visualization in the link, notice where v.back is pointing to.
I don't really understand the confusion. The advantage of resize is that it resizes your vector. Having to do a loop of push_backs is both tedious and may require more than one "actual" resize.
If you want to "resize" your vector without changing its accessible indexes then use std::vector<T>::reserve. That will change the size of the internal allocated array without actually "adding" anything.