basic question on std::vector in C++ - c++

C++ textbooks, and threads, like these say that vector elements are physically contiguous in memory.
But when we do operations like v.push_back(3.14) I would assume the STL is using the new operator to get more memory to store the new element 3.14 just introduced into the vector.
Now say the vector of size 4 is stored in computer memory cells labelled 0x7, 0x8, 0x9, 0xA. If cell 0xB contains some other unrelated data, how will 3.14 go into this cell? Does that mean cell 0xB will be copied somewhere else, erased to make room for 3.14?

The short answer is that the entire array holding the vector's data is moved around to a location where it has space to grow. The vector class reserves a larger array than is technically required to hold the number of elements in the vector. For example:
vector< int > vec;
for( int i = 0; i < 100; i++ )
vec.push_back( i );
cout << vec.size(); // prints "100"
cout << vec.capacity(); // prints some value greater than or equal to 100
The capacity() method returns the size of the array that the vector has reserved, while the size() method returns the number of elements in the array which are actually in use. capacity() will always return a number larger than or equal to size(). You can change the size of the backing array by using the reserve() method:
vec.reserve( 400 );
cout << vec.capacity(); // returns "400"
Note that size(), capacity(), reserve(), and all related methods refer to individual instances of the type that the vector is holding. For example, if vec's type parameter T is a struct that takes 10 bytes of memory, then vec.capacity() returning 400 means that the vector actually has 4000 bytes of memory reserved (400 x 10 = 4000).
So what happens if more elements are added to the vector than it has capacity for? In that case, the vector allocates a new backing array (generally twice the size of the old array), copies the old array to the new array, and then frees the old array. In pseudo-code:
if(capacity() < size() + items_added)
{
size_t sz = capacity();
while(sz < size() + items_added)
sz*=2;
T* new_data = new T[sz];
for( int i = 0; i < size(); i++ )
new_data[ i ] = old_data[ i ];
delete[] old_data;
old_data = new_data;
}
So the entire data store is moved to a new location in memory that has enough space to store the current data plus a number of new elements. Some vectors may also dynamically decrease the size of their backing array if they have far more space allocated than is actually required.

std::vector first allocates a bigger buffer, then copies existing elements from the "old" buffer to the "new" buffer, then it deletes the "old buffer", and finally adds the new element into the "new" buffer.
Generally, std::vector implementation grow their internal buffer by doubling the capacity each time it's necessary to allocate a bigger buffer.
As Chris mentioned, every time the buffer grows, all existing iterators are invalidated.

When std::vector allocates memory for the values, it allocates more than it needs; you can find out how much by calling capacity. When that capacity is used up, it allocates a bigger chunk, again larger than it needs, and copies everything from the old memory to the new; then it releases the old memory.

If there is not enough space to add the new element, more space will be allocated (as you correctly indicated), and the old data will be copied to the new location. So cell 0xB will still contain the old value (as it might have pointers to it in other places, it is impossible to move it without causing havoc), but the whole vector in question will be moved to the new location.

A vector is an array of memory. Typical implementation is that it grabs more memory than is required. It that footprint needs to expand over any other memory - the whole lot is copied. The old stuff is freed. The vector memory is on the stack - and that should be noted. It is also a good idea to say the maximum size is required.

In C++, which comes from C, memory is not 'managed' the way you describe - Cell 0x0B's contents will not be moved around. If you did that, any existing pointers would be made invalid! (The only way this could be possible is if the language had no pointers and used only references for similar functionality.)
std::vector allocates a new, larger buffer and stores the value of 3.14 to the "end" of the buffer.
Usually, though, for optimized this->push_back()s, a std::vector allocates memory about twice its this->size(). This ensures that a reasonable amount of memory is exchanged for performance. So, it is not guaranteed 3.14 will cause a this->resize(), and may simply be put into this->buffer[this->size()++] if and only if this->size() < this->capacity().

Related

Dynamically allocating memory for changing array size starting with unknown size C++

How do I dynamically allocate an array where the size will be changing because the stuff stored in the array will be read from a file. There are lots of suggestions on using a vector, but I want to know how to do it the array way.
I know for memory allocation it is
int count;
int *n = new int[count];
Say the variable count is going to increment in a loop. How would I change the size of the array?
Also, what if we did it using malloc?
Don't try to make the array allocation exactly follow the continual changing size requirements of what you are going to store. Consider using the traditional 2*N multiple. When array is full, reallocate by growing by 2*N (allocate a new array twice as large), and copy items over. This amortizes the reallocation cost logarithmically.
Keep in mind that this logic you are setting out to implement with low level arrays is exactly why vector exists. You are not likely to implement your own as efficiently, or as bug free.
But if you are set on it, keep count a multiple of 2, starting with something realistic (or the nearest multiple of 2 rounded up)
You may keep two pointers, p and q(placeholder), when count changes, you need to do a fresh allocation for p, before that earlier allocations need to be deallocated, even before that the contents of earlier p should be transferred to new p as well.
int count, oldcount;
int *p = NULL;
int *q;
p = new int[count];
oldcount = count;
when you need to re-allocate:
q = new int[count];
memcpy(q, p, oldcount * sizeof(int)); // OR for (int i = 0; i < oldcount; i++) q[i] = p[i];
delete [] p;
p = q;
oldcount = count; // for use later
If you use malloc, calloc then you need to use as number of bytes to pass in malloc. but not needed with new and delete operators in C++
How would I change the size of the array?
Using new: You can't. The size of an object (here, an array object) can't change at runtime.
You would have to create a new array with the appropriate size, copy all elements from the old into the new array and destroy the old one.
To avoid many reallocations you should always allocate more than you need. Keep track of the size (the amount of elements currently in use) and the capacity (the actual size of the allocated array). Once you want to increase the size, check whether there is still some memory left (size<capacity) and use that if possible; otherwise, apply the aforementioned method.
And that's exactly what vector does for you: But with RAII and all the convenience possible.

Assigning vector size vs reserving vector size

bigvalue_t result;
result.assign(left.size() + right.size(), 0);
int carry = 0;
for(size_t i = 0; i < left.size(); i++) {
carry = 0;
for(size_t j = 0; j < right.size(); j++) {
int sum = result[i+j] + (left[i]*right[j]) + carry;
result[i+j] = sum%10;
carry = sum/10;
}
result[i+right.size()] = carry;
}
return result;
Here I used assign to allocate size of result, and result passed back normally.
When I use result.reserve(left.size()+right.size());, the function runs normally inside the both for loops. Somehow when I use print out the result.size(), it is always 0. Does reserve not allocate any space?
It is specified as
void reserve(size_type n);
Effects: A directive that informs a
vector of a planned change in size, so that it can manage the storage
allocation accordingly. After reserve(), capacity() is greater or
equal to the argument of reserve if reallocation happens; and equal to
the previous value of capacity() otherwise. Reallocation happens at
this point if and only if the current capacity is less than the
argument of reserve(). If an exception
is thrown other than by the move constructor of a non-CopyInsertable type, there are no effects.
Complexity: It does not change the size of the sequence and takes at
most linear time in the size of the sequence.
So, yes, it allocates memory, but it doesn't create any objects within the container. To actually create as much elements in the vector as you want to have later, and being able to access them via op[] you need to call resize().
reserve() is for when you want to prevent things like the vector reallocation every now and then when doing lots of push_back()s.
reserve allocates space, but doesn't really create anything. It is used in order to avoid reallocations.
For, example, if you intend to store 10000 elements, by push_back into a vector, you probably will make the vector to use re-allocations. If you use reserve before actually storing your elements, then the vector is prepared to accept about 10000 elements, thus he is prepared and the fill of the vector shall happen faster, than if you didn't use reserve.
resize, actually creates space. Note also, that resize will initialize your elements to their default values (so for an int, it will set every element to 0).
PS - In fact, when you say reserve(1000), then the vector will actually -may- allocate space for more than 1000 elements. If this happens and you store exactly 1000 elements, then the unused space remains unused (it is not de-allocated).
It is the difference between semantically increasing the size of the vector (resize/assign/push_back/etc), and physically creating more underlying memory for it to expand into (reserve).
That you see your code appear to work even with reserve is just because you're not triggering any OS memory errors (because the memory belongs to your vector), but just because you don't see any error messages or crashes doesn't mean your code is safe or correct: as far as the vector is concerned, you are writing into memory that belongs to it and not you.
If you'd used .at() instead of [] you'd have got an exception; as it is, you are simply invoking undefined behaviour.

Control over std::vector reallocation

By reading the std::vector reference I understood that
calling insert when the the maximum capacity is reached will cause the reallocation of the std::vector (causing iterator invalidation) because new memory is allocated for it with a bigger capacity. The goal is to keep the guarantee about contiguous data.
As long as I stick below the maximum capacity insert will not cause that (and iterators will be intact).
My question is the following:
When reserve is called automatically by insert, is there any way to control how much new memory must be reserved?
Suppose that I have a vector with an initial capacity of 100 and, when the maximum capacity is hit, I want to allocate an extra 20 bytes.
Is it possible to do that?
You can always track it yourself and call reserve before it would allocate, e.g.
static const int N = 20 // Amount to grow by
if (vec.capacity() == vec.size()) {
vec.reserve(vec.size() + N);
}
vec.insert(...);
You can wrap this in a function of your own and call that function instead of calling insert() directly.

STL's vector resizing

I can't find this piece of information. I'm dealing with an odd situation here where i'm inside of a loop and i can get a random information at any given time. This information has to be stored in a vector. Now each frame i have to set this vector to ensure that i won't exeed the space (i'm writing values into random points in the vector using indexing).
Now assuming there's no way to change this piece of code, i want to know, does the vector "ignore" the resize() function if i send an argument that's exactly the size of the vector? Where can i find this information?
From MSDN reference1
If the container's size is less than the requested size, _Newsize, elements are added to the vector until it reaches the requested size. If the container's size is larger than the requested size, the elements closest to the end of the container are deleted until the container reaches the size _Newsize. If the present size of the container is the same as the requested size, no action is taken
The ISO C++ standard (page 485 2) specifies this behaviour for vector::resize
void resize ( size_type sz , T c = T ());
if ( sz > size ())
insert ( end () , sz - size () , c );
else if ( sz < size ())
erase ( begin ()+ sz , end ());
else
; // Does nothing
So yes, the vector ignores it and you don't need to perform a check on your own.
Kinda-sorta.
Simply resizing a vector with resize() can only result in more memory being used by the vector itself (will change how much is used by its elements). If there's not enough room in the reserved space, it will reallocate (and sometimes they like to pad themselves a bit so even if there is you might grow). If there is already plenty of room for the requested size and whatever padding it wants to do, it will not regrow.
When the specification says that the elements past the end of the size will be deleted, it means in place. Basically it will call _M_buff[i].~T() for each element it is deleting. Thus any memory your object allocates will be deleted, assuming a working destructor, but the space that the object itself occupies (it's size) will not be. The vector will grow, and grow, and grow to the maximum size you ever tell it to and will not reshrink while it exists.

How to expand an array dynamically in C++? {like in vector }

Lets say, i have
int *p;
p = new int[5];
for(int i=0;i<5;i++)
*(p+i)=i;
Now I want to add a 6th element to the array. How do I do it?
You have to reallocate the array and copy the data:
int *p;
p = new int[5];
for(int i=0;i<5;i++)
*(p+i)=i;
// realloc
int* temp = new int[6];
std::copy(p, p + 5, temp); // Suggested by comments from Nick and Bojan
delete [] p;
p = temp;
You cannot. You must use a dynamic container, such as an STL vector, for this. Or else you can make another array that is larger, and then copy the data from your first array into it.
The reason is that an array represents a contiguous region in memory. For your example above, let us say that p points to address 0x1000, and the the five ints correspond to twenty bytes, so the array ends at the boundary of 0x1014. The compiler is free to place other variables in the memory starting at 0x1014; for example, int i might occupy 0x1014..0x1018. If you then extended the array so that it occupied four more bytes, what would happen?
If you allocate the initial buffer using malloc you can use realloc to resize the buffer. You shouldn't use realloc to resize a new-ed buffer.
int * array = (int*)malloc(sizeof(int) * arrayLength);
array = (int*)realloc(array, sizeof(int) * newLength);
However, this is a C-ish way to do things. You should consider using vector.
Why don't you look in the sources how vector does that? You can see the implementation of this mechanism right in the folder your C++ include files reside!
Here's what it does on gcc 4.3.2:
Allocate a new contiguous chunk of memory with use of the vector's allocator (you remember that vector is vector<Type, Allocator = new_allocator>?). The default allocator calls operator new() (not just new!) to allocate this chunk, letting himself thereby not to mess with new[]/delete[] stuff;
Copy the contents of the existing array to the newly allocated one;
Dispose previously aligned chunk with the allocator; the default one uses operator delete().
(Note, that if you're going to write your own vector, your size should increase "M times", not "by fixed amount". This will let you achieve amortized constant time. For example, if, upon each excession of the size limit, your vector grows twice, each element will be copied on average once.)
Same as others are saying, but if you're resizing the array often, one strategy is to resize the array each time by doubling the size. There's an expense to constantly creating new and destroying old, so the doubling theory tries to mitigate this problem by ensuring that there's sufficient room for future elements as well.