How is dynamic memory managed in std::vector? - c++

How does std::vector implement the management of the changing number of elements: Does it use realloc() function, or does it use a linked list?
Thanks.

It uses the allocator that was given to it as the second template parameter. Like this then. Say it is in push_back, let t be the object to be pushed:
...
if(_size == _capacity) { // size is never greater than capacity
// reallocate
T * _begin1 = alloc.allocate(_capacity * 2, 0);
size_type _capacity1 = _capacity * 2;
// copy construct items (copy over from old location).
for(size_type i=0; i<_size; i++)
alloc.construct(_begin1 + i, *(_begin + i));
alloc.construct(_begin1 + _size, t);
// destruct old ones. dtors are not allowed to throw here.
// if they do, behavior is undefined (17.4.3.6/2)
for(size_type i=0;i<_size; i++)
alloc.destroy(_begin + i);
alloc.deallocate(_begin, _capacity);
// set new stuff, after everything worked out nicely
_begin = _begin1;
_capacity = _capacity1;
} else { // size less than capacity
// tell the allocator to allocate an object at the right
// memory place previously allocated
alloc.construct(_begin + _size, t);
}
_size++; // now, we have one more item in us
...
Something like that. The allocator will care about allocating memory. It keeps the steps of allocating memory and constructing object into that memory apart, so it can preallocate memory, but not yet call constructors. During reallocate, the vector has to take care about exceptions being thrown by copy constructors, which complicates the matter somewhat. The above is just some pseudo code snippet - not real code and probably contains many bugs. If the size gets above the capacity, it asks the allocator to allocate a new greater block of memory, if not then it just constructs at the previously allocated space.
The exact semantics of this depend on the allocator. If it is the standard allocator, construct will do
new ((void*)(_start + n)) T(t); // known as "placement new"
And the allocate allocate will just get memory from ::operator new. destroy would call the destructor
(_start + n)->~T();
All that is abstracted behind the allocator and the vector just uses it. A stack or pooling allocator could work completely different. Some key points about vector that are important
After a call to reserve(N), you can have up to N items inserted into your vector without risking a reallocation. Until then, that is as long as size() <= capacity(), references and iterators to elements of it remain valid.
Vector's storage is contiguous. You can treat &v[0] as a buffer containing as many elements you have currently in your vector.

One of the hard-and-fast rules of vectors is that the data will be stored in one contiguous block of memory.
That way you know you can theoretically do this:
const Widget* pWidgetArrayBegin = &(vecWidget[0]);
You can then pass pWidgetArrayBegin into functions that want an array as a parameter.
The only exception to this is the std::vector<bool> specialisation. It actually isn't bools at all, but that's another story.
So the std::vector will reallocate the memory, and will not use a linked list.
This means you can shoot yourself in the foot by doing this:
Widget* pInteresting = &(vecWidget.back());
vecWidget.push_back(anotherWidget);
For all you know, the push_back call could have caused the vector to shift its contents to an entirely new block of memory, invalidating pInteresting.

The memory managed by std::vector is guaranteed to be continuous, such that you can treat &vec[0] as a pointer to the beginning of a dynamic array.
Given this, how it actually manages it's reallocations is implementation specific.

std::vector stored data in contiguous memory blocks.
Suppose we declare a vector as
std::vector intvect;
So initially a memory of x elements will be created . Here x is implementation depended.
If user is inserting more than x elements than a new memory block will be created of 2x (twice the size)elements and initial vector is copied into this memory block.
Thats why it is always recommended to reserve memory for vector by calling reserve
function.
intvect.reserve(100);
so as to avoid deletion and copying of vector data.

Related

Use a ::std::vector for array creation

I want to grow a ::std::vector at runtime, like that:
::std::vector<int> i;
i.push_back(100);
i.push_back(10);
At some point, the vector is finished, and i do not need the extra functionality ::std::vector provides any more, so I want to convert it to a C-Array:
int* v = i.data();
Because I will do that more than once, I want to deallocate all the heap memory ::std::vector reserved, but I want to keep the data, like that (pseudo-code):
free(/*everything from ::std::vector but the list*/);
Can anybody give me a few pointers on that?
Thanks in advance,
Jack
In all the implementations I have seen, the allocated part of a vector is... the array itself. If you know that it will no longer grow, you can try to shrink it, to release possibly unused memory:
i.shrink_to_fit();
int* v = i.data();
Of course, nothing guarantees that the implementation will do anything, but the only foolproof alternative would be to allocated a new array, move data from the vector to the array and clear the vector.
int *v = new int[i.size];
memcpy(v, i.data, i.size * sizeof(int));
i.clear();
But I really doubt that you will have a major gain that way...
You can use the contents of a std::vector as a C array without the need to copy or release anything. Just make sure that std::vector outlives the need for your pointer to be valid (and that further modifications which could trigger a reallocation are done on the vector itself).
When you obtain a pointer to internal storage through data() you're not actually allocating anything, just referring to the already allocated memory.
The only additional precaution you could use to save memory is to use shrink_to_fit() to release any excess memory used as spare capacity (though it's not guaranteed).
You have two options:
Keep data in vector, but call shrink_to_fit. All overhead you'll have - is an extra pointer (to the vector end). It is available since C++ 11
Copy data to an external array and destroy the vector object:
Here is the example:
std::vector<int> vec;
// fill vector
std::unique_ptr<int[]> arr(new int[vec.size()]);
std::copy(vec.begin(), vec.end(), arr.get());

std::vector::assign - reallocates the data?

I am working with STL library and my goal is to minimize the data reallocation cases.
I was wndering, does
std::vector::assign(size_type n, const value_type& val)
reallocated the data if the size is not changed or does is actually just assign the new values (for example, using operator=) ?
The STL documentation at http://www.cplusplus.com/ sais the following (C++98):
In the fill version (2), the new contents are n elements, each initialized to a copy of val.
If a reallocation happens,the storage needed is allocated using the internal allocator.
Any elements held in the container before the call are destroyed and replaced by newly constructed elements (no assignments of elements take place).
This causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.
The phrase "no assignments of elements take place" make it all a little confusing.
So for example, I want to have a vector of classes (for example, cv::Vec3i of OpenCV). Does this mean, that
the destructor or constructor of cv::Vec3i will be called?
a direct copy of Vec3i memory will be made and fills the vector?
what happens, if my class allocates memory at run time with new
operator? This memory cannot be accounted for by plain memory
copying. Does it mean, that assign() should not be used for such
objects?
EDIT: the whole purpose of using assign in this case is to set all values in the vector to 0 (in case I have std::vector< cv::Vec3i > v). It will be done many-many times. The size of std::vector itself will not be changed.
what i want to do (in a shorter way) is the following:
for(int i=0; i<v.size(); i++)
for(int j=0; j<3; j++)
v[i][j] = 0;
right now I am interested in C++98
std::vector.assign(...) does not reallocate the vector if it does not have to grow it. Still, it must copy the actual element.
If you want to know what the standard guarantees, look in the standard: C++11 standard plus minor editorial changes.
I assume that you have a vector filled with some data and you call an assign on it, this will:
destroy all the elements of the vector (their destructor is called) like a call to clear()
fill the now-empty vector with n copies of the given object, which must have a copy constructor.
So if your class allocates some memory you have to:
take care of this in your copy constructor
free it in the destructor
Reallocations happen when the size exceed the allocated memory (vector capacity). You can prevent this with a call to reserve(). However I think that assign() is smart enough to allocate all the memory it needs (if this is more than the already allocated) before starting to fill the vector and after having cleared it.
You may want to avoid reallocations because of their cost, but if you are trying to avoid them because your objects cannot handle properly, then I strongly discourage you to put them in a vector.
The semantics of assign are defined the standard in a quite straightforward way:
void assign(size_type n, const T& t);
Effects:
erase(begin(), end());
insert(begin(), n, t);
This means that first the destructors of the elements will be called. The copies of t are made in what is now a raw storage left after the lifetime of the elements ended.
The requirements are that value_type is MoveAssignable (for when erase doesn't erase to the end of the container and needs to move elements towards the beginning).
insert overload used here requires that value_type is CopyInsertable and CopyAssignable.
In any case, vector is oblivious to how your class is managing its own resources. That's on you to take care of. See The Rule of Three.
As in the case of vector::resize method,
std::vector::assign(size_type n, const value_type& val)
will initialize each element to a copy of "val". I prefer to use resize as it minimizes the number of object instanciations/destructions, but it does the same. Use resize if you want to minimize the data realloc, but keep in mind the following:
While doing this is safe for certain data structures, be aware that assigning /pushing elements of a class containing pointers to dynamically allocated data (e.g. with a new in the constructor) may cause havoc.
If your class dynamically allocates data then you should reimplement the YourClass::operator= to copy the data to the new object instead of copying the pointer.
Hope that it helps!

C++ delete vector, objects, free memory

I am totally confused with regards to deleting things in C++. If I declare an array of objects and if I use the clear() member function. Can I be sure that the memory was released?
For example :
tempObject obj1;
tempObject obj2;
vector<tempObject> tempVector;
tempVector.pushback(obj1);
tempVector.pushback(obj2);
Can I safely call clear to free up all the memory? Or do I need to iterate through to delete one by one?
tempVector.clear();
If this scenario is changed to a pointer of objects, will the answer be the same as above?
vector<tempObject> *tempVector;
//push objects....
tempVector->clear();
You can call clear, and that will destroy all the objects, but that will not free the memory. Looping through the individual elements will not help either (what action would you even propose to take on the objects?) What you can do is this:
vector<tempObject>().swap(tempVector);
That will create an empty vector with no memory allocated and swap it with tempVector, effectively deallocating the memory.
C++11 also has the function shrink_to_fit, which you could call after the call to clear(), and it would theoretically shrink the capacity to fit the size (which is now 0). This is however, a non-binding request, and your implementation is free to ignore it.
There are two separate things here:
object lifetime
storage duration
For example:
{
vector<MyObject> v;
// do some stuff, push some objects onto v
v.clear(); // 1
// maybe do some more stuff
} // 2
At 1, you clear v: this destroys all the objects it was storing. Each gets its destructor called, if your wrote one, and anything owned by that MyObject is now released.
However, vector v has the right to keep the raw storage around in case you want it later.
If you decide to push some more things into it between 1 and 2, this saves time as it can reuse the old memory.
At 2, the vector v goes out of scope: any objects you pushed into it since 1 will be destroyed (as if you'd explicitly called clear again), but now the underlying storage is also released (v won't be around to reuse it any more).
If I change the example so v becomes a pointer to a dynamically-allocated vector, you need to explicitly delete it, as the pointer going out of scope at 2 doesn't do that for you. It's better to use something like std::unique_ptr in that case, but if you don't and v is leaked, the storage it allocated will be leaked as well. As above, you need to make sure v is deleted, and calling clear isn't sufficient.
vector::clear() does not free memory allocated by the vector to store objects; it calls destructors for the objects it holds.
For example, if the vector uses an array as a backing store and currently contains 10 elements, then calling clear() will call the destructor of each object in the array, but the backing array will not be deallocated, so there is still sizeof(T) * 10 bytes allocated to the vector (at least). size() will be 0, but size() returns the number of elements in the vector, not necessarily the size of the backing store.
As for your second question, anything you allocate with new you must deallocate with delete. You typically do not maintain a pointer to a vector for this reason. There is rarely (if ever) a good reason to do this and you prevent the vector from being cleaned up when it leaves scope. However, calling clear() will still act the same way regardless of how it was allocated.
if I use the clear() member function. Can I be sure that the memory was released?
No, the clear() member function destroys every object contained in the vector, but it leaves the capacity of the vector unchanged. It affects the vector's size, but not the capacity.
If you want to change the capacity of a vector, you can use the clear-and-minimize idiom, i.e., create a (temporary) empty vector and then swap both vectors.
You can easily see how each approach affects capacity. Consider the following function template that calls the clear() member function on the passed vector:
template<typename T>
auto clear(std::vector<T>& vec) {
vec.clear();
return vec.capacity();
}
Now, consider the function template empty_swap() that swaps the passed vector with an empty one:
template<typename T>
auto empty_swap(std::vector<T>& vec) {
std::vector<T>().swap(vec);
return vec.capacity();
}
Both function templates return the capacity of the vector at the moment of returning, then:
std::vector<double> v(1000), u(1000);
std::cout << clear(v) << '\n';
std::cout << empty_swap(u) << '\n';
outputs:
1000
0
Move semantics allows for a straightforward way to release memory, by simply applying the assignment (=) operator from an empty rvalue:
std::vector<int> vec(100, 0);
std::cout << vec.capacity(); // 100
vec = vector<int>(); // Same as "vector<int>().swap(vec)";
std::cout << vec.capacity(); // 0
It is as much efficient as the "swap()"-based method described in other answers (indeed, both are conceptually doing the same thing). When it comes to readability, however, the assignment version makes a better job.
You can free memory used by vector by this way:
//Removes all elements in vector
v.clear()
//Frees the memory which is not used by the vector
v.shrink_to_fit();
If you need to use the vector over and over again and your current code declares it repeatedly within your loop or on every function call, it is likely that you will run out of memory. I suggest that you declare it outside, pass them as pointers in your functions and use:
my_arr.resize()
This way, you keep using the same memory sequence for your vectors instead of requesting for new sequences every time.
Hope this helped.
Note: resizing it to different sizes may add random values. Pass an integer such as 0 to initialise them, if required.

Array of contiguous new objects

I am currently filling an vector array of elements like so:
std::vector<T*> elemArray;
for (size_t i = 0; i < elemArray.size(); ++i)
{
elemArray = new T();
}
The code has obviously been simplified. Now after asking another question (unrelated to this problem but related to the program) I realized I need an array that has new'd objects (can't be on the stack, will overflow, too many elements) but are contiguous. That is, if I were to receive an element, without the array index, I should be able to find the array index by doing returnedElement - elemArray[0] to get the index of the element in the array.
I hope I have explained the problem, if not, please let me know which parts and I will attempt to clarify.
EDIT: I am not sure why the highest voted answer is not being looked into. I have tried this many times. If I try allocating a vector like that with more than 100,000 (approximately) elements, it always gives me a memory error. Secondly, I require pointers, as is clear from my example. Changing it suddenly to not be pointers will require a large amount of code re-write (although I am willing to do that, but it still does not address the issue that allocating vectors like that with a few million elements does not work.
A std::vector<> stores its elements in a heap allocated array, it won't store the elements on the stack. So you won't get any stack overflow even if you do it the simple way:
std::vector<T> elemArray;
for (size_t i = 0; i < elemCount; ++i) {
elemArray.push_back(T(i));
}
&elemArray[0] will be a pointer to a (continuous) array of T objects.
If you need the elements to be contiguous, not the pointers, you can just do:
std::vector<T> elemArray(numberOfElements);
The elements themselves won't be on the stack, vector manages the dynamic allocation of memory and as in your example the elements will be value-initialized. (Strictly, copy-initialized from a value-initialized temporary but this should work out the same for objects that it is valid to store in a vector.)
I believe that your index calculation should be: &returnedElement - &elemArray[0] and this will work with a vector. Provided that returnedElement is actually stored in elemArray.
Your original loop should look something like this: (though it doesn't create objects in contiguous memory).
for (size_t i = 0; i < someSize ; ++i)
{
elemArray.push_back(new T());
}
And then you should know two basic things here:
elemArray.size() returns the number of elements elemArray currently holds. That means, if you use it in the for loop's condition, then your for loop would become an infinite loop, because you're adding elements to it, and so the vector's size would keep on increasing.
elemArray is a vector of T*, so you can store only T*, and to populate it, you've to use push_back function.
Considering your old code caused a stack overflow I think it probably looked like this:
Item items[2000000]; // stack overflow
If that is the case, then you can use the following syntax:
std::vector<Item> items(2000000);
This will allocate (and construct) the items contiguously on the heap.
While I also have the same requirements but for a different reason, mainly for being cache hot...(for small number of objects)
int maxelements = 100000;
T *obj = new T [100000];
vector<T *> vectorofptr;
for (int i = 0; i < maxelements; i++)
{
vectorofptr.push_back(&obj[i]);
}
or
int sz = sizeof(T);
int maxelements = 100000;
void *base = calloc(maxelements, sz); //need to save base for free()
vector<T *> vectorofptr;
int offset = 0;
for (int i = 0; i < maxelements; i++)
{
vectorofptr.push_back((T *) base + offset);
offset += sz;
}
Allocate a large chunk of memory and use placement new to construct your vector elements in it:
elemArray[i] = new (GetNextContinuousAddress()) T();
(assuming that you really need pointer to indidually new'ed objects in your array and are aware of the reasons why this is not recommended ..)
I know this is an old post but it might be useful intel for any who might need it at some point. Storing pointers in a sequence container is perfectly fine only if you have in mind a few important caveats:
the objects your stored pointers point to won't be contiguously aligned in memory since they were allocated (dynamically or not) elsewhere in your program, you will end up with a contiguous array of pointers pointing to data likely scattered across memory.
since STL containers work with value copy and not reference copy, your container won't take ownership of allocated data and thus will only delete the pointer objects on destruction and not the objects they point to. This is fine when those objects were not dynamically allocated, otherwise you will need to provide a release mechanism elsewhere on the pointed objects like simply looping through your pointers and delete them individually or using shared pointers which will do the job for you when required ;-)
one last but very important thing to remember is that if your container is to be used in a dynamic environment with possible insertions and deletions at random places, you need to make sure to use a stable container so your iterators/pointers to stored elements remain valid after such operations otherwise you will end up with undesirable side effects... This is the case with std::vector which will release and reallocate extra space when inserting new elements and then perform a copy of the shifted elements (unless you insert at the end), thus invalidating element pointers/iterators (some implementations provide stability though, like boost::stable_vector, but the penalty here is that you lose the contiguous property of the container, magic does not exist when programming, life is unfair right? ;-))
Regards

Resizing Arrays - Difference between two execution blocks?

I have a function which grows an array when trying to add an element if it is full. Which of the execution blocks is better or faster?
I think my second block (commented out) may be wrong, because after doubling my array I then go back and point to the original.
When creating arrays does the compiler look for a contiguous block in memory which it entirely fits into? (On the stack/heap? I don't fully understand which, though it is important for me to learn it is irrelevant to the actual question.)
If so, would this mean using the second block could potentially overwrite other information by overwriting adjacent memory? (Since the original would use 20 adjacent blocks of memory, and the latter 40.)
Or would it just mean the location of elements in my array would be split, causing poor performance?
void Grow()
{
length *= 2; // double the size of our stack
// create temp pointer to this double sized array
int* tempStack = new int[length];
// loop the same number of times as original size
for(int i = 0; i < (length / 2); i++)
{
// copy the elements from the original array to the temp one
tempStack[i] = myStack[i];
}
delete[] myStack; //delete the original pointer and free the memory
myStack = tempStack; //make the original point to the new stack
//Could do the following - but may not get contiguous memory block, causing
// overwritten >data
#if 0
int* tempStack = myStack; //create temp pointer to our current stack
delete[] myStack; //delete the original pointer and free memory
myStack = new int[length *= 2]; //delete not required due to new?
myStack = tempStack;
#endif
}
The second block wouldn't accomplish what you want at all.
When you do
myStack = new int[length *= 2];
then the system will return a pointer to wherever it happens to allocate the new, larger array.
You then reassign myStack to the old location (which you've already de-allocated!), which means you're pointing at memory that's not allocated (bad!) and you've lost the pointer to the new memory you just allocated (also bad!).
Edit: To clarify, your array will be allocated on the heap. Additionally, the (new) pointer returned by your larger array allocation (new int[foo]) will be a contiguous block of memory, like the old one, just probably in a different location. Unless you go out of bounds, don't worry about "overwriting" memory.
Your second block is incorrect because of this sequence:
int* tempStack = myStack; //create temp pointer to our current stack
delete[] myStack; //delete the original pointer and free memory
tempStack and myStack and both simply pointers to the same block of memory. When you delete[] the pointer in the second line, you no longer have access to that memory via either pointer.
Using C++ memory management, if you want to grow an array, you need to create a new array before you delete the old one and copy over the values yourself.
That said, since you are working with POD, you could use C style memory management which supports directly growing an array via realloc. That can be a bit more efficient if the memory manager realizes it can grow the buffer without moving it (although if it can't grow the buffer in place, it will fall back on the way you grow your array in your first block).
C style memory management is only okay for arrays of POD. For non-POD, you must do the create new array/copy/delete old array technique.
This doesn't exactly answer your question, but you shouldn't be doing either one. Generally new[] or delete[] should be avoided in favor of using std::vector. new[] is hard to use because it requires explicit memory management (if an exception is thrown as the elements are being copied, you will need to catch the exception and delete the array to avoid a memory leak). std::vector takes care of this for you, automatically grows itself, and is likely to have an efficient implementation tuned by the vendor.
One argument for using explicit arrays is to have a contiguous block of memory that can be passed to C functions, but that also can be done with std::vector for any non-pathological implementation (and the next version of the C++ standard will require all conforming implementations to support that). (For reference, see http://www.gotw.ca/publications/mill10.htm by Herb Sutter, former convener of the ISO C++ standards committee.)
Another argument against std::vector is the weirdness with std::vector<bool>, but if you need that you can simply use std::vector<char> or std::vector<int> instead. (See: http://www.gotw.ca/publications/mill09.htm)