std::vector internals - c++

How is std::vector implemented, using what data structure? When I write
void f(int n) {
std::vector<int> v(n);
...
}
Is the vector v allocated on stack?

The vector object will be allocated on the stack and will internally contain a pointer to beginning of the elements on the heap.
Elements on the heap give the vector class an ability to grow and shrink on demand.
While having the vector on the stack gives the object the benefit of being destructed when going out of scope.
In regards to your [] question, the vector class overloads the [] operator. I would say internally it's basically doing something like this when you do array[1]:
return *(_Myfirst+ (n * elementSize))
Where the vector keeps track of the start of it's internal heap with _Myfirst.
When your vector starts to fill up, it will allocate more memory for you. A common practice is to double the amount of memory needed each time.
Note that vector inherits from _Vector_val, which contains the following members:
pointer _Myfirst; // pointer to beginning of array
pointer _Mylast; // pointer to current end of sequence
pointer _Myend; // pointer to end of array
_Alty _Alval; // allocator object for values

Your v is allocated in automatic memory. (commonly known as the stack, yes)
The implementation details are not specified, but most commonly it's implemented using a dynamic array which is resized if you attempt to add more elements than the previous allocation can hold.
The standard only specifies the interface (which methods it should have) and execution time boundaries.
Since vector is a template, the implementation is visible, so locate your <vector> file and start inspecting.

void f(int n) {
std::vector<int> v(n);
...
}
The vector above has automatic storage duration and is allocated on the stack. However, the data array that the vector manages is allocated on the heap.
The internals of the vector are implementation specific, but a typical implementation will contain 3 pointers; one each to keep track of start of array, size of vector and capacity of vector.

Related

Is std::vector preferred over heap arrays for dynamically sized arrays in classes?

If you want to have an array as a member variable of a class there are two main options:
A: Allocate the memory on the heap
class X
{
int * arr;
public:
UnionFind(int numNodes)
{
arr = new int[numNodes];
}
}
B: Use a vector
class X
{
vector <int> arr;
public:
UnionFind(int numNodes)
{
arr.resize(numNodes);
}
}
Which of these is the preferred method? I know one drawback of heap allocated arrays is that you need to take care of deleting the memory yourself.
As a small side question, if an object of X is created on the heap is vector <int> arr also in the heap within the object? If so, how come vector <int> arr does not manually need to be deleted?
When you have the choice between a dynamically allocated C-style array and a std::vector<>, choose the vector.
It is safe, does all the alloc/realloc/resizing for you
It makes you code more flexible, readable, and easier to maintain
It is extremely efficient in most use cases
It provides explicit iterators, and plenty of member functions, including size()
Many implementations will do index checking in debug mode to catch out-of-bounds errors
Note that std::array exists for most of the cases where a C-array would be preferable (e.g., when allocation on the stack is preferred)
You should prefer vector:
the vector and vector's elements' destructors are guaranteed to run at the appropriate times
things like .push_back are massively easier and more concise to use correctly than coding your own checks on "capacity" and resizing/copy-constructing/moving in a C++ object-aware fashion
it's easier to apply algorithms to Standard containers, use the iterators etc.
it will be easier to move from vector to another Standard container (list, stack, map, unordered_map, deque etc) if evolving application/library needs suggest it
vector has some housekeeping information that's useful: size(), capacity()
before C++11 there was a single performance issue compared to using new[] and delete[] - you couldn't do an up-front "sizing" of the vector to hold however-many elements without copy-constructing their values from a prototypical element (constructor "2" here, and resize here) - that meant the constructor or resize had to iterate over every element doing copy construction even if the default constructor was a no-op (e.g. deliberately leaving all members uninitialised)
this is very rarely relevant or problematic, and indeed the C++ behaviour was generally safer
because it's a proper class, you can (whether you should is another matter) overload operator<<, operator>> for conveniently streaming arbitrary vectors
if an object of X is created on the heap is vector <int> arr also in the heap within the object? If so, how come vector <int> arr does not manually need to be deleted?
Yes, the vector object itself will be embedded within X, so will be on the heap too (similarly, it could be embedded in an automatic/stack variable, a global/namespace/static variable, a thread-specific variable etc.). That said, the vector object contains a pointer which tracks any further memory needed for elements in the vector, and that memory is by default dynamically allocated (i.e. on the heap) regardless of where the vector object itself is held.
Member variables with destructors (such as any vector) have them called automatically by the containing class's destructor.

Vector of vectors, heap versus stack (C++)

I wanted to initialize a vector of vectors that contain pointers to Courses. I declared this:
std::vector<std::vector<Course*> > *CSPlan =
new std::vector<std::vector<Course*> >(smsNum);
What I wanted to do by this is to have a vector of vectors, each inside vector is a vector that contains pointers to Courses, and I wanted the MAIN vector to be of size int smsNum. Furthermore, I wanted it on the heap.
My questions are:
Are both the main vector AND the inside vectors allocated on the heap? or is it only the MAIN vector is on the heap and its' indexes are pointers to other smaller vectors on the stack?
I declared it to be of size int smsNum so the Main vector is of size 10, but what about the smaller vectors? are they also of that size or are they still dynamic?
My goal in the end is to have a vector of vectors, both the Main vector and the child vectors on the heap, and ONLY the Main vector is of size smsNum, while the rest are dynamic.
Any structure that can grow as large as the user wants it to is going to be allocated on the heap. The memory stack, on the other hand, is used to allocate statically allocated variables, that the program has control of the size statically, during the compilation process.
Since you can have a loop like this:
for (i = 0; i < your_value; i++) {
vector.insert(...);
}
And considering your_value as an integer read from the standard input, the compiler have no control on how large will be your vector, i.e., it does not know what is the maximum amount of inserts you may perform.
To solve this, the structure must be allocated on the heap, where it may grow as large as the OS allows it to -- considering primary memory, and swap. As a complement, if you use the pointer to the vector, you'll be simply dynamically allocating a variable to reference the vector. This changes NOT the fact that the contents of the vector is, necessarily, being allocated on the heap.
You'll have, in your stack:
a variable "x" that stores the address of a variable "y";
And in your heap:
the value of the variable "y", that is a reference to your vector of vectors;
the contents of your vector of vectors (accessed by "y", that is accessed by "x").

C++ delete vector, objects, free memory

I am totally confused with regards to deleting things in C++. If I declare an array of objects and if I use the clear() member function. Can I be sure that the memory was released?
For example :
tempObject obj1;
tempObject obj2;
vector<tempObject> tempVector;
tempVector.pushback(obj1);
tempVector.pushback(obj2);
Can I safely call clear to free up all the memory? Or do I need to iterate through to delete one by one?
tempVector.clear();
If this scenario is changed to a pointer of objects, will the answer be the same as above?
vector<tempObject> *tempVector;
//push objects....
tempVector->clear();
You can call clear, and that will destroy all the objects, but that will not free the memory. Looping through the individual elements will not help either (what action would you even propose to take on the objects?) What you can do is this:
vector<tempObject>().swap(tempVector);
That will create an empty vector with no memory allocated and swap it with tempVector, effectively deallocating the memory.
C++11 also has the function shrink_to_fit, which you could call after the call to clear(), and it would theoretically shrink the capacity to fit the size (which is now 0). This is however, a non-binding request, and your implementation is free to ignore it.
There are two separate things here:
object lifetime
storage duration
For example:
{
vector<MyObject> v;
// do some stuff, push some objects onto v
v.clear(); // 1
// maybe do some more stuff
} // 2
At 1, you clear v: this destroys all the objects it was storing. Each gets its destructor called, if your wrote one, and anything owned by that MyObject is now released.
However, vector v has the right to keep the raw storage around in case you want it later.
If you decide to push some more things into it between 1 and 2, this saves time as it can reuse the old memory.
At 2, the vector v goes out of scope: any objects you pushed into it since 1 will be destroyed (as if you'd explicitly called clear again), but now the underlying storage is also released (v won't be around to reuse it any more).
If I change the example so v becomes a pointer to a dynamically-allocated vector, you need to explicitly delete it, as the pointer going out of scope at 2 doesn't do that for you. It's better to use something like std::unique_ptr in that case, but if you don't and v is leaked, the storage it allocated will be leaked as well. As above, you need to make sure v is deleted, and calling clear isn't sufficient.
vector::clear() does not free memory allocated by the vector to store objects; it calls destructors for the objects it holds.
For example, if the vector uses an array as a backing store and currently contains 10 elements, then calling clear() will call the destructor of each object in the array, but the backing array will not be deallocated, so there is still sizeof(T) * 10 bytes allocated to the vector (at least). size() will be 0, but size() returns the number of elements in the vector, not necessarily the size of the backing store.
As for your second question, anything you allocate with new you must deallocate with delete. You typically do not maintain a pointer to a vector for this reason. There is rarely (if ever) a good reason to do this and you prevent the vector from being cleaned up when it leaves scope. However, calling clear() will still act the same way regardless of how it was allocated.
if I use the clear() member function. Can I be sure that the memory was released?
No, the clear() member function destroys every object contained in the vector, but it leaves the capacity of the vector unchanged. It affects the vector's size, but not the capacity.
If you want to change the capacity of a vector, you can use the clear-and-minimize idiom, i.e., create a (temporary) empty vector and then swap both vectors.
You can easily see how each approach affects capacity. Consider the following function template that calls the clear() member function on the passed vector:
template<typename T>
auto clear(std::vector<T>& vec) {
vec.clear();
return vec.capacity();
}
Now, consider the function template empty_swap() that swaps the passed vector with an empty one:
template<typename T>
auto empty_swap(std::vector<T>& vec) {
std::vector<T>().swap(vec);
return vec.capacity();
}
Both function templates return the capacity of the vector at the moment of returning, then:
std::vector<double> v(1000), u(1000);
std::cout << clear(v) << '\n';
std::cout << empty_swap(u) << '\n';
outputs:
1000
0
Move semantics allows for a straightforward way to release memory, by simply applying the assignment (=) operator from an empty rvalue:
std::vector<int> vec(100, 0);
std::cout << vec.capacity(); // 100
vec = vector<int>(); // Same as "vector<int>().swap(vec)";
std::cout << vec.capacity(); // 0
It is as much efficient as the "swap()"-based method described in other answers (indeed, both are conceptually doing the same thing). When it comes to readability, however, the assignment version makes a better job.
You can free memory used by vector by this way:
//Removes all elements in vector
v.clear()
//Frees the memory which is not used by the vector
v.shrink_to_fit();
If you need to use the vector over and over again and your current code declares it repeatedly within your loop or on every function call, it is likely that you will run out of memory. I suggest that you declare it outside, pass them as pointers in your functions and use:
my_arr.resize()
This way, you keep using the same memory sequence for your vectors instead of requesting for new sequences every time.
Hope this helped.
Note: resizing it to different sizes may add random values. Pass an integer such as 0 to initialise them, if required.

Does a vector have to store the size twice?

This is a rather academic question, I realise it matters little regarding optimization, but it is just out of interest.
From what I understand, when you call new[size], additional space is allocated to store the size of the allocated array. This is so when delete [] is called, it is known how much space can be freed.
What I've done is written how I think a vector would roughly be implemented:
#include <cstddef>
template <class T>
class Vector
{
public:
struct VectorStorage
{
std::size_t size;
T data[];
};
Vector(std::size_t size) : storage(new VectorStorage[size])
{
storage->size = size;
}
std::size_t size()
{
return storage->size;
}
~Vector()
{
delete[] storage;
}
private:
VectorStorage* storage;
};
As far as I can tell, size is stored twice. Once in the VectorStorage object directly (as it needs to be so the size() function can work) but again in a hidden way by the compiler, so delete[] can work.
It seems like size is stored twice. Is this the case and unavoidable, or is there a way to ensure that the size is only stored once?
std::vector does not allocate memory; std::allocator, or whatever allocator you give the vector, is what allocates the memory. The allocator interface is given the number of items to allocate/deallocate, so it is not required to actually store that.
Vectors don't usually store the size. The usual implementation keeps a pointer past the end of the last element and one past the end of the allocated memory, since one can reserve space without actually storing elements in it. However, there is no standard way to access the size stored by new (which may not be stored to begin with), so some duplication is usually needed.
Yes. But that's an implementation detail of the heap allocator that the compiler knows nothing about. It is almost certainly not the same as the capacity of the vector since the vector is only interested in the number of elements, not the number of bytes. And a heap block tends to have extra overhead for its own purposes.
You have size in the wrong place -- you're storing it with every element (by virtue of an array of VectorStorage), which also contains an array (per instance of VectorStorage) of T. Do you really mean to have an array inside an array like that? You're never cleaning up your array of T in your destructor either.

What's the difference between these two classes?

Below, I'm not declaring my_ints as a pointer. I don't know where the memory will be allocated. Please educate me here!
#include <iostream>
#include <vector>
class FieldStorage
{
private:
std::vector<int> my_ints;
public:
FieldStorage()
{
my_ints.push_back(1);
my_ints.push_back(2);
}
void displayAll()
{
for (int i = 0; i < my_ints.size(); i++)
{
std::cout << my_ints[i] << std::endl;
}
}
};
And in here, I'm declaring the field my_ints as a pointer:
#include <iostream>
#include <vector>
class FieldStorage
{
private:
std::vector<int> *my_ints;
public:
FieldStorage()
{
my_ints = new std::vector<int>();
my_ints->push_back(1);
my_ints->push_back(2);
}
void displayAll()
{
for (int i = 0; i < my_ints->size(); i++)
{
std::cout << (*my_ints)[i] << std::endl;
}
}
~FieldStorage()
{
delete my_ints;
}
};
main() function to test:
int main()
{
FieldStorage obj;
obj.displayAll();
return 0;
}
Both of them produces the same result. What's the difference?
In terms of memory management, these two classes are virtually identical. Several other responders have suggested that there is a difference between the two in that one is allocating storage on the stack and other on the heap, but that's not necessarily true, and even in the cases where it is true, it's terribly misleading. In reality, all that's different is where the metadata for the vector is allocated; the actual underlying storage in the vector is allocated from the heap regardless.
It's a little bit tricky to see this because you're using std::vector, so the specific implementation details are hidden. But basically, std::vector is implemented like this:
template <class T>
class vector {
public:
vector() : mCapacity(0), mSize(0), mData(0) { }
~vector() { if (mData) delete[] mData; }
...
protected:
int mCapacity;
int mSize;
T *mData;
};
As you can see, the vector class itself only has a few members -- capacity, size and a pointer to a dynamically allocated block of memory that will store the actual contents of the vector.
In your example, the only difference is where the storage for those few fields comes from. In the first example, the storage is allocated from whatever storage you use for your containing class -- if it is heap allocated, so too will be those few bits of the vector. If your container is stack allocated, so too will be those few bits of the vector.
In the second example, those bits of the vector are always heap allocated.
In both examples, the actual meat of the vector -- the contents of it -- are allocated from the heap, and you cannot change that.
Everybody else has pointed out already that you have a memory leak in your second example, and that is also true. Make sure to delete the vector in the destructor of your container class.
You have to release ( to prevent memory leak ) memory allocated for vector in the second case in the FieldStorage destructor.
FieldStorage::~FieldStorage()
{
delete my_ints;
}
As Mykola Golubyev pointed out, you need to delete the vector in the second case.
The first will possibly build faster code, as the optimiser knows the full size of FieldStorage including the vector, and could allocate enough memory in one allocation for both.
Your second implementation requires two separate allocations to construct the object.
I think you are really looking for the difference between the Stack and the Heap.
The first one is allocated on the stack while the second one is allocated on the heap.
In the first example, the object is allocated on the stack.
The the second example, the object is allocated in the heap, and a pointer to that memory is stored on the stack.
the difference is that the second allocates the vector dynamically. there are few differences:
you have to release memory occupied by the vector object (the vector object itself, not the object kept in the vector because it is handled correctly by the vector). you should use some smart pointer to keep the vector or make (for example in the destructor):
delete my_ints;
the first one is probably more efficient because the object is allocated on the stack.
access to the vector's method have different syntax :)
The first version of FieldStorage contains a vector. The size of the FieldStorage class includes enough bytes to hold a vector. When you construct a FieldStorage, the vector is constructed right before the body of FieldStorage's constructor is executed. When a FieldStorage is destructed, so is the vector.
This does not necessarily allocate the vector on the stack; if you heap-allocate a FieldStorage object, then the space for the vector comes from that heap allocation, not the stack. If you define a global FieldStorage object, then the space for the vector comes from neither the stack nor the heap, but rather from space designated for global objects (e.g. the .data or .bss section on some platforms).
Note that the vector performs heap allocations to hold the actual data, so it is likely to only contain a few pointers, or a pointer and couple of lengths, but it may contain whatever your compiler's STL implementation needs it to.
The second version of FieldStorage contains a pointer to a vector. The size of the FieldStorage class includes room for a pointer to a vector, not an actual vector. You are allocating storage for the vector using new in the body of FieldStorage's constructor, and you leaking that storage when FieldStorage is destructed, because you didn't define a destructor that deletes the vector.
First way is the prefereable one to use, you don't need pointer on vector and have forgot to delete it.
In the first case, the std::vector is being put directly into your class (and it is handling any memory allocation and deallocation it needs to grow and shrink the vector and free the memory when your object is destroyed; it is abstracting the memory management from you so that you don't have to deal with it). In the second case, you are explicitly allocating the storage for the std::vector in the heap somewhere, and forgetting to ever delete it, so your program will leak memory.
The size of the object is going to be different. In the second case the Vector<>* only takes up the size of a pointer (4bytes on 32bit machines). In the first case, your object will be larger.
One practical difference is that in your second example, you never release either the memory for the vector, or its contents (because you don't delete it, and therefore call its destructor).
But the first example will automatically destroy the vector (and free its contents) upon destruction of your FieldStorage object.