Understanding the space occupied by an vector instance in C++

Understanding the space occupied by an vector instance in C++ - c++

I'm trying to understand the amount of bytes occupied by an instance of std::vector. The following code:
vector <int>uno;
uno.push_back(1);
uno.push_back(1);
cout <<"1: "<< sizeof(uno)<<" bytes"<<endl;
cout << endl;
vector <bool>unos;
unos.push_back(true);
cout <<"2: "<< sizeof(unos)<<" bytes"<<endl;
cout << endl;
gives me this output:
1: 12 bytes
2: 20 bytes
Can someone explain me why the size of the vector<bool> has more bytes than the vector<int>?. And what is the correct way to measure the size in bytes of an vector instance... by adding the size of every member in the vector?

In C++, the sizeof operator is always evaluated at compile time, so every vector<T> will have the same size (unless vector<T> is a specialization like vector<bool>).
In case you are wondering about the 12 bytes, that is probably the size of 3 pointers, or alternatively, the size of a pointer and an element count and a capacity. The real data is never stored inside the vector.
If you want to know how much memory is used total, a good approximation is:
sizeof(std::vector<T>) + my_vector.capacity() * sizeof(T)
This is only an approximation because it does not take into account book-keeping data on the heap.

std::vector<bool> is larger since it is more complicated than std::vector<int>. The current C++ standard specifies that the vector<bool> data should be bit-packed. This requires some additional bookkeeping, and gives them a slightly different behavior than the normal std::vector<T>.
This different behavior of std::vector<bool> is very likely to be removed in the upcoming C++0x standard.

Related

Unable to Allocate large cpp std::vector that is less than std::vector::max_size() [duplicate]

This question already has answers here:
How to know the right max size of vector? max_size()? but no
(4 answers)
Closed 8 months ago.
I am trying to allocate a vector<bool> in c++ for 50,000,000,000 entries; however, the program errors out.terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc (or in the online compiler it just ends).
I initially thought this was due to too large a size; however, v1.maxsize() is greater than 50GB for me. What's confusing though, is when I reduce the # of entries it works fine.
Question: What could be the root cause of this considering that the number of entries is less than maxsize of a vector?
Other questions/answers have suggested that similar issues are due to being on a 32 bit cpu; however I have a 64bit.
#include <iostream>
#include <vector>
using namespace std;
int main()
{
long size = 50000000000;
std::vector<bool> v1;
std::cout << "max_size: " << bool(v1.max_size() > 50000000000) <<"vs" << size << "\n";
v1 = std::vector<bool>(size,false);
cout << "vector initialised \n" << endl;
cout << v1.size() << endl;
}
note: I am essentially trying to create a memory efficient bitmap to track if certain addresses for a different data structure have been initialized. I cant use a bitset mentioned in this post since the size isn't known at compile time.

From std::vector::max_size:
This value typically reflects the theoretical limit on the size of the
container, at most std::numeric_limits<difference_type>::max(). At
runtime, the size of the container may be limited to a value smaller
than max_size() by the amount of RAM available.
This means that std::vector::max_size is not a good indication to the actual maximum size you can allocate due to hardware limitation.
In practice the actual maximum size will [almost] always be smaller, depending on your available RAM in runtime.
On current 64 bit systems this will always be the case (at least with current available hardware), because the theoretical size in a 64 bit address space is a lot bigger than available RAM sizes.

The operation of the sizeof operator in C++

On my MS VS 2015 compiler, the sizeof int is 4 (bytes). But the sizeof vector<int> is 16. As far as I know, a vector is like an empty box when it's not initialized yet, so why is it 16? And why 16 and not another number?
Furthermore, if we have vector<int> v(25); and then initialize it with int numbers, then still the size of v is 16 although it has 25 int numbers! The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?

The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?
You're confusing sizeof(std::vector) and std::vector::size(), the former will return the size of vector itself, not including the size of elements it holds. The latter will return the count of the elements, you can get all their size by std::vector::size() * sizeof(int).
so why is it 16? And why 16 and not another number?
What is sizeof(std::vector) depends on implmentation, mostly implemented with three pointers. For some cases (such as debug mode) the size might increase for the convenience.

std::vector is typically a structure which contains two elements: pointer (array) of its elements and size of the array (number of elements).
As size is sizeof(void *) and the pointer is also sizeof(void *), the size of the structure is 2*sizeof(void *) which is 16.
The number of elements has nothing to do with the size as the elements are allocated on the heap.
EDIT: As M.M mentioned, the implementation could be different, like the pointer, start, end, allocatedSize. So in 32-bit environment that should be 3*sizeof(size_t)+sizeof(void *) which might be the case here. Even the original could work with start hardcoded to 0 and allocatedSize computed by masking end so really dependent on implementation. But the point remains the same.

sizeof is evaluated at compile time, so it only counts the size of the variables declared in the class, which probably includes a couple of counters and a pointer. It's what the pointer points to that varies with the size, but the compiler doesn't know about that.

The size can be explained using pointers which can be: 1) begin of vector 2) end of vector and 3) vector's capacity. So it would be more of like implementation dependent and it will change for different implementation.

You seem to be mixing "array" with "vector". If you have a local array, sizeof will provide the size of the array indeed. However, vector is not an array. It is a class, a container from STL guaranteeing that the memory contents are located within a single block of memory (that may get relocated, if vector grows).
Now, if you take a look at the std::vector implementation, you'll notice it contains fields (at least in MSVC 14.0):
size_type _Count = 0;
typename _Alloc_types::_Alty _Alval; // allocator object (from base)
_Mylast
_Myfirst
That could sum up to 16 bytes under your implementation (note: experience may vary).

Cannot resize vector to 1765880295

I want to allocate a vector of size 1765880295 and so i used resize(1765880295) but the program stops running.The adjact problem would be code block not responding.
what is wrong?
Although the max_size gives 4294967295 which is greater than 1765880295 the problem is still the same even without resizing the vector.

Depending on what is stored in the vector -- for example, a 32-bit pointer or uint32, the size of the vector (number of elements * size of each element) will exceed the maximum addressable space on a 32-bit system.

The max_size is dependent on the implementation (some may have 1073741823 as their max_size). But even if your implementation supports a bigger number, the program will fail if there is not enough memory.
For example: if you have a vector<int>, and the sizeof(int) == 4 // bytes, we do the math, and...
1765880295 * 4 bytes = 7063521180 bytes ≈ 6.578 gygabytes
So you would require around 6.6GiB of free memory to allocate that enormous vector.

Why doesn't `std::string::reserve()` reserve exact amount of space I specify?

std::string::reserve() doesn't allocate the exact amount of space I pass as argument. For example, if I try to reserve space for 100 characters, it reserves for 111 characters. If I pass 200, it reserves for 207. 655 for 650, 1007 for 1000.
What is the reason behind this?
Program code:
std::string mystr;
std::cout << "After creation :" << mystr.capacity() << std::endl;
mystr.reserve(1000);
std::cout << "After reserve() :" << mystr.capacity() << std::endl;
mystr = "asd";
std::cout << "After assignment :" << mystr.capacity() << std::endl;
mystr.clear();
std::cout << "After clear() :" << mystr.capacity() << std::endl;
Code output:
After creation :15
After reserve() :1007
After assignment :1007
After clear() :1007
(IDE: Visual Studio 2012)

The standard allows it
The C++ standard allows the implementation to reserve more memory than requested. In the standard (N3690, §21.4.4) it states
void reserve(size_type res_arg=0);
The member function reserve() is a directive that informs a basic_string object of a planned change in size, so that it can manage the storage allocation accordingly.
Effects: After reserve(), capacity() is greater or equal to the argument of reserve. [ Note: Calling reserve() with a res_arg argument less than capacity() is in effect a non-binding shrink request. A call with res_arg <= size() is in effect a non-binding shrink-to-fit request. — end note ]
The reason: Alignment on 16-byte boundaries
It seems that the reserved size is always a number that is a multiple of 16 minus one character for null termination. Memory reserved on the heap is always automatically 16-byte aligned on a x86 machine. Hence there is no cost in rounding up to the next biggest multiple of 16 for memory allocation.
The Microsoft documentation for malloc() states that:
The storage space pointed to by the return value is guaranteed to be suitably aligned for storage of any type of object.
Objects of SIMD type must be 16-byte aligned to work best. These are packed types of 4 floats or 2 doubles (or other) that fit into the 128-bit registers of an x86 machine. If the data is not properly aligned, then loading and storing to these memory locations can lead a great loss of performance or even crashes. That's why malloc() does this. Hence the conclusion for the 16-byte alignment. Most memory allocations (including operator new) ultimately call malloc(). Not allocating a multiple of 16 bytes would just be a waste of memory that would otherwise be unused anyways.

The Standard doesn't require it to reserve exactly what you specify, only at least what you specify:
21.4.4 basic_string capacity [string.capacity]
12/Effects: After reserve(), capacity() is greater or equal to the
argument of reserve. [ Note: Calling reserve() with a res_arg argument
less than capacity() is in effect a non-binding shrink request. A call
with res_arg <= size() is in effect a non-binding shrink-to-fit
request. —end note ]

I would have to look at the source to be 100% certain but it looks like the underlying code is reserving the amount you requested and padding that to the next 16 byte boundary (leaving 1 for null termination)
This is just a theory based on the behavior.

Size of std::array, std::vector and raw array

Lets we have,
std::array <int,5> STDarr;
std::vector <int> VEC(5);
int RAWarr[5];
I tried to get size of them as,
std::cout << sizeof(STDarr) + sizeof(int) * STDarr.max_size() << std::endl;
std::cout << sizeof(VEC) + sizeof(int) * VEC.capacity() << std::endl;
std::cout << sizeof(RAWarr) << std::endl;
The outputs are,
40
20
40
Are these calculations correct? Considering I don't have enough memory for std::vector and no way of escaping dynamic allocation, what should I use? If I would know that std::arrayresults in lower memory requirement I could change the program to make array static.

These numbers are wrong. Moreover, I don't think they represent what you think they represent, either. Let me explain.
First the part about them being wrong. You, unfortunately, don't show the value of sizeof(int) so we must derive it. On the system you are using the size of an int can be computed as
size_t sizeof_int = sizeof(RAWarr) / 5; // => sizeof(int) == 8
because this is essentially the definition of sizeof(T): it is the number of bytes between the start of two adjacent objects of type T in an array. This happens to be inconsistent with the number print for STDarr: the class template std::array<T, n> is specified to have an array of n objects of type T embedded into it. Moreover, std::array<T, n>::max_size() is a constant expression yielding n. That is, we have:
40 // is identical to
sizeof(STDarr) + sizeof(int) * STDarr.max_size() // is bigger or equal to
sizeof(RAWarr) + sizeof_int * 5 // is identical to
40 + 40 // is identical to
80
That is 40 >= 80 - a contradication.
Similarily, the second computation is also inconsistent with the third computation: the std::vector<int> holds at least 5 elements and the capacity() has to be bigger than than the size(). Moreover, the std::vector<int>'s size is non-zero. That is, the following always has to be true:
sizeof(RAWarr) < sizeof(VEC) + sizeof(int) * VEC.capacity()
Anyway, all this is pretty much irrelevant to what your actual question seems to be: What is the overhead of representing n objects of type T using a built-in array of T, an std::array<T, n>, and an std::vector<T>? The answer to this question is:
A built-in array T[n] uses sizeof(T) * n.
An std::array<T, n> uses the same size as a T[n].
A std::vector<T>(n) has needs some control data (the size, the capacity, and possibly and possibly an allocator) plus at least 'n * sizeof(T)' bytes to represent its actual data. It may choose to also have a capacity() which is bigger than n.
In addition to these numbers, actually using any of these data structures may require addition memory:
All objects are aligned at an appropriate address. For this there may be additional byte in front of the object.
When the object is allocated on the heap, the memory management system my include a couple of bytes in addition to the memory made avaiable. This may be just a word with the size but it may be whatever the allocation mechanism fancies. Also, this memory may live somewhere else than the allocate memory, e.g. in a hash table somewhere.
OK, I hope this provided some insight. However, here comes the important message: if std::vector<T> isn't capable of holding the amount of data you have there are two situations:
You have extremely low memory and most of this discussion is futile because you need entirely different approaches to cope with the few bytes you have. This would be the case if you are working on extremely resource constrained embedded systems.
You have too much data an using T[n] or std::array<T, n> won't be of much help because the overhead we are talking of is typically less than 32 bytes.
Maybe you can describe what you are actually trying to do and why std::vector<T> is not an option.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Understanding the space occupied by an vector instance in C++ - c++

Related

Unable to Allocate large cpp std::vector that is less than std::vector::max_size() [duplicate]

The operation of the sizeof operator in C++

Cannot resize vector to 1765880295

Why doesn't `std::string::reserve()` reserve exact amount of space I specify?

Size of std::array, std::vector and raw array

Categories

Resources