Size of std::array, std::vector and raw array - c++

Lets we have,
std::array <int,5> STDarr;
std::vector <int> VEC(5);
int RAWarr[5];
I tried to get size of them as,
std::cout << sizeof(STDarr) + sizeof(int) * STDarr.max_size() << std::endl;
std::cout << sizeof(VEC) + sizeof(int) * VEC.capacity() << std::endl;
std::cout << sizeof(RAWarr) << std::endl;
The outputs are,
40
20
40
Are these calculations correct? Considering I don't have enough memory for std::vector and no way of escaping dynamic allocation, what should I use? If I would know that std::arrayresults in lower memory requirement I could change the program to make array static.

These numbers are wrong. Moreover, I don't think they represent what you think they represent, either. Let me explain.
First the part about them being wrong. You, unfortunately, don't show the value of sizeof(int) so we must derive it. On the system you are using the size of an int can be computed as
size_t sizeof_int = sizeof(RAWarr) / 5; // => sizeof(int) == 8
because this is essentially the definition of sizeof(T): it is the number of bytes between the start of two adjacent objects of type T in an array. This happens to be inconsistent with the number print for STDarr: the class template std::array<T, n> is specified to have an array of n objects of type T embedded into it. Moreover, std::array<T, n>::max_size() is a constant expression yielding n. That is, we have:
40 // is identical to
sizeof(STDarr) + sizeof(int) * STDarr.max_size() // is bigger or equal to
sizeof(RAWarr) + sizeof_int * 5 // is identical to
40 + 40 // is identical to
80
That is 40 >= 80 - a contradication.
Similarily, the second computation is also inconsistent with the third computation: the std::vector<int> holds at least 5 elements and the capacity() has to be bigger than than the size(). Moreover, the std::vector<int>'s size is non-zero. That is, the following always has to be true:
sizeof(RAWarr) < sizeof(VEC) + sizeof(int) * VEC.capacity()
Anyway, all this is pretty much irrelevant to what your actual question seems to be: What is the overhead of representing n objects of type T using a built-in array of T, an std::array<T, n>, and an std::vector<T>? The answer to this question is:
A built-in array T[n] uses sizeof(T) * n.
An std::array<T, n> uses the same size as a T[n].
A std::vector<T>(n) has needs some control data (the size, the capacity, and possibly and possibly an allocator) plus at least 'n * sizeof(T)' bytes to represent its actual data. It may choose to also have a capacity() which is bigger than n.
In addition to these numbers, actually using any of these data structures may require addition memory:
All objects are aligned at an appropriate address. For this there may be additional byte in front of the object.
When the object is allocated on the heap, the memory management system my include a couple of bytes in addition to the memory made avaiable. This may be just a word with the size but it may be whatever the allocation mechanism fancies. Also, this memory may live somewhere else than the allocate memory, e.g. in a hash table somewhere.
OK, I hope this provided some insight. However, here comes the important message: if std::vector<T> isn't capable of holding the amount of data you have there are two situations:
You have extremely low memory and most of this discussion is futile because you need entirely different approaches to cope with the few bytes you have. This would be the case if you are working on extremely resource constrained embedded systems.
You have too much data an using T[n] or std::array<T, n> won't be of much help because the overhead we are talking of is typically less than 32 bytes.
Maybe you can describe what you are actually trying to do and why std::vector<T> is not an option.

Related

Allocating memory for multiple buffers including alignment

I'd like to allocate memory for contiguous arrays of several different types. That is, like this, but with dynamic sizes:
template <typename T0, std::size_t N0, typename T1, std::size_t N1, ...>
struct {
T0 t0s[N0];
T1 t1s[N1];
...
};
Naively, the size is just sizeof(T0) * N0 + sizeof(T1) * N1 + ... but alignment makes it tricky, I think. You'd align the whole thing on alignof(T0), then after N0 T0s, you have to figure out the alignof(T1) spot to start the T1s. There's std::align for this, but it's geared toward finding where to put them after you already have memory allocated, so it would take a little effort to use it before allocating memory.
Is there some easy off-the-shelf way to calculate the size and offsets of a dynamic struct like this? I'm picturing a function like
// Return the offsets in bytes of the ends of each array
template <typename... Ts>
std::array<std::size_t, sizeof...(Ts)>
calculateOffsets(const std::array<std::size_t, sizeof...(Ts)> sizes);
for example:
auto offsets = calculateOffets<char, float>({3, 2});
// Offsets is now {4, 12} because the layout is [cccXffffFFFF] where c is a char, X is padding, f and F are the two floats.
// Now we can allocate a buffer of size offsets.back() with alignment alignof(char) and placement-new 3 chars starting at buf + 0, and 2 floats at buf + offsets.front()
It's funny because clearly the compiler has this logic built in, since at compile time it knows the layout of the above struct when the sizes are static.
(So far I'm only interested in POD types; to work fully generically I'd additionally need to deal with exception-safe construction and destruction.)
There are no built-in facilities for this -- but this is a solvable problem in C++.
Since what you're effectively looking for is a "packed buffer" of aligned contiguous components, I would recommend doing this by allocating the sum of all buffers at the alignment of the most aligned object. This makes calculation of the length of the buffer in bytes easy -- but requires remapping of the order of the data.
The thing to remember with alignment is that a 16-byte boundary is also aligned to an 8 byte boundary (and so on). On the other hand, it is not true that 4 bytes from an 1-byte boundary will be on an 4-byte boundary1. It may be, but this depends on the initial pointer.
Example
As an example, we want the following in the buffer:
3 chars (c)
2 floats (f)
2 shorts (s)
We assume the following is true:
alignof(float) >= alignof(short) >= alignof(char)
sizeof(float) is 4
sizeof(short) is 2
sizeof(char) is 1
For our needs, we would want a buffer that is sizeof(float) * 2 + sizeof(short) * 2 + sizeof(char) * 3 = 15 bytes.
A buffer that has been sorted by the alignment would look like the following, when packed:
ffff ffff ssss ccc
As long as the initial allocation is aligned to alignof(float), the rest of the bytes in this buffer are guaranteed to be suitably aligned as well.
It would be ideal if you could just arrange your data to always be in descending order of alignment; but if this is not guaranteed to be the case, you could always use a pre-canned template-metaprogramming solution to sort a list of T... types based on the alignof(T) for each object.
1 The issue with your suggestion of aligning in order of the buffers is that a situation like alignof(char), which is 1 byte, does not guarantee that 3 chars later and 1 padding byte from the initial alignment is aligned to a 4 byte alignment. A pointer of 0x00FFFFFE01 is "1-byte aligned", but 4 bytes later is 0x00FFFFFE05 -- which is still unaligned.
Although this may work with some underlying allocators, like new/std::malloc which have a default alignment of alignof(std::max_align_t) -- this is not true for more precise allocation mechanisms such as those found in std::polymorphic_allocator.
Is there some easy off-the-shelf way to calculate the size and offsets of a dynamic struct like this?
There is no such function in the standard library. You can probably implement it using std::align in a loop.

Aligning buffer to an N-byte boundary but not a 2N-byte one?

I would like to allocate some char buffers0, to be passed to an external non-C++ function, that have a specific alignment requirement.
The requirement is that the buffer be aligned to a N-byte1 boundary, but not to a 2N boundary. For example, if N is 64, then an the pointer to this buffer p should satisfy ((uintptr_t)p) % 64 == 0 and ((uintptr_t)p) % 128 != 0 - at least on platforms where pointers have the usual interpretation as a plain address when cast to uintptr_t.
Is there a reasonable way to do this with the standard facilities of C++11?
If not, is there is a reasonable way to do this outside the standard facilities2 which works in practice for modern compilers and platforms?
The buffer will be passed to an outside routine (adhering to the C ABI but written in asm). The required alignment will usually be greater than 16, but less than 8192.
Over-allocation or any other minor wasted-resource issues are totally fine. I'm more interested in correctness and portability than wasting a few bytes or milliseconds.
Something that works on both the heap and stack is ideal, but anything that works on either is still pretty good (with a preference towards heap allocation).
0 This could be with operator new[] or malloc or perhaps some other method that is alignment-aware: whatever makes sense.
1 As usual, N is a power of two.
2 Yes, I understand an answer of this type causes language-lawyers to become apoplectic, so if that's you just ignore this part.
Logically, to satisfy "aligned to N, but not 2N", we align to 2N then add N to the pointer. Note that this will over-allocate N bytes.
So, assuming we want to allocate B bytes, if you just want stack space, alignas would work, perhaps.
alignas(N*2) char buffer[B+N];
char *p = buffer + N;
If you want heap space, std::aligned_storage might do:
typedef std::aligned_storage<B+N,N*2>::type ALIGNED_CHAR;
ALIGNED_CHAR buffer;
char *p = reinterpret_cast<char *>(&buffer) + N;
I've not tested either out, but the documentation suggests it should be OK.
You can use _aligned_malloc(nbytes,alignment) (in MSVC) or _mm_malloc(nbytes,alignment) (on other compilers) to allocate (on the heap) nbytes of memory aligned to alignment bytes, which must be an integer power of two.
Then you can use the trick from Ken's answer to avoid alignment to 2N:
void*ptr_alloc = _mm_malloc(nbytes+N,2*N);
void*ptr = static_cast<void*>(static_cast<char*>(ptr_alloc) + N);
/* do your number crunching */
_mm_free(ptr_alloc);
We must ensure to keep the pointer returned by _mm_malloc() for later de-allocation, which must be done via _mm_free().

The operation of the sizeof operator in C++

On my MS VS 2015 compiler, the sizeof int is 4 (bytes). But the sizeof vector<int> is 16. As far as I know, a vector is like an empty box when it's not initialized yet, so why is it 16? And why 16 and not another number?
Furthermore, if we have vector<int> v(25); and then initialize it with int numbers, then still the size of v is 16 although it has 25 int numbers! The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?
The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?
You're confusing sizeof(std::vector) and std::vector::size(), the former will return the size of vector itself, not including the size of elements it holds. The latter will return the count of the elements, you can get all their size by std::vector::size() * sizeof(int).
so why is it 16? And why 16 and not another number?
What is sizeof(std::vector) depends on implmentation, mostly implemented with three pointers. For some cases (such as debug mode) the size might increase for the convenience.
std::vector is typically a structure which contains two elements: pointer (array) of its elements and size of the array (number of elements).
As size is sizeof(void *) and the pointer is also sizeof(void *), the size of the structure is 2*sizeof(void *) which is 16.
The number of elements has nothing to do with the size as the elements are allocated on the heap.
EDIT: As M.M mentioned, the implementation could be different, like the pointer, start, end, allocatedSize. So in 32-bit environment that should be 3*sizeof(size_t)+sizeof(void *) which might be the case here. Even the original could work with start hardcoded to 0 and allocatedSize computed by masking end so really dependent on implementation. But the point remains the same.
sizeof is evaluated at compile time, so it only counts the size of the variables declared in the class, which probably includes a couple of counters and a pointer. It's what the pointer points to that varies with the size, but the compiler doesn't know about that.
The size can be explained using pointers which can be: 1) begin of vector 2) end of vector and 3) vector's capacity. So it would be more of like implementation dependent and it will change for different implementation.
You seem to be mixing "array" with "vector". If you have a local array, sizeof will provide the size of the array indeed. However, vector is not an array. It is a class, a container from STL guaranteeing that the memory contents are located within a single block of memory (that may get relocated, if vector grows).
Now, if you take a look at the std::vector implementation, you'll notice it contains fields (at least in MSVC 14.0):
size_type _Count = 0;
typename _Alloc_types::_Alty _Alval; // allocator object (from base)
_Mylast
_Myfirst
That could sum up to 16 bytes under your implementation (note: experience may vary).

2D array access time comparison

I have two ways of constructing a 2D array:
int arr[NUM_ROWS][NUM_COLS];
//...
tmp = arr[i][j]
and flattened array
int arr[NUM_ROWS*NUM_COLS];
//...
tmp = arr[i*NuM_COLS+j];
I am doing image processing so even a little improvement in access time is necessary. Which one is faster? I am thinking the first one since the second one needs calculation, but then the first one requires two addressing so I am not sure.
I don't think there is any performance difference. System will allocate same amount of contiguous memory in both cases. For calculate i*Numcols+j, either you would do it for 1D array declaration, or system would do it in 2D case. Only concern is ease of usage.
You should have trust into the capabilities of your compiler in optimizing standard code.
Also you should have trust into modern CPUs having fast numeric multiplication instructions.
Don't bother to use one or another!
I - decades ago - optimized some code greatly by using pointers instead of using 2d-array-calculation --> but this will a) only be useful if it is an option to store the pointer - e.g. in a loop and b) have low impact since i guess modern cpus should do 2d array access in a single cycle? Worth measuring! May be related to the array size.
In any case pointers using ptr++ or ptr += NuM_COLS will for sure be a little bit faster if applicable!
The first method will almost always be faster. IN GENERAL (because there are always corner cases) processor and memory architecture as well as compilers may have optimizations built in to aid with 2d arrays or other similar data structures. For example, GPUs are optimized for matrix (2d array) math.
So, again in general, I would allow the compiler and hardware to optimize your memory and address arithmetic if possible.
...also I agree with #Paul R, there are much bigger considerations when it comes to performance than your array allocation and address arithmetic.
There are two cases to consider: compile time definition and run-time definition of the array size. There is big difference in performance.
Static allocation, global or file scope, fixed size array:
The compiler knows the size of the array and tells the linker to allocate space in the data / memory section. This is the fastest method.
Example:
#define ROWS 5
#define COLUMNS 6
int array[ROWS][COLUMNS];
int buffer[ROWS * COLUMNS];
Run time allocation, function local scope, fixed size array:
The compiler knows the size of the array, and tells the code to allocate space in the local memory (a.k.a. stack) for the array. In general, this means adding a value to a stack register. Usually one or two instructions.
Example:
void my_function(void)
{
unsigned short my_array[ROWS][COLUMNS];
unsigned short buffer[ROWS * COLUMNS];
}
Run Time allocation, dynamic memory, fixed size array:
Again, the compiler has already calculated the amount of memory required for the array since it was declared with fixed size. The compiler emits code to call the memory allocation function with the required amount (usually passed as a parameter). A little slower because of the function call and the overhead required to find some dynamic memory (and maybe garbage collection).
Example:
void another_function(void)
{
unsigned char * array = new char [ROWS * COLS];
//...
delete[] array;
}
Run Time allocation, dynamic memory, variable size:
Regardless of the dimensions of the array, the compiler must emit code to calculate the amount of memory to allocate. This quantity is then passed to the memory allocation function. A little slower than above because of the code required to calculate the size.
Example:
int * create_board(unsigned int rows, unsigned int columns)
{
int * board = new int [rows * cols];
return board;
}
Since your goal is image processing then I would assume your images are too large for static arrays. The correct question you should be about dynamically allocated arrays
In C/C++ there are multiple ways you can allocate a dynamic 2D array How do I work with dynamic multi-dimensional arrays in C?. To make this work in both C/C++ we can use malloc with casting (for C++ only you can use new)
Method 1:
int** arr1 = (int**)malloc(NUM_ROWS * sizeof(int*));
for(int i=0; i<NUM_ROWS; i++)
arr[i] = (int*)malloc(NUM_COLS * sizeof(int));
Method 2:
int** arr2 = (int**)malloc(NUM_ROWS * sizeof(int*));
int* arrflat = (int*)malloc(NUM_ROWS * NUM_COLS * sizeof(int));
for (int i = 0; i < dimension1_max; i++)
arr2[i] = arrflat + (i*NUM_COLS);
Method 2 essentially creates a contiguous 2D array: i.e. arrflat[NUM_COLS*i+j] and arr2[i][j] should have identical performance. However, arrflat[NUM_COLS*i+j] and arr[i][j] from method 1 should not be expected to have identical performance since arr1 is not contiguous. Method 1, however, seems to be the method that is most commonly used for dynamic arrays.
In general, I use arrflat[NUM_COLS*i+j] so I don't have to think of how to allocated dynamic 2D arrays.

Understanding the space occupied by an vector instance in C++

I'm trying to understand the amount of bytes occupied by an instance of std::vector. The following code:
vector <int>uno;
uno.push_back(1);
uno.push_back(1);
cout <<"1: "<< sizeof(uno)<<" bytes"<<endl;
cout << endl;
vector <bool>unos;
unos.push_back(true);
cout <<"2: "<< sizeof(unos)<<" bytes"<<endl;
cout << endl;
gives me this output:
1: 12 bytes
2: 20 bytes
Can someone explain me why the size of the vector<bool> has more bytes than the vector<int>?. And what is the correct way to measure the size in bytes of an vector instance... by adding the size of every member in the vector?
In C++, the sizeof operator is always evaluated at compile time, so every vector<T> will have the same size (unless vector<T> is a specialization like vector<bool>).
In case you are wondering about the 12 bytes, that is probably the size of 3 pointers, or alternatively, the size of a pointer and an element count and a capacity. The real data is never stored inside the vector.
If you want to know how much memory is used total, a good approximation is:
sizeof(std::vector<T>) + my_vector.capacity() * sizeof(T)
This is only an approximation because it does not take into account book-keeping data on the heap.
std::vector<bool> is larger since it is more complicated than std::vector<int>. The current C++ standard specifies that the vector<bool> data should be bit-packed. This requires some additional bookkeeping, and gives them a slightly different behavior than the normal std::vector<T>.
This different behavior of std::vector<bool> is very likely to be removed in the upcoming C++0x standard.