How is a 2D area layout in memory? Especially if its a staggered area. Given, to my understanding, that memory is contiguous going from Max down to 0, does the computer allocate each area in the area one after the other? If so, should one of the areas in the area need to be resized, does it shift all the other areas down as to make space for the newly sized area?
If specifics are needed:
C++17/14/11
Clang
linux x86
Revision: (thanks user4581301)
I'm referring to having a vector<vector<T>> where T is some defined type. I'm not talking template programming here unless that doesn't change anything.
The precise details of how std::vector is implemented will vary from compiler to compiler, but more than likely, a std::vector contains a size_t member that stores the length and a pointer to the storage. It allocates this storage using whatever allocator you specify in the template, but the default is to use new, which allocates them off the heap. You probably know this, but typically the heap is the area of RAM below the stack in memory, which grows from the bottom up as the stack grows from the top down, and which the runtime manages by tracking which blocks of it are free.
The storage managed by a std::vectoris a contiguous array of objects, so a vector of twenty vectors of T would contain at least a size_t storing the value 20, and a pointer to an array of twenty structures each containing a length and a pointer. Each of those pointers would point to an array of T, stored contiguously in memory.
If you instead create a rectangular two-dimensional array, such as T table[ROWS][COLUMNS], or a std::array< std::array<T, COLUMNS>, ROWS >, you will instead get a single continuous block of T elements stored in row-major order, that is: all the elements of row 0, followed by all the elements of row 1, and so on.
If you know the dimensions of the matrix in advance, the rectangular array will be more efficient because you’ll only need to allocate one block of memory. This is faster because you’ll only need to call the allocator and the destructor one time, instead of once per row, and also because it will be in one place, not split up over many different locations, and therefore the single block is more likely to be in the processor’s cache.
vectors are thin wrappers around a dynamically allocated array of their elements. For a vector<vector<T>>, this means that the outer vector's internal array contains the inner vector structures, but the inner vectors allocate and manage their own internal arrays separately (the structure contains a pointer to the managed array).
Essentially, the 2D aspect is purely in the program logic; the elements of any given "row" are contiguous, but there is no specified spacial relationship between the rows.
True 2D arrays (where the underlying memory is allocated as a single block) only really happen with C-style arrays declared with 2D syntax (int foo[10][20];) and nested std::array types, or POD types following the same basic design.
Related
I am developing a C++ library focused on multidimensional arrays and relevant operations involving these objects. The data for my "Tensor<T,n>" class (which corresponds to an n-dimensional array whose elements are of some numeric type T) is stored in a std::vector object and the elements are accessed via indices by calculating the appropriate index in the one-dimensional data vector using the concept of strides.
I understand that stack allocation is faster than heap allocation, and is generally safer. However, I also understand that heap allocation may be necessary for incredibly large data structures, and that users of this library may need to work with incredibly large data structures. My conclusion from this information is twofold:
for relatively small multidimensional arrays (in terms of number of elements), I should allocate this information on the stack.
for relatively large multidimensional arrays (in terms of number of elements), I should allocate this information on the heap.
I argue that this conclusion implies the existence of a "breaking point" as the size of a hypothetical array increases, at which I should switch from stack allocation to heap allocation. However, I have not been successful in finding resources that might assist me in determining where exactly this "breaking point" could be implemented to optimize efficiency for my user.
Assuming my conclusion is correct, how can I rigorously determine when to switch between the two types of allocation?
A std::vector has a constant size and allocates the actual data always on the heap. So no matter what matrix size you have the Matrix class will be constant size and always store data on the heap.
If you want a heap-free version then you would have to implement a Matrix with std::array. You could use if constexpr to choose between vector and array.
Note that heap allocation cost some time. But they also allow move semantic. The "break even" point might be as little as 4x4 matrixes but you have to test that. new/delete isn't that expensive if it can reuse memory and doesn't have to ask the kernel for more.
Oh, and there is also the option of using a custom allocator for the vector.
A std::vector<T> has the property of storing its elements continuously in memory. But what about a std::vector<std::vector<T>>? The elements within an individual std::vector<T> are continuous, but are the vectors themselves continuous in memory (that is, the whole data kept in the outer vector would be one memory block)? Would not this imply, that if I resize one of the inner vectors I would have to copy (or move) many objects to preserve continuity?
And what about a std::array<std::array<T,N>,N>? Here I would assume memory continuity just as for T[N][N]
A std::vector<T> internally stores a pointer to dynamically allocated memory. So while the elements of a single std::vector<T> will be continuous in memory there is no relationship to any pointers that those elements store themselves.
Therefore a std::vector<std::vector<T>> will not have the elements of the "inner vectors" stored continuously in relation to each other. It would also be impossible for a resize of one of those "inner vectors" to affect the others, since at the point where the resize happens there is no information about any relationship to other std::vector<T> objects.
On the other hand a std::array is a (thin) wrapper around a C-style array and uses neither dynamic allocation nor is resizable, so a std::array<std::array<T,N>,N> will have the same memory layout as a T[N][N] array.
vector container holds object in continuous memory. it is easy to understand for cases like vector. but what if it is a vector of vectors, like vector>, each vector in this vector of vectors can have different length. how does it manage the memory? Does it allocate a fixed length vector every time we push in a new vector? if so, what will happen if the first vector grows out of size during push_back. would it trigger a full vector of vector reallocate and copy/move?
A vector is a pointer to a dynamic array. If you push_back and find you're out of space in the array you have, you allocate a new, bigger array, copy over everything from the old array, and then stick the new value in.
If you have a vector of vectors, the same holds true for each of the inner vectors.
What you need to understand here is that a vector of vectors (unlike a 2D array), is not contiguous in memory. Each of the inner vectors' arrays can be stored anywhere in memory. Or in other words, "each vector in a vector of vectors is a completely different vector. Each with their own, completely separate and separately managed buffer.1"
1. Thanks to user4581301 for this!
A vector contains a pointer to a contiguous memory block. When it runs out of memory, it allocates a new memory block. A vector of vectors is just a vector of pointers to memory blocks. Although each memory block is a contiguous block, they are not necessarily contiguous to each other, that is, not necessarily when one vector ends, the next one starts, there is almost always a gap.
Why the not necessarily and almost always semantics? Because it depends on the memory allocator you're using and on the operating system internals. Ultimately, it's (one of) the job(s) of the OS to allocate and serve memory blocks to user-space programs.
Background: I want to implement a 3-d collision detection algorithm and would like to fragment the search space into cubes so that I only check for collisions when objects are close enough.
Actual Question: I was thinking of using an array of vectors to store pointers to the objects I am going to iterate over. For example box[0][0][0] would be a vector holding pointers to the objects in one corner of the simulation space. Regardless of whether this is an optimal solution, I am wondering how c++ handles arrays of vectors. Would the array hold pointers to the vectors so that their subsequent re-allocation have no effect on the validity of the array, or would the vectors be created inside the array and then moved out, causing undefined behavior?
Similar questions did not have answers specific to this implementation detail. Sorry if this is actually answered elsewhere and I missed it.
An STL vector holds a pointer to a heap buffer that contains the actual data. This allows the vector to resize the buffer on demand without invalidating the vector object itself. (See documentation of vector)
So, to answer your question. An array of vectors will not become invalid if one of the vectors needs to be resized. An array of pointers to vectors would also not become invalid if one of the vectors needs to be resized.
In STL a vector is an implementation of a dynamic array ( an array that can be resided on the fly). This essentially means that the array is dynamically allocated, and the user gets a pointer to the array on the heap. When more space is needed, a new array is allocated (usually double of it's previous size), the contents of the old one is copied over and the old array freed. And that is how data consistency is handled.
Now, when you have a array of vectors, statically allocated like the question shows, you have in memory (stack, or .data section, depending where you declare this array) an or 3 vector objects, allocated one after the other in memory, each one will hold a pointer to the an array allocated on the heap.
I hope this answers your question.
I know vectors are guaranteed to be contiguous memory, and so are arrays. So what happens when I do something like this:
std::vector<uint8_t> my_array[10];
my_array[2].push_back(11);
my_array[2].push_back(7);
What would the memory look like? If both need to be contiguous, would every element of the array after my_array[2] be pushed forward a byte every time I do a push_back() on my_array[2]?
Would this be the same situation as when I have an array of structs, where the structs have a member that has a variable size, such as a string or another vector?
Memory footprint of std::vector consists of two parts:
The memory for the std::vector object itself (very small, and independent of the size), and
The memory for the data of the vector (depends on the number of elements in the vector).
The first kind of data will be contiguous in an array; the second kind of data is allocated dynamically, so it would not be contiguous in an array.
This would not be the same as with a C struct that has a flexible data member, because the data portion of std::vector is not always allocated in the same kind of memory, let alone being adjacent to it. The vector itself may be allocated in static, dynamic, or automatic memory areas, while its data is always in the dynamic area. Moreover, when vector is resized, the memory for its data may be moved to a different region.
Each time you call push_back, std::vector checks if it has enough dynamic memory to accommodate the next data element. If there is not enough memory, then the vector allocates a bigger chunk of memory, and moves its current content there before pushing the new item.
The vector memory structure is contiguous in memory; however std::vector's all contain a pointer pointing to dynamically allocated memory for the actual storage (which is very very likely not contiguous).
Knowing this, std::vector::push_back will only check to see if the (external) dynamically allocated array has enough capacity to hold the new item, if not it will reallocate space. A push_back on the first vector that overflows will not cause the second vector in the array to reallocate memory, that isn't how it works.
Also, there is no such thing as a struct having a variable size, the size of structures and classes have to be known at compile time.
std::string also has a fixed size, although you may think it is variable, because it also (like vector) has a pointer to the char* it contains.