Difference in size between std::vector and std::deque - c++

Right after I declare a vector<int> and a deque<int> if I print out sizeof on both of them, the std::vector has 12 bytes ( I guess begin, end and size ) while deque has 40 bytes. Where do those extra bytes come from?
I'm using Code::Blocks IDE 13.12 and I did select the C++11 standard to be used.

The size is implementation dependent; but it's not surprising that the control structure for a more complicated structure like deque might be larger than a simple one like vector (which, as you say, can be easily managed by three pointers, or a pointer and two sizes).
Looking at the GNU implementation, it stores precalculated "start" and "end" iterators, presumably because they're difficult to calculate on demand, and easier to update when the container changes. Each of those is quite complicated, containing four pointers: the current position, the start and end of the current block, and a pointer to a map structure needed to move between blocks. The deque also has a (possibly redundant?) pointer to that map, and the size (again, presumably difficult to calculate on demand), giving a total of ten pointers/sizes, as you observed.

The sizeof operator takes the size of the object and not the size of its contents.
So if an object have three data member variables, the sizeof operator will get the size of those member variables, but if one of them is a pointer to more data then you only get the size of the pointer and not the length of the data it points to.

sizeof does not give you the number of elements in a container; rather the compile time evaluated size of the structure.
So the actual value is down to the implementation of the standard library and, as such, can vary from platform to platform.

Related

How do I best force-flatten a (one dimensional) vector for N values?

I need something that behaves like an std::vector (interface/features/etc.) but I need it to be flat, i.e. it mustn't dynamically allocate a buffer. Clearly, this doesn't work in general, as the available size must be determined at compile time. But I want the type to be able to deal with N objects without additional allocations, and only if further items are pushed resort to dynamic allocation.
Some implementations of std::vector already do this, but only to the extent that it uses its existing members if the accumulated size of the content fits (I believe about three pointers-worth of payload). So, firstly, this is not a guarantee and secondly it is not configurable at compile time.
My thoughts are that I could either
A) self-cook a type (probably bad because I'd loose the ridiculous performance optimisations from vector)
B) use some sort of variant<vector<T>,array<T,N>> with an access wrapper (oh, the boilerplate)
C) come up with a MyAllocator<T,N> that has an array<T,N> member which then may be used to hold the first N items and after this defer to allocator<T> but I'm not sure if this can work because I cannot find out whether vector must permanently hold an instance of its allocator type as a member (I believe it does not)
I figure I'm not the first person to want this, so perhaps there are already approaches to this? Some empirical values or perhaps even a free library?
You might find folly/small_vector of use.
folly::small_vector is a sequence container that
implements small buffer optimization. It behaves similarly to
std::vector, except until a certain number of elements are reserved it
does not use the heap.
Like standard vector, it is guaranteed to use contiguous memory. (So,
after it spills to the heap all the elements live in the heap buffer.)
Simple usage example:
small_vector<int,2> vec;
vec.push_back(0); // Stored in-place on stack
vec.push_back(1); // Still on the stack
vec.push_back(2); // Switches to heap buffer.
// With space for 32 in situ unique pointers, and only using a
// 4-byte size_type.
small_vector<std::unique_ptr<int>, 32, uint32_t> v;
// A inline vector of up to 256 ints which will not use the heap.
small_vector<int, 256, NoHeap> v;
// Same as the above, but making the size_type smaller too.
small_vector<int, 256, NoHeap, uint16_t> v;

Why does std::vector<bool> have no .data()?

The specialisation of std::vector<bool>, as specified in C++11 23.3.7/1, doesn't declare a data() member (e.g. mentioned here and here).
The question is: Why does a std::vector<bool> have no .data()? This is the very same question as why is a vector of bools not stored contiguously in memory. What are the benefits in not doing so?
Why can a pointer to an array of bools not be returned?
Why does a std::vector have no .data()?
Because std::vector<bool> stores multiple values in 1 byte.
Think about it like a compressed storage system, where every boolean value needs 1 bit. So, instead of having one element per memory block (one element per array cell), the memory layout may look like this:
Assuming that you want to index a block to get a value, how would you use operator []? It can't return bool& (since it will return one byte, which stores more than one bools), thus you couldn't assign a bool* to it. In other words bool *bool_ptr =&v[0]; is not valid code, and would result in a compilation error.
Moreover, a correct implementation might not have that specialization and don't do the memory optimization (compression). So data() would have to copy to the expected return type depending of implementation (or standard should force optimization instead of just allowing it).
Why can a pointer to an array of bools not be returned?
Because std::vector<bool> is not stored as an array of bools, thus no pointer can be returned in a straightforward way. It could do that by copying the data to an array and return that array, but it's a design choice not to do that (if they did, I would think that this works as the data() for all containers, which would be misleading).
What are the benefits in not doing so?
Memory optimization.
Usually 8 times less memory usage, since it stores multiple bits in a single byte. To be exact, CHAR_BIT times less.

Are there any advantages of C++ Arrays over Vectors? [duplicate]

What are the differences between an array and a vector in C++? An example of the differences might be included libraries, symbolism, abilities, etc.
Array
Arrays contain a specific number of elements of a particular type. So that the compiler can reserve the required amount of space when the program is compiled, you must specify the type and number of elements that the array will contain when it is defined. The compiler must be able to determine this value when the program is compiled. Once an array has been defined, you use the identifier for the array along with an index to access specific elements of the array. [...] arrays are zero-indexed; that is, the first element is at index 0. This indexing scheme is indicative of the close relationship in C++ between pointers and arrays and the rules that the language defines for pointer arithmetic.
— C++ Pocket Reference
Vector
A vector is a dynamically-sized sequence of objects that provides array-style operator[] random access. The member function push_back copies its arguments via copy constructor, adds that copy as the last item in the vector and increments its size by one. pop_back does the exact opposite, by removing the last element. Inserting or deleting items from the end of a vector takes amortized constant time, and inserting or deleting from any other location takes linear time. These are the basics of vectors. There is a lot more to them. In most cases, a vector should be your first choice over a C-style array. First of all, they are dynamically sized, which means they can grow as needed. You don't have to do all sorts of research to figure out an optimal static size, as in the case of C arrays; a vector grows as needed, and it can be resized larger or smaller manually if you need to. Second, vectors offer bounds checking with the at member function (but not with operator[]), so that you can do something if you reference a nonexistent index instead of simply watching your program crash or worse, continuing execution with corrupt data.
— C++ Cookbook
arrays:
are a builtin language construct;
come almost unmodified from C89;
provide just a contiguous, indexable sequence of elements; no bells and whistles;
are of fixed size; you can't resize an array in C++ (unless it's an array of POD and it's allocated with malloc);
their size must be a compile-time constant unless they are allocated dynamically;
they take their storage space depending from the scope where you declare them;
if dynamically allocated, you must explicitly deallocate them;
if they are dynamically allocated, you just get a pointer, and you can't determine their size; otherwise, you can use sizeof (hence the common idiom sizeof(arr)/sizeof(*arr), that however fails silently when used inadvertently on a pointer);
automatically decay to a pointers in most situations; in particular, this happens when passing them to a function, which usually requires passing a separate parameter for their size;
can't be returned from a function; (Unless it is std::array)
can't be copied/assigned directly;
dynamical arrays of objects require a default constructor, since all their elements must be constructed first;
std::vector:
is a template class;
is a C++ only construct;
is implemented as a dynamic array;
grows and shrinks dynamically;
automatically manage their memory, which is freed on destruction;
can be passed to/returned from functions (by value);
can be copied/assigned (this performs a deep copy of all the stored elements);
doesn't decay to pointers, but you can explicitly get a pointer to their data (&vec[0] is guaranteed to work as expected);
always brings along with the internal dynamic array its size (how many elements are currently stored) and capacity (how many elements can be stored in the currently allocated block);
the internal dynamic array is not allocated inside the object itself (which just contains a few "bookkeeping" fields), but is allocated dynamically by the allocator specified in the relevant template parameter; the default one gets the memory from the freestore (the so-called heap), independently from how where the actual object is allocated;
for this reason, they may be less efficient than "regular" arrays for small, short-lived, local arrays;
when reallocating, the objects are copied (moved, in C++11);
does not require a default constructor for the objects being stored;
is better integrated with the rest of the so-called STL (it provides the begin()/end() methods, the usual STL typedefs, ...)
Also consider the "modern alternative" to arrays - std::array; I already described in another answer the difference between std::vector and std::array, you may want to have a look at it.
I'll add that arrays are very low-level constructs in C++ and you should try to stay away from them as much as possible when "learning the ropes" -- even Bjarne Stroustrup recommends this (he's the designer of C++).
Vectors come very close to the same performance as arrays, but with a great many conveniences and safety features. You'll probably start using arrays when interfacing with API's that deal with raw arrays, or when building your own collections.
Those reference pretty much answered your question. Simply put, vectors' lengths are dynamic while arrays have a fixed size.
when using an array, you specify its size upon declaration:
int myArray[100];
myArray[0]=1;
myArray[1]=2;
myArray[2]=3;
for vectors, you just declare it and add elements
vector<int> myVector;
myVector.push_back(1);
myVector.push_back(2);
myVector.push_back(3);
...
at times you wont know the number of elements needed so a vector would be ideal for such a situation.

Memory allocation of C++ vector<bool>

The vector<bool> class in the C++ STL is optimized for memory to allocate one bit per bool stored, rather than one byte. Every time I output sizeof(x) for vector<bool> x, the result is 40 bytes creating the vector structure. sizeof(x.at(0)) always returns 16 bytes, which must be the allocated memory for many bool values, not just the one at position zero. How many elements do the 16 bytes cover? 128 exactly? What if my vector has more or less elements?
I would like to measure the size of the vector and all of its contents. How would I do that accurately? Is there a C++ library available for viewing allocated memory per variable?
I don't think there's any standard way to do this. The only information a vector<bool> implementation gives you about how it works is the reference member type, but there's no reason to assume that this has any congruence with how the data are actually stored internally; it's just that you get a reference back when you dereference an iterator into the container.
So you've got the size of the container itself, and that's fine, but to get the amount of memory taken up by the data, you're going to have to inspect your implementation's standard library source code and derive a solution from that. Though, honestly, this seems like a strange thing to want in the first place.
Actually, using vector<bool> is kind of a strange thing to want in the first place. All of the above is essentially why its use is frowned upon nowadays: it's almost entirely incompatible with conventions set by other standard containers… or even those set by other vector specialisations.

std::vector versus std::array in C++

What are the difference between a std::vector and an std::array in C++? When should one be preferred over another? What are the pros and cons of each? All my textbook does is list how they are the same.
std::vector is a template class that encapsulate a dynamic array1, stored in the heap, that grows and shrinks automatically if elements are added or removed. It provides all the hooks (begin(), end(), iterators, etc) that make it work fine with the rest of the STL. It also has several useful methods that let you perform operations that on a normal array would be cumbersome, like e.g. inserting elements in the middle of a vector (it handles all the work of moving the following elements behind the scenes).
Since it stores the elements in memory allocated on the heap, it has some overhead in respect to static arrays.
std::array is a template class that encapsulate a statically-sized array, stored inside the object itself, which means that, if you instantiate the class on the stack, the array itself will be on the stack. Its size has to be known at compile time (it's passed as a template parameter), and it cannot grow or shrink.
It's more limited than std::vector, but it's often more efficient, especially for small sizes, because in practice it's mostly a lightweight wrapper around a C-style array. However, it's more secure, since the implicit conversion to pointer is disabled, and it provides much of the STL-related functionality of std::vector and of the other containers, so you can use it easily with STL algorithms & co. Anyhow, for the very limitation of fixed size it's much less flexible than std::vector.
For an introduction to std::array, have a look at this article; for a quick introduction to std::vector and to the the operations that are possible on it, you may want to look at its documentation.
Actually, I think that in the standard they are described in terms of maximum complexity of the different operations (e.g. random access in constant time, iteration over all the elements in linear time, add and removal of elements at the end in constant amortized time, etc), but AFAIK there's no other method of fulfilling such requirements other than using a dynamic array. As stated by #Lucretiel, the standard actually requires that the elements are stored contiguously, so it is a dynamic array, stored where the associated allocator puts it.
To emphasize a point made by #MatteoItalia, the efficiency difference is where the data is stored. Heap memory (required with vector) requires a call to the system to allocate memory and this can be expensive if you are counting cycles. Stack memory (possible for array) is virtually "zero-overhead" in terms of time, because the memory is allocated by just adjusting the stack pointer and it is done just once on entry to a function. The stack also avoids memory fragmentation. To be sure, std::array won't always be on the stack; it depends on where you allocate it, but it will still involve one less memory allocation from the heap compared to vector. If you have a
small "array" (under 100 elements say) - (a typical stack is about 8MB, so don't allocate more than a few KB on the stack or less if your code is recursive)
the size will be fixed
the lifetime is in the function scope (or is a member value with the same lifetime as the parent class)
you are counting cycles,
definitely use a std::array over a vector. If any of those requirements is not true, then use a std::vector.
If you are considering using multidimensional arrays, then there is one additional difference between std::array and std::vector. A multidimensional std::array will have the elements packed in memory in all dimensions, just as a c style array is. A multidimensional std::vector will not be packed in all dimensions.
Given the following declarations:
int cConc[3][5];
std::array<std::array<int, 5>, 3> aConc;
int **ptrConc; // initialized to [3][5] via new and destructed via delete
std::vector<std::vector<int>> vConc; // initialized to [3][5]
A pointer to the first element in the c-style array (cConc) or the std::array (aConc) can be iterated through the entire array by adding 1 to each preceding element. They are tightly packed.
A pointer to the first element in the vector array (vConc) or the pointer array (ptrConc) can only be iterated through the first 5 (in this case) elements, and then there are 12 bytes (on my system) of overhead for the next vector.
This means that a std::vector> array initialized as a [3][1000] array will be much smaller in memory than one initialized as a [1000][3] array, and both will be larger in memory than a std:array allocated either way.
This also means that you can't simply pass a multidimensional vector (or pointer) array to, say, openGL without accounting for the memory overhead, but you can naively pass a multidimensional std::array to openGL and have it work out.
Summarizing the above discussion in a table for quick reference:
C-Style Array
std::array
std::vector
Size
Fixed/Static
Fixed/Static
Dynamic
Memory efficiency
More efficient
More Efficient
Less efficient (May double its size on new allocation.)
Copying
Iterate over elements or use std::copy()
Direct copy: a2 = a1;
Direct copy: v2 = v1;
Passing to function
Passed by pointer. (Size not available in function)
Passed by value
Passed by value (Size available in that function)
Size
sizeof(a1) / sizeof(a1[0])
a1.size()
v1.size()
Use case
For quick access and when insertions/deletions not frequently needed.
Same as classic array but safer and easier to pass and copy.
When frequent additions or deletions might be needed
Using the std::vector<T> class:
...is just as fast as using built-in arrays, assuming you are doing only the things built-in arrays allow you to do (read and write to existing elements).
...automatically resizes when new elements are inserted.
...allows you to insert new elements at the beginning or in the middle of the vector, automatically "shifting" the rest of the elements "up"( does that make sense?). It allows you to remove elements anywhere in the std::vector, too, automatically shifting the rest of the elements down.
...allows you to perform a range-checked read with the at() method (you can always use the indexers [] if you don't want this check to be performed).
There are two three main caveats to using std::vector<T>:
You don't have reliable access to the underlying pointer, which may be an issue if you are dealing with third-party functions that demand the address of an array.
The std::vector<bool> class is silly. It's implemented as a condensed bitfield, not as an array. Avoid it if you want an array of bools!
During usage, std::vector<T>s are going to be a bit larger than a C++ array with the same number of elements. This is because they need to keep track of a small amount of other information, such as their current size, and because whenever std::vector<T>s resize, they reserve more space then they need. This is to prevent them from having to resize every time a new element is inserted. This behavior can be changed by providing a custom allocator, but I never felt the need to do that!
Edit: After reading Zud's reply to the question, I felt I should add this:
The std::array<T> class is not the same as a C++ array. std::array<T> is a very thin wrapper around C++ arrays, with the primary purpose of hiding the pointer from the user of the class (in C++, arrays are implicitly cast as pointers, often to dismaying effect). The std::array<T> class also stores its size (length), which can be very useful.
A vector is a container class while an array is an allocated memory.