Size of struct with vector - c++

I am trying to find the difference in size from a struct with vector of object and struct with a vector of object pointers.
The code I have written shows that size of the both structs are the same even in theory at least based their content they should be different.
What would be the correct way of finding the correct size of a struct based on it's contents?
#include <iostream>
#include <string>
#include <vector>
using namespace std;
struct Song{
string Name;
string Artist;
};
struct Folder{
vector<Song*> list;
};
struct Library{
vector<Song> songs;
};
int main(int argc, const char * argv[]) {
Library library;
Folder favorites;
Folder recentPurchaces;
library.songs.push_back(Song{"Human After All", "Daft Punk"});
library.songs.push_back(Song{"All of my love", "Led Zepplin"});
favorites.list.push_back(&library.songs[0]);
favorites.list.push_back(&library.songs[2]);
cout << "Size of library: " << sizeof(library) << endl;
cout << "Size of favorites: " << sizeof(favorites) << endl;
return 0;
}

in theory at least based their content they should be different.
No, the sizes shouldn't be different: std::vector<T> stores the data in dynamically allocated storage, while the struct stores only the "anchor" part, consisting of a couple of pointers. The number of items inside the vectors, as well as the size of the items inside the vector itself, are not counted in determining the size of this footprint.
In order to compute the size in memory you need to write a function that adds up sizes of the individual items inside each container, and add the capacity of a container itself, times the size of a container item.

Most likely, std::vector just keeps a pointer to a dynamically allocated array of elements, and the size of that pointer is the same regardless of whether it's Song* or Song**. The allocated size of the pointed-to memory would, of course, likely be different.
Put another way, sizeof() is not a good way to measure how much memory a std::vector requires.

Why would you expect the size of the structures to be different? std::vector stores its data via dynamic allocation ("on the heap"), so there is no reason for its implementation to contain anything else than a few pointers.
Exact implementation details depend on the standard library implementation, of course, but a typical std::vector<T, Alloc> could just contain something like this:
template <class T, class Alloc = allocator<T>>
class vector
{
T *_Begin, *_End, *_EndOfCapacity;
Alloc _Allocator;
// No other data members
public:
/* ... */
};

Related

C++ Growth of containers containing containers?

If I have a std::vector<std::set<int>>. The vector will reallocate if you insert past its capacity. In the case where you have another resizable type inside the vector, is the vector only holding a pointer to the said type?
In particular I want to know about how memory is allocated if a vector is holding an arbitrary type.
std::vector<int> a(10); //Size will be sizeof(int) * 10
std::vector<std::set<int>> b(10);
b[0] = {0, 0, 0, 0, 0, 0, 0, .... }; //Is b's size effected by the sets inside?
C++ objects can only have one size, but may include pointers to arbitrarily sized heap memory. So, yes, container objects themselves generally include a pointer to heap memory and probably don't include any actual items. (The only typical exception is string types, which sometimes have a "small string optimization" that allows string objects to contain small strings directly in the object without allocating heap memory.)
The memory that any vector will allocate "by itself" will always be sizeof(element_type) * vector.size().
The vector can only allocate memory for element data that is visible at compile time. It doesn't care about any allocations done by the element class.
Think of a vector as an array on steroids. Like an array, a vector consists of a contiguous block of memory where all elements have the same size. To fullfill this requirement it must know at compile time how big each element will be.
Imagine a std::set to have these member variables:
struct SomeSet
{
size_t size;
SomeMagicInternalType* data;
};
So no matter how data will be allocated at runtime, the vector only allocates memory per element for what it knows at compile time:
sizeof(SomeSet::size) + sizeof(SomeSet::data)
Which would be 4 + 4 on a 32-bit machine.
Consider this example:
#include <iostream>
#include <vector>
int main() {
std::vector<int> v;
std::cout << sizeof(v) << "\n";
std::cout << v.size() << "\n";
v.push_back(3);
std::cout << sizeof(v) << "\n";
std::cout << v.size() << "\n";
}
The exact number may differ, but I get as output:
24
0
24
1
The size (size=size of the object) of a vector does not change when you add an element. The same is true for a set, thus a vector<set> does not need to reallocate if one of its elements adds or removes an element.
A set does not store its elements as members, otherwise sets with different number of elements would be different types. They are stored on the heap and as such do not contribute to the size of the set directly.
A std::vector<T> holds objects of type T. When it gets resized it copies or moves those objects as needed. A std::vector<std::set<int>> is no different; it holds objects of type std::set<int>.

What is the best way to populate a boost::multi_array from an initializer list?

I'd like to initialize a boost::multi_array inline in some code. But I don't think the boost::multi_array supports initialization from an initializer list. Here's what I have so far:
// First create a primitive array, which can be directly initialized
uint8_t field_primitive[4][8] = {
{ 1,1,1,1,1,1,1,1 },
{ 1,2,1,2,1,2,1,2 },
{ 1,1,2,2,2,2,2,2 },
{ 1,2,2,2,2,2,2,2 }
};
// Create the boost::multi_array I actually want to use
boost::multi_array<uint8_t, 2> field(boost::extents[4][8]);
// Compact but yucky approach to copying the primitive array contents into the multi_array.
memcpy(field.data(), field_primitive, field.num_elements() * sizeof(uint8_t));
I like that I can compactly express the matrix contents using the curly brace intializer list. But I don't like the "memcpy" and I don't like the use of a throwaway primitive array. Is there a nicer way to populate my boost::multi_array from a readable inline set of values in the code?
The following example from the official boost documentation concerning multi_arrayuses memcpy, too, though in combination with origin(). So it seems to be OK using it:
#include <boost/multi_array.hpp>
#include <algorithm>
#include <iostream>
#include <cstring>
int main()
{
boost::multi_array<char, 2> a{boost::extents[2][6]};
typedef boost::multi_array<char, 2>::array_view<1>::type array_view;
typedef boost::multi_array_types::index_range range;
array_view view = a[boost::indices[0][range{0, 5}]];
std::memcpy(view.origin(), "tsooB", 6);
std::reverse(view.begin(), view.end());
std::cout << view.origin() << '\n';
boost::multi_array<char, 2>::reference subarray = a[1];
std::memcpy(subarray.origin(), "C++", 4);
std::cout << subarray.origin() << '\n';
}
Concerning the difference between origin() and data(), the multiarray reference manual defines the following:
element* data(); This returns a pointer to the beginning of the
contiguous block that contains the array's data. If all dimensions of
the array are 0-indexed and stored in ascending order, this is
equivalent to origin().
element* origin(); This returns the origin element of the multi_array.
So there seem to be two things to consider when using data() and origin() together with memcpy, ff the array contains dimensions that are not 0-indexed or not in ascending order:
First, origin() might not point to the start of the continuous memory block used by the array. Hence, copying memory in the size of the multiarray to this location might exceed the reserved memory block.
Second, on the other hand, copying a memory block to data()'s address might lead to a memory layout where array indices as accessed through multiarray do not correspond to the indizes of the memory block copied into the array's internal data buffer.
So it seems to me that using memcpy to (pre)-fill a multiarray should be used with care and ideally with 0-based indices and ascending order.

Is there a way to print the amount of heap memory an object has allocated?

In a running program, how can I track/print the amount of heap memory an object has allocated?
For example:
#include <iostream>
#include <vector>
int main(){
std::vector<int> v;
std::cout << heap_sizeof(v) << '\n';
for (int i = 0; i < 1000; ++i){
v.push_back(0);
}
std::cout << heap_sizeof(v) << '\n';
}
Is there an implementation that could substitute heap_sizeof()?
With everything as it's designed out of the box, no, that's not possible. You do have a couple of options for doing that on your own though.
If you need this exclusively for standard containers, you can implement an allocator that tracks the memory that's been allocated (and not freed) via that allocator.
If you want this capability for everything allocated via new (whether a container or not) you can provide your own implementation of operator new on a global and/or class-specific basis, and have it (for example) build an unordered map from pointers to block sizes to tell you the size of any block it's allocated (and with that, you'll have to provide a function to retrieve that size). Depending on the platform, this might also be implemented using platform-specific functions. For example, when you're building for Microsoft's compiler (well, library, really) your implementation of operator new wouldn't have to do anything special at all, and the function to retrieve a block's size would look something like this:
size_t block_size(void const *block) {
return _msize(block);
}
Yet another possibility would be to increase the allocation size of each requested block by the size of an integer large enough to hold the size. In this case, you'd allocate a bigger chunk of data than the user requested, and store the size of that block at the beginning of the block that was returned. When the user requests the size of a block, you take the correct (negative) offset from the pointer they pass, and return the value you stored there.
First, v is allocated on the stack, not on the heap.
To get the total amount of space used by it, I suggest using this function: (Found on this article, and modified a bit)
template <typename T>
size_t areaof (const vector<T>& x)
{
return sizeof (vector<T>) + x.capacity () * sizeof (T);
}
If you want not to count the size of the std::vector object itself, the delete the part with sizeof:
template <typename T>
size_t heap_sizeof (const vector<T>& x)
{
return x.capacity () * sizeof (T);
}
If you are not concerned with accounting for what each object allocates and are more concerned with how much memory has been allocated/freed between to point in time, you can use the malloc statistics functions. Each malloc has its own version. On linux you can usemallocinfo().

why constant size of struct despite having a vector of int

I have defined a struct which contains a vector of integer. Then I insert 10 integers in the vector and check for the size of struct. But I see no difference.
Here is my code:
struct data
{
vector<int> points;
}
int main()
{
data d;
cout << sizeof(d) << endl;
for (int i=0; i< 10; ++i)
d.points.push_back(i)
cout << sizeof(d) << endl;
In both the cases I am getting the same result : 16
Why is it so? Shouldn't the size of struct grow?
A vector will store its elements in dynamically allocated memory (on the heap). Internally, this might be represented as:
T* elems; // Pointer memory.
size_t count; // Current number of elements.
size_t capacity; // Total number of elements that can be held.
so the sizeof(std::vector) is unaffected by the number of elements it contains as it calculating the sizeof its contained members (in this simple example roughly sizeof(T*) + (2 * sizeof(size_t))).
The sizeof operator is a compile time operation that gives you the size of the data structure used to maintain the container, not including the size of the stored elements.
While this might not seem too intuitive at first, consider that when you use a std::vector you are using a small amount of local storage (where the std::vector is created) which maintains pointers to a different region holding the actual data. When the vector grows the data block will grow, but the control structure is still the same.
The fact that sizeof will not change during it's lifetime is important, as it is the only way of making sure that the compiler can allocate space for points inside data without interfering with other possible members:
struct data2 {
int x;
std::vector<int> points;
int y;
};
If the size of the object (std::vector in this case) was allowed to grow it would expand over the space allocated for y breaking any code that might depend on its location:
data2 d;
int *p = &d.y;
d.points.push_back(5);
// does `p` still point to `&d.y`? or did the vector grow over `y`?

Best Replacement for a Character Array

we have a data structure
struct MyData
{
int length ;
char package[MAX_SIZE];
};
where MAX_SIZE is a fixed value . Now we want to change it so as to support
"unlimited" package length greater than MAX_SIZE . one of the proposed solution
is to replace the static array with a pointer and then dynamically allocating
the size as we require For EX
struct MyData
{
int length ;
char* package;
};
and then
package = (char*)malloc(SOME_RUNTIME_SIZE) ;
Now my question is that is this the most efficient way to cater to the requirement OR is there any other method .. maybe using STL data structures like growable arrays etc etc .
we want a solution where most of the code that works for the static char array should work for the new structure too ..
Much, much better/safer:
struct my_struct
{
std::vector<char>package;
};
To resize it:
my_struct s;
s.package.resize(100);
To look at how big it is:
my_struct s;
int size = s.package.size();
You can even put the functions in the struct to make it nicer:
struct my_struct
{
std::vector<char>package;
void resize(int n) {
package.resize(n);
}
int size() const {
return package.size();
}
};
my_struct s;
s.resize(100);
int z = s.size();
And before you know it, you're writing good code...
using STL data structures like growable arrays
The STL provides you with a host of containers. Unfortunately, the choice depends on your requirements. How often do you add to the container? How many times do you delete? Where do you delete from/add to? Do you need random access? What performance gurantees do you need? Once you have a sufficiently clear idea about such things look up vector, deque, list, set etc.
If you can provide some more detail, we can surely help pick a proper one.
I would also wrap a vector:
// wraps a vector. provides convenience conversion constructors
// and assign functions.
struct bytebuf {
explicit bytebuf(size_t size):c(size) { }
template<size_t size>
bytebuf(char const(&v)[size]) { assign(v); }
template<size_t size>
void assign(char const(&v)[size]) {
c.assign(v, v+size);
}
// provide access to wrapped vector
std::vector<char> & buf() {
return c;
}
private:
std::vector<char> c;
};
int main() {
bytebuf b("data");
process(&b.buf()[0], b.buf().size()); // process 5 byte
std::string str(&b.buf()[0]);
std::cout << str; // outputs "data"
bytebuf c(100);
read(&c.buf()[0], c.buf().size()); // read 100 byte
// ...
}
There is no need to add many more functions to it, i think. You can always get the vector using buf() and operate on it directly. Since a vectors' storage is contiguous, you can use it like a C array, but it is still resizable:
c.buf().resize(42)
The template conversion constructor and assign function allows you to initialize or assign from a C array directly. If you like, you can add more constructors that can initialize from a set of two iterators or a pointer and a length. But i would try keeping the amount of added functionality low, so it keeps being a tight, transparent vector wrapping struct.
If this is C:
Don't cast the return value of malloc().
Use size_t to represent the size of the allocated "package", not int.
If you're using the character array as an array of characters, use a std::vector<char> as that's what vectors are for. If you're using the character array as a string, use a std::string which will store its data in pretty much the same way as a std::vector<char>, but will communicate its purpose more clearly.
Yep, I would use an STL vector for this:
struct
{
std::vector<char> package;
// not sure if you have anything else in here ?
};
but your struct length member just becomes package.size ().
You can index characters in the vector as you would in your original char array (package[index]).
use a deque. sure a vector will work and be fine, but a deque will use fragmented memory and be almost as fast.
How are you using your structure?
Is it like an array or like a string?
I would just typedef one of the C++ containers:
typedef std::string MyData; // or std::vector<char> if that is more appropriate
What you have written can work and is probably the best thing to do if you do not need to resize on the fly. If you find that you need to expand your array, you can run
package = (char*)realloc((void*)package, SOME_RUNTIME_SIZE) ;
You can use an STL vector
include <vector>
std::vector<char> myVec(); //optionally myVec(SOME_RUNTIME_SIZE)
that you can then resize using myVec.resize(newSize) or by using functions such as push_back that add to the vector and automatically resize. The good thing about the vector solution is that it takes away many memory management issues -- if the vector is stack-allocated, its destructor will be called when it goes out of scope and the dynamically-allocated array underlying it will be deleted. However, if you pass the vector around, the data will get copied that can be slow, so you may need to pass pointers to vectors instead.