Calculating size of vector of vectors in bytes - c++

typedef vector<vector<short>> Mshort;
typedef vector<vector<int>> Mint;
Mshort mshort(1 << 20, vector<short>(20, -1)); // Xcode shows 73MB
Mint mint(1 << 20, vector<int>(20, -1)); // Xcode shows 105MB
short uses 2 bytes and int 4 bytes; please note that 1 << 20 = 2^20;
I am trying to calculate ahead (on paper) usage of memory but I am unable to.
sizeof(vector<>) // = 24 //no matter what type
sizeof(int) // = 4
sizeof(short) // = 2
I do not understand: mint should be double the mshort but it isn't. When running program only with mshort initialisation Xcode shows 73MB of memory usage; for mint 105MB;
mshort.size() * mshort[0].size() * sizeof(short) * sizeof(vector<short>) // = 1006632960
mint.size() * min[0].size() * sizeof(int) * sizeof(vector<int>) // = 2013265920
//no need to use .capacity() because I fill vectors with -1
1006632960 * 2 = 2013265920
How does one calculate how much space of RAM will 2d std::vector use or 2d std::array use.
I know the sizes ahead and each row has same number of columns.

The memory usage of your vectors of vectors will be e.g.
// the size of the data...
mshort.size() * mshort[0].size() * sizeof(short) +
// the size of the inner vector objects...
mshort.size() * sizeof mshort[0] +
// the size of the outer vector object...
// (this is ostensibly on the stack, given your code)
sizeof mshort +
// dynamic allocation overheads
overheads
The dynamic allocation overheads are because the vectors internally new memory for the elements they're to store, and for speed reasons they may have pools of fixed-sized memory areas waiting for new requests, so if the vector effectively does a new short[20] - with the data needing 40 bytes - it might end up with e.g. 48 or 64. The implementation may actually need to use some extra memory to store the array size, though for short and int there's no need to loop over the elements invoking destructors during delete[], so a good implementation will avoid that allocation and no-op destruction behaviour.
The actual data elements for any given vector are contiguous in memory though, so if you want to reduce the overheads, you can change your code to use fewer, larger vectors. For example, using one vector with (1 << 20) * 20 will have negligible overhead - then rather than accessing [i][j] you can access [i * 20 + j] - you can write a simple class wrapping the vector to do this for you, most simply with a v(i, j) notation...
inline short& operator()(size_t i, size_t j) { return v_[i * 20 + j]; }
inline short operator()(size_t i, size_t j) const { return v_[i * 20 + j]; }
...though you could support v[i][j] by having v.operator[] return a proxy object that can be further indexed with []. I'm sure if you search SO for questions on multi-dimension arrays there'll be some examples - think I may have posted such code myself once.
The main reason to want vector<vector<x>> is when the inner vectors vary in length.

Assuming glibc malloc:
Each memory chunk will allocate additional 8-16 bytes(2 size_t) for memory block header. For 64 bit system it would be 16 bytes.
see code:
https://github.com/sploitfun/lsploits/blob/master/glibc/malloc/malloc.c#L1110
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk, if allocated | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of chunk, in bytes |M|P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| User data starts here... .
. .
. (malloc_usable_size() bytes) .
. |
It gives me approximately 83886080 for short when adding 16 bytes per row.
26+16+ mshort.size(1048576) * (mshort[0].size(20)*sizeof(short(2)) + sizeof(vector(26))+header(16))
It gives me approximately 125829120 for int.
But then I recompute you numbers and it look like you are on 32 bit...
short 75497472 that is ~73M
long 117440512 that is ~112M
Looks very close to reported ones.
Use capacity not size to get #items number, even if those are the same in your case.
Allocating single vector size row*columns will save you header*1048576 bytes.

Your calculation mshort.size() * mshort[0].size() * sizeof(short) * sizeof(vector<short>) // = 1006632960 is simply wrong. As your calculation, mshort takes 1006632960 which is 960MiB, which is not true.
Let's ignore libc's overhead, and just focus on std::vector<>'s size:
mshort is a vector of 1^20 items, each is vector<short> with 20 items.
So the size shall be:
mshort.size() * mshort[0].size() * sizeof(short) // Size of all short values
+ mshort.size() * sizeof(vector<short>) // Size of 1^20 vector<short>
+ sizeof(mshort) // Size of mshort itself, which can be ignored as overhead
The calculated size is 64MiB.
The same to mint, where the calculated size is 104MiB.
So mint is simply NOT double size of mshort.

Related

Problem with initialising 2D vector in C++

I was implementing a solution for this problem to get a feel for the language. My reasoning is as follows:
Notice that the pattern on the diagonal is 2*n+1.
The elements to the left and upwards are alternating arithmetic progressions or additions/subtractions of the elements from the diagonal to the boundary.
Create a 2D vector and instantiate all the diagonal elements. Then create a dummy variable to fill in the remaining parts by add/subtract the diagonal elements.
My code is as follows:
#include <vector>
using namespace std;
const long value = 1e9;
vector<vector<long>> spiral(value, vector<long> (value));
long temp;
void build(){
spiral[0][0] = 1;
for(int i = 1; i < 5e8; i++){
spiral[i][i]= 2*i+1;
temp = i;
long counter = temp;
while(counter){
if(temp % 2 ==0){
spiral[i][counter]++;
spiral[counter][i]--;
counter--;
temp--;
}else{
spiral[i][counter]--;
spiral[counter][i]++;
counter--;
temp--;
}
}
}
}
int main(){
spiral[0][0] = 1;
build();
int y, x;
cin >> y >> x;
cout << spiral[y][x] << endl;
}
The problem is that the programme doesn't output any thing. I can't figure out why my vector won't print any elements. I've tested it with spiral[1][1] and all I get is some obscure assembler message after waiting 5 or 10 minutes. What's wrong with my reasoning?
EDIT: Full output is:
and
A long is probably 4 or 8 bytes for you (e.g. commonly 4 bytes on Windows, 4 bytes on x86 Linux, and 8 bytes on x64 Linux), so lets assume 4. 1e9 * 4 is 4 gigabytes of continuous memory for each vector<long> (value).
Then the outer vector creates another 1e9 copies of that, which is 4 exabytes (or 4 million terabytes) given a 32bit long or double for 64bit and ignoring the overhead size of each std::vector. It is highly unlikely that you have that much memory and swapfile, and being a global this is attempted before main() is called.
So you are not going to be able to store all this data directly, you will need to think about what data actually needs to be stored to get the result you desire.
If you run under a debugger set to stop on exceptions, you might see a std::bad_alloc getting thrown, with the call stack indicating the cause (e.g. Visual Studio will display something like "dynamic initializer for 'spiral'" in the call stack), but it is possible on Linux the OS will just kill it first, as Linux can over-commit memory (so new etc. succeeds), then when some program goes to use memory (an actual read or write) it fails (over committed, nothing free) and it SIGKILL's something to free memory (this doesn't seem entirely predictable, I copy-pasted your code onto Ubuntu 18 and on command line got "terminate called after throwing an instance of 'std::bad_alloc'").
The problem actually asks you to find an analytical formula for the solution, not to simulate the pattern. All you need to do is to carefully analyze the pattern:
unsigned int get_n(unsigned int row, unsigned int col) {
assert(row >= 1 && col >= 1);
const auto n = std::max(row, col);
if (n % 2 == 0)
std::swap(row, col);
if (col == n)
return n * n + 1 - row;
else
return (n - 1) * (n - 1) + col;
}
Math is your friend, here, not std::vector. One of the constraints of this puzzle is a memory limit of 512MB, but a vector big enough for all the tests would require several GB of memory.
Consider how the square is filled. If you choose the maximum between the given x and y (call it w), you have "delimited" a square of size w2. Now you have to consider the outer edge of this square to find the actual index.
E.g. Take x = 6 and y = 3. The maximum is 6 (even, remember the zig zag pattern), so the number is (6 - 1)2 + 3 = 28
* * * * * 26
* * * * * 27
* * * * * [28]
* * * * * 29
* * * * * 30
36 35 34 33 32 31
Here, a proof of concept.

C++ pointer pointer size

I have the following code:
int main() {
int N = 1000000;
int D = 1;
double **POINTS = new double * [N];
for (unsigned i=0;i<=N-1;i++) POINTS[i] = new double [D];
for (unsigned j=0;j<=D-1;j++) POINTS[0][j] = 0;
for (int i = 0; i < N; ++i)
{
for (int j = 0; j < D; ++j)
{
POINTS[i][j] = 3.14;
}
}
}
If the size of each pointer is 8 and N = 10^6 and D = 1, it is expected that size of POINTS must be 8 * 10^6 * 1 / 1000 / 1000 = 8 mb but in fact this program eats 42 mb of memory. If N = 2 * 10^6 it is expected 16 mb but actually 84. Why?
There are lots of possible reasons:
Every memory allocation probably comes with some overhead so the memory manager can keep track of things. Lots of small allocations (like you have) mean you probably have more tied up in overhead than you do in data.
Memory normally comes in "pages". If you dynamically allocate 1 byte then your size likely grows by the size of 1 page. (The first time - not every 1 byte allocation will get you a whole new page)
Objects may have padding applied. If you allocate one byte it probably gets padded out to "word size" and so you use more than you think
As you allocate and free objects you can create "holes" (fragmentation). You want 8 bytes but there is only a 4 byte hole in this page? You'll get a whole new page.
In short, there is no simple way to explain why your program is using more memory than you think it should, and unless you are having problems you probably shouldn't care. If you are having problems "valgrind" and similar tools will help you find them.
Last point: dynamically allocated 2d arrays are one of the easiest ways to create the problems mentioned above.

How do I allocate size for a dynamic array in c++?

So I wrote a method in c++ where I remove a range of elements in an array. The thing is this is a dynamic array and the size of the array must always be a certain size. So if I remove alot of elements from the array and leave at least 5 empty spaces then I need to remove those 5 empty spaces. I already wrote a similar method where I remove one element. This is the line that checks to see if there's too much space:
if (size - 1 == allocated_size - BLOCK_SIZE){
Where size is the number of elements in the array, allocated_size is how much space is in the array and BLOCK_SIZE is 5. So with my other remove method, I need to do a similar check however what if I have an array of 15 elements and I remove 10 elements. Then I would have to remove 10 spaces in the array but I'm not sure how to do that. Here's what I have right now:
if (size - range <= allocated_size - BLOCK_SIZE){
try {
new_array = new int[allocated_size - BLOCK_SIZE];
} catch (bad_alloc){
throw exception (MEMORY_EXCEPTION);
}
Where range is the number of elements I'm removing. My theory is that maybe I could make another variable and when I declare the array I say allocated_size - BLOCK_SIZE * n so if I need to remove 10 spaces then n would be 2. The problem I'm having trouble implementing that.
Could you use some int arithmetic,
the number of empty slots in your array will be:
int empty_slots = allocated_size - size;
The number of empty blocks will be:
int empty_blocks = empty_slots / 5;
integer division truncates so for 0 - 4 empty slots you will have 0 empty blocks for 5-9 empty slots you will have 1 empty block etc...
But don't you really want to know how big to make your new array? wouldn't that always be size, so:
int blocks_need = size / 5; // truncates
if (size % 5 > 0) {
blocks_need = blocks_needed + 1; // add a block if needed
}
new_array = new int[blocks_needed * 5];
or size + extra capacity if you want some extra capacity in your array.

Why are 3D vectors in C++ larger in RAM than a 1D vector

I've discovered by accident that an STL vector defined as follows:
vector < float > test;
test.resize(10000 * 10000 * 5);
Uses up significant less space in RAM than the following definition:
std::vector<std::vector<std::vector< float > > > test;
test.resize(10000);
for(int i = 0;i < 10000;i++)
{
test[i].resize(10000);
for(int j = 0;j < 10000;j++)
{
test[i][j].resize(5);
}
}
The linear vector method (top one) uses the correct amount of RAM (2Gb) as would be calculated by hand. So my question is, why does a 3D vector use up way more RAM than a linear one, I found it was significantly more in this example (about 4Gb).
In the former case you have:
sizeof(vector<float>) // outermost vector
+ 10000 * 10000 * 5 * sizeof(float) // xyz space
In the latter you have:
sizeof(vector<vector<vector<float>>>) // outermost vector
+ 10000 * sizeof(vector<vector<float>>) // x axis
+ 10000 * 10000 * sizeof(vector<float>) // xy plane
+ 10000 * 10000 * 5 * sizeof<float> // xyz space
The typical value for sizeof(vector<T>) for any T is 3 * sizeof(T*), which is also, I believe, the minimal value allowed by the standard—capacity must be distinct from size because reserve() must change the value of capacity() but not of size().
The vector class uses up memory to hold onto additional pointers. When you allocate as a 1D vector, you only have the 1 pointer and big block of memory it points to. In the vector of vector of vector's case, you have 10,000 * 10,000 * 5 vectors, each with a 4 byte pointer, taking up 2 billion extra bytes, just to hold onto location information.
EDIT
As Andrey pointed out in the comments, you are not actually setting up 10,000 * 10,000 * 5 vectors, rather:
1D - the top level vector sets aside space for 10,000 vectors beneath it
2D - each of those 10,000 vectors sets up another 10,0000 vectors
3D - the final level is just the actual data so...
you have 10,000 inital vectors, and the 10,000 * 10,000 vectors below for a total of 100,010,000 vectors. Another user mentioned about 20 bytes of space taken by each vector (for the memory pointer as well as other members in the class, like size, capacity and what not) so you end up with about 2 billion bytes.
The vector class has some additional overhead. There is at a minimum a pointer and a field for the size. In MS visual studio sizeof(std::vector) is 20 or 24 (with iterator debugging enabled). The actual size is going to be implementation dependent.
Each vector object has an overhead - as they can be variable length so need to have pointers to the start, size of the used bit and its capacity.
If you have a fixed size 3d shape where the height, width and length is know you can convert it into a 1D array. This would be the most efficent.
To do this given x, y, z you find the index as x + width * ((z * height) + y) will do the conversion.

Reading a dataset file(Hex values) onto a block of memory-part 2

I have a block of code that is trying to read the data from a dataset on to a randomly allocated block of memory. I don't know what exactly is inside the dataset but they access matrix values(Hex values) and put on to a memory location. And it works perfectly fine!
const unsigned int img_size = numel_img * sizeof(float);// (1248*960)4bytes= 4.79MB
for (unsigned int i=0; i<p_rctheader->glb_numImg; ++i)// 0 to 496(Total no of images)
{
const unsigned int cur_proj = i; // absolute projection number
// read proj mx
double* const pProjMx = pProjMatrixBuffers + cur_proj * 12;
ifsData.read((char*) (pProjMx), 12 * sizeof(double));
ifsData.seekg(img_size, ios::cur);
}
where pProjMatrixBuffers is
double** pProjMatrixBuffers = new double* [rctheader.glb_numImg]; //
pProjMatrixBuffers[0] = new double[rctheader.glb_numImg * 12]; //
for (unsigned int i=1; i<rctheader.glb_numImg; ++i) {
pProjMatrixBuffers[i] = pProjMatrixBuffers[0] + i * 12;
}
There is a another read operation after this :
rctgdata.adv_pProjBuffers = new float* [num_proj_buffers];// 124 buffers
rctgdata.adv_pProjBuffers[0] = new float[num_proj_buffers * numel_img];// (1.198MB per image*124)*4bytes
// set it to zero
memset(rctgdata.adv_pProjBuffers[0], 0, num_proj_buffers * numel_img * sizeof(float));
for (unsigned int i=1; i<num_proj_buffers; ++i) {
rctgdata.adv_pProjBuffers[i] = rctgdata.adv_pProjBuffers[0] + i * numel_img;
}
for (unsigned int i=0; i<numProjsInIteration; ++i)// (0 to 124)
{
const unsigned int cur_proj = numProcessedProjs + i; // absolute projection number// 0+124->124+124->248+124->372+124
// read proj mx
ifsData.read((char*) (pProjMatrixBuffers[cur_proj]), 12 * sizeof(double));
// read image data
ifsData.read((char*) (rctgdata.adv_pProjBuffers[i]), numel_img * sizeof(float));
}
******EDITS****************************
Basically this code, reads Projection matrix from the dataset which is 12 doubles followed by 1248*960 image pixels.(floats). This goes on for 124 times inside for loop.
Q1.If you see in the above code, pProjMatrixBuffers[cur_proj] is read twice, which could have been done once. (Correct me if I am wrong).
Q2.How will rctgdata.adv_pProjBuffers[i] know where to start accessing the data from in the dataset? I mean location in the dataset. I am sorry if I have confused you. Please ask me for more information if needed. Thank you so much for all the help in advance!!
There is no way a 2-dimensional MxN-array can be allocated as such using new. The workaround in this code consists of an allocation of a 1-dimensional array of M pointers and another allocation of an array for the MxN elements. Then the M pointers are set to point to the M first elements of each row within the array for the elements.
Here we have two 2-dimensional arrays which I call (for obvious reasons) D and F. It's not clear how big D is - what's the value of rctheader.glb_numImg?
The first loop reads 12 doubles into a row of D and skips the float data for a row of F, doing a seekg with the appropriate positive offset to be added to the current position (i.e., forward). This is done rctheader.glb_numImg times.
There is something I don't see in this code: a single seekg back to the beginning of the file, after the first loop and before the second loop.
The second loop reads (once more) 12 doubles for each of the 124 rows and then, in one fell swoop, 1248*960 floats for each row. There is no need to reposition after these reads since the data for the second image immediately follows the data for the first image, and so on. (It's slightly irritating that num_proj_buffers and numProjsInIteration should have the same value, i.e., 124.)
It looks as if the second read loop would re-read what the first loop read. But since I don't know for sure that p_rctheader->glb_numImg is also 124, I can't really confirm that.
Calculating the size of what is read by 123 iterations of the second loop as
(1248*960*4 + 12*8)*124
this would account for ~0.5 GB - but the file size was reported as being ~2.5 GB.
Also note that one index within the second loop is computed as
unsigned int cur_proj = numProcessedProjs + i;
but the initial setting of numProcessedProjs is unclear.
To answer Q2, you allocate one big block of memory with new double[header.numImg * 12], and you also allocate a bunch of row pointers with new double* [header.numImg]. The first row pointer [0] points at the beginning of the memory (because it was used in the new call). The for loop then sets each row pointer [i] to point into the big block at 12-item increments (so each row should have 12 items in it). So for instance [1] points at the 12th item in the big block, [2] points at the 24th item, etc.
I haven't quite figured out what you mean by Q1 yet.