I have the following code:
int main() {
int N = 1000000;
int D = 1;
double **POINTS = new double * [N];
for (unsigned i=0;i<=N-1;i++) POINTS[i] = new double [D];
for (unsigned j=0;j<=D-1;j++) POINTS[0][j] = 0;
for (int i = 0; i < N; ++i)
{
for (int j = 0; j < D; ++j)
{
POINTS[i][j] = 3.14;
}
}
}
If the size of each pointer is 8 and N = 10^6 and D = 1, it is expected that size of POINTS must be 8 * 10^6 * 1 / 1000 / 1000 = 8 mb but in fact this program eats 42 mb of memory. If N = 2 * 10^6 it is expected 16 mb but actually 84. Why?
There are lots of possible reasons:
Every memory allocation probably comes with some overhead so the memory manager can keep track of things. Lots of small allocations (like you have) mean you probably have more tied up in overhead than you do in data.
Memory normally comes in "pages". If you dynamically allocate 1 byte then your size likely grows by the size of 1 page. (The first time - not every 1 byte allocation will get you a whole new page)
Objects may have padding applied. If you allocate one byte it probably gets padded out to "word size" and so you use more than you think
As you allocate and free objects you can create "holes" (fragmentation). You want 8 bytes but there is only a 4 byte hole in this page? You'll get a whole new page.
In short, there is no simple way to explain why your program is using more memory than you think it should, and unless you are having problems you probably shouldn't care. If you are having problems "valgrind" and similar tools will help you find them.
Last point: dynamically allocated 2d arrays are one of the easiest ways to create the problems mentioned above.
Related
I was implementing a solution for this problem to get a feel for the language. My reasoning is as follows:
Notice that the pattern on the diagonal is 2*n+1.
The elements to the left and upwards are alternating arithmetic progressions or additions/subtractions of the elements from the diagonal to the boundary.
Create a 2D vector and instantiate all the diagonal elements. Then create a dummy variable to fill in the remaining parts by add/subtract the diagonal elements.
My code is as follows:
#include <vector>
using namespace std;
const long value = 1e9;
vector<vector<long>> spiral(value, vector<long> (value));
long temp;
void build(){
spiral[0][0] = 1;
for(int i = 1; i < 5e8; i++){
spiral[i][i]= 2*i+1;
temp = i;
long counter = temp;
while(counter){
if(temp % 2 ==0){
spiral[i][counter]++;
spiral[counter][i]--;
counter--;
temp--;
}else{
spiral[i][counter]--;
spiral[counter][i]++;
counter--;
temp--;
}
}
}
}
int main(){
spiral[0][0] = 1;
build();
int y, x;
cin >> y >> x;
cout << spiral[y][x] << endl;
}
The problem is that the programme doesn't output any thing. I can't figure out why my vector won't print any elements. I've tested it with spiral[1][1] and all I get is some obscure assembler message after waiting 5 or 10 minutes. What's wrong with my reasoning?
EDIT: Full output is:
and
A long is probably 4 or 8 bytes for you (e.g. commonly 4 bytes on Windows, 4 bytes on x86 Linux, and 8 bytes on x64 Linux), so lets assume 4. 1e9 * 4 is 4 gigabytes of continuous memory for each vector<long> (value).
Then the outer vector creates another 1e9 copies of that, which is 4 exabytes (or 4 million terabytes) given a 32bit long or double for 64bit and ignoring the overhead size of each std::vector. It is highly unlikely that you have that much memory and swapfile, and being a global this is attempted before main() is called.
So you are not going to be able to store all this data directly, you will need to think about what data actually needs to be stored to get the result you desire.
If you run under a debugger set to stop on exceptions, you might see a std::bad_alloc getting thrown, with the call stack indicating the cause (e.g. Visual Studio will display something like "dynamic initializer for 'spiral'" in the call stack), but it is possible on Linux the OS will just kill it first, as Linux can over-commit memory (so new etc. succeeds), then when some program goes to use memory (an actual read or write) it fails (over committed, nothing free) and it SIGKILL's something to free memory (this doesn't seem entirely predictable, I copy-pasted your code onto Ubuntu 18 and on command line got "terminate called after throwing an instance of 'std::bad_alloc'").
The problem actually asks you to find an analytical formula for the solution, not to simulate the pattern. All you need to do is to carefully analyze the pattern:
unsigned int get_n(unsigned int row, unsigned int col) {
assert(row >= 1 && col >= 1);
const auto n = std::max(row, col);
if (n % 2 == 0)
std::swap(row, col);
if (col == n)
return n * n + 1 - row;
else
return (n - 1) * (n - 1) + col;
}
Math is your friend, here, not std::vector. One of the constraints of this puzzle is a memory limit of 512MB, but a vector big enough for all the tests would require several GB of memory.
Consider how the square is filled. If you choose the maximum between the given x and y (call it w), you have "delimited" a square of size w2. Now you have to consider the outer edge of this square to find the actual index.
E.g. Take x = 6 and y = 3. The maximum is 6 (even, remember the zig zag pattern), so the number is (6 - 1)2 + 3 = 28
* * * * * 26
* * * * * 27
* * * * * [28]
* * * * * 29
* * * * * 30
36 35 34 33 32 31
Here, a proof of concept.
I managed to reduce the problem to the following code, which uses almost 500MB of memory when it runs on my laptop - which in turn causes a std::bad_alloc in the full program. What is the problem here? As far as I can see, the unordered map only uses something like (32+32)*4096*4096 bits = 134.2MB, which is not even close to what the program uses.
#include<iostream>
#include<unordered_map>
using namespace std;
int main()
{
unordered_map<int,int> a;
long long z = 0;
for (int x = 0; x < 4096; x++)
{
for (int y = 0; y < 4096; y++)
{
z = 0;
for (int j = 0; j < 4; j++)
{
z ^= ((x>>(3*j))%8)<<(3*j);
z ^= ((y>>(3*j))%8)<<(3*j + 12);
}
a[z]++;
}
}
return 0;
}
EDIT: I'm aware that some of the bit shifting here can cause undefined behaviour, but I'm 99% sure that's not what's the problem.
EDIT2: What I need is essentially to count the number of x in a given set that some function maps to each y in a second set (of size 4096*4096). Would it be better to perhaps store these numbers in an array? I.e I have a function f: A to B, and I need to know the size of the set {x in A : f(x) = y} for each y in B. In this case A and B are both the set of non-negative integers less than 2^12=4096. (Ideally I would like to extend this to 2^32).
... which uses almost 500MB of memory ... What is the problem here?
There isn't really a problem, per se, regarding the memory usage you are observing. std::unordered_map is built to run fast for large number of elements. As such, memory isn't a top priority. For example, in order to optimize for resizing, it often allocates upon creation for some pre-calculated hash chains. Also, your measure of the the count of elements multiplied by the element's size is not taking into account the actual memory-footprint, data structure-wise, of each node in this map -- which should at least involve a few pointers to adjacent elements in the list of its bucket.
Having said that, it isn't clear you even need to use std::unorderd_map in this scenario. Instead, given the mapping your trying store is defined as
{x in A : f(x) = y} for each y in B
you could have one fixed-sized array (use std::array for that) that would simply hold for each index i, representing the element in set B, the number of elements from set A that fills the criteria.
Could someone please explain what each line is really doing here. I know overall it allocates 4 spaces, but I don't understand the details.
int** arrayArray;
arrayArray= new int*[4];
for (int x = 0; x < 4; ++x)
{ arrayArray[x] = new int[4];}
Your array is a 2D array, so it size is in forms of "x by y". The total ints your array contains will be x*y. I will call x as the row number and y as the column number.
It seems like you need a array with total 4 ints, so your row number and column number should have a product of 4, for example 2x2. Keep in mind that the following answer is dealing with a 4x4 array, which has 16 total ints.
Allocation:
int** arrayArray;
This line decelar a variable arrayArray, which is a pointer of ( pointer of int ). Like
(arrayArray) -> * -> int
So arrayArray[0] gives you a pointer of int, and arrayArray[0][0] therefore give you an int (which is in the array).
arrayArray = new int*[4];
This line allocate a space that can contain 4 pointers of int, and set the arrayArray to point to that space (don't forget that arrayArray is a pointer of a pointer of int. ).
Here the 4 means row number.
for (int x = 0; x < 4; ++x)
arrayArray[x] = new int[4];
For every pointer in arrayArray, set it to point to a space for 4 ints.
Here the 4 in new int[4] means column number.
So the structure of this array will be something like
Deallocation (free):
Notice that arrayArray by it self is just a pointer to 4 other pointer. If you want to free the entire array, you don't just free the space for the 4 pointers. You need to free every space these pointer points to, else you will cause memory leak. For example you want to free the entire array, it is just the reverse of allocation, first free all the space arrayArray points to:
for (int x = 0; x < 4; ++x)
delete[] arrayArray[x];
In C++, if you want to delete a space allocated by new something[], you need delete[] instead of delete.
Then free the array itself:
delete[] arrayArray;
int** arrayArray;
arrayArray= new int*[4];
The above code initializes 4 locations that can hold pointers to 4 int*
The next couple of lines allocate 4 int spaces for each int*, so a total of 4*4=16 integer spaces.
for (int x = 0; x < 4; ++x)
{ arrayArray[x] = new int[4];}
Now as per your comment, you want only 4 integer spaces. So that means your 2d array would be 2x2.
So your code should look something like:
int** arrayArray;
arrayArray= new int*[2];
for (int x = 0; x < 2; ++x)
{ arrayArray[x] = new int[2];}
This way you will only allocate 2 int* locations and 2 int spaces that they can point to.
I have a block of code that is trying to read the data from a dataset on to a randomly allocated block of memory. I don't know what exactly is inside the dataset but they access matrix values(Hex values) and put on to a memory location. And it works perfectly fine!
const unsigned int img_size = numel_img * sizeof(float);// (1248*960)4bytes= 4.79MB
for (unsigned int i=0; i<p_rctheader->glb_numImg; ++i)// 0 to 496(Total no of images)
{
const unsigned int cur_proj = i; // absolute projection number
// read proj mx
double* const pProjMx = pProjMatrixBuffers + cur_proj * 12;
ifsData.read((char*) (pProjMx), 12 * sizeof(double));
ifsData.seekg(img_size, ios::cur);
}
where pProjMatrixBuffers is
double** pProjMatrixBuffers = new double* [rctheader.glb_numImg]; //
pProjMatrixBuffers[0] = new double[rctheader.glb_numImg * 12]; //
for (unsigned int i=1; i<rctheader.glb_numImg; ++i) {
pProjMatrixBuffers[i] = pProjMatrixBuffers[0] + i * 12;
}
There is a another read operation after this :
rctgdata.adv_pProjBuffers = new float* [num_proj_buffers];// 124 buffers
rctgdata.adv_pProjBuffers[0] = new float[num_proj_buffers * numel_img];// (1.198MB per image*124)*4bytes
// set it to zero
memset(rctgdata.adv_pProjBuffers[0], 0, num_proj_buffers * numel_img * sizeof(float));
for (unsigned int i=1; i<num_proj_buffers; ++i) {
rctgdata.adv_pProjBuffers[i] = rctgdata.adv_pProjBuffers[0] + i * numel_img;
}
for (unsigned int i=0; i<numProjsInIteration; ++i)// (0 to 124)
{
const unsigned int cur_proj = numProcessedProjs + i; // absolute projection number// 0+124->124+124->248+124->372+124
// read proj mx
ifsData.read((char*) (pProjMatrixBuffers[cur_proj]), 12 * sizeof(double));
// read image data
ifsData.read((char*) (rctgdata.adv_pProjBuffers[i]), numel_img * sizeof(float));
}
******EDITS****************************
Basically this code, reads Projection matrix from the dataset which is 12 doubles followed by 1248*960 image pixels.(floats). This goes on for 124 times inside for loop.
Q1.If you see in the above code, pProjMatrixBuffers[cur_proj] is read twice, which could have been done once. (Correct me if I am wrong).
Q2.How will rctgdata.adv_pProjBuffers[i] know where to start accessing the data from in the dataset? I mean location in the dataset. I am sorry if I have confused you. Please ask me for more information if needed. Thank you so much for all the help in advance!!
There is no way a 2-dimensional MxN-array can be allocated as such using new. The workaround in this code consists of an allocation of a 1-dimensional array of M pointers and another allocation of an array for the MxN elements. Then the M pointers are set to point to the M first elements of each row within the array for the elements.
Here we have two 2-dimensional arrays which I call (for obvious reasons) D and F. It's not clear how big D is - what's the value of rctheader.glb_numImg?
The first loop reads 12 doubles into a row of D and skips the float data for a row of F, doing a seekg with the appropriate positive offset to be added to the current position (i.e., forward). This is done rctheader.glb_numImg times.
There is something I don't see in this code: a single seekg back to the beginning of the file, after the first loop and before the second loop.
The second loop reads (once more) 12 doubles for each of the 124 rows and then, in one fell swoop, 1248*960 floats for each row. There is no need to reposition after these reads since the data for the second image immediately follows the data for the first image, and so on. (It's slightly irritating that num_proj_buffers and numProjsInIteration should have the same value, i.e., 124.)
It looks as if the second read loop would re-read what the first loop read. But since I don't know for sure that p_rctheader->glb_numImg is also 124, I can't really confirm that.
Calculating the size of what is read by 123 iterations of the second loop as
(1248*960*4 + 12*8)*124
this would account for ~0.5 GB - but the file size was reported as being ~2.5 GB.
Also note that one index within the second loop is computed as
unsigned int cur_proj = numProcessedProjs + i;
but the initial setting of numProcessedProjs is unclear.
To answer Q2, you allocate one big block of memory with new double[header.numImg * 12], and you also allocate a bunch of row pointers with new double* [header.numImg]. The first row pointer [0] points at the beginning of the memory (because it was used in the new call). The for loop then sets each row pointer [i] to point into the big block at 12-item increments (so each row should have 12 items in it). So for instance [1] points at the 12th item in the big block, [2] points at the 24th item, etc.
I haven't quite figured out what you mean by Q1 yet.
I am developing an application based on QT, I need to use vectors in a dynamic (QVector ). When checking the size of the vector, this was higher than it should, I tested with STL vector and the result is the same. Below I present code the problem with STL vector. This situation prevents us from knowing the actual size of the vector and use it properly. How to fix?. Thank you for your help.
Compiler: GCC 4.5.2
OS: Linux Ubuntu 11.04
Observations: the capacity or size of the vector is always a power of base 2
The code is:
double PI = 3.1415926536, delta = PI/(100/2);
vector<double> A(0);
vector<double> B(0);
cout<<"Capacity A = "<<A.capacity()<<"; Capacity B = "<<B.capacity()<<endl;
for (int i = 0; i < 100; i++) {
A.push_back(i*delta);
B.push_back( sin( A[i] ) );
cout<<"A("<<i<<") = " <<A[i]<<"; B("<<i<<") = " <<B[i]<<" "<<"Size A = "<<A.capacity()<<"; Size B = "<<B.capacity()<<endl;
}
for (int i = 0; i < A.capacity(); i++) {
cout<<"A("<<i<<") = " <<A[i]<<"; B("<<i<<") = " <<B[i]<<" "<<"Size A = "<<A.capacity()<<"; Size B = "<<B.capacity()<<endl;
}
cout<<"Size A = "<<A.capacity()<<"; Size B = "<<B.capacity()<<endl;
The output is:
Capacity A = 0; Capacity B = 0
A(0) = 0; B(0) = 0 Size A = 1; Size Y = 1
A(1) = 0.0628319; B(1) = 0.0627905 Size A = 2; Size B = 2
A(2) = 0.125664; B(2) = 0.125333 Size A = 4; Size B = 4
A(3) = 0.188496; B(3) = 0.187381 Size A = 4; Size B = 4
.
A(99) = 6.22035; B(99) = -0.0627905 Size A = 128; Size B = 128
.
A(126) = 0; B(126) = 1.31947 Size A = 128; Size B = 128
A(127) = 0; B(127) = 1.3823 Size A = 128; Size B = 128
Size A = 128; Size B = 128
What you're seeing is std::vector's ability to scale. One of the things they put in to make it work faster in general cases was to reserve more memory than what is needed, so that it doesn't have to keep reserving memory each time you use push_back.
As you can see, more is reserved the larger it gets. capacity is the function that tells you this amount. You can test this theory out by using reserve. It will tell the vector how much memory to reserve, after which capacity will retrieve that number if no operations are made (which could cause another change in reserved memory). reserve is generally useful if you're about to push_back a large number of elements and you want the vector to only reserve enough memory once, instead of however many times it would have automatically.
The function you're looking for is size, which gives you the number of elements in your vector. The associated function with this is resize, as reserve was to capacity. That is to say, when you call resize (10), if you had 5 elements before, you'll gain 5 default-initialized new ones, and size returns 10.
Why are you interested in capacity? Are you focusing on memory usage? The capacity method is not needed otherwise, and you only need to concern yourself with size.
If we're talking about capacity details, how the capacity changes is up to vendor implementation. The fact that yours reallocates arrays based on powers of 2 may not apply to all cases: I've seen some implementations scale by a factor of 150%, for example, rather than 200%.
capacity will often be greater than size, and sometimes considerably greater (ex: double the number of elements). This is because vectors are growable, contiguous sequences (they're array-based). The last thing you want if you care at all about performance is for every push_back/insert/erase to trigger a memory allocation/deallocation, so vector often creates an array bigger than is immediately necessary for subsequent insertions. It's also worth noting that the clear method will not necessarily do anything to affect capacity, and you might want to look at the shrink-to-fit idiom (http://www.gotw.ca/gotw/054.htm).
If you want absolute control over the capacity so that you have a perfect fit, you can make use of the reserve method to allocate a specific capacity in advance. That only works well though if you can anticipate the number of elements you will be putting into your vector in advance.