I am developing an application based on QT, I need to use vectors in a dynamic (QVector ). When checking the size of the vector, this was higher than it should, I tested with STL vector and the result is the same. Below I present code the problem with STL vector. This situation prevents us from knowing the actual size of the vector and use it properly. How to fix?. Thank you for your help.
Compiler: GCC 4.5.2
OS: Linux Ubuntu 11.04
Observations: the capacity or size of the vector is always a power of base 2
The code is:
double PI = 3.1415926536, delta = PI/(100/2);
vector<double> A(0);
vector<double> B(0);
cout<<"Capacity A = "<<A.capacity()<<"; Capacity B = "<<B.capacity()<<endl;
for (int i = 0; i < 100; i++) {
A.push_back(i*delta);
B.push_back( sin( A[i] ) );
cout<<"A("<<i<<") = " <<A[i]<<"; B("<<i<<") = " <<B[i]<<" "<<"Size A = "<<A.capacity()<<"; Size B = "<<B.capacity()<<endl;
}
for (int i = 0; i < A.capacity(); i++) {
cout<<"A("<<i<<") = " <<A[i]<<"; B("<<i<<") = " <<B[i]<<" "<<"Size A = "<<A.capacity()<<"; Size B = "<<B.capacity()<<endl;
}
cout<<"Size A = "<<A.capacity()<<"; Size B = "<<B.capacity()<<endl;
The output is:
Capacity A = 0; Capacity B = 0
A(0) = 0; B(0) = 0 Size A = 1; Size Y = 1
A(1) = 0.0628319; B(1) = 0.0627905 Size A = 2; Size B = 2
A(2) = 0.125664; B(2) = 0.125333 Size A = 4; Size B = 4
A(3) = 0.188496; B(3) = 0.187381 Size A = 4; Size B = 4
.
A(99) = 6.22035; B(99) = -0.0627905 Size A = 128; Size B = 128
.
A(126) = 0; B(126) = 1.31947 Size A = 128; Size B = 128
A(127) = 0; B(127) = 1.3823 Size A = 128; Size B = 128
Size A = 128; Size B = 128
What you're seeing is std::vector's ability to scale. One of the things they put in to make it work faster in general cases was to reserve more memory than what is needed, so that it doesn't have to keep reserving memory each time you use push_back.
As you can see, more is reserved the larger it gets. capacity is the function that tells you this amount. You can test this theory out by using reserve. It will tell the vector how much memory to reserve, after which capacity will retrieve that number if no operations are made (which could cause another change in reserved memory). reserve is generally useful if you're about to push_back a large number of elements and you want the vector to only reserve enough memory once, instead of however many times it would have automatically.
The function you're looking for is size, which gives you the number of elements in your vector. The associated function with this is resize, as reserve was to capacity. That is to say, when you call resize (10), if you had 5 elements before, you'll gain 5 default-initialized new ones, and size returns 10.
Why are you interested in capacity? Are you focusing on memory usage? The capacity method is not needed otherwise, and you only need to concern yourself with size.
If we're talking about capacity details, how the capacity changes is up to vendor implementation. The fact that yours reallocates arrays based on powers of 2 may not apply to all cases: I've seen some implementations scale by a factor of 150%, for example, rather than 200%.
capacity will often be greater than size, and sometimes considerably greater (ex: double the number of elements). This is because vectors are growable, contiguous sequences (they're array-based). The last thing you want if you care at all about performance is for every push_back/insert/erase to trigger a memory allocation/deallocation, so vector often creates an array bigger than is immediately necessary for subsequent insertions. It's also worth noting that the clear method will not necessarily do anything to affect capacity, and you might want to look at the shrink-to-fit idiom (http://www.gotw.ca/gotw/054.htm).
If you want absolute control over the capacity so that you have a perfect fit, you can make use of the reserve method to allocate a specific capacity in advance. That only works well though if you can anticipate the number of elements you will be putting into your vector in advance.
Related
In my implementation of some equation I need to build a NXN diagonal matrix.I'm using vector of vectors for representing matrices. However the N is around 300000 so,
std::vector<std::vector<double> > new_matrix(300000,std::vector<double>(300000,0))
on running, gives me
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Is this not possible due to memory limit?
The question can be answered generally: Dealing with large amounts of data in c++
In your case,
The error is due to insufficient contiguous heap memory for a
300,000 * 300,000 vector of double precision floating point numbers.
Maybe, a different container could be preferred that does not need contiguous memory like std::list. But wait, what would be the amount of memory needed for this?
300000 * 300000 * sizeof(double) =
300000 * 300000 * 8 =
720000000000 bytes =
703125000 KB =
686646 MB =
671 GB!
Unless you are working with a supercomputer, forget about this idea.
Back in the olden days, programmers had to come up with clever solutions to work within the limits of tiny RAM. Why not do it today? Let's start by digging patterns in your problem. You are working with a
Diagonal matrix:
A matrix having non-zero elements only in the diagonal running from
the upper left to the lower right
Therefore, you do not have to store the non-diagonal elements in the memory because it is guaranteed that those elements are 0. So the problem boils down to storing just the main diagonal elements (matrix[0][0], matrix[1][1],... matrix[n-1][n-1]), a total of n elements only. Now you can consider just 300000 elements, which is way shorter than 90000000000!
Calculating the memory for this:
300000 * sizeof(double) =
300000 * 8 =
2400000 bytes =
2344 KB =
2.3 MB!
As you can see, this approach has reduced our requirement from 670 GB to just 2 MB. It might still not work on some stack memory but it doesn't matter because we are dealing with heap memory (std::vector is a dynamic array).
Now that we have reduced the space complexity, it is just a matter of implementing this logic. But be careful to maintain this convention throughout while accessing, traversing, and storing the array.
For example:
#include <iostream>
#include <vector>
typedef std::vector<double> DiagonalMatrix;
int main()
{
// std::vector<std::vector<double>> new_matrix(300000, std::vector<double>(300000, 0));
DiagonalMatrix m(300000, 0);
// Store the main diagonal elements only
// Keep this convention in mind while using this implementation of DiagonalMatrix
return 0;
}
Edit:
To demonstrate this design and after your request in the comments, below is an implementation of the necessary logic:
// Product = DiagonalMatrix * Matrix
Matrix DiagonalMatrix::preMultiply(Matrix multiplier)
{
// Check if multiplication is possible
if (multiplier.r != this->n)
return Matrix();
// Product is the same as the multiplier where
// every element in the ith row of the multiplier is
// multiplied by the ith diagonal element of the diagonal matrix
Matrix& product = multiplier;
for (int i = 0; i < multiplier.r; ++i)
{
for (int j = 0; j < multiplier.c; ++j)
{
product.m[i][j] *= this->m[i];
}
}
return product;
}
// Product = Matrix * DiagonalMatrix
Matrix DiagonalMatrix::postMultiply(Matrix multiplier)
{
// Check if multiplication is possible
if (multiplier.c != this->n)
return Matrix();
// Product is the same as the multiplier where
// every element in the jth column of the multiplier is
// multiplied by the jth diagonal element of the diagonal matrix
Matrix& product = multiplier;
for (int j = 0; j < multiplier.c; ++j)
{
for (int i = 0; i < multiplier.r; ++i)
{
product.m[i][j] *= this->m[j];
}
}
return product;
}
I managed to reduce the problem to the following code, which uses almost 500MB of memory when it runs on my laptop - which in turn causes a std::bad_alloc in the full program. What is the problem here? As far as I can see, the unordered map only uses something like (32+32)*4096*4096 bits = 134.2MB, which is not even close to what the program uses.
#include<iostream>
#include<unordered_map>
using namespace std;
int main()
{
unordered_map<int,int> a;
long long z = 0;
for (int x = 0; x < 4096; x++)
{
for (int y = 0; y < 4096; y++)
{
z = 0;
for (int j = 0; j < 4; j++)
{
z ^= ((x>>(3*j))%8)<<(3*j);
z ^= ((y>>(3*j))%8)<<(3*j + 12);
}
a[z]++;
}
}
return 0;
}
EDIT: I'm aware that some of the bit shifting here can cause undefined behaviour, but I'm 99% sure that's not what's the problem.
EDIT2: What I need is essentially to count the number of x in a given set that some function maps to each y in a second set (of size 4096*4096). Would it be better to perhaps store these numbers in an array? I.e I have a function f: A to B, and I need to know the size of the set {x in A : f(x) = y} for each y in B. In this case A and B are both the set of non-negative integers less than 2^12=4096. (Ideally I would like to extend this to 2^32).
... which uses almost 500MB of memory ... What is the problem here?
There isn't really a problem, per se, regarding the memory usage you are observing. std::unordered_map is built to run fast for large number of elements. As such, memory isn't a top priority. For example, in order to optimize for resizing, it often allocates upon creation for some pre-calculated hash chains. Also, your measure of the the count of elements multiplied by the element's size is not taking into account the actual memory-footprint, data structure-wise, of each node in this map -- which should at least involve a few pointers to adjacent elements in the list of its bucket.
Having said that, it isn't clear you even need to use std::unorderd_map in this scenario. Instead, given the mapping your trying store is defined as
{x in A : f(x) = y} for each y in B
you could have one fixed-sized array (use std::array for that) that would simply hold for each index i, representing the element in set B, the number of elements from set A that fills the criteria.
I have the following code:
int main() {
int N = 1000000;
int D = 1;
double **POINTS = new double * [N];
for (unsigned i=0;i<=N-1;i++) POINTS[i] = new double [D];
for (unsigned j=0;j<=D-1;j++) POINTS[0][j] = 0;
for (int i = 0; i < N; ++i)
{
for (int j = 0; j < D; ++j)
{
POINTS[i][j] = 3.14;
}
}
}
If the size of each pointer is 8 and N = 10^6 and D = 1, it is expected that size of POINTS must be 8 * 10^6 * 1 / 1000 / 1000 = 8 mb but in fact this program eats 42 mb of memory. If N = 2 * 10^6 it is expected 16 mb but actually 84. Why?
There are lots of possible reasons:
Every memory allocation probably comes with some overhead so the memory manager can keep track of things. Lots of small allocations (like you have) mean you probably have more tied up in overhead than you do in data.
Memory normally comes in "pages". If you dynamically allocate 1 byte then your size likely grows by the size of 1 page. (The first time - not every 1 byte allocation will get you a whole new page)
Objects may have padding applied. If you allocate one byte it probably gets padded out to "word size" and so you use more than you think
As you allocate and free objects you can create "holes" (fragmentation). You want 8 bytes but there is only a 4 byte hole in this page? You'll get a whole new page.
In short, there is no simple way to explain why your program is using more memory than you think it should, and unless you are having problems you probably shouldn't care. If you are having problems "valgrind" and similar tools will help you find them.
Last point: dynamically allocated 2d arrays are one of the easiest ways to create the problems mentioned above.
I have this vector:
list.push_back("one");
list.push_back("two");
list.push_back("three");
I use list.erase(list.begin() + 1) to delete the "two" and it works. But when I try to output the list again:
cout<<list[0]<<endl;
cout<<list[1]<<endl;
cout<<list[2]<<endl;
produces:
one
three
three
I tried targeting the last element for erasing with list.erase(list.begin() + 2), but the duplicate three's remain. I imagined index 2 should have been shifted and list[2] should have outputted nothing. list[3] outputs nothing, as it should.
I'm trying to erase the "two" and output the list as only:
one
three
When using cout<<list[2]<<endl; you asume that you still have three elements. But in fact you are accessing remaining data in a part of the memory that is no more used.
You should use list.size () to obtain the number of elements. So, something like:
for ( size_t i = 0; i < list.size (); i++ )
{
cout<<list[i]<<endl;
}
But you erased the element, thus the size of your container was decreased by one, i.e. from 3 to 2.
So, after the erase, you shouldn't do this:
cout<<list[0]<<endl;
cout<<list[1]<<endl;
cout<<list[2]<<endl; // Undefined Behaviour!!
but this:
cout<<list[0]<<endl;
cout<<list[1]<<endl;
In your case, the "three" is just copied to the index 1, which is expected. you is vector.size() == 2 now.
it is because vector will do pre-allocation, which help to improve the performance.
To keep from having to resize with every change, vector grabs a block of memory bigger than it needs and keeps it until forced to get bigger or instructed to get smaller.
To brutally simplify, think of it as
string * array = new string[100];
int capacity = 100
int size = 0;
In this case you can write all through that 100 element array without the program crashing because it is good and valid memory, but only values beneath size have been initialized and are meaningful. What happens when you read above size is undefined. Because reading out of bounds is a bad idea and preventing it has a performance cost that should not be paid by correct usage, the C++ standard didn't waste any time defining what the penalty for doing so is. Some debug or security critical versions will test and throw exceptions or mark unused portions with a canary value to assist in detecting faults, but most implementations are aiming for maximum speed and do nothing.
Now you push_back "one", "two", and "three". The array is still 100 elements, capacity is still 100, but size is now 3.
You erase array[1] and every element after 1 up to size will be copied up one element (note potentially huge performance cost here. vector is not the right data structure choice if you are adding and removing items from it at random locations) and size will be reduced by one resulting in "one", "three", and "three". The array is still 100 elements, capacity is still 100, but size is now 2.
Say you add another 99 strings. This pushes size each time a string is added and when size will exceed capacity, a new array will be made, the old array will be copied to the new, and the old will be freed. Something along the lines of:
capacity *= 1.5;
string * temp = new string[capacity];
for (int index = 0; index < size; index ++)
{
temp[index] = array[index];
}
delete array;
array = temp;
The array is now 150 elements, capacity is now 150, and size is now 101.
Result:
There is usually a bit of fluff around the end of a vector that will allow reading out of bounds without the program crashing, but do not confuse this with the program working.
How do you fill with 0 a dynamic matrix, in C++? I mean, without:
for(int i=0;i<n;i++)for(int j=0;j<n;j++)a[i][j]=0;
I need it in O(n), not O(n*m) or O(n^2).
Thanks.
For the specific case where your array is going to to be large and sparse and you want to zero it at allocation time then you can get some benefit from using calloc - on most platforms this will result in lazy allocation with zero pages, e.g.
int **a = malloc(n * sizeof(a[0]); // allocate row pointers
int *b = calloc(n * n, sizeof(b[0]); // allocate n x n array (zeroed)
a[0] = b; // initialise row pointers
for (int i = 1; i < n; ++i)
{
a[i] = a[i - 1] + n;
}
Note that this is, of course, premature optimisation. It is also C-style coding rather than C++. You should only use this optimisation if you have established that performance is a bottleneck in your application and there is no better solution.
From your code:
for(int i=0;i<n;i++)for(int j=0;j<n;j++)a[i][j]=0;
I assume, that your matrix is two dimensional array declared as either
int matrix[a][b];
or
int** matrix;
In first case, change this for loop to a single call to memset():
memset(matrix, 0, sizeof(int) * a * b);
In second case, you will to do it this way:
for(int n = 0; n < a; ++n)
memset(matrix[n], 0, sizeof(int) * b);
On most platforms, a call to memset() will be replaced with proper compiler intrinsic.
every nested loop is not considered as O(n2)
the following code is a O(n),
No 1
for(int i=0;i<n;i++)for(int j=0;j<n;j++)a[i][j]=0;
imagine that you had all of the cells in matrix a copied into a one dimentional flat array and set zero for all of its elements by just one loop, what would be the order then? ofcouse you will say thats a O(n)
No 2 for(int i=0;i<n*m;i++) b[i]=0;
Now lets compare them, No 2 with No 1, ask the following questions from yourselves :
Does this code traverse matrix a cells more than once?
If I can measure the time will there be a difference?
Both answers are NO.
Both codes are O(n), A multi-tier nested loop on a multi-dimentional array produces a O(n) order.