Memory issues with a very large array in C++ - c++

Hi I have the following:
struct myStructure
{
vector<int> myVector;
};
myStructure myArray[10000000];
As you can see I have a very large array of a vectors. The problem is that i dont have a priori knowledge of the number of elements I need to have in the array, but I know that 10 million elements is the max i can have. I have tried two things:
a) make myArray a global array, however the problem is that i have a function that will access myArray many many times, which is resulting in memory leaks and the program crashing for large calculations.
b) declare myArray dynamically from within the function that needs to access it, the memory is kept in check but the program runs about 8 times slower.
Any ideas on how to address this issue. Thanks

access myArray many many times, which is resulting in memory leaks and the program crashing for large calculations
You should fix those bugs in any case.
the memory is kept in check but the program runs about 8 times slower
Since you're already using dynamic allocation with an array of vectors it's not immediately obvious why dynamically allocating one more thing would result in such a slowdown. So you should look into this as well.
Then I would go with a vector<vector<int>> that isn't global but has the appropriate lifespan for its uses
#include <vector>
#include <functional>
#include <algorithm>
using std::vector;
int main() {
vector<vector<int>> v;
for(int i=0;i<100;++i) {
std::for_each(begin(v),end(v),std::mem_fn(&vector<int>::clear));
foo(v);
for(int j=0;j<100;++j) {
std::for_each(begin(v),end(v),std::mem_fn(&vector<int>::clear));
foo(v);
for(int k=0;k<100;++k) {
std::for_each(begin(v),end(v),std::mem_fn(&vector<int>::clear));
foo(v);
for(int l=0;l<100;++l) {
std::for_each(begin(v),end(v),std::mem_fn(&vector<int>::clear));
foo(v);
}
}
}
}
}

The best solution I can find is to call the function "malloc" which reserves space in "heap memory", in the array case you should code something like:
int* myArray = (int*) malloc ( sizeof(int)* Len );
..after that, don't forget to liberate heap memory using free(myArray);
it's a powerful tool to make arrays super large.

Declare this structure in an object with a lifetime guaranteed to surpass the objects that access it and use a reference to access this object. Ideally, you should have a class in your hierarchy that calls all the functions dealing with this struct, so these functions may well be members of your large array of vectors.

Did you try turning your array of vectors into a vector of vectors? Not knowing how many of an item you will need is what vectors are for, after all.
I believe it would be
vector<vector<int>> myVecs;

Use a different data structure. I'd suggest trying something like one of the
sparse matrix classes from Boost. They are optimised for storing numeric data in which each row or column contains a significant number of zeroes. Mind you, if the problem you're trying to solve isn't suitable for handling with a sparse data structure, it would be a good idea to set out the nature of the problem you're trying to solve, in greater detail. Take another look at https://stackoverflow.com/questions/how-to-ask even though I guess you already read that.
But before you do that I think you probably have another problem too:
access myArray many many times, which is resulting in memory leaks and
the program crashing for large calculations
It looks to me from what you write there that your code may have some pre-existing bugs. Unless your crashes are simply caused by trying to allocate a 10000000-element array as an auto variable.

Related

stack overflow eror in c++ [duplicate]

I am using Dev C++ to write a simulation program. For it, I need to declare a single dimensional array with the data type double. It contains 4200000 elements - like double n[4200000].
The compiler shows no error, but the program exits on execution. I have checked, and the program executes just fine for an array having 5000 elements.
Now, I know that declaring such a large array on the stack is not recommended. However, the thing is that the simulation requires me to call specific elements from the array multiple times - for example, I might need the value of n[234] or n[46664] for a given calculation. Therefore, I need an array in which it is easier to sift through elements.
Is there a way I can declare this array on the stack?
No there is no(we'll say "reasonable") way to declare this array on the stack. You can however declare the pointer on the stack, and set aside a bit of memory on the heap.
double *n = new double[4200000];
accessing n[234] of this, should be no quicker than accessing n[234] of an array that you declared like this:
double n[500];
Or even better, you could use vectors
std::vector<int> someElements(4200000);
someElements[234];//Is equally fast as our n[234] from other examples, if you optimize (-O3) and the difference on small programs is negligible if you don't(+5%)
Which if you optimize with -O3, is just as fast as an array, and much safer. As with the
double *n = new double[4200000];
solution you will leak memory unless you do this:
delete[] n;
And with exceptions and various things, this is a very unsafe way of doing things.
You can increase your stack size. Try adding these options to your link flags:
-Wl,--stack,36000000
It might be too large though (I'm not sure if Windows places an upper limit on stack size.) In reality though, you shouldn't do that even if it works. Use dynamic memory allocation, as pointed out in the other answers.
(Weird, writing an answer and hoping it won't get accepted... :-P)
Yes, you can declare this array on the stack (with a little extra work), but it is not wise.
There is no justifiable reason why the array has to live on the stack.
The overhead of dynamically allocating a single array once is neglegible (you could say "zero"), and a smart pointer will safely take care of not leaking memory, if that is your concern.
Stack allocated memory is not in any way different from heap allocated memory (apart from some caching effects for small objects, but these do not apply here).
Insofar, just don't do it.
If you insist that you must allocate the array on the stack, you will need to reserve 32 megabytes of stack space first (preferrably a bit more). For that, using Dev-C++ (which presumes Windows+MingW) you will either need to set the reserved stack size for your executable using compiler flags such as -Wl,--stack,34000000 (this reserves somewhat more than 32MiB), or create a thread (which lets you specify a reserved stack size for that thread).
But really, again, just don't do that. There's nothing wrong with allocating a huge array dynamically.
Are there any reasons you want this on the stack specifically?
I'm asking because the following will give you a construct that can be used in a similar way (especially accessing values using array[index]), but it is a lot less limited in size (total max size depending on 32bit/64bit memory model and available memory (RAM and swap memory)) because it is allocated from the heap.
int arraysize= 4200000;
int *heaparray= new int[arraysize];
...
k= heaparray[456];
...
delete [] heaparray;
return;

Maximum number of pointer in one variable

In my project, there are one million inputs and I am supposed to take different numbers of inputs in order to compare sort/search algorithms. Everything was allright till I tried to take five hundread thousand inputs. Therefore, I have realized that I can't create five hundred thousand pointers to my class or even an integer type by using array. However, I can create five pointers with size of one hundred thousand.
If I didn't explain very well, just look these two codes;
int *ptr[500000]; // it crashes
int *ptr1[100000]; // it runs well
int *ptr2[100000];
int *ptr3[100000];
int *ptr4[100000];
int *ptr5[100000];
What is the reason of crashing? Is there a limiting or is it about memory? And of course, how can I fix it?
You are trying to allocate a 500,000-entry array on the stack. The stack is not really designed for holding large amounts of data like this. In your case, the stack just happens to be big enough to hold 100,000 entries (or even several different lots of 100,000 entries) but not 500,000 in a single block. If you overflow the stack, behaviour is undefined but a crash is likely.
You will get much better results by allocating your array on the heap instead.
int **ptr = malloc(500000*sizeof(int*));
Remember to check for a NULL return value from malloc, and free the memory when you're finished with it.

C++ dynamic allocation

I'm very confused with regard to the following instructions:
#include <iostream>
#define MAX_IT 100
using namespace std;
class Integer{
private :
int a;
public:
Integer(int valoare){a=valoare;}
int getA(){return a;}
void setA(int valoare){a=valoare;}
};
int main(){
Integer* a=new Integer(0);
//cout<<a[0].getA();
for(int i=1;i<=MAX_IT;i++)
{
a[i]=*(new Integer(i));
}
for(int i=0;i<=MAX_IT;i++)
cout<<a[i].getA()<<endl;
return 13;
}
It works for small values of MAX_IT, but when I try to set MAX_IT to 1000 it doesn't work anymore.
Initially, I thought "new" operator was supposed to do the job, but after some reading documentation I understood it is not supposed to work at all like this (out of bound array).
So my question is: why is it working for small values of MAX_IT and not for bigger ones?
EDIT:
I am experimenting with this code for a larger program, where I am not allowed to use STL. You have not understood my concern: if I have Integer *var=new Integer[10]; for(int k=1;K<10;k++) *(var+k)=k; //this is perfectly fine, but if I try var[10]=new Integer; //this should not be working and should generate a memory problem //My concern is that it is working if I do it only 100 times or so...The question if why is it working everytime for small number of iterations?
Because by allocating space for one Integer then using it as an array of multiple Integers, your code invokes undefined behavior, meaning that it can do anything, including crashing, working seemingly fine, or pulling demons out of your nose.
And anyways it's leaking memory. If you don't need dynamic memory allocation, then don't use it.
a[i]=*(new Integer(i));
And kaboom, you lost the pointer to the Integer, no chance to delete it later. Leaks.
If you don't need raw arrays, don't use them. Prefer std::vector. Or switch to C if C++ is too hard.
std::vector<Integer> vec;
vec.push_back(Integer(1337));
The reason that things tend to work nicely when you overflow your buffer by just a little bit is... memory fragmentation! Who would have guessed?
To avoid memory fragmentation, allocators won't return you a block of just sizeof (Integer). They'll give you a somewhat larger block, to ensure that if the block is later freed before the adjacent blocks, it's at least big enough to be useful.
Exactly how big this is can vary by architecture, OS, compiler version, or even how much memory is physically present in the machine. You should consider it to be completely unpredictable. Also, some libraries designed to help catch this sort of bug force any small object to be placed at the end of the block instead of the beginning, so the extra bytes could be negative array indices instead of positive.
Therefore, don't ever rely on having spare area given to you for free after (or before) an object.
Guru note: Occasionally someone comes up with a valid use for the extra memory, and asks for a way to discover how large it is. One good example is that the capacity (not size!) of a std::vector could be adjusted to match the actual allocated space instead of the requested space, and therefore reduce (on average) the number of reallocations needed. Such requests usually come paired with other guru allocator APIs, such as the ability to expand an allocation in-place if there happen to be free blocks adjacent.
Note that in your particular case you do still have undefined behavior, because you're calling operator= on a non-POD object which hasn't first been constructed. If you gave class Integer a trivial default constructor that would change.
you actually need
Integer* a=new Integer[MAX_IT];
//cout<<a[0].getA();
for(int i=1;i<MAX_IT;i++) << note < not <=
{
a[i]=i;
}
better would be to use std::vector though

Declare large array on Stack

I am using Dev C++ to write a simulation program. For it, I need to declare a single dimensional array with the data type double. It contains 4200000 elements - like double n[4200000].
The compiler shows no error, but the program exits on execution. I have checked, and the program executes just fine for an array having 5000 elements.
Now, I know that declaring such a large array on the stack is not recommended. However, the thing is that the simulation requires me to call specific elements from the array multiple times - for example, I might need the value of n[234] or n[46664] for a given calculation. Therefore, I need an array in which it is easier to sift through elements.
Is there a way I can declare this array on the stack?
No there is no(we'll say "reasonable") way to declare this array on the stack. You can however declare the pointer on the stack, and set aside a bit of memory on the heap.
double *n = new double[4200000];
accessing n[234] of this, should be no quicker than accessing n[234] of an array that you declared like this:
double n[500];
Or even better, you could use vectors
std::vector<int> someElements(4200000);
someElements[234];//Is equally fast as our n[234] from other examples, if you optimize (-O3) and the difference on small programs is negligible if you don't(+5%)
Which if you optimize with -O3, is just as fast as an array, and much safer. As with the
double *n = new double[4200000];
solution you will leak memory unless you do this:
delete[] n;
And with exceptions and various things, this is a very unsafe way of doing things.
You can increase your stack size. Try adding these options to your link flags:
-Wl,--stack,36000000
It might be too large though (I'm not sure if Windows places an upper limit on stack size.) In reality though, you shouldn't do that even if it works. Use dynamic memory allocation, as pointed out in the other answers.
(Weird, writing an answer and hoping it won't get accepted... :-P)
Yes, you can declare this array on the stack (with a little extra work), but it is not wise.
There is no justifiable reason why the array has to live on the stack.
The overhead of dynamically allocating a single array once is neglegible (you could say "zero"), and a smart pointer will safely take care of not leaking memory, if that is your concern.
Stack allocated memory is not in any way different from heap allocated memory (apart from some caching effects for small objects, but these do not apply here).
Insofar, just don't do it.
If you insist that you must allocate the array on the stack, you will need to reserve 32 megabytes of stack space first (preferrably a bit more). For that, using Dev-C++ (which presumes Windows+MingW) you will either need to set the reserved stack size for your executable using compiler flags such as -Wl,--stack,34000000 (this reserves somewhat more than 32MiB), or create a thread (which lets you specify a reserved stack size for that thread).
But really, again, just don't do that. There's nothing wrong with allocating a huge array dynamically.
Are there any reasons you want this on the stack specifically?
I'm asking because the following will give you a construct that can be used in a similar way (especially accessing values using array[index]), but it is a lot less limited in size (total max size depending on 32bit/64bit memory model and available memory (RAM and swap memory)) because it is allocated from the heap.
int arraysize= 4200000;
int *heaparray= new int[arraysize];
...
k= heaparray[456];
...
delete [] heaparray;
return;

Efficiently collect data from multiple 1-D arrays in to a single 1-D array

I've got a prewritten function in C that fills an 1-D array with data, e.g.
int myFunction(myData **arr,...);
myData *array;
int arraySize;
arraySize = myFunction(&arr, ...);
I would like to call the function n times in a row with slightly different parameters (n is dependent on user input), and I need all the data collected in a single C array afterwards. The size of the returned array is not always fixed. Oh, and myFunction does the memory allocation internally. I want to do this in a memory-efficient way, but using realloc in each iteration does not sound like a good idea.
I do have all the C++ functionality available (the project is in C++, just using a C library), but using std::vector is no good because the collected data is later sent in to a function with a definition similar to:
void otherFunction(myData *data, int numData, ...);
Any ideas? Only things I can think of are realloc or using a std::vector and copying the data into an array afterwards, and those don't sound too promising.
Using realloc() in each iteration sounds like a very fine idea to me, for two reasons:
"does not sound like a good idea" is what people usually say when they have not established a performance requirement for their software, and they have not tested their software against the performance requirement to see if there is any need to improve it.
Instead of reallocating a new block each time, the realloc method will simply keep expanding your memory block which will presumably be at the top of the memory heap, so it won't be wasting any time either traversing memory block lists, or copying data around. This holds true provided that whatever memory allocated by myFunction() gets freed before it returns. You can verify it by looking at the pointer returned by realloc() and seeing that it always (or almost always(*1)) is the exact same pointer as the one you gave it to reallocate.
EDIT (*1) some C++ runtimes implement two heaps, one for small allocations and one for large allocations, so if your block gets allocated in the heap for small blocks, and then it grows large, there is a possibility that it will be moved once to the heap for large blocks. So, don't expect the pointer to always be the same; just most of the time.
Just copy all of the data into an std::vector. You can call otherFunction on a vector v with
otherFunction(&v[0], v.size(), ...)
or
otherFunction(v.data(), v.size(), ...)
As for your efficiency requirement: it looks to me like your optimizing prematurely. First try this option, then measure how fast it is and only look for other solutions if it's really too slow.
If you know that you are going to call the function N times, and returned arrays are always M long, then why don't you just allocate one array M*N initially? Or if you don't know one of M or N, then set a worst case maximum. Or are M and N both dependent on user-input?
Then, change how you call your user-input-getting function, such that the array pointer you pass it is actually an offset into that large array, so that it stores the data in the right location. Then, next iteration, offset further, and call again.
I think best solution would be to write your own 1D array class with some methods which you need.
depending on how you write the class you'll get such result. (sorry bad grammar)..