When I run the following code I get a stack overflow. How is it possible? I thought if I define an array outside of main it will be static and I won't have memory problems? What can I do against it?
#include <stdio.h>
#define dim1 14001
#define dim2 14001
#define dim4 8
double large_array[dim1][dim2][dim4];
int main()
{
large_array[30][6][5] = 1337;
printf("%lf\t",large_array[30][6][5]);
}
One problem this has is that it's very likely that the place where the compiler wants to store the array doesn't have room to fit that much memory -- in some environments, very large allocations must be done on the heap, e.g. with malloc, new, or other similar means (or implicitly through those means, e.g. as done internally by std::vector).
3-D global array you declared consumes too much memory (~1043456KB.i.e..~1GB) on heap while initialising at compile time, which is why your program is giving overflow problem.
One intuitive way to handle this problem is to use multidimensional map STL instead of Multidimensional Global array.
#include<stdio.h>
#include<map>
/*#define dim1 14001
#define dim2 14001
#define dim4 8
double large_array[dim1][dim2][dim4];
*/
using namespace std;
map<double, map<double,map<double,double> > > large_array;
int main()
{
large_array[30][6][5] = 1337;
printf("%lf\t",large_array[30][6][5]);
}
The maximum size of an statically allocated array is determined by the amount of memory that a program can access. On a 32-bit system, the maximum amount of memory that can be addressed by a pointer is 2^32 bytes which is 4 gigabytes. The actual limit may be less, depending on operating system implementation details and compiler implementation choices.
As interjay said in comment, your allocation requires quite higher amount of space.
In C++, you should use the provided containers like vector or the boost multi dimensional arrays to handle such cases.
Related
This question already has answers here:
How to solve the 32-byte-alignment issue for AVX load/store operations?
(3 answers)
Closed 4 years ago.
I do some operations on array using SIMD, so I need to have them aligned in memory. When I place arrays on the stack, I simply do this and it works:
#define BUFFER_SIZE 10000
alignas(16) float approxFreqMuls_Float[BUFFER_SIZE];
alignas(16) double approxFreqMuls_Double[BUFFER_SIZE];
But now I need to allocate more memory (such as 96k doubles, or more): so I think the heap is the way; but when I do this:
int numSteps = 96000;
alignas(16) float *approxFreqMuls_Float = new float[numSteps];
alignas(16) double *approxFreqMuls_Double = new double[numSteps];
It thrown error on ostream. Not really sure about the message (I'm on MSVC, nothing appair).
How would you allocate aligned arrays on heap?
Heap allocations are aligned to the maximum native alignment by default, so as long as you don't need to over-align, then you don't need to do anything in particular to align it.
If you do need over-alignment, for some reason, you can use the aligned new syntax new (std::align_val_t(16)) float[numSteps]; (or std::aligned_alloc which is in the malloc family of functions and the memory must therefore be freed rather than deleted).
If you don't have C++17, then you need to allocate size + align - 1 bytes instead if size, and std::align the pointer - or use a non-standard aligned allocation function provided on your target platform.
I have two ways of constructing a 2D array:
int arr[NUM_ROWS][NUM_COLS];
//...
tmp = arr[i][j]
and flattened array
int arr[NUM_ROWS*NUM_COLS];
//...
tmp = arr[i*NuM_COLS+j];
I am doing image processing so even a little improvement in access time is necessary. Which one is faster? I am thinking the first one since the second one needs calculation, but then the first one requires two addressing so I am not sure.
I don't think there is any performance difference. System will allocate same amount of contiguous memory in both cases. For calculate i*Numcols+j, either you would do it for 1D array declaration, or system would do it in 2D case. Only concern is ease of usage.
You should have trust into the capabilities of your compiler in optimizing standard code.
Also you should have trust into modern CPUs having fast numeric multiplication instructions.
Don't bother to use one or another!
I - decades ago - optimized some code greatly by using pointers instead of using 2d-array-calculation --> but this will a) only be useful if it is an option to store the pointer - e.g. in a loop and b) have low impact since i guess modern cpus should do 2d array access in a single cycle? Worth measuring! May be related to the array size.
In any case pointers using ptr++ or ptr += NuM_COLS will for sure be a little bit faster if applicable!
The first method will almost always be faster. IN GENERAL (because there are always corner cases) processor and memory architecture as well as compilers may have optimizations built in to aid with 2d arrays or other similar data structures. For example, GPUs are optimized for matrix (2d array) math.
So, again in general, I would allow the compiler and hardware to optimize your memory and address arithmetic if possible.
...also I agree with #Paul R, there are much bigger considerations when it comes to performance than your array allocation and address arithmetic.
There are two cases to consider: compile time definition and run-time definition of the array size. There is big difference in performance.
Static allocation, global or file scope, fixed size array:
The compiler knows the size of the array and tells the linker to allocate space in the data / memory section. This is the fastest method.
Example:
#define ROWS 5
#define COLUMNS 6
int array[ROWS][COLUMNS];
int buffer[ROWS * COLUMNS];
Run time allocation, function local scope, fixed size array:
The compiler knows the size of the array, and tells the code to allocate space in the local memory (a.k.a. stack) for the array. In general, this means adding a value to a stack register. Usually one or two instructions.
Example:
void my_function(void)
{
unsigned short my_array[ROWS][COLUMNS];
unsigned short buffer[ROWS * COLUMNS];
}
Run Time allocation, dynamic memory, fixed size array:
Again, the compiler has already calculated the amount of memory required for the array since it was declared with fixed size. The compiler emits code to call the memory allocation function with the required amount (usually passed as a parameter). A little slower because of the function call and the overhead required to find some dynamic memory (and maybe garbage collection).
Example:
void another_function(void)
{
unsigned char * array = new char [ROWS * COLS];
//...
delete[] array;
}
Run Time allocation, dynamic memory, variable size:
Regardless of the dimensions of the array, the compiler must emit code to calculate the amount of memory to allocate. This quantity is then passed to the memory allocation function. A little slower than above because of the code required to calculate the size.
Example:
int * create_board(unsigned int rows, unsigned int columns)
{
int * board = new int [rows * cols];
return board;
}
Since your goal is image processing then I would assume your images are too large for static arrays. The correct question you should be about dynamically allocated arrays
In C/C++ there are multiple ways you can allocate a dynamic 2D array How do I work with dynamic multi-dimensional arrays in C?. To make this work in both C/C++ we can use malloc with casting (for C++ only you can use new)
Method 1:
int** arr1 = (int**)malloc(NUM_ROWS * sizeof(int*));
for(int i=0; i<NUM_ROWS; i++)
arr[i] = (int*)malloc(NUM_COLS * sizeof(int));
Method 2:
int** arr2 = (int**)malloc(NUM_ROWS * sizeof(int*));
int* arrflat = (int*)malloc(NUM_ROWS * NUM_COLS * sizeof(int));
for (int i = 0; i < dimension1_max; i++)
arr2[i] = arrflat + (i*NUM_COLS);
Method 2 essentially creates a contiguous 2D array: i.e. arrflat[NUM_COLS*i+j] and arr2[i][j] should have identical performance. However, arrflat[NUM_COLS*i+j] and arr[i][j] from method 1 should not be expected to have identical performance since arr1 is not contiguous. Method 1, however, seems to be the method that is most commonly used for dynamic arrays.
In general, I use arrflat[NUM_COLS*i+j] so I don't have to think of how to allocated dynamic 2D arrays.
I am receiving the error "User store segfault # 0x000000007feff598" for a large convolution operation.
I have defined the resultant array as
int t3_isize = 0;
int t3_irowcount = 0;
t3_irowcount=atoi(argv[2]);
t3_isize = atoi(argv[3]);
int iarray_size = t3_isize*t3_irowcount;
uint64_t t_result[iarray_size];
I noticed that if the array size is less than 2^16 - 1, the operation doesn't fail, but for the array size 2^16 or higher, I get the segfault error.
Any idea why this is happening? And how can i rectify this?
“I noticed that if the array size is greater than 2^16 - 1, the operation doesn't fail, but for the array size 2^16 or higher, I get the segfault error”
↑ Seems a bit self-contradictory.
But probably you're just allocating a too large array on the stack. Using dynamic memory allocation (e.g., just switch to using std::vector) you avoid that problem. For example:
std::vector<uint64_t> t_result(iarray_size);
In passing, I would ditch the Hungarian notation-like prefixes. For example, t_ reads like this is a type. The time for Hungarian notation was late 1980's, and its purpose was to support Microsoft's Programmer's Workbench, a now dicontinued (for very long) product.
You're probably declaring too large of an array for the stack. 216 elements of 8 bytes each is quite a lot (512K bytes).
If you just need static allocation, move the array to file scope.
Otherwise, consider using std::vector, which will allocate storage from the heap and manage it for you.
Using malloc() solved the issue.
uint64_t* t_result = (uint64_t*) malloc(sizeof(uint64_t)*iarray_size);
I'm using Visual Studio 2010 Win 8. I have a class where I'm making a 2D array to hold game data for a game.
Create a blank console app and make main.cpp and add this code. Using 360 for MAP_SIZE causes stack overflow using 359 doesn't. Why would this be? I'm looking to have a much larger size array. I'd like something like 2000 - 10,000 ideally.
#define MAP_SIZE 360
typedef unsigned short ushort;
typedef unsigned long ulong;
struct Tile
{
ushort baseLayerTileID;
ulong ownerID;
};
class Server
{
private:
Tile _map[MAP_SIZE][MAP_SIZE];
};
int main()
{
Server s;
return 0;
}
My estimates put sizeof(Tile) at 8 or more. That means sizeof(Server) is at least 360*360*8 = 1036800, which is 0.99 MB. The stack is usually small, and 1MB is a common default size. You should allocate the tiles on the heap instead, perhaps using std::vector.
class Server
{
public:
Server() : _map(MAP_SIZE * MAP_SIZE) {}
private:
std::vector<Tile> _map; // position [i][j] is at [i*MAP_SIZE+j]
};
You're allocating an array of 360 x 360 Tile objects on the stack. This is a bad idea from the get go. You are allocated a very large block of memory on the stack. The stack isn't intended for this type of usage.
This memory should either be static, if you only need one instance and know in advance the size, or you should allocate it from the heap (using new or even malloc()).
Consider having the constructor for Server allocate the memory using new instead of doing it how you are doing it.
The stack has limited size. If you need to hold a big array, use dynamic allocation.
You've created a type which requires ~1MB of stack space per instance, which apparently is larger than your stack can accommodate.
The portable option is to change from a fixed array to dynamically allocated or to a vector type.
The non-portable option is to increase the stack size in your application (which in turn increases the size of the stack for all threads)
The default stack size is 1MB. your struct size = ushort(2bytes) + ulong (4byte)= 6 bytes which the compiler converts to 8 bytes for struct alignment .
so 8*360*360 =1036800 Bytes , marginally above 1MB
There are 3 solutions:
1- force stop alignment :
#pragma pack(push) /* push current alignment to stack */
#pragma pack(1) /* set alignment to 1 byte boundary */
struct Tile
{
ushort baseLayerTileID;
ulong ownerID;
};
#pragma pack(pop) /* restore original alignment from stack */
This will allow for a maximum MAP_SIZE= sqrt(1024*1024/6)=418 ,so this allows for a larger mapsize but not the size you wish
2-You can change visual studio settings to allow compiler and linker to use more than 1 MB in stack:
you need to change it to be larger the maximum mapsize you need which is 8*10000*10000 ~800MB
right click project, and choose properties from the menu .
go to configuration properties->C/C++-> Commandline, add this parameter:
/F801000000
go to Configuration properties->Linker->Commandline, add this parameter
/STACK:801000000
Done!
3- the third solution is dynamic array to allocate over the heap, instead of static array , as all have said.
I'm having an issue with (specifically the MSFT VS 10.0 implementation of) std::unique_ptrs. When I create a std::list of them, I use twice as much memory as when I create a std::list of just the underlying object (note: this is a big object -- ~200 bytes, so it's not just an extra reference counter lying around).
In other words, if I run:
std::list<MyObj> X;
X.resize( 1000, MyObj());
my application will require half as much memory as when I run:
std::list<std::unique_ptr<MyObj>> X;
for ( int i=0; i<1000; i++ ) X.push_back(std::unique_ptr<MyObj>(new MyObj()));
I've checked out the MSFT implementation and I don't see anything obvious -- any one encountered this and have any ideas?
EDIT: Ok, to be a bit more clear/specific. This is clearly a Windows memory usage issue and I am obviously missing something. I have now tried the following:
Create a std::list of 100000 MyObj
Create a std::list of 100000 MyObj*
Create a std::list of 100000 int*
Create a std::list of 50000 int*
In each case, each add'l member of the list, whether a pointer or otherwise, is bloating my application by 4400(!) bytes. This is in a release, 64-bit build, without any debugging information included (Linker > Debugging > Generate Debug Info set to No).
I obviously need to research this a bit more to narrow it down to a smaller test case.
For those interested, I am determining application size using Process Explorer.
Turns out it was entirely heap fragmentation. How ridiculous. 4400 bytes per 8 byte object! I switched to pre-allocating and the problem went away entirely -- I am used to some inefficiency in relying on per-object allocation, but this was just ridiculous.
MyObj implementation below:
class MyObj
{
public:
MyObj() { memset(this,0,sizeof(MyObj)); }
double m_1;
double m_2;
double m_3;
double m_4;
double m_5;
double m_6;
double m_7;
double m_8;
double m_9;
double m_10;
double m_11;
double m_12;
double m_13;
double m_14;
double m_15;
double m_16;
double m_17;
double m_18;
double m_19;
double m_20;
double m_21;
double m_22;
double m_23;
CUnit* m_UnitPtr;
CUnitPos* m_UnitPosPtr;
};
The added memory is likely from heap inefficiencies - you have to pay extra for each block you allocate due to internal fragmentation and malloc data. You're performing twice the amount of allocations which is going to incur a penalty hit.
For instance, this:
for(int i = 0; i < 100; ++i) {
new int;
}
will use more memory than this:
new int[100];
Even though the amount allocated is the same.
Edit:
I'm getting around 13% more memory used using unique_ptr using GCC on Linux.
std::list<MyObj> contains N copies of your object (+ the information needed for the pointers of the list).
std::unique_ptr<MyObj> contains a pointer to a instance of your object. (It should only contain a MyObj*).
So a std::list<std::unique_ptr<MyObj>> is not directly equivalent to your first list. std::list<MyObj*> should give the same size as the std::unque_ptr list.
After verifying the implementation, the only thing that could be embedded next to the pointer of the object itself, could be the 'deleter', which in the default case is a empty object that calls operator delete.
Do you have a Debug or a Release build?
This isn't an answer, but it doesn't fit in a comment and it might be illustrative.
I cannot reproduce the claim (GCC 4.6.2). Take this code:
#include <memory>
#include <list>
struct Foo { char p[200]; };
int main()
{
//std::list<Foo> l1(100);
std::list<std::unique_ptr<Foo>> l2;
for (unsigned int i = 0; i != 100; ++i) l2.emplace_back(new Foo);
}
Enabling only l1 produces (in Valgrind):
total heap usage: 100 allocs, 100 frees, 20,800 bytes allocated
Enabling only l2 and the loop gives:
total heap usage: 200 allocs, 200 frees, 21,200 bytes allocated
The smart pointers take up exactly 4 × 100 bytes.
In both cases, /usr/bin/time -v gives:
Maximum resident set size (kbytes): 3136
Further more, pmap shows in both cases: total 2996K. To confirm, I changed the object size to 20000 and the number of elements to 10000. Now the numbers are 198404K vs 198484K: Exactly 80000B difference, 8B per unique pointer (presumably there's some 8B alignment going on in the allocator of the list). Under the same changes, the "maximum resident set size"s reported by time -v are now 162768 vs 164304.