I'm using Visual Studio 2010 Win 8. I have a class where I'm making a 2D array to hold game data for a game.
Create a blank console app and make main.cpp and add this code. Using 360 for MAP_SIZE causes stack overflow using 359 doesn't. Why would this be? I'm looking to have a much larger size array. I'd like something like 2000 - 10,000 ideally.
#define MAP_SIZE 360
typedef unsigned short ushort;
typedef unsigned long ulong;
struct Tile
{
ushort baseLayerTileID;
ulong ownerID;
};
class Server
{
private:
Tile _map[MAP_SIZE][MAP_SIZE];
};
int main()
{
Server s;
return 0;
}
My estimates put sizeof(Tile) at 8 or more. That means sizeof(Server) is at least 360*360*8 = 1036800, which is 0.99 MB. The stack is usually small, and 1MB is a common default size. You should allocate the tiles on the heap instead, perhaps using std::vector.
class Server
{
public:
Server() : _map(MAP_SIZE * MAP_SIZE) {}
private:
std::vector<Tile> _map; // position [i][j] is at [i*MAP_SIZE+j]
};
You're allocating an array of 360 x 360 Tile objects on the stack. This is a bad idea from the get go. You are allocated a very large block of memory on the stack. The stack isn't intended for this type of usage.
This memory should either be static, if you only need one instance and know in advance the size, or you should allocate it from the heap (using new or even malloc()).
Consider having the constructor for Server allocate the memory using new instead of doing it how you are doing it.
The stack has limited size. If you need to hold a big array, use dynamic allocation.
You've created a type which requires ~1MB of stack space per instance, which apparently is larger than your stack can accommodate.
The portable option is to change from a fixed array to dynamically allocated or to a vector type.
The non-portable option is to increase the stack size in your application (which in turn increases the size of the stack for all threads)
The default stack size is 1MB. your struct size = ushort(2bytes) + ulong (4byte)= 6 bytes which the compiler converts to 8 bytes for struct alignment .
so 8*360*360 =1036800 Bytes , marginally above 1MB
There are 3 solutions:
1- force stop alignment :
#pragma pack(push) /* push current alignment to stack */
#pragma pack(1) /* set alignment to 1 byte boundary */
struct Tile
{
ushort baseLayerTileID;
ulong ownerID;
};
#pragma pack(pop) /* restore original alignment from stack */
This will allow for a maximum MAP_SIZE= sqrt(1024*1024/6)=418 ,so this allows for a larger mapsize but not the size you wish
2-You can change visual studio settings to allow compiler and linker to use more than 1 MB in stack:
you need to change it to be larger the maximum mapsize you need which is 8*10000*10000 ~800MB
right click project, and choose properties from the menu .
go to configuration properties->C/C++-> Commandline, add this parameter:
/F801000000
go to Configuration properties->Linker->Commandline, add this parameter
/STACK:801000000
Done!
3- the third solution is dynamic array to allocate over the heap, instead of static array , as all have said.
Related
There are many great threads on how to align structs to the cache line (e.g., Aligning to cache line and knowing the cache line size).
Imagine you have a system with 256B cache line size, and a struct of size 17B (e.g., a tightly packed struct with two uint64_t and one uint8_t). If you align the struct to cache line size, you will have exactly one cache line load per struct instance.
For machines with a cache line size of 32B or maybe even 64B, this will be good for performance, because we avoid having to fetch 2 caches lines as we do definitely not cross CL boundaries.
However, on the 256B machine, this wastes lots of memory and results in unnecessary loads when iterating through an array/vector of this struct. In fact, you could store 15 instances of the struct in a single cacheline.
My question is two-fold:
In C++17 and above, using alignas, I can align to cache line size. However, it is unclear to me how I can force alignment in a way that is similar to "put as many instances in a cache line as possible without crossing the cache line boundary, then start at the next cache line". So something like this:
where the upper box is a cache line and the other boxes are instances of our small struct.
Do I actually want this? I cannot really wrap my head around this. Usually, we say if we align our struct to the cache line size, access will be faster, as we just have to load a single cache line. However, seeing my example, I wonder if this is actually true. Wouldn't it be faster to not be aligned, but instead store many more instances in a single cache line?
Thank you so much for your input here. It is much appreciated.
To address (2), it is unclear whether the extra overhead of using packed structs (e.g., unaligned 64-bit accesses) and the extra math to access array elements will be worth it. But if you want to try it, you can create a new struct to pack your struct elements appropriately, then create a small wrapper class to access the elements like you would an array:
#include <array>
#include <iostream>
using namespace std;
template <typename T, size_t BlockAlignment>
struct __attribute__((packed)) Packer
{
static constexpr size_t NUM_ELEMS = BlockAlignment / sizeof(T);
static_assert( NUM_ELEMS > 0, "BlockAlignment too small for one object." );
T &operator[]( size_t index ) { return packed[index]; }
T packed[ NUM_ELEMS ];
uint8_t padding[ BlockAlignment - sizeof(T)*NUM_ELEMS ];
};
template <typename T, size_t NumElements, size_t BlockAlignment>
struct alignas(BlockAlignment) PackedAlignedArray
{
typedef Packer<T, BlockAlignment> PackerType;
std::array< PackerType, NumElements / PackerType::NUM_ELEMS + 1 > packers;
T &operator[]( size_t index ) {
return packers[ index / PackerType::NUM_ELEMS ][ index % PackerType::NUM_ELEMS ];
}
};
struct __attribute__((packed)) Foo
{
uint64_t a;
uint64_t b;
uint8_t c;
};
int main()
{
static_assert( sizeof(Foo) == 17, "Struct not packed for test" );
constexpr size_t NUM_ELEMENTS = 10;
constexpr size_t BLOCK_ALIGNMENT = 64;
PackedAlignedArray<Foo, NUM_ELEMENTS, BLOCK_ALIGNMENT> theArray;
for ( size_t i=0; i<NUM_ELEMENTS; ++i )
{
// Display the memory offset between the current
// element and the start of the array
cout << reinterpret_cast<std::ptrdiff_t>(&theArray[i]) -
reinterpret_cast<std::ptrdiff_t>(&theArray[0]) << std::endl;
}
return 0;
}
The output of the program shows the byte offsets of the addresses in memory of the the 17-byte elements, automatically resetting to a multiple of 64 every four elements:
0
17
34
64
81
98
128
145
162
192
You could pack into a cache line by declaring the struct itself as under-aligned, with GNU C __attribute__((packed)) or something, so it has sizeof(struct) = 17 instead of the usual padding you'd get to make the struct size a multiple of 8 bytes (the alignment it would normally have because of having a uint64_t member, assuming alignof(uint64_t) == 8).
Then put it in an alignas(256) T array[], so only the start of the array is aligned.
Alignment to boundaries wider than the struct object itself is only possible in terms of a larger object containing multiple structs; ISO C++'s alignment system can't specify that an object can only go in containers which start at a certain alignment boundary; that would lead to nonsensical situations like T *arr being the start of a valid array, but arr + 1 no being a valid array, even though it has the same type.
Unless you're worried about false sharing between two threads, you should at most naturally-align your struct, e.g. to 32 bytes. A naturally-aligned object (alignof=sizeof) can't span an alignment boundary larger than itself. But that wastes a lot of space, so probably better to just let the compiler align it by 8, inheriting alignof(struct) = max (alignof(members)).
See also How do I organize members in a struct to waste the least space on alignment? for more detail on how space inside a single struct works.
Depending on how your data is accessed, one option would be to store pairs of uint64_t in one array, and the corresponding byte in a separate array. That way you have no padding bytes and everything is a power of 2 size and alignment. But random access to all 3 member that go with each other could cost 2 cache misses instead of 1.
But for iterating over your data sequentially, that's excellent.
I have read documentations about posix_memalign(). I still not sure how to deal with this The value of alignment shall be a power of two multiple of sizeof(void *).
Also, I need some error messages to check that my alignment is successful.
I need to allocate memories aligned with 64bytes for the following arrays along with error messages for check up.
int array_dataset [5430][20];
int X_train [4344][20];
int Y_train[4344];
int data_point [20];
int Y-test [1068];
int X_test [1068][20];
posix_memalign allocates aligned heap memory (similar to malloc), so cannot be used with static or auto arrays like you show. Instead, your variables need to be pointers that you use to access the memory
int *Y_train = 0;
if (posix_memalign(&Y_train, 64, 4344*sizeof(*Y_train)) {
... there was an error
Note that for your odd-sized 2D arrays that may be a problem. You can declare
int (*array_dataset)[20] = 0;
if (posix_memalign(&array_dataset, 64, 5340*sizeof(*array_dataset)) {
but doing so will only align the first subarray -- array[0] will be aligned on a 64-byte boundary. But because sizeof(int[20]) is not a multiple of 64 (it is probably 80, but might be 40 or 160 on some machines), array[1] will not be aligned. You might want to use int (*array_dataset)[32]; instead to avoid this. Or swap the indexes and use int (*array_dataset)[5440] -- it all depends on what you are trying to do and why you want aligned memory in the first place.
When I run the following code I get a stack overflow. How is it possible? I thought if I define an array outside of main it will be static and I won't have memory problems? What can I do against it?
#include <stdio.h>
#define dim1 14001
#define dim2 14001
#define dim4 8
double large_array[dim1][dim2][dim4];
int main()
{
large_array[30][6][5] = 1337;
printf("%lf\t",large_array[30][6][5]);
}
One problem this has is that it's very likely that the place where the compiler wants to store the array doesn't have room to fit that much memory -- in some environments, very large allocations must be done on the heap, e.g. with malloc, new, or other similar means (or implicitly through those means, e.g. as done internally by std::vector).
3-D global array you declared consumes too much memory (~1043456KB.i.e..~1GB) on heap while initialising at compile time, which is why your program is giving overflow problem.
One intuitive way to handle this problem is to use multidimensional map STL instead of Multidimensional Global array.
#include<stdio.h>
#include<map>
/*#define dim1 14001
#define dim2 14001
#define dim4 8
double large_array[dim1][dim2][dim4];
*/
using namespace std;
map<double, map<double,map<double,double> > > large_array;
int main()
{
large_array[30][6][5] = 1337;
printf("%lf\t",large_array[30][6][5]);
}
The maximum size of an statically allocated array is determined by the amount of memory that a program can access. On a 32-bit system, the maximum amount of memory that can be addressed by a pointer is 2^32 bytes which is 4 gigabytes. The actual limit may be less, depending on operating system implementation details and compiler implementation choices.
As interjay said in comment, your allocation requires quite higher amount of space.
In C++, you should use the provided containers like vector or the boost multi dimensional arrays to handle such cases.
I am now reading the source code of OPENCV, a computer vision open source library. I am confused with this function:
#define CV_MALLOC_ALIGN 16
void* fastMalloc( size_t size )
{
uchar* udata = (uchar*)malloc(size + sizeof(void*) + CV_MALLOC_ALIGN);
if(!udata)
return OutOfMemoryError(size);
uchar** adata = alignPtr((uchar**)udata + 1, CV_MALLOC_ALIGN);
adata[-1] = udata;
return adata;
}
/*!
Aligns pointer by the certain number of bytes
This small inline function aligns the pointer by the certian number of bytes by
shifting it forward by 0 or a positive offset.
*/
template<typename _Tp> static inline _Tp* alignPtr(_Tp* ptr, int n=(int)sizeof(_Tp))
{
return (_Tp*)(((size_t)ptr + n-1) & -n);
}
fastMalloc is used to allocated memory for a pointer, which invoke malloc function and then alignPtr. I cannot understand well why alignPtr is called after memory is allocated? My basic understanding is by doing so it is much faster for the machine to find the pointer. Can some references on this issue be found in the internet? For modern computer, is it still necessary to perform this operation? Any ideas will be appreciated.
Some platforms require certain types of data to appear on certain byte boundaries (e.g:- some compilers
require pointers to be stored on 4-byte boundaries).
This is called alignment, and it calls for extra padding within, and possibly at the end of, the object's data.
Compiler might break in case they didn't find proper alignment OR there could be performance bottleneck in reading that data ( as there would be a need to read two blocks for getting same data).
EDITED IN RESPONSE TO COMMENT:-
Memory request by a program is generally handled by memory allocator. One such memory allocator is fixed-size allocator. Fixed size allocation return chunks of specified size even if requested memory is less than that particular size. So, with that background let me try to explain what's going on here:-
uchar* udata = (uchar*)malloc(size + sizeof(void*) + CV_MALLOC_ALIGN);
This would allocate amount of memory which is equal to memory_requested + random_size. Here random_size is filling up the gap to make it fit for size specified for fixed allocation scheme.
uchar** adata = alignPtr((uchar**)udata + 1, CV_MALLOC_ALIGN);
This is trying to align pointer to specific boundary as explained above.
It allocates a block a bit bigger than it was asked for.
Then it sets adata to the address of the next properly allocated byte (add one byte, then round up to the next properly aligned address).
Then it stores the original pointer before the new address. I assume this is later used to free the originally allocated block.
And then we return the new address.
This only makes sense if CV_MALLOC_ALIGN is a stricter alignment than malloc guarantees - perhaps a cache line?
Has anyone encountered a maximum size for QList?
I have a QList of pointers to my objects and have found that it silently throws an error when it reaches the 268,435,455th item, which is exactly 28 bits. I would have expected it to have at least a 31bit maximum size (minus one bit because size() returns a signed integer), or a 63bit maximum size on my 64bit computer, but this doesn't appear to be the case. I have confirmed this in a minimal example by executing QList<void*> mylist; mylist.append(0); in a counting loop.
To restate the question, what is the actual maximum size of QList? If it's not actually 2^32-1 then why? Is there a workaround?
I'm running a Windows 64bit build of Qt 4.8.5 for MSVC2010.
While the other answers make a useful attempt at explaining the problem, none of them actually answer the question or missed the point. Thanks to everyone for helping me track down the issue.
As Ali Mofrad mentioned, the error thrown is a std::bad_alloc error when the QList fails to allocate additional space in my QList::append(MyObject*) call. Here's where that happens in the Qt source code:
qlist.cpp: line 62:
static int grow(int size) //size = 268435456
{
//this is the problem line
volatile int x = qAllocMore(size * sizeof(void *), QListData::DataHeaderSize) / sizeof(void *);
return x; //x = -2147483648
}
qlist.cpp: line 231:
void **QListData::append(int n) //n = 1
{
Q_ASSERT(d->ref == 1);
int e = d->end;
if (e + n > d->alloc) {
int b = d->begin;
if (b - n >= 2 * d->alloc / 3) {
//...
} else {
realloc(grow(d->alloc + n)); //<-- grow() is called here
}
}
d->end = e + n;
return d->array + e;
}
In grow(), the new size requested (268,435,456) is multiplied by sizeof(void*) (8) to compute the size of the new block of memory to accommodate the growing QList. The problem is, 268435456*8 equals +2,147,483,648 if it's an unsigned int32, or -2,147,483,648 for a signed int32, which is what's getting returned from grow() on my OS. Therefore, when std::realloc() is called in QListData::realloc(int), we're trying to grow to a negative size.
The workaround here, as ddriver suggested, is to use QList::reserve() to pre-allocate the space, preventing my QList from ever having to grow.
In short, the maximum size for QList is 2^28-1 items unless you pre-allocate, in which case the maximum size truly is 2^31-1 as expected.
Update (Jan 2020): This appears to have changed in Qt 5.5, such that 2^28-1 is now the maximum size allowed for QList and QVector, regardless of whether or not you reserve in advance. A shame.
Has anyone encountered a maximum size for QList? I have a QList of pointers to my objects and have found that it silently throws an error when it reaches the 268,435,455th item, which is exactly 28 bits. I would have expected it to have at least a 31bit maximum size (minus one bit because size() returns a signed integer), or a 63bit maximum size on my 64bit computer, but this doesn't appear to be the case.
Theoretical maximum positive number stored in int is 2^31 - 1. Size of pointer is 4 bytes (for 32bit machine), so maximum possible number of them is 2^29 - 1. Appending data to the container will increases fragmentation of heap memory, so there is possible that you can allocate only half of possible memory. Try use reserve() or resize() instead.
Moreover, Win32 has some limits for memory allocation. So application compiled without special options cannot allocate more than this limit (1G or 2G).
Are you sure about this huge containers? Is it better to optimize application?
QList stores its elements in a void * array.
Hence, a list with 228 items, of which each one is a void *, will be 230 bytes long on a 32 bit machine, and 231 bytes on a 64 bit machine. I doubt you can request such a big chunk of contiguous memory.
And why allocating such a huge list anyhow? Are you sure you really need it?
The idea of be backed by an array of void * elements is because several operations on the list can be moved to non-templated code, therefore reducing the amount of generated code.
QList stores items straight in the void * array if the type is small enough (i.e. sizeof(T) <= sizeof(void*)), and if the type can be moved in memory via memmove. Otherwise, each item will be allocated on the heap via new, and the array will store the pointers to those items. A set of type traits is used to figure out how to handle each type, see Q_DECLARE_TYPEINFO.
While in theory this approach may sound attractive, in practice:
For all primitive types smaller than void * (char; int and float on 64 bit; etc.) you waste from 50 to 75% of the allocated space in the array
For all movable types bigger than void * (double on 32bit, QVariant, ...), you pay a heap allocation per each item in the list (plus the array itself)
QList code is generally less optimized than QVector one
Compilers these days do a pretty good job at merging template instantiations, hence the original reason for this design gets lost.
Today it's a much better idea to stick with QVector. Unfortunately the Qt APIs expose QList everywhere and can't change them (and we need C++11 to define QList as a template alias for QVector...)
I test this in Ubuntu 32bit with 4GB RAM using qt4.8.6. Maximum size for me is 268,435,450
I test this in Windows7 32bit with 4GB RAM using qt4.8.4. Maximum size for me is 134,217,722
This error happend : 'std::bad_alloc'
#include <QCoreApplication>
#include <QDebug>
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
QList<bool> li;
for(int i=0; ;i++)
{
li.append(true);
if(i>268435449)
qDebug()<<i;
}
return a.exec();
}
Output is :
268435450
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc