C++ allocating large array on heap gives "out of memory exception" - c++

I am currently having a problem with declaring or filling a large array with data because I get a dialog box saying "Out of memory", originating from CMemoryException.
I am trying to create an array or vector (tried both) with around 50000 elements of an object, where sizeof(MyObjectClass) returns around 37000 bytes.
If I try to just fill up a vector or a CArray element by element, then I get around to filling with somewhere near 16000 elements before getting the Out Of Memory exception. That should be close to 600MBs?
I have 8GB RAM on the machine and only 4GB are being used according to Windows Task Manager. So the amount of physical RAM should not impose a problem. I am running C++ MFC in Visual Studio 2010, 32-bit.
Also if I try to write
MyObjectClass* heaparray = new MyObjectClass[50000];
then I immediately get that very same Out of memory error, on that very row.
Any ideas?
Thank You in advance!
UPDATE:
I have also tried to simply create a TestStruct with the fields:
struct TestStruct
{
long long field1;
GUID field2;
GUID field3;
GUID field4;
TCHAR field5[256];
TCHAR field6[4];
TCHAR field7[258];
TCHAR field8[1026];
TCHAR field9[258];
TCHAR field10[16386];
TCHAR field11[258];
};
TestStruct* heapArr = new TestStruct[50000];
Still the same...I get a "Out of Memory" exception when executing the last line of code.
Isn't one of the great things with the heap supposed to be possibility to be limited only by RAM (more or less) when handling big data. And yet...since it crashes already at 600MB of allocated space I cannot agree that that is very big data either...or should I? :/

This is a fun one. Both Vectors and arrays are stored contiguously in memory as stated here.
You are not only looking for 1850000000 bytes (1.72295 gigabytes) in memory, but one unbroken chunk of memory that big. That will be hard to find. If you switch to a different data structure that does not do contiguous storage (say a linked list) then you may be able to store that much.
Note: that will also make each object just a bit bigger.
What would be best would be to see if there is any way to just buffer the objects; load only the ones you will update and load the others on the fly when you need them. I have my doubts that you are doing cpu operations on more than one at a time. If you do it right (with threading most likely) you won't even suffer any slows from reading/writing them.
More information about what you are working on would be helpful. There may even be a way to just have an array filled with a type identifier, if your object has less than 2,147,483,647 (size of int) variations. You could store an array of integers that the class could be generated from (a toHash and fromHash that would be 50000 * 4 bytes = 195.312 kilobytes), that may work for you too. Again, it depends on what you are working on.

I will try to expand on #user1884803's answer:
Don't use a pointer to an array. Even Visual Studio 2010 has <vector>. But see next point.
Don't use a vector either... Specially if you really want to read all your MyObjectClass objects in RAM. As the other answer said, even if you have 4Gbytes free, you probably don't have 1.7Gbytes of contiguous free memory.
So, if you really, really, want to read all your objects in RAM (because the processing you want to do on them is non-linear, or needs many records at the same time in memory), use a std::list<MyObjectClass> or, if you need a "key" to access each record, use a std::map<KeyType, MyObjectClass>. BUT...
You really should try not reading 1.8Gbytes of objects to RAM. Even if you have that much RAM lying around unused, it's just not a good practice. If you can, read each object from the database, process it, and write it back to the database discarding the used object, not accumulating the whole thing in RAM. If you need and if it improves your speed, you can save part of it in a std::list, std::map, or even in a std::vector, and on demand refresh other parts of the objects from the database.
That way, your program would go from:
if( cmd.Open() ) {
do {
MyObjectClass obj = cmd.Read(); // whatever is needed to read the object from the db
vectorOfObjects.push_back(obj); // or list, or map...
} while( cmd.MoveNext() );
}
for( std::vector<MyObjectClass>::iterator p = vectorOfObjects.begin(), e = vectorOfObjects.end(); p != e; ++p ) {
// process *p
}
for( std::vector<MyObjectClass>::iterator p = vectorOfObjects.begin(), e = vectorOfObjects.end(); p != e; ++p ) {
cmd.Save(*p); // see reading above, but for saving...
}
to something like
if( cmd.Open() ) {
do {
MyObjectClass obj = cmd.Read();
// JUST PROCESS obj here and go to next
cmd.Save(obj); // or whatever
} while( cmd.MoveNext() );
}

Related

Why a pointer to a class take less memory SRAM than a "classic" variable

i have a Arduino Micro with 3 time of flight LIDAR micro sensors welded to it. In my code i was creating 3 Global variable like this:
Adafruit_VL53L0X lox0 = Adafruit_VL53L0X();
Adafruit_VL53L0X lox1 = Adafruit_VL53L0X();
Adafruit_VL53L0X lox2 = Adafruit_VL53L0X();
And it took like ~80% of the memory
Now i am creating my objects like this
Adafruit_VL53L0X *lox_array[3] = {new Adafruit_VL53L0X(), new Adafruit_VL53L0X(), new Adafruit_VL53L0X()};
And it take 30% of my entire program
I Try to look on arduino documentation but i don't find anything that can help me.
I can understand that creating a "classic" object can fill the memory. But where is the memory zone located when the pointer is create ?
You use the same amount of memory either way. (Actually, the second way uses a tiny bit more, because the pointers need to be stored as well.)
It's just that with the first way, the memory is already allocated statically from the start and part of the data size of your program, so your compiler can tell you about it, while with the second way, the memory is allocated at runtime dynamically (on the heap), so your compiler doesn't know about it up front.
I dare say that the second method is more dangerous, because consider the following scenario: Let's assume your other code and data already uses 90% of the memory at compile-time. If you use your first method, you will fail to upload the program because it would now use something like 150%, so you already know it won't work. But if you use your second method, your program will compile and upload just fine, but then crash when trying to allocate the extra memory at runtime.
(By the way, the compiler message is a bit incomplete. It should rather say "leaving 1750 bytes for local variables and dynamically allocated objects" or something along those lines).
You can check it yourself using this function which allows you to estimate the amount of free memory at runtime (by comparing the top of the heap with the bottom [physically, not logically] of the stack, the latter being achieved by looking at the address of a local variable which would have been allocated at the stack at that point):
int freeRam () {
extern int __heap_start, *__brkval;
int v;
return (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval);
}
See also: https://playground.arduino.cc/Code/AvailableMemory/

Crash when infinitely expanding a deque instead of a vector

when running this code:
#include <iostream>
#include <vector>
#include <deque>
template< typename C >
void fillToMax( C & collection, typename C::value_type value )
{
try
{
while( true )
collection.push_back( value );
}
catch( std::bad_alloc const& )
{
std::cout << "bad alloc with size " << collection.size() << std::endl;
}
return;
}
void fillVector()
{
std::vector<long> vecL;
fillToMax( vecL, 123 );
}
void fillDeque()
{
std::deque<long> deqL;
fillToMax( deqL, 123 );
}
int main()
{
fillVector();
fillDeque();
}
I get an expected bad_alloc error, therefore that is easy to try/catch.
The problem is when I substitute vector with deque, in this case my machine just crashes... blackscreen, reboots and when up again claims: you had an unexpected problem!
I would like to use deque instead of vector to store a larger amount of items without the issue of contiguous space. This will enable me to store more data but I cannot afford for my application to crash and would like to know how I can get this to bad_alloc instead.
Is this possible?
My tests use MinGW-W64 - gcc version 4.8.2 (x86_64-posix-seh-rev4) on win8.1
You don't say what system you're using, so it's hard to say, but
some systems "overcommit", which basically makes a conforming
implementation of C++ (or even C) impossible; the system will
say that there is memory available when there isn't, and crash
when you try to use it. Linux is the most widely documented
culprit here, but you can reconfigure it to work correctly.
The reason you get bad_alloc with vector is because vector
allocates much larger chunks. And even with overcommit, the
system will refuse to allocate memory if the chunk is too big.
Also, many mallocs will use a different allocation strategy for
very large chunks; IIRC, the malloc in Linux switches to using
mmap beyond a certain size, and the system may refuse a mmap
even when an sbrk would have succeeded.
The fast answer of why vector might crash and not deque is that because vector uses a contiguous buffer you'll bad_alloc "quicker". And also on a request that asks for a large chunk.
Why? Because it is less likely that you will be able to allocate a contiguous buffer than a smaller one.
vector will allocate a certain amount and then try a big "realloc" for a bigger buffer. It might be possible to extend the current memory space but it might not, and may need to find a whole new chunk of memory.
Let's say it looks to expand by a factor of 1.5. So you currently have 40% of the memory available in your vector in use and it needs to find 60% of the memory available but cannot do it at the current location. Well that takes you to the limit so it fails with bad_alloc but in reality you are only using 40% of the memory.
So in reality there is memory available and those operating systems that use "optimistic" memory allocation will not accidentally over-allocate for you. You've asked for a lot and it couldn't give it to you. (They are not always totally optimistic).
deque on the other hand asks for a chunk at a time. You will really use up your memory and as a result it's better to use for large collections, however it has the downside that when you run out of memory you really do run out. And your lovely optimistic memory allocator cannot handle it and your process dies. (It kills something to make more memory. Sadly it was yours).
Now for your solution of how to avoid it happening? Your answer might be a custom allocator, i.e. the 2nd parameter of deque, which could check the real system memory available and refuse to allocate if you have hit a certain threshold.
Of course it is system dependent but you could have different versions for different machines.
You could also set your own arbitrary "limit", of course.
Assuming your system is Linux, you might be able to turn overcommit off with
'echo 2 > /proc/sys/vm/overcommit_memory'
You would need root (admin) permissions to do that. (Or get someone who has it to configure it that way).
Otherwise, other ways to examine the memory usage are available in the Linux manuals, usually referred to in /proc.
If your system isn't Linux but another that over-commits, you'll have to look up how you can by-pass it by writing your own memory manager. Otherwise take the simpler option of an arbitrary configurable maximum size.
Remember that with deque your allocator will only be invoked when you need to allocate a new "chunk" and not for every push_back.
Xarylem, just attempting to answer "how can I prevent this" here...
you know something that throws bad_alloc - std::vector.
you know something that crashes... std::deque.
So one way would be to create a new vector of size X, if that succeeds, clear the vector and push back X more into the deque. If it doesn't, you know you're walking into a quagmire. Something like:
std::vector<int> testVector;
testeVector.reserve(1);
std::deque<int> actualDequeToFill;
for(size_t i = 0; ; ++i)
{
//test first
bool haveSpace = false;
try { testVector.reserve(2); } catch(...) { haveSpace = false; }
vector.reserve(1);
if (!haveSpace) throw new std::bad_alloc("Vector shows no space left");
deque.push_back(something);
}
This isn't anywhere close to foolproof... so please use it as a possible idea for a workaround rather than as an implementation.
Now that that is aside, my best guess would be... your compiler is not compliant ... as I've mentioned in a comment, C++ requires deque::push_back to throw bad_alloc. If you can, move away from that compiler (this is basic stuff to get right)

c++ Alternative implementation to avoid shifting between RAM and SWAP memory

I have a program, that uses dynamic programming to calculate some information. The problem is, that theoretically the used memory grows exponentially. Some filters that I use limit this space, but for a big input they also can't avoid that my program runs out of RAM - Memory.
The program is running on 4 threads. When I run it with a really big input I noticed, that at some point the program starts to use the swap memory, because my RAM is not big enough. The consequence of this is, that my CPU-usage decreases from about 380% to 15% or lower.
There is only one variable that uses the memory which is the following datastructure:
Edit (added type) with CLN library:
class My_Map {
typedef std::pair<double,short> key;
typedef cln::cl_I value;
public:
tbb::concurrent_hash_map<key,value>* map;
My_Map() { map = new tbb::concurrent_hash_map<myType>(); }
~My_Map() { delete map; }
//some functions for operations on the map
};
In my main program I am using this datastructure as globale variable:
My_Map* container = new My_Map();
Question:
Is there a way to avoid the shifting of memory between SWAP and RAM? I thought pushing all the memory to the Heap would help, but it seems not to. So I don't know if it is possible to maybe fully use the swap memory or something else. Just this shifting of memory cost much time. The CPU usage decreases dramatically.
If you have 1 Gig of RAM and you have a program that uses up 2 Gb RAM, then you're going to have to find somewhere else to store the excess data.. obviously. The default OS way is to swap but the alternative is to manage your own 'swapping' by using a memory-mapped file.
You open a file and allocate a virtual memory block in it, then you bring pages of the file into RAM to work on. The OS manages this for you for the most part, but you should think about your memory usage so not to try to keep access to the same blocks while they're in memory if you can.
On Windows you use CreateFileMapping(), on Linux you use mmap(), on Mac you use mmap().
The OS is working properly - it doesn't distinguish between stack and heap when swapping - it pages you whatever you seem not to be using and loads whatever you ask for.
There are a few things you could try:
consider whether myType can be made smaller - e.g. using int8_t or even width-appropriate bitfields instead of int, using pointers to pooled strings instead of worst-case-length character arrays, use offsets into arrays where they're smaller than pointers etc.. If you show us the type maybe we can suggest things.
think about your paging - if you have many objects on one memory page (likely 4k) they will need to stay in memory if any one of them is being used, so try to get objects that will be used around the same time onto the same memory page - this may involve hashing to small arrays of related myType objects, or even moving all your data into a packed array if possible (binary searching can be pretty quick anyway). Naively used hash tables tend to flay memory because similar objects are put in completely unrelated buckets.
serialisation/deserialisation with compression is a possibility: instead of letting the OS swap out full myType memory, you may be able to proactively serialise them into a more compact form then deserialise them only when needed
consider whether you need to process all the data simultaneously... if you can batch up the work in such a way that you get all "group A" out of the way using less memory then you can move on to "group B"
UPDATE now you've posted your actual data types...
Sadly, using short might not help much because sizeof key needs to be 16 anyway for alignment of the double; if you don't need the precision, you could consider float? Another option would be to create an array of separate maps...
tbb::concurrent_hash_map<double,value> map[65536];
You can then index to map[my_short][my_double]. It could be better or worse, but is easy to try so you might as well benchmark....
For cl_I a 2-minute dig suggests the data's stored in a union - presumably word is used for small values and one of the pointers when necessary... that looks like a pretty good design - hard to improve on.
If numbers tend to repeat a lot (a big if) you could experiment with e.g. keeping a registry of big cl_Is with a bi-directional mapping to packed integer ids which you'd store in My_Map::map - fussy though. To explain, say you get 987123498723489 - you push_back it on a vector<cl_I>, then in a hash_map<cl_I, int> set [987123498723489 to that index (i.e. vector.size() - 1). Keep going as new numbers are encountered. You can always map from an int id back to a cl_I using direct indexing in the vector, and the other way is an O(1) amortised hash table lookup.

Big array C++ , vector no memory

I need huge array in C to store some data. The thing that i am working on is related to DNA sequencing. I am using Visual Studio 2013.
Firstly, I've tried with a global static variable like
static oligo SPECTRUM[C1][C2]
Where oligo structure contains eight integers, and C1 is 100000 and C2 500.
But visual said that the array is to large. Then I asked Google, and he said that's good idea to use vectors. So i switched to these by replacing code above with a code below
static std::vector<std::vector<oligo>> SPECTRUM;
It was said that is a nice thing to resize vector before using, so i did:
SPECTRUM.resize(C1);
for (int i = 0; i < C1; i++)
{
SPECTRUM[i].resize(C2);
}
but now I am having runtime exception throwed during execution of above code (resizing)
An unhandled exception of type 'System.Runtime.InteropServices.SEHException' occurred in ConsoleApplication1.exe
in file xmemory0. Visual shows the exception is throwed here
else if (((size_t)(-1) / sizeof (_Ty) < _Count)
|| (_Ptr = ::operator new(_Count * sizeof (_Ty))) == 0)
_Xbad_alloc(); // report no memory
I want you to know also, that I have 4 GB RAM avaiable on my computer, and I estimate that my program shouldn't use more then 1 GB RAM.
Each oligo will consume 32 bytes. That means that if C1 is "around 100k", and C2 is bigger than about 600, the array will consume an entire 2 GB.
First are you sure you need all that memory available in your heap(ram)?
-You can do you calculations in chunks, allocate a chunk work on it and free it.
-You can use a file to store all your data, and load chunks of the file for your calculations.
If you need many GB of memory, it's not good to allocate it all at once in the heap, you never know there will be enough left.
I doubt there is a simple solution to this problem, given the values that you are dealing with, you will need more memory or at the very least more address space (this is "the addressable region of memory"). The easiest solution would be to go with an OS that is 64-bit - you may also need to get more RAM, but the first step is to allow the processor to address all the locations in the matrix - and with 32 bits, your limit for C2 becomes around 600, if C1 is 100k. And that assumes there are absolutely no other usage of memory - which unfortunately isn't typically true. The first few megabytes are reserved to catch "null pointer", and then the code and stack has to live somewhere. Ultimately, 100k x 500 seems unlikely to fit, even if the total size allows this much.
The other option is to use a "sparse array". Often when working with large matrices, there is a common value that is in "most places", and only some positions in the large matrix has a "different value". In these cases, you can use a method where you check if the data is present, and if so, use the value, otherwise use the default. You can use for example std::map as the storage container, and use the find method to see if the data is present.
I would suggest to address the question other way.
Make a Linked list (refer to data structure concept) for each and every element of the array as Node and get it linked. An pointer would be sufficient for accessing current node.
Yes mechanism function has to be written for traversing linked list, but will help to create such big arrays in the current target operating system instead of shifting to 64 Bit.
You should thry this:
static oligo *spectrum[C1];
for(int i = 0; i < C2; ++i)
{
spectrum[i] = new oligo[C2];
if (spectrum[i] == nullptr)
{
fprintf(stderr, "failed to allocate the array for i=%d.\n", i);
fflush(stderr);
}
}
this will tell you, how much memory are you allowed to allocate and what is your memory limit.
There may be some linker option to control this limit...

Why is deleted memory unable to be reused

I am using C++ on Windows 7 with MSVC 9.0, and have also been able to test and reproduce on Windows XP SP3 with MSVC 9.0.
If I allocate 1 GB of 0.5 MB sized objects, when I delete them, everything is ok and behaves as expected. However if I allocate 1 GB of 0.25 MB sized objects when I delete them, the memory remains reserved (yellow in Address Space Monitor) and from then on will only be able to be used for allocations smaller than 0.25 MB.
This simple code will let you test both scenarios by changing which struct is typedef'd. After it has allocated and deleted the structs it will then allocate 1 GB of 1 MB char buffers to see if the char buffers will use the memory that the structs once occupied.
struct HalfMegStruct
{
HalfMegStruct():m_Next(0){}
/* return the number of objects needed to allocate one gig */
static int getIterations(){ return 2048; }
int m_Data[131071];
HalfMegStruct* m_Next;
};
struct QuarterMegStruct
{
QuarterMegStruct():m_Next(0){}
/* return the number of objects needed to allocate one gig */
static int getIterations(){ return 4096; }
int m_Data[65535];
QuarterMegStruct* m_Next;
};
// which struct to use
typedef QuarterMegStruct UseType;
int main()
{
UseType* first = new UseType;
UseType* current = first;
for ( int i = 0; i < UseType::getIterations(); ++i )
current = current->m_Next = new UseType;
while ( first->m_Next )
{
UseType* temp = first->m_Next;
delete first;
first = temp;
}
delete first;
for ( unsigned int i = 0; i < 1024; ++i )
// one meg buffer, i'm aware this is a leak but its for illustrative purposes.
new char[ 1048576 ];
return 0;
}
Below you can see my results from within Address Space Monitor. Let me stress that the only difference between these two end results is the size of the structs being allocated up to the 1 GB marker.
This seems like quite a serious problem to me, and one that many people could be suffering from and not even know it.
So is this by design or should this be considered a bug?
Can I make small deleted objects actually be free for use by larger allocations?
And more out of curiosity, does a Mac or a Linux machine suffer from the same problem?
I cannot positively state this is the case, but this does look like memory fragmentation (in one of its many forms). The allocator (malloc) might be keeping buckets of different sizes to enable fast allocation, after you release the memory, instead of directly giving it back to the OS it is keeping the buckets so that later allocations of the same size can be processed from the same memory. If this is the case, the memory would be available for further allocations of the same size.
This type of optimization, is usually disabled for big objects, as it requires reserving memory even if not in use. If the threshold is somewhere between your two sizes, that would explain the behavior.
Note that while you might see this as weird, in most programs (not test, but real life) the memory usage patterns are repeated: if you asked for 100k blocks once, it more often than not is the case that you will do it again. And keeping the memory reserved can improve performance and actually reduce fragmentation that would come from all requests being granted from the same bucket.
You can, if you want to invest some time, learn how your allocator works by analyzing the behavior. Write some tests, that will acquire size X, release it, then acquire size Y and then show the memory usage. Fix the value of X and play with Y. If the requests for both sizes are granted from the same buckets, you will not have reserved/unused memory (image on the left), while when the sizes are granted from different buckets you will see the effect on the image on the right.
I don't usually code for windows, and I don't even have Windows 7, so I cannot positively state that this is the case, but it does look like it.
I can confirm the same behaviour with g++ 4.4.0 under Windows 7, so it's not in the compiler. In fact, the program fails when getIterations() returns 3590 or more -- do you get the same cutoff? This looks like a bug in Windows system memory allocation. It's all very well for knowledgeable souls to talk about memory fragmentation, but everything got deleted here, so the observed behaviour definitely shouldn't happen.
Using your code I performed your test and got the same result. I suspect that David Rodríguez is right in this case.
I ran the test and had the same result as you. It seems there might be this "bucket" behaviour going on.
I tried two different tests too. Instead of allocating 1GB of data using 1MB buffers I allocated the same way as the memory was first allocated after deleting. The second test I allocated the half meg buffer cleaned up then allocated the quater meg buffer, adding up to 512MB for each. Both tests had the same memory result in the end, only 512 is allocated an no large chunk of reserved memory.
As David mentions, most applications tend to make allocation of the same size. One can see quite clearly why this could be a problem though.
Perhaps the solution to this is that if you are allocating many smaller objects in this way you would be better to allocate a large block of memory and manage it yourself. Then when you're done free the large block.
I spoke with some authorities on the subject (Greg, if you're out there, say hi ;D) and can confirm that what David is saying is basically right.
As the heap grows in the first pass of allocating ~0.25MB objects, the heap is reserving and committing memory. As the heap shrinks in the delete pass, it decommits at some pace but does not necessarily release the virtual address ranges it reserved in the allocation pass. In the last allocation pass, the 1MB allocations are bypassing the heap due to their size and thus begin to compete with the heap for VA.
Note that the heap is reserving the VA, not keeping it committed. VirtualAlloc and VirtualFree can help explain the different if you're curious. This fact doesn't solve the problem you ran into, which is that the process ran out of virtual address space.
This is a side-effect of the Low-Fragmentation Heap.
http://msdn.microsoft.com/en-us/library/aa366750(v=vs.85).aspx
You should try disabling it to see if that helps. Run against both GetProcessHeap and the CRT heap (and any other heaps you may have created).