Allocate several GBs of memory for std::vector - c++

I need to acquire several GB of data from a sensor. When I tried to allocate a big array with malloc (10 or more GB. My system has 32GB) it returns NULL. So I thought the problem could be solved with a linked list of iterators to vectors.
However I don't know how to set this up. I tried declaring "
list< vector::iterator >" but I can't allocate the memory for each vector (e/o should have 1000~2000 elements). Do you know any way to do this or maybe a better solution for this big memory allocation?

If you are using a 64-bit operating system, then malloc should be able to allocate the large size with no problem.
For example, this code runs on my windows machine (64-bit windows) and allocates 10GB of ram flawlessly:
#include <stdio.h>
#include <malloc.h>
int main(int argc, char *argv[]) {
long int size = 10L * 1024 * 1024 * 1024;
printf("size = %ld\n", size);
char *x = (char *)malloc(size);
printf("x = 0x%lx\n", x);
long int i;
for (i = 0; i < size; i += 1024*1024) {
x[i] = 'h';
}
printf("Done1\n");
}
However, if you have a 32-bit operating system, you'll be in trouble, and can't allocate over some limit (maybe 3 GB, but probably system dependent)
In that case, you'll need to write your data to a file instead.
However, if you're using a fat filesystem, then you can't write to a file that big either. In that case, you'd have to split the data among many files under 2gb in size.
You'd want to actually check the malloc result for NULL to make sure the malloc works and memory could be grabbed.

You will need to allocation this space under Windows 64 bit OS. You will ALSO have to set "large address space aware" flag, otherwise you can only get 2 GB of RAM due to how the virtual memory system works on Windows.
You may want to look into using a memory mapped file, as suggested by sehe in his answer if you do not absolutely have to have one large 10 GB chunk of continuous memory. If you have to build your application for Windows 32 bit, then this will be the only answer, as Windows 32 bit normally only allows for 2 GB of memory, unless the option is set for "large address space aware" flag, at which point it will allow 3 GB of memory usage.

When you have to deal with large blocks of memory, you are better off skipping malloc altogether and going directly to the operating system calls for memory allocation.

I usually move to memory mapped files or shared memory maps for this kind of data volumes.
This way, you're not bound to the amount of physical (process) memory available at all. You can let the OS page in and out as required. Fragmentation becomes much less of an issue (unless you actually fragment the logical address space, which is quite hard to achieve on 64 bit architectures).
More information
I have quite a number of answers on SO that show examples of storing vectors and all manner of more complicated data structures in shared memory/mapped files. You might want to look for mapped_file_device (from Boost Iostreams) or managed_shared_memory and managed_mapped_file (from Boost Interprocess)

Related

Growing memory during assignment of large array

When assigning values to a large array the used memory keeps increasing even though no new memory is allocated. I am checking the used memory simply by the task manager (windows) or system monitor (Ubuntu).
The Problem is the same on both OS. I am using gcc 4.7 or 4.6 respectively.
This is my code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int i,j;
int n=40000000; //array size
int s=100;
double *array;
array=malloc(n*sizeof(double)); //allocate array
if(array==NULL){
return -1;
}
for(i=0;i<n;i++){ //loop for array, memory increases during this loop
for(j=0;j<s;j++){ //loop to slow down the program
array[i] = 3.0;
}
}
return 0;
}
I do not see any logical Problem, but to my knowledge I do not exceed any system limits either. So my questions are:
can the problem be reproduced by others?
what is the reason for the growing memory?
how do I solve this issue?
When modern systems 'allocate' memory, the pages are not actually allocated within physical RAM. You will get a virtual memory allocation. As you write to those pages, a physical page will be taken. So the virtual RAM taken will be increased when you do the malloc(), but only when you write the value in will the physical RAM be taken (on a page by page basis).
You should see the virtual memory used increase immediately. After that the RSS, or real memory used will increment as you write into the newly allocated memory. More information at How to measure actual memory usage of an application or process?
This is because memory allocated in Linux and on many other operating systems, isn't actually given to your program until you use it.
So you could malloc 1 GB on a 256 MB machine, and not run out of memory until you actually tried to use all 1 GB.
In Linux there is a group of overcommit settings which changes this behavior. See Cent OS: How do I turn off or reduce memory overcommitment, and is it safe to do it?

Allocating more memory than there exists using malloc

This code snippet will allocate 2Gb every time it reads the letter 'u' from stdin, and will initialize all the allocated chars once it reads 'a'.
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#define bytes 2147483648
using namespace std;
int main()
{
char input [1];
vector<char *> activate;
while(input[0] != 'q')
{
gets (input);
if(input[0] == 'u')
{
char *m = (char*)malloc(bytes);
if(m == NULL) cout << "cant allocate mem" << endl;
else cout << "ok" << endl;
activate.push_back(m);
}
else if(input[0] == 'a')
{
for(int x = 0; x < activate.size(); x++)
{
char *m;
m = activate[x];
for(unsigned x = 0; x < bytes; x++)
{
m[x] = 'a';
}
}
}
}
return 0;
}
I am running this code on a linux virtual machine that has 3Gb of ram. While monitoring the system resource usage using the htop tool, I have realized that the malloc operation is not reflected on the resources.
For example when I input 'u' only once(i.e. allocate 2GB of heap memory), I don't see the memory usage increasing by 2GB in htop. It is only when I input 'a'(i.e. initialize), I see the memory usage increasing.
As a consequence, I am able to "malloc" more heap memory than there exists. For example, I can malloc 6GB(which is more than my ram and swap memory) and malloc would allow it(i.e. NULL is not returned by malloc). But when I try to initialize the allocated memory, I can see the memory and swap memory filling up till the process is killed.
-My questions:
1.Is this a kernel bug?
2.Can someone explain to me why this behavior is allowed?
It is called memory overcommit. You can disable it by running as root:
echo 2 > /proc/sys/vm/overcommit_memory
and it is not a kernel feature that I like (so I always disable it). See malloc(3) and mmap(2) and proc(5)
NB: echo 0 instead of echo 2 often -but not always- works also. Read the docs (in particular proc man page that I just linked to).
from man malloc (online here):
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
So when you just want to allocate too much, it "lies" to you, when you want to use the allocated memory, it will try to find enough memory for you and it might crash if it can't find enough memory.
No, this is not a kernel bug. You have discovered something known as late paging (or overcommit).
Until you write a byte to the address allocated with malloc (...) the kernel does little more than "reserve" the address range. This really depends on the implementation of your memory allocator and operating system of course, but most good ones do not incur the majority of kernel overhead until the memory is first used.
The hoard allocator is one big offender that comes to mind immediately, through extensive testing I have found it almost never takes advantage of a kernel that supports late paging. You can always mitigate the effects of late paging in any allocator if you zero-fill the entire memory range immediately after allocation.
Real-time operating systems like VxWorks will never allow this behavior because late paging introduces serious latency. Technically, all it does is put the latency off until a later indeterminate time.
For a more detailed discussion, you may be interested to see how IBM's AIX operating system handles page allocation and overcommitment.
This is a result of what Basile mentioned, over commit memory. However, the explanation kind of interesting.
Basically when you attempt to map additional memory in Linux (POSIX?), the kernel will just reserve it, and will only actually end up using it if your application accesses one of the reserved pages. This allows multiple applications to reserve more than the actual total amount of ram / swap.
This is desirable behavior on most Linux environments unless you've got a real-time OS or something where you know exactly who will need what resources, when and why.
Otherwise somebody could come along, malloc up all the ram (without actually doing anything with it) and OOM your apps.
Another example of this lazy allocation is mmap(), where you have a virtual map that the file you're mapping can fit inside - but you only have a small amount of real memory dedicated to the effort. This allows you to mmap() huge files (larger than your available RAM), and use them like normal file handles which is nifty)
-n
Initializing / working with the memory should work:
memset(m, 0, bytes);
Also you could use calloc that not only allocates memory but also fills it with zeros for you:
char* m = (char*) calloc(1, bytes);
1.Is this a kernel bug?
No.
2.Can someone explain to me why this behavior is allowed?
There are a few reasons:
Mitigate need to know eventual memory requirement - it's often convenient to have an application be able to an amount of memory that it considers an upper limit on the need it might actually have. For example, if it's preparing some kind of report either of an initial pass just to calculate the eventual size of the report or a realloc() of successively larger areas (with the risk of having to copy) may significantly complicate the code and hurt performance, where-as multiplying some maximum length of each entry by the number of entries could be very quick and easy. If you know virtual memory is relatively plentiful as far as your application's needs are concerned, then making a larger allocation of virtual address space is very cheap.
Sparse data - if you have the virtual address space spare, being able to have a sparse array and use direct indexing, or allocate a hash table with generous capacity() to size() ratio, can lead to a very high performance system. Both work best (in the sense of having low overheads/waste and efficient use of memory caches) when the data element size is a multiple of the memory paging size, or failing that much larger or a small integral fraction thereof.
Resource sharing - consider an ISP offering a "1 giga-bit per second" connection to 1000 consumers in a building - they know that if all the consumers use it simultaneously they'll get about 1 mega-bit, but rely on their real-world experience that, though people ask for 1 giga-bit and want a good fraction of it at specific times, there's inevitably some lower maximum and much lower average for concurrent usage. The same insight applied to memory allows operating systems to support more applications than they otherwise would, with reasonable average success at satisfying expectations. Much as the shared Internet connection degrades in speed as more users make simultaneous demands, paging from swap memory on disk may kick in and reduce performance. But unlike an internet connection, there's a limit to the swap memory, and if all the apps really do try to use the memory concurrently such that that limit's exceeded, some will start getting signals/interrupts/traps reporting memory exhaustion. Summarily, with this memory overcommit behaviour enabled, simply checking malloc()/new returned a non-NULL pointer is not sufficient to guarantee the physical memory is actually available, and the program may still receive a signal later as it attempts to use the memory.

How to large memory blocks Windows 7

I am trying to allocate a large block (5GB) of memory under Windows 7 Professional. I have a 64-bit machine and 16GB RAM and I'm using MS Visual Studio 10. For those of you who might ask why - its because I need to hold a 2-D raster representation of a map of reference numbers to polygon data, the maps can be up to 40,000 x 40,000 units. This has to go into RAM and fragmenting it into smaller blocks would be too expensive in runtimes.
So if I do this:
int _tmain(int argc, _TCHAR* argv[])
{
int t = INT_MAX;
int * test = new int[t/6];
delete test;
return 0;
}
The call to new fails, but
int * test = new int[t/7]; succeeds.
Investigating a bit more I found that the memory allocation is only using the memory which is designated as 'free' by the resource monitor. So when this is smaller than the allocation requested the allocation fails. The resource monitor tells me that (when I looked) I had ~5GB in use, ~10GB on standby, and a little over 1GB free.
As far as I understand it this should not happen. Surely if memory is requested it should be taken from the standby memory? If this is not the case then is there a way to reduce the amount of standby memory used by windows from inside C++?
The latter can be done outside C++ using RAMMap as I discovered from this post: Clear the windows 7 standby memory programmatically but unfortunately there was no useful answer to the question of programatic clearing. Perhaps I am luckier in C++.
Of course the more likely scenario that I am just missing something obvious.
Thanks
Remember that when you allocate on the heap, that memory will have to be a contiguous block. If the memory is fragmented, no block big enough to hold the allocation may be awailable, even though the total of free memory is much larger than what you request.
Memory fragmentation might be one of the issues: you could have that amount of memory free, but it is not continuous.
A better approach would be to allocate rather big chunks of memory (several megabytes) and link-list them. The probability that you will find space for such a chunk (and thus every of the chunks) is way higher than that you'll find continuous space for several gigabytes.
As of performance: as long as you are working on exactly one chunk, you will lose no speed as data remains in the CPU cache. If you switch often between two chunks (e.g. you swap items between them), you will get cache misses.
Anyways any workarounds like clearing the memory are like poker: you might get a chunk then or not. It depends on too many factors you can't control.

why does dynamic memory allocation fail after 600MB?

i implemented a bloom filter(bit table) using three dimension char array it works well until it reaches at a point where it can no more allocate memory and gives a bad_alloc message. It gives me this error on the next expand request after allocating 600MB.
The bloom filter(the array) is expected to grow as big as 8 to 10GB.
Here is the code i used to allocate(expand) the bit table.
unsigned char ***bit_table_=0;
unsigned int ROWS_old=5;
unsigned int EXPND_SIZE=5;
void expand_bit_table()
{
FILE *temp;
temp=fopen("chunk_temp","w+b");
//copy old content
for(int i=0;i<ROWS_old;++i)
for(int j=0;j<ROWS;++j)
fwrite(bit_table_[i][j],COLUMNS,1,temp);
fclose(temp);
//delete old table
chunk_delete_bit_table();
//create expanded bit table ==> add EXP_SIZE more rows
bit_table_=new unsigned char**[ROWS_old+EXPND_SIZE];
for(int i=0;i<ROWS_old+EXPND_SIZE;++i)
{
bit_table_[i]=new unsigned char*[ROWS];
for(int k=0;k<ROWS;++k)
bit_table_[i][k]=new unsigned char[COLUMNS];
}
//copy back old content
temp=fopen("chunk_temp","r+b");
for(int i=0;i<ROWS_old;++i)
{
fread(bit_table_[i],COLUMNS*ROWS,1,temp);
}
fclose(temp);
//set remaining content of bit_table_to 0
for(int i=ROWS_old;i<ROWS_old+EXPND_SIZE;++i)
for(int j=0;j<ROWS;++j)
for(int k=0;k<COLUMNS;++k)
bit_table_[i][j][k]=0;
ROWS_old+=EXPND_SIZE;
}
What is the maximum allowable size for an array and if this is not the issue what can i do about it.
EDIT:
It is developed using a 32 bit platform.
It is run on 64 bit platform(server) with 8GB RAM.
A 32-bit program must allocate memory from the virtual memory address space. Which stores chunks of code and data, memory is allocated from the holes between them. Yes, the maximum you can hope for is around 650 megabytes, the largest available hole. That goes rapidly down from there. You can solve it by making your data structure smarter, like a tree or list instead of one giant array.
You can get more insight in the virtual memory map of your process with the SysInternals' VMMap utility. You might be able to change the base address of a DLL so it doesn't sit plumb in the middle of an otherwise empty region of the address space. Odds that you'll get much beyond 650 MB are however poor.
There's a lot more breathing room on a 64-bit operating system, a 32-bit process has a 4 gigabyte address space since the operating system components run in 64-bit mode. You have to use the /LARGEADDRESSAWARE linker option to allow the process to use it all. Still, that only works on a 64-bit OS, your program is still likely to bomb on a 32-bit OS. When you really need that much VM, the simplest approach is to just make a 64-bit OS a prerequisite and build your program targeting x64.
A 32-bit machine gives you a 4GB address space.
The OS reserves some of this (half of it by default on Windows, giving you 2GB to yourself. I'm not sure about Linux, but I believe it reserves 1GB)
This means you have 2-3 GB to your own process.
Into this space, several things need to fit:
your executable (as well as all dynamically linked libraries) are memory-mapped into it
each thread needs a stack
the heap
and quite a few other nitty gritty bits.
The point is that it doesn't really matter how much memory you end up actually using. But a lot of different pieces have to fit into this memory space. And since they're not packed tightly into one end of it, they fragment the memory space. Imagine, for simplicity, that your executable is mapped into the middle of this memory space. That splits your 3GB into two 1.5GB chunks. Now say you load two dynamic libraries, and they subdivide those two chunks into four 750MB ones. Then you have a couple of threads, each needing further chunks of memory, splitting up the remaining areas further. Of course, in reality each of these won't be placed at the exact center of each contiguous block (that'd be a pretty stupid allocation strategy), but nevertheless, all these chunks of memory subdivide the available memory space, cutting it up into many smaller pieces.
You might have 600MB memory free, but you very likely won't have 600MB of contiguous memory available. So where a single 600MB allocation would almost certainly fail, six 100MB allocations may succeed.
There's no fixed limit on how big a chunk of memory you can allocate. The answer is "it depends". It depends on the precise layout of your process' memory space. But on a 32-bit machine, you're unlikely to be able to allocate 500MB or more in a single allocation.
The maximum in-memory data a 32-bit process can access is 4GB in theory (in practice it will be somewhat smaller). So you cannot have 10GB data in memory at once (even with the OS supporting more). Also, even though you are allocating the memory dynamically, the free store available is further limited by the stack size.
The actual memory available to the process depends on the compiler settings that generates the executable.
If you really do need that much, consider persisting (parts of) the data in the file system.

C++ Array size x86 and for x64

Simple question, I'm writting a program that needs to open huge image files (8kx8k) but I'm a little bit confused on how to initialize the huge arrays to hold the images in c++.
I been trying something like this:
long long SIZE = 8092*8092; ///8096*8096
double* array;
array = (double*) malloc(sizeof(double) * SIZE);
if (array == NULL)
{
fprintf(stderr,"Could not allocate that much memory");
}
But sometimes my NULL check does not catch that the array was not initialized, any idea why?
Also I can't initialize more that 2 or 3 arrays, even when running in a x64 machine with 12 GB of RAM, any idea why?
I would really wish not to have to work with sections of array instead. Any help is welcome.
Thanks.
You're not running into an array size problem. 8K*8K is merely 64M. Even 64M doubles (sizeof==8) are not an issue; that would require a mere 512 MB. Now, a 32 bit application (no matter where it's running) should be able to allocate a few of them. Not 8, because the OS typically needs to reserve some space for itself (often slightly over 2GB) and sometimes not even 3 when memory is fragmented.
The behavior of "malloc failed but didn't return NULL" is a Linux configuration bug, fixed by # echo 2 > /proc/sys/vm/overcommit_memory
malloc() does not initialize memory, it just reserves it. You will have to initialize it explicitly, e.g. via memset() from string.h:
array = (double*) malloc(SIZE * sizeof(double));
if (array) memset(array, 0, SIZE * sizeof(double));
However, in C++ you should use new instead of malloc:
double* array = new double[SIZE];
if (!array) {
cerr << "Could not allocate that much memory" << endl;
}
for (int i=0; i<SIZE; i++) array[i] = 0.0;
Regarding size: each such array is 512 MB. Are you positively sure you need double precision (which means the image has 64-bit pixel depth)? Maybe a float would suffice? That would halve the memory footprint.
You might be running into a 2GB per-process address space limit if you are running a 32bit operating system. With a few hundred MBs of system libs and other stuff, and 2 or 3 arrays of 512MB each, that will give 2GB easily. A 64bit OS would help you there.
Are you compiling your application as a 32-bit application (the default in Visual Studio, if that's what you're using), or as a 64-bit application? You shouldn't have troubles if you build it as a 64-bit app.
malloc allocates (reserves memory and returns a pointer), calloc initializes (writes all zeros to that memory).
Seems to be that you have no continuous memory block of such size (~500Mb) in C runtime heap. Instead of copying file into memory try to map image into a processes address space. You could map only necessary parts of the file.
Just as a side note: although you don't want to bother about the whole image not being in memory at once, there are reasons not to do it. Maybe think about an abstraction that allows you to keep only the currently needed chunk in memory. The program code then can be written as though ignorant of the memory issues.
I would really wish not to have to work with sections of array instead. Any help is welcome.
Have you looked into memory-mapped files?
Yep, sounds a lot like heap fragmentation, as Kirill pointed out. See also: How to avoid heap fragmentation?
i suggest using compression. decompress part of it which you need to process in your code whenever, and compress it after the part done.
2nd proposal: write code to overload memory pointer "operator+" and "operator-" so you could use non-continuous memory buffers. use smaller memory buffers make your code more stable than a continuous larger one. i had experienced it and had written some operator-overloading, see http://code.google.com/p/effoaddon/source/browse/trunk/devel/effo/codebase/addons/mem/include/mcur_i.h for the example. when i test 47G malloc()ed system memory on a x86_64, i allocated just 1G per malloc() call, so i allocated 47 memory blocks in total. EDIT: while if i tried to allocate as much as possible by using just one malloc(), i would only get 30G on a 48G system, say less than 70%, that's because larger buffer per malloc() requested, much more managemental memory consumed by the system/libc itself, you know, I called mlock() to prevent the allocated memory from being swapped out to the disk.
3rd one: try posix file mapping, map to memory per image.
Btw: call malloc() is more stable than new() though writing c++, because when memory got stressed, new() is prone to trow exceptions instead of returning NULL.