I am interested to find out what is the memory limit for the automatic and dynamically-allocated variables, so I tested things like that:
int main() {
const int N = 1000000;
int a[N];
a[1] = 100;
}
I found the maximum N that would not incur a Segmentation fault is 2,600,000, about 10MB.
Then I tested dynamically allocated variables, like this:
int main() {
const int N = 1000000;
int* a = new int [N];
delete[] a;
}
I found that maximum N that would not throw an exception is about 730,000,000, that's about 3GB.
Now the question is, how is the 10MB limit (for automatic variables) and 3GB limit (for dynamically-allocated variables) determined. I assume it is related to my machine? Also, is there any way to increase the limit, in case I really need it?
Language mandates nothing. It's all implementation-defined.
Automatic variables usually go onto stack, and you can usually increase the maximum size via compiler options. Free store is usually heap, and is limited only by usable address space. Don't count on more than 2-3GB in 32-bit environment, the limit will be much higher in 64-bit environment. Of course, you won't be able to allocate all of the 64-bit address space, you'll hit the limit of available virtual memory (RAM + swap space).
The limit for automatic variables is the amount of memory allocated for the machine stack. 10MB is actually rather high; 1 or 2 MB is a more common default.
Obviously, the 3GB is the OS limit -- it's roughly the size of the process space allowed by the OS to a program. It'll vary widely by OS and hardware platform.
The 3 GB limit can probably be fixed by moving to a 64-bit OS (with plenty of RAM).
There's a reasonable chance (but no certainty) that the 10 MB limit can be adjusted with some linker flags.
Related
It would be efficient for some purposes to allocate a huge amount of virtual space, and page in only pages that are accessed. Allocating a large amount of memory is instantaneous and does not actually grab pages:
char* p = new char[1024*1024*1024*256];
Ok, the above was wrong as pointed out because it's a 32 bit number.
I expect that new is calling malloc which calls sbrk, and that when I access a location 4Gb beyond the start, it tries to extend the task memory by that much?
Here is the full program:
#include <cstdint>
int main() {
constexpr uint64_t GB = 1ULL << 30;
char* p = new char[256*GB]; // allocate large block of virtual space
p[0] = 1;
p[1000000000] = 1;
p[2000000000] = 1;
}
Now, I get bad_alloc when attempting to allocate the huge amount, so obviously malloc won't work.
I was under the impression that mmap would map to files, but since this is suggested I am looking into it.
Ok, so mmap seems to support allocation of big areas of virtual memory, but it requires a file descriptor. Creating huge in-memory data structures could be a win but not if they have to be backed by a file:
The following code uses mmap even though I don't like the idea of attaching to a file. I did not know what number to put in to request in virtual memory, and picked 0x800000000. mmap returns -1, so obviously I'm doing something wrong:
#include <cstdint>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int main() {
constexpr uint64_t GB = 1ULL << 30;
void *addr = (void*)0x8000000000ULL;
int fd = creat("garbagefile.dat", 0660);
char* p = (char*)mmap(addr, 256*GB, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
p[0] = 1;
p[1000000000] = 1;
p[2000000000] = 1;
close(fd);
}
Is there any way to allocate a big chunk of virtual memory and access pages sparsely, or is this not doable?
Is it possible to allocate large amount of virtual memory in linux?
Possibly. But you may need to configure it to be allowed:
The Linux kernel supports the following overcommit handling modes
0 - Heuristic overcommit handling. Obvious overcommits of address
space are refused. Used for a typical system. It ensures a seriously
wild allocation fails while allowing overcommit to reduce swap
usage. root is allowed to allocate slightly more memory in this
mode. This is the default.
1 - Always overcommit. Appropriate for some scientific applications.
Classic example is code using sparse arrays and just relying on the
virtual memory consisting almost entirely of zero pages.
2 - Don't overcommit. The total address space commit for the system
is not permitted to exceed swap + a configurable amount (default is
50%) of physical RAM. Depending on the amount you use, in most
situations this means a process will not be killed while accessing
pages but will receive errors on memory allocation as appropriate.
Useful for applications that want to guarantee their memory
allocations will be available in the future without having to
initialize every page.
The overcommit policy is set via the sysctl `vm.overcommit_memory'.
So, if you want to allocate more virtual memory than you have physical memory, then you'd want:
# in shell
sysctl -w vm.overcommit_memory=1
RLIMIT_AS The maximum size of the process's virtual memory (address space) in bytes. This limit affects calls to brk(2), mmap(2) and mremap(2), which fail with the error ENOMEM upon exceeding this limit. Also automatic stack expansion will fail (and generate a SIGSEGV that kills the process if no alternate stack has been made available via sigaltstack(2)). Since the value is a long, on machines with a 32-bit long either this limit is at most 2 GiB, or this resource is unlimited.
So, you'd want:
setrlimit(RLIMIT_AS, {
.rlim_cur = RLIM_INFINITY,
.rlim_max = RLIM_INFINITY,
});
Or, if you cannot give the process permission to do this, then you can configure this persistently in /etc/security/limits.conf which will affect all processes (of a user/group).
Ok, so mmap seems to support ... but it requires a file descriptor. ... could be a win but not if they have to be backed by a file ... I don't like the idea of attaching to a file
You don't need to use a file backed mmap. There's MAP_ANONYMOUS for that.
I did not know what number to put in to request
Then use null. Example:
mmap(nullptr, 256*GB, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
That said, if you've configured the system as described, then new should work just as well as mmap. It'll probably use malloc which will probably use mmap for large allocations like this.
Bonus hint: You may benefit from taking advantage of using HugeTLB Pages.
The value of 256*GB does not fit into a range of 32-bit integer type. Try uint64_t as a type of GB:
constexpr uint64_t GB = 1024*1024*1024;
or, alternatively, force 64-bit multiplication:
char* p = new char[256ULL * GB];
OT: I would prefer this definition of GB:
constexpr uint64_t GB = 1ULL << 30;
As for the virtual memory limit, see this answer.
I need to acquire several GB of data from a sensor. When I tried to allocate a big array with malloc (10 or more GB. My system has 32GB) it returns NULL. So I thought the problem could be solved with a linked list of iterators to vectors.
However I don't know how to set this up. I tried declaring "
list< vector::iterator >" but I can't allocate the memory for each vector (e/o should have 1000~2000 elements). Do you know any way to do this or maybe a better solution for this big memory allocation?
If you are using a 64-bit operating system, then malloc should be able to allocate the large size with no problem.
For example, this code runs on my windows machine (64-bit windows) and allocates 10GB of ram flawlessly:
#include <stdio.h>
#include <malloc.h>
int main(int argc, char *argv[]) {
long int size = 10L * 1024 * 1024 * 1024;
printf("size = %ld\n", size);
char *x = (char *)malloc(size);
printf("x = 0x%lx\n", x);
long int i;
for (i = 0; i < size; i += 1024*1024) {
x[i] = 'h';
}
printf("Done1\n");
}
However, if you have a 32-bit operating system, you'll be in trouble, and can't allocate over some limit (maybe 3 GB, but probably system dependent)
In that case, you'll need to write your data to a file instead.
However, if you're using a fat filesystem, then you can't write to a file that big either. In that case, you'd have to split the data among many files under 2gb in size.
You'd want to actually check the malloc result for NULL to make sure the malloc works and memory could be grabbed.
You will need to allocation this space under Windows 64 bit OS. You will ALSO have to set "large address space aware" flag, otherwise you can only get 2 GB of RAM due to how the virtual memory system works on Windows.
You may want to look into using a memory mapped file, as suggested by sehe in his answer if you do not absolutely have to have one large 10 GB chunk of continuous memory. If you have to build your application for Windows 32 bit, then this will be the only answer, as Windows 32 bit normally only allows for 2 GB of memory, unless the option is set for "large address space aware" flag, at which point it will allow 3 GB of memory usage.
When you have to deal with large blocks of memory, you are better off skipping malloc altogether and going directly to the operating system calls for memory allocation.
I usually move to memory mapped files or shared memory maps for this kind of data volumes.
This way, you're not bound to the amount of physical (process) memory available at all. You can let the OS page in and out as required. Fragmentation becomes much less of an issue (unless you actually fragment the logical address space, which is quite hard to achieve on 64 bit architectures).
More information
I have quite a number of answers on SO that show examples of storing vectors and all manner of more complicated data structures in shared memory/mapped files. You might want to look for mapped_file_device (from Boost Iostreams) or managed_shared_memory and managed_mapped_file (from Boost Interprocess)
So I had a strange experience this evening.
I was working on a program in C++ that required some way of reading a long list of simple data objects from file and storing them in the main memory, approximately 400,000 entries. The object itself is something like:
class Entry
{
public:
Entry(int x, int y, int type);
Entry(); ~Entry();
// some other basic functions
private:
int m_X, m_Y;
int m_Type;
};
Simple, right? Well, since I needed to read them from file, I had some loop like
Entry** globalEntries;
globalEntries = new Entry*[totalEntries];
entries = new Entry[totalEntries];// totalEntries read from file, about 400,000
for (int i=0;i<totalEntries;i++)
{
globalEntries[i] = new Entry(.......);
}
That addition to the program added about 25 to 35 megabytes to the program when I tracked it on the task manager. A simple change to stack allocation:
Entry* globalEntries;
globalEntries = new Entry[totalEntries];
for (int i=0;i<totalEntries;i++)
{
globalEntries[i] = Entry(.......);
}
and suddenly it only required 3 megabytes. Why is that happening? I know pointer objects have a little bit of extra overhead to them (4 bytes for the pointer address), but it shouldn't be enough to make THAT much of a difference. Could it be because the program is allocating memory inefficiently, and ending up with chunks of unallocated memory in between allocated memory?
Your code is wrong, or I don't see how this worked. With new Entry [count] you create a new array of Entry (type is Entry*), yet you assign it to Entry**, so I presume you used new Entry*[count].
What you did next was to create another new Entry object on the heap, and storing it in the globalEntries array. So you need memory for 400.000 pointers + 400.000 elements. 400.000 pointers take 3 MiB of memory on a 64-bit machine. Additionally, you have 400.000 single Entry allocations, which will all require sizeof (Entry) plus potentially some more memory (for the memory manager -- it might have to store the size of allocation, the associated pool, alignment/padding, etc.) These additional book-keeping memory can quickly add up.
If you change your second example to:
Entry* globalEntries;
globalEntries = new Entry[count];
for (...) {
globalEntries [i] = Entry (...);
}
memory usage should be equal to the stack approach.
Of course, ideally you'll use a std::vector<Entry>.
First of all, without specifying which column exactly you were watching, the number in task manager means nothing. On a modern operating system it's difficult even to define what you mean with "used memory" - are we talking about private pages? The working set? Only the stuff that stays in RAM? does reserved but not committed memory count? Who pays for memory shared between processes? Are memory mapped file included?
If you are watching some meaningful metric, it's impossible to see 3 MB of memory used - your object is at least 12 bytes (assuming 32 bit integers and no padding), so 400000 elements will need about 4.58 MB. Also, I'd be surprised if it worked with stack allocation - the default stack size in VC++ is 1 MB, you should already have had a stack overflow.
Anyhow, it is reasonable to expect a different memory usage:
the stack is (mostly) allocated right from the beginning, so that's memory you nominally consume even without really using it for anything (actually virtual memory and automatic stack expansion makes this a bit more complicated, but it's "true enough");
the CRT heap is opaque to the task manager: all it sees is the memory given by the operating system to the process, not what the C heap has "really" in use; the heap grows (requesting memory to the OS) more than strictly necessary to be ready for further memory requests - so what you see is how much memory it is ready to give away without further syscalls;
your "separate allocations" method has a significant overhead. The all-contiguous array you'd get with new Entry[size] costs size*sizeof(Entry) bytes, plus the heap bookkeeping data (typically a few integer-sized fields); the separated allocations method costs at least size*sizeof(Entry) (size of all the "bare elements") plus size*sizeof(Entry *) (size of the pointer array) plus size+1 multiplied by the cost of each allocation. If we assume a 32 bit architecture with a cost of 2 ints per allocation, you quickly see that this costs size*24+8 bytes of memory, instead of size*12+8 for the contiguous array in the heap;
the heap normally really gives away blocks that aren't really the size you asked for, because it manages blocks of fixed size; so, if you allocate single objects like that you are probably paying also for some extra padding - supposing it has 16 bytes blocks, you are paying 4 bytes extra per element by allocating them separately; this moves out memory estimation to size*28+8, i.e. an overhead of 16 bytes per each 12-byte element.
When assigning values to a large array the used memory keeps increasing even though no new memory is allocated. I am checking the used memory simply by the task manager (windows) or system monitor (Ubuntu).
The Problem is the same on both OS. I am using gcc 4.7 or 4.6 respectively.
This is my code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int i,j;
int n=40000000; //array size
int s=100;
double *array;
array=malloc(n*sizeof(double)); //allocate array
if(array==NULL){
return -1;
}
for(i=0;i<n;i++){ //loop for array, memory increases during this loop
for(j=0;j<s;j++){ //loop to slow down the program
array[i] = 3.0;
}
}
return 0;
}
I do not see any logical Problem, but to my knowledge I do not exceed any system limits either. So my questions are:
can the problem be reproduced by others?
what is the reason for the growing memory?
how do I solve this issue?
When modern systems 'allocate' memory, the pages are not actually allocated within physical RAM. You will get a virtual memory allocation. As you write to those pages, a physical page will be taken. So the virtual RAM taken will be increased when you do the malloc(), but only when you write the value in will the physical RAM be taken (on a page by page basis).
You should see the virtual memory used increase immediately. After that the RSS, or real memory used will increment as you write into the newly allocated memory. More information at How to measure actual memory usage of an application or process?
This is because memory allocated in Linux and on many other operating systems, isn't actually given to your program until you use it.
So you could malloc 1 GB on a 256 MB machine, and not run out of memory until you actually tried to use all 1 GB.
In Linux there is a group of overcommit settings which changes this behavior. See Cent OS: How do I turn off or reduce memory overcommitment, and is it safe to do it?
I am using C++ on Windows 7 with MSVC 9.0, and have also been able to test and reproduce on Windows XP SP3 with MSVC 9.0.
If I allocate 1 GB of 0.5 MB sized objects, when I delete them, everything is ok and behaves as expected. However if I allocate 1 GB of 0.25 MB sized objects when I delete them, the memory remains reserved (yellow in Address Space Monitor) and from then on will only be able to be used for allocations smaller than 0.25 MB.
This simple code will let you test both scenarios by changing which struct is typedef'd. After it has allocated and deleted the structs it will then allocate 1 GB of 1 MB char buffers to see if the char buffers will use the memory that the structs once occupied.
struct HalfMegStruct
{
HalfMegStruct():m_Next(0){}
/* return the number of objects needed to allocate one gig */
static int getIterations(){ return 2048; }
int m_Data[131071];
HalfMegStruct* m_Next;
};
struct QuarterMegStruct
{
QuarterMegStruct():m_Next(0){}
/* return the number of objects needed to allocate one gig */
static int getIterations(){ return 4096; }
int m_Data[65535];
QuarterMegStruct* m_Next;
};
// which struct to use
typedef QuarterMegStruct UseType;
int main()
{
UseType* first = new UseType;
UseType* current = first;
for ( int i = 0; i < UseType::getIterations(); ++i )
current = current->m_Next = new UseType;
while ( first->m_Next )
{
UseType* temp = first->m_Next;
delete first;
first = temp;
}
delete first;
for ( unsigned int i = 0; i < 1024; ++i )
// one meg buffer, i'm aware this is a leak but its for illustrative purposes.
new char[ 1048576 ];
return 0;
}
Below you can see my results from within Address Space Monitor. Let me stress that the only difference between these two end results is the size of the structs being allocated up to the 1 GB marker.
This seems like quite a serious problem to me, and one that many people could be suffering from and not even know it.
So is this by design or should this be considered a bug?
Can I make small deleted objects actually be free for use by larger allocations?
And more out of curiosity, does a Mac or a Linux machine suffer from the same problem?
I cannot positively state this is the case, but this does look like memory fragmentation (in one of its many forms). The allocator (malloc) might be keeping buckets of different sizes to enable fast allocation, after you release the memory, instead of directly giving it back to the OS it is keeping the buckets so that later allocations of the same size can be processed from the same memory. If this is the case, the memory would be available for further allocations of the same size.
This type of optimization, is usually disabled for big objects, as it requires reserving memory even if not in use. If the threshold is somewhere between your two sizes, that would explain the behavior.
Note that while you might see this as weird, in most programs (not test, but real life) the memory usage patterns are repeated: if you asked for 100k blocks once, it more often than not is the case that you will do it again. And keeping the memory reserved can improve performance and actually reduce fragmentation that would come from all requests being granted from the same bucket.
You can, if you want to invest some time, learn how your allocator works by analyzing the behavior. Write some tests, that will acquire size X, release it, then acquire size Y and then show the memory usage. Fix the value of X and play with Y. If the requests for both sizes are granted from the same buckets, you will not have reserved/unused memory (image on the left), while when the sizes are granted from different buckets you will see the effect on the image on the right.
I don't usually code for windows, and I don't even have Windows 7, so I cannot positively state that this is the case, but it does look like it.
I can confirm the same behaviour with g++ 4.4.0 under Windows 7, so it's not in the compiler. In fact, the program fails when getIterations() returns 3590 or more -- do you get the same cutoff? This looks like a bug in Windows system memory allocation. It's all very well for knowledgeable souls to talk about memory fragmentation, but everything got deleted here, so the observed behaviour definitely shouldn't happen.
Using your code I performed your test and got the same result. I suspect that David RodrÃguez is right in this case.
I ran the test and had the same result as you. It seems there might be this "bucket" behaviour going on.
I tried two different tests too. Instead of allocating 1GB of data using 1MB buffers I allocated the same way as the memory was first allocated after deleting. The second test I allocated the half meg buffer cleaned up then allocated the quater meg buffer, adding up to 512MB for each. Both tests had the same memory result in the end, only 512 is allocated an no large chunk of reserved memory.
As David mentions, most applications tend to make allocation of the same size. One can see quite clearly why this could be a problem though.
Perhaps the solution to this is that if you are allocating many smaller objects in this way you would be better to allocate a large block of memory and manage it yourself. Then when you're done free the large block.
I spoke with some authorities on the subject (Greg, if you're out there, say hi ;D) and can confirm that what David is saying is basically right.
As the heap grows in the first pass of allocating ~0.25MB objects, the heap is reserving and committing memory. As the heap shrinks in the delete pass, it decommits at some pace but does not necessarily release the virtual address ranges it reserved in the allocation pass. In the last allocation pass, the 1MB allocations are bypassing the heap due to their size and thus begin to compete with the heap for VA.
Note that the heap is reserving the VA, not keeping it committed. VirtualAlloc and VirtualFree can help explain the different if you're curious. This fact doesn't solve the problem you ran into, which is that the process ran out of virtual address space.
This is a side-effect of the Low-Fragmentation Heap.
http://msdn.microsoft.com/en-us/library/aa366750(v=vs.85).aspx
You should try disabling it to see if that helps. Run against both GetProcessHeap and the CRT heap (and any other heaps you may have created).