In C++ how much can the stack segment grow before the compiler gives up and says that it cannot allocate more memory for stack.
Using gcc on a linux (fedora) 32 bit machine.
Under UNIX, if you are running bash run
$ ulimit -a
it will list various limits including stack size. Mine is 8192kb. You can use ulimit to change the limits.
Also, you can use ulimit() function to set various limits from within your program.
$ man 3 ulimit
Under Windows see StackReserveSize and StackCommitSize
In practice stack addresses begin at high addresses (on a 32-bit platform, close to the 3GB limit) and decrease while memory allocation begins at low addresses. This allows the stack and memory to grow until whole memory is exhausted.
On my 32 bit linux, its 8192K bytes. So it should be the same on your machine.
$ uname -a
Linux TomsterInc 2.6.28-14-generic #46-Ubuntu SMP Wed Jul 8 07:21:34 UTC 2009 i686 GNU/Linux
$ ulimit -s
8192
Windows (and I think Linux) both operate on the big stack model assumption, that is, there is one stack (per thread) whose space is preallocated before the thread starts.
I suspect the OS simply assigns virtual memory space of the preallocated size to that stack area, and adds real memory pages underneath as the end of the stack is advanced beyond a page boundary until the upper limit ("ulimit") is reached.ck
Since OSes often place stacks well away from other structure, when ulimit is reached, it is just possible that the OS might be able to expand the stack, if when the overflow occurs nothing else has shown up next to the stack. In general, if you are building a program complex enough enough to overflow the stack, you are likely allocating memory dynamically and there is no gaurantee that the area next to the stack didn't get allocated. If such memory is allocated, of course the OS can't expand the stack where it is.
This means the application cannot count on the stack being expanded automatically by the OS. In effect, the stack can't grow.
In theory, an application exhausting its stack might be able to start a new thread with a larger stack, copy the existing stack and continue, but as practical matter I doubt this can be done, if for no other reason than pointers to local variables stack will need adjusting and C/C++ compilers don't make it possible to find such pointers and adjust them.
Consequence: ulimit has to be declared before the program starts, and once exceeded, the program dies.
If one wants a stack that can expand arbitrarily, it is better to switch to a language that uses heap-allocated activation records. Then you simply don't run out until your address space is used up. 32 or 64 bit VM spaces ensure you can do a lot of recursion with this techniquie.
We have a parallel programming language called PARLANSE, that does heap allocation to enable thousands of parallel computational grains (in practice) to recurse arbitrarily this way.
Related
I'v run into an odd problem, my process cannot allocate more than what seems to be slightly below 1 GiB. Windows Task Manager "Mem Usage" column shows values close to 1 GiB when my software gives a bad_alloc exception. Yes, i'v checked that the value passed to memory allocation is sensible. ( no race condition / corruption exists that would make this fail ). Yes, I need all this memory and there is no way around it. ( It's a buffer for images, which cannot be compressed any further )
I'm not trying to allocate the whole 1 GiB memory in one go, there a few allocations around 300 MiB each. Would this cause problems? ( I'll try to see if making more smaller allocations works any better ). Is there some compiler switch or something else that I must set in order to get past 1 GiB? I've seen others complaining about the 2 GiB limit, which would be fine for me.. I just need little bit more :). I'm using VS 2005 with SP1 and i'm running it on a 32 bit XP and it's in C++.
On a 32-bit OS, a process has a 4GB address space in total.
On Windows, half of this is off-limits, so your process has 2GB.
This is 2GB of contiguous memory. But it gets fragmented. Your executable is loaded in at one address, each DLL is loaded at another address, then there's the stack, and heap allocations and so on. So while your process probably has enough free address space, there are no contiguous blocks large enough to fulfill your requests for memory. So making smaller allocations will probably solve it.
If your application is compiled with the LARGEADDRESSAWARE flag, it will be allowed to use as much of the remaining 2GB as Windows can spare. (And the value of that depends on your platform and environment.
for 32-bit code running on a 64-bit OS, you'll get a full 4-GB address space
for 32-bit code running on a 32-bit OS without the /3GB boot switch, the flag means nothing at all
for 32-bit code running on a 32-bit OS with the /3GB boot switch, you'll get 3GB of address space.
So really, setting the flag is always a good idea if your application can handle it (it's basically a capability flag. It tells Windows that we can handle more memory, so if Windows can too, it should just go ahead and give us as large an address space as possible), but you probably can't rely on it having an effect. Unless you're on a 64-bit OS, it's unlikely to buy you much. (The /3GB boot switch is necessary, and it has been known to cause problems with drivers, especially video drivers)
Allocating big chunks of continuous memory is always a problem.
It is very likely to get more memory in smaller chunks
You should redesign your memory structures.
You are right to suspect the larger 300MB allocations. Your process will be able to get close to 2GB (3 if you use the /3GB boot.ini switch and LARGEADDRESSAWARE link flag), but not as a large contiguous block.
Typical solutions for this are to break up the requests into tiles or strips of fixed size (say 256x256x4 bytes) and write an intermediate class to hide this representation detail.
You can quickly verify this by writing a small allocation loop that allocate blocks of different sizes.
You could also check this function from MSDN. 1GB rings a bell from here:
This parameter must be greater than or equal to 13 pages (for example,
53,248 on systems with a 4K page size), and less than the system-wide
maximum (number of available pages minus 512 pages). The default size
is 345 pages (for example, this is 1,413,120 bytes on systems with a
4K page size).
Here they mentioned that default maximum number of pages allowed for a process is 345 pages which is slightly more than 1GB.
When I have a few big allocs like that to do, I use the Windows function VirtualAlloc, to avoid stressing the default allocator.
Another way forward might be to use nedmalloc in your project.
is there any memory limit for a single process in x64 Linux?
we are running a Linux Server with 32Gb of RAM and I'm wondering if I can allocate most of it for a single process I'm coding which requires lots of RAM!
Certain kernels have different limits, but on any modern 64-bit linux the single-process limit is still far over 32GB (assuming that process is a 64-bit executable). Various distributions may also have set per-process limits using sysctl, so you'll want to check your local environment to make sure that there aren't arbitrarily low limits set (also check ipcs -l on RPM-based systems).
The Debian port documentation for the AMD64 port specifically mentions that the per-process virtual address space limit is 128TiB (twice the physical memory limit), so that should be the reasonable upper bound you're working with.
The resource limits are set using setrlimit syscall. You can change them with a shell builtin (e.g. ulimit on bash, limit with zsh).
The practical limit is also related to RAM size and swap size. The free command show these. (Some systems are overcommitting memory, but that is risky).
A process actually don't use RAM, it consumes virtual memory using system calls like mmap (which may get called by malloc). You could even map a portion of a file into memory with that call.
To learn about the memory map of a process 1234, look into the /proc/1234/maps file. From your own application, read the /proc/self/maps. And you have also /proc/1234/smaps and /proc/self/smaps. Try the command cat /proc/self/mapsto understand the memory map of the process running that cat.
On a 32Gb RAM machine, you can usually run a process with 31 Gb of process space (assuming no other big process exist). If you had also 64Gb of swap, you could run a process of at least 64Gb but that would be unbelievably slow (most of the time would be spent on swapping to disk). You can add swap space (e.g. by swapping to a file, initialized with dd then mkswap, and activated with swapon).
If coding a server, be very careful about memory leaks. The valgrind tool is helpful to hunt such bugs. And you could consider using Boehm's garbage collector
Current 64bit Linux kernel has limit to 64TB of physical RAM and 128TB of virtual memory (see RHEL limits and Debian port). Current x86_64 CPUs (ie. what we have in the PC) has (virtual) address limit 2^48=256TB because of how the address register in the CPU use all the bits (upper bits are used for page flags like ReadOnly, Writable, ExecuteDisable, PagedToDisc etc in the pagetable), but the specification allows to switch to true 64bit address mode reaching the maximum at 2^64=16EB (Exa Bytes). However, the motherboard and CPU die does not have so many pins to deliver all 48 bits of the memory address to the RAM chip through the address bus, so the limit for physical RAM is lower (and depends on manufacturer), but the virtual address space could by nature reach more than the amount of RAM one could have on the motherboard up to virtual memory limit mentioned above.
The limit per process are raised by how the memory virtual address space for the process is set, because there could be various sizes for stack, mmap() area (and dynamic libraries), program code itself, also the kernel is mapped into the process space. Some of these settings could be changed by passing argument to the linker, sometimes by special directive in the source code, or by modifying the binary file with the program directly (binary has ELF format). Also there are limits the administrator of the machine (root) has set or the user has (see output of the command "ulimit -a"). These limits could be soft or hard and the user is unable to overcome hard limit.
Also the Linux kernel could be set to allow memory overcommit allocation. In this case, the program is allowed to allocate a huge amount of RAM and then use only a few of pages (see sparse arrays, sparse matrix), see Linux kernel documentation. So in this case, the program will fail only after filling up the requested memory by data, but not at the time of memory allocation.
Is there a limit on the stack size of a process in Linux? Is it simply dependent on the RAM of the machine?
I want to know this in order to limit the depth of recursive calls to a function.
The stack is normally limited by a resource limit. You can see what the default settings are on your installation using ulimit -a:
stack size (kbytes, -s) 8192
(this shows that mine is 8MB, which is huge).
If you remove or increase that limit, you still won't be able to use all the RAM in the machine for the stack - the stack grows downward from a point near the top of your process's address space, and at some point it will run into your code, heap or loaded libraries.
The limit can be set by the admin.
See man ulimit.
There is probably a default which you cannot cross. If you have to worry about stack limits, I would say you need to rethink your design, perhaps write an iterative version?
It largely depends what architecture you're on (32 or 64-bit) and whether you're multithreaded or not.
By default in a single threaded process, i.e. the main thread created by the OS at exec() time, your stack usually will grow until it hits something else in the address space. This means that it is generally possible, on a 32-bit machine, to have, say 1G of stack.
However, this is definitely NOT the case in a multithreaded 32-bit process. In multithreaded procesess, the stacks share address space and hence need to be allocated, so they typically get given a small amount of address space (e.g. 1M) so that many threads can be created without exhausting address space.
So in a multithreaded process, it's small and finite, in a single threaded one, it's basically until you hit something else in the address-space (which the default allocation mechanism tries to ensure doesn't happen too soon).
In a 64-bit machine, of course there is a lot more address space to play with.
In any case you can always run out of virtual memory, in which case you'll get a SIGBUS or SIGSEGV or something.
Would have commented on the accepted answer but apparently I need more rep....
True Stack Overflow can be subtle and not always cause any error messages or warnings. I just had a situation where the only symptom was that socket connections would fail with strange SSL errors. Everything else worked fine. Threads could malloc(), grab locks, talk to the DB, etc. But new connections would fail at the SSL layer.
With stack traces from well within GnuTLS I was quite confused about the true cause. Nearly reported the traces to their team after spending lots of time trying to figure it out.
Eventually found that the stacksize was set to 8Mb and immediately upon raising it the problems vanished. Lowering the stack back to 8Mb brought the problem back (ABA).
So if you are troubleshooting what appears to be strange socket errors without any other warnings or uninitialized memory errors.... it could be stack overflow.
What is the initial heap size alloted typically to a C++ program running on UNIX based OS ?
How is it decided by the g++ compiler if at all it has a role to play in this regard ?
For C++, no matter what the platform, the heap is almost always extended dynamically by asking the OS for more memory as needed. On some embedded platforms, or some very old platforms this may not be true, but then you probably have a really good idea how much heap you have because of the nature of the environment.
On Unix platforms this is doubly true. Even most Unix embedded platforms work this way.
On platforms that work like this, the library usually doesn't have any kind of internal limit, but instead relies on the OS to tell it that it can't have any more memory. This may happen well after you have actually asked for more memory than is available though for a variety of reasons.
On most Unix systems, there is a hard limit on how much total memory a process can have. This limit can be queried with the getrlimit system call. The relevant constant is RLIMIT_AS. This limit governs the maximum number of memory pages that can be assigned to a process and directly limits the amount of heap space available.
Unfortunately that limit doesn't directly say how much heap you can use. Memory pages are assigned to a process as a result of mmap calls, to hold the program code itself, and for the process' stack.
Additionally, this limit is frequently set well in excess of the total memory available to the whole system if you add together physical memory and swap space. So in reality your program will frequently run out of memory before this limit is reached.
Lastly, some versions of Unix over-assign pages. They allow you to allocate a massive number of pages, but only actually find memory for those pages when you write to them. This means your program can be killed for running out of memory even if all the memory allocation calls succeed. The rationale for this is the ability to allocate huge arrays which will only ever be partially used.
So, in short, there isn't a typical size, and no good way to find out what the size really is.
The heap is extended dynamically by asking the OS for more memory as needed.
It's not determined by the compiler, exactly, but by the library.
It is more typical to fix the size of the heap in dynamic languages with GC. In C and C++, it is a simple matter to ask the OS for more memory, since it is obvious when you need it. As a consequence, the initial heap size matters very little and is just an implementation decision on the part of the allocation library.
In short, there is not definit way to configure the heap size. But we do have some way to impact the heap memory size , as the heap memory size is part of the total availabe memory.
You can get the total amount of availabe memory in the sytem by :
cat /proc/meminfo | grep CommitLimit
CommitLimit: 498080 kB
This CommitLimit is caculated with following formula:
CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap
Supposing the swap is zero, by setting the overcommit_ratio you can configure total availble memory. You can set the overcommit_ratio by :
sysctl -w vm.overcommit_ratio=60
And it is important to notice that this limit is only adhered to if strict overcommit accounting is enabled (mode 2 in 'vm.overcommit_memory'). . This could be set by :
sysctl -w vm.overcommit_memory=2
Here is the kernel document that explain this well.
you could try to write an small program with a while(true) loop. after run it, "cat /proc/{pid}/maps" you'll know its initial heap size.
I am curious about how to find out what the maximum stack size is for a particular compiler/os combo. I am using Ubuntu/GNU compiler. A few questions I have in addition are:
Who controls the default maximum stack size; OS or compiler?
Is the default maximum scaled according to total memory? (ie a machine with 2gb memory would have larger default size than a machine with only 512mb) For this example both machines are same os/compiler setup, just different amounts of system RAM.
Thanks!
Who controls the default maximum stack size; OS or compiler?
The compiler typically. The OS/hardware does limit it to a certain extent. Default is 8MB on linux IIRC. Think of ulimit -s on Linux (to change stack sizes).
Is the default maximum scaled according to total memory? (ie a machine with 2gb memory would have larger default size than a machine with only 512mb) For this example both machines are same os/compiler setup, just different amounts of system RAM.
No. Until and unless you do it yiurself.You can alter stack sizes via compiler switches.
ld --stack=<STACK_SIZE>
or
gcc -Wl,--stack=<STACK_SIZE>
The C++ Standard's take on the issue of stacks and heaps:
The standard is based on an abstract machine and does not really concern itself with hardware or stacks or heaps. It does talk about an allocated store and a free store. The free store is where you'd be if you are calling new (mostly). FWIW, an implementation can have only one memory area masquerading as both stack and heap when it comes to object allocation.
Your question, therefor, boils down to be an implementation specific issue rather than a language issue.
Hope this helps.
On Linux (Ubuntu), operating system controls the maximum size. See "man limit" or "man ulimit" for reference.
Nowadays, the correct question is: How much memory is allocated to my thread. Each thread gets an amount which typically you can control at thread-creation time.
To answer part 1, the compiler / thread-system gets to pick, although some OS's have (historically) had limits.
For 2, no it's not scaled.
There is no way of doing this portably - the C++ Standard does not actually require a stack.