Limit physical memory per process - c++

I am writing an algorithm to perform some external memory computations, i.e. where your input data does not fit into main memory and you have to consider the I/O complexity.
Since for my tests I do not always want to use real inputs I want to limit the amount of memory available to my process. What I have found is, that I can set the mem kernel parameter to limit the physically used memory of all processes (is that correct?)
Is there a way to do the same, but with a per process limit. I have seen ulimit, but it only limits the virtual memory per process. Any ideas (maybe I can even set it programmatically from within my C++ code)?

You can try with 'cgroups'.
To use them type the following commands, as root.
# mkdir /dev/cgroups
# mount -t cgroup -omemory memory /dev/cgroups
# mkdir /dev/cgroups/test
# echo 10000000 > /dev/cgroups/test/memory.limit_in_bytes
# echo 12000000 > /dev/cgroups/test/memory.memsw.limit_in_bytes
# echo <PID> > /dev/cgroups/test/tasks
Where is the PID of the process you want to add to the cgroup. Note that the limit applies to the sum of all the processes assigned to this cgroup.
From this moment on, the processes are limited to 10MB of physical memory and 12MB of pysical+swap.
There are other tunable parameters in that directory, but the exact list will depend on the kernel version you are using.
You can even make hierarchies of limits, just creating subdirectories.
The cgroup is inherited when you fork/exec, so if you add the shell from where your program is launched to a cgroup it will be assigned automatically.
Note that you can mount the cgroups in any directory you want, not just /dev/cgroups.

I can't provide a direct answer but pertaining to doing such stuff, I usually write my own memory management system so that I can have full control of the memory area and how much I allocate. This is usually appliacble when you're writing for microcontrollers as well. Hope it helps.

I would use the setrlimti with the RLIMIT_AS parameter to set the limit of virtual memory (this is what ulimit does) and then have the process use mlockall(MCL_CURRENT|MCL_FUTURE) to force the kernel to fault in and lock into physical RAM all the process pages, so that amount virtual == amount physical memory for this process

have you considered trying your code in some kind of virtual environment? A virtual machine might be too much for your needs, but something like User-Mode Linux could be a good fit. This runs a linux kernel as a single process inside your regular operating system. Then you can provide a separate mem= kernel setting, as well as a separate swap space to make controlled experiments.

Kernel mem= boot parameter limits how much memory in total OS will use.
This is almost never what user wants.
For physical memory, there is RSS rlimit aka RLIMIT_AS.

As other posters have indicated already, setrlimit is the most probable solution, it controls the limits of all configurable aspects of a process environment. Use this command to see these individual settings on your shell process:
ulimit -a
The ones most pertinent to your scenario in the resulting output are as follows:
data seg size (kbytes, -d) unlimited
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
virtual memory (kbytes, -v) unlimited
Checkout the manual page for setrlimit ("man setrlimit"), it can be invoked programmatically from your C/C++ code. I have used it to good effect in the past for controlling stack size limits. (btw, there is no dedicated man page for ulimit, it's actually an embedded bash command, so it's in the bash man page.)

Related

mmap - controlling memory mapped file virtual memory usage with cgroups

I have an application that opens a file with mmap() and does stuff to it (long story short, makes calls to gdb to parse a coredump file and then 7z to compress the dump). What I am trying to achieve is setting a limit on how much resident memory (a.k.a. actual RAM) can be used by this application, while letting it use as much total virtual memory as it wants.
There are two main suggestions I've seen to achieve this: ulimit and cgroups.
mmap: an observation
Before moving forward, a note on mmap: my understanding is the whole point of using it is to minimize the total amount of memory used to read file. This works by having the mmap'ed file backed up by themselves, not by swap or RAM. However, when I start my application (that uses mmap) and look at the output from top, I notice it still reports the application as using a large amount of virtual memory... using just a bit under the size of the file that is being opened with mmap. So a 15GB file might report 0.5GB of RAM usage and 14.5GB of virtual memory usage. So does this mean mmap needs to load the entire file into (virtual) memory or is this just a quirk of the way Linux reports memory usage for mmap (as in, it "counts" the space on the hard drive where the file is located as virtual memory)?
ulimit
ulimit only supports setting a limit for virtual memory as a whole. There is no way to way to specify a limit for only resident memory, which is what I'm interested in. Since mmap appears to use roughly the same amount of virtual memory as the size of the file it is opening (as described above), this doesn't work for me. Set ulimit -v to any thing less, and my application crashes.
cgroups
cgroups lets us set a specific limit for resident memory with memory.limit_in_bytes. I tried creating a cgroup and running my application with it. Here I saw a phenomenon that's left me stumped: on a machine with only 4GB of RAM and 2 CPUS, the cgroup seems to respect the RAM usage limit I set, with the limit_in_bytes only set to 100MB. However on a machine with 500GB, 60 CPUs and a limit of 100 bytes, the exact same file, exact same application (same executable, not rebuilt on the new machine or anything), setting the same 100MB limit leads to the application crashing. Only when I set the limit back to around the same size as the file being mmapd, can it run successfully.
So there are a two questions here:
Does mmap need to load the whole file into virtual memory to work or not? My evidence points to yes after trying ulimit... and no after my experiment with cgroups, on the 4GB machine.
Any suggestions on what other factors could explain why the 4GB is able to successfully work with the cgroup limit, but not the 500GB machine?

Understanding memory used by a program in bash (in ubuntu linux)

In some programming contests, problems have a memory limit (like 64MB or 256MB). How can I understand the memory used by my program (written in C++) with bash commands? Is there any way to limit the memory used by the program? The program should terminate if it uses more memory than the limit.
The command top will give you a list of all running processes and the current memory and swap or if you prefer the GUI you can use the System Monitor Application.
As for locking down memory usage you can always use the ulimit -v to set the maximum virtual address range for a process. This will cause malloc and its buddies to fail if they try to get more memory than that set limit.
Depending on how much work you want to put into it you can look at getrusage(), getrlimit(), and setrlimit(). For testing purposes you can call them at the beginning of your program or perhaps set them up in a parent process and fork your contest program off as a child. Then dispense with them when you submit your program for contest consideration.
Also, for process 1234, you could look into /proc/1234/maps or /proc/1234/smaps or run pmap 1234, all these commands display the memory map of that process of pid 1234.
Try to run cat /proc/self/maps to get an example (the memory map of the process running that cat command).
The memory map of a process is initialized by execve(2) and changed by the mmap(2) syscall (etc...)

Single Process Maximum Possible Memory in x64 Linux

is there any memory limit for a single process in x64 Linux?
we are running a Linux Server with 32Gb of RAM and I'm wondering if I can allocate most of it for a single process I'm coding which requires lots of RAM!
Certain kernels have different limits, but on any modern 64-bit linux the single-process limit is still far over 32GB (assuming that process is a 64-bit executable). Various distributions may also have set per-process limits using sysctl, so you'll want to check your local environment to make sure that there aren't arbitrarily low limits set (also check ipcs -l on RPM-based systems).
The Debian port documentation for the AMD64 port specifically mentions that the per-process virtual address space limit is 128TiB (twice the physical memory limit), so that should be the reasonable upper bound you're working with.
The resource limits are set using setrlimit syscall. You can change them with a shell builtin (e.g. ulimit on bash, limit with zsh).
The practical limit is also related to RAM size and swap size. The free command show these. (Some systems are overcommitting memory, but that is risky).
A process actually don't use RAM, it consumes virtual memory using system calls like mmap (which may get called by malloc). You could even map a portion of a file into memory with that call.
To learn about the memory map of a process 1234, look into the  /proc/1234/maps file. From your own application, read the /proc/self/maps. And you have also /proc/1234/smaps and /proc/self/smaps. Try the command cat /proc/self/mapsto understand the memory map of the process running that cat.
On a 32Gb RAM machine, you can usually run a process with 31 Gb of process space (assuming no other big process exist). If you had also 64Gb of swap, you could run a process of at least 64Gb but that would be unbelievably slow (most of the time would be spent on swapping to disk). You can add swap space (e.g. by swapping to a file, initialized with dd then mkswap, and activated with swapon).
If coding a server, be very careful about memory leaks. The valgrind tool is helpful to hunt such bugs. And you could consider using Boehm's garbage collector
Current 64bit Linux kernel has limit to 64TB of physical RAM and 128TB of virtual memory (see RHEL limits and Debian port). Current x86_64 CPUs (ie. what we have in the PC) has (virtual) address limit 2^48=256TB because of how the address register in the CPU use all the bits (upper bits are used for page flags like ReadOnly, Writable, ExecuteDisable, PagedToDisc etc in the pagetable), but the specification allows to switch to true 64bit address mode reaching the maximum at 2^64=16EB (Exa Bytes). However, the motherboard and CPU die does not have so many pins to deliver all 48 bits of the memory address to the RAM chip through the address bus, so the limit for physical RAM is lower (and depends on manufacturer), but the virtual address space could by nature reach more than the amount of RAM one could have on the motherboard up to virtual memory limit mentioned above.
The limit per process are raised by how the memory virtual address space for the process is set, because there could be various sizes for stack, mmap() area (and dynamic libraries), program code itself, also the kernel is mapped into the process space. Some of these settings could be changed by passing argument to the linker, sometimes by special directive in the source code, or by modifying the binary file with the program directly (binary has ELF format). Also there are limits the administrator of the machine (root) has set or the user has (see output of the command "ulimit -a"). These limits could be soft or hard and the user is unable to overcome hard limit.
Also the Linux kernel could be set to allow memory overcommit allocation. In this case, the program is allowed to allocate a huge amount of RAM and then use only a few of pages (see sparse arrays, sparse matrix), see Linux kernel documentation. So in this case, the program will fail only after filling up the requested memory by data, but not at the time of memory allocation.

Finding amount of RAM using C++

How would i find out the amount of RAM and details about my system like CPU type, speed, amount of physical memory available. amount of stack and heap memory in RAM, number of processes running.
Also how to determine if there is any way to determin how long it takes your computer to execute an instruction, fetch a word from memory (with and without a cache miss), read consecutive words from disk, and seek to a new location on disk.
Edit: I want to accomplish this on my linux system using g++ compiler. are there any inbulit functions for this..? Also tell me if such things are possible on windows system.
I just got this question out of curiosity when I was learning some memory management stuff in c++. Please guide me through this step by step or may be online tutorials ll do great. Thanks.
With Linux and GCC, you can use the sysconf function included using the <unistd.h> header.
There are various arguments you can pass to get hardware information. For example, to get the amount of physical RAM in your machine you would need to do:
sysconf(_SC_PHYS_PAGES) * sysconf(_SC_PAGESIZE);
See the man page for all possible usages.
You can get the maximum stack size of a process using the getrlimit system call along with the RLIMIT_STACK argument, included using the <sys/resource.h> header.
To find out how many processes are running on the current machine you can check the /proc directory. Each running process is represented as a file in this directory named by its process ID number.
For Windows - GetPhysicallyInstalledSystemMemory for installed RAM, GetSystemInfo for CPUs, Process Status API for process enumeration. Heap and stack usage can be gotten only by the local process for itself. Remember stack usage is per-thread, and in Windows a process can have multiple heaps (use GetProcessHeaps to enumerate them). Memory usage per process in externally visible usage can be retrieved for each process using GetProcessMemoryInfo.
I'm not aware of Win32 APIs for the second paragraph's list. Probably have to do this at the device driver level (kernel mode) I would think, if it's even possible. Instruction fetch and
execution depend on the processor, cache size and instruction itself (they are not all the same in complexity). Memory access speed will depend on RAM, CPU and the motherboard FSB speed. Disk access likewise is totally dependent on the system characteristics.
On Windows Vista and Windows 7, the Windows System Assessment Tool can provide a lot of info. Supposedly it can be programmatically accessed via the WEI API.

Setting the default stack size on Linux globally for the program

So I've noticed that the default stack size for threads on linux is 8MB (if I'm wrong, PLEASE correct me), and, incidentally, 1MB on Windows. This is quite bad for my application, as on a 4-core processor that means 64 MB is space is used JUST for threads! The worst part is, I'm never using more than 100kb of stack per thread (I abuse the heap a LOT ;)).
My solution right now is to limit the stack size of threads. However, I have no idea how to do this portably. Just for context, I'm using Boost.Thread for my threading needs. I'm okay with a little bit of #ifdef hell, but I'd like to know how to do it easily first.
Basically, I want something like this (where windows_* is linked on windows builds, and posix_* is linked under linux builds)
// windows_stack_limiter.c
int limit_stack_size()
{
// Windows impl.
return 0;
}
// posix_stack_limiter.c
int limit_stack_size()
{
// Linux impl.
return 0;
}
// stack_limiter.cpp
int limit_stack_size();
static volatile int placeholder = limit_stack_size();
How do I flesh out those functions? Or, alternatively, am I just doing this entirely wrong? Remember I have no control over the actual thread creation (no new params to CreateThread on Windows), as I'm using Boost.Thread.
You do not need to do this. The machine's physical memory is employed only where it is needed by a demand page fault system. Even if the thread stacks are significantly larger than the amount you are using the extra size is in virtual address space and does not tie up physical RAM.
Had physical RAM been tied up at that rate, a typical machine would run out of memory with only a few dozen processes running. You can see from a ps -Al that quite a few more than that execute concurrently.
I've run into similar problems on 32-bit systems (especially MIPS) running large application programs with hundreds of threads. Large default stacks don't tie up physical memory, but virtual memory can be a scarce resource as well. There are a couple of ways to resolve the problem:
Use setrlimit from within the program. I haven't done this but I suspect it would work.
Before starting the program from the shell, use "ulimit -s" with a parameter smaller than the default. (e.g., "ulimit -s 1024" for default 1 MB stack)
First, you don't need to change this unless you are getting SEGVs from hitting this limit. (see man setrlimit for detailed info)
Second, you change this in all of the linux distributions I'm aware of by editing /etc/security/limits.conf (to change the default) or by running ulimit -s <stack size in kilobytes> to change the value until you exit the shell.