I have a program that is projected to use a few GB of lmdb diskspace (it's a blockchain, and we're moving away from leveldb due to its lack of ACID, which I need for some future plans). Is it possible to run that program with that database on a Raspberry Pi without adding more swap (with >1 GB memory)? (considering that adding swap is for advanced users).
Currently when I run that program mdb_env_set_mapsize(1 << 30), hence 1 GB of mapsize, it returns error 12, which is out-of-memory. But it works if I reduce the size to 512 MB.
But what's the right way to handle such memory issues in lmdb when the database size keeps increasing?
The maximum size of memory that can be memory mapped depends on the size of the virtual address space, which is dictated by the CPU's virtual memory manager. A 32-bit CPU have a limit of 4GB virtual address space, this limit is for the whole system unless PAE is enabled, in which case the limit is per process.
In addition to this, the kernel and your application reserves some space of their own on your address space, and memory allocation usually requires contiguous address space, reducing memory available for the database to allocate.
So your user will need to either enable PAE on their system, or upgrade to 64-bit CPU. If neither of these is an option in your application, then you cannot use a memory mapped file larger than your available address space, so you'll have to do some segmentation to split your data into multiple files that you can map only small chunks at a time. I'm guessing that lmdb requires that it can map the entire database file into memory.
For a blockchain application, your data is mostly a linear sequence of log entries, so your application should only need to work with the most recent entries most of the time. You can separate the recent entries into its own working file, and the rest of the log in a database that doesn't require mapping the entire file into memory or in multiple fixed size files that you can map and unmap as needed.
Related
I have an application that opens a file with mmap() and does stuff to it (long story short, makes calls to gdb to parse a coredump file and then 7z to compress the dump). What I am trying to achieve is setting a limit on how much resident memory (a.k.a. actual RAM) can be used by this application, while letting it use as much total virtual memory as it wants.
There are two main suggestions I've seen to achieve this: ulimit and cgroups.
mmap: an observation
Before moving forward, a note on mmap: my understanding is the whole point of using it is to minimize the total amount of memory used to read file. This works by having the mmap'ed file backed up by themselves, not by swap or RAM. However, when I start my application (that uses mmap) and look at the output from top, I notice it still reports the application as using a large amount of virtual memory... using just a bit under the size of the file that is being opened with mmap. So a 15GB file might report 0.5GB of RAM usage and 14.5GB of virtual memory usage. So does this mean mmap needs to load the entire file into (virtual) memory or is this just a quirk of the way Linux reports memory usage for mmap (as in, it "counts" the space on the hard drive where the file is located as virtual memory)?
ulimit
ulimit only supports setting a limit for virtual memory as a whole. There is no way to way to specify a limit for only resident memory, which is what I'm interested in. Since mmap appears to use roughly the same amount of virtual memory as the size of the file it is opening (as described above), this doesn't work for me. Set ulimit -v to any thing less, and my application crashes.
cgroups
cgroups lets us set a specific limit for resident memory with memory.limit_in_bytes. I tried creating a cgroup and running my application with it. Here I saw a phenomenon that's left me stumped: on a machine with only 4GB of RAM and 2 CPUS, the cgroup seems to respect the RAM usage limit I set, with the limit_in_bytes only set to 100MB. However on a machine with 500GB, 60 CPUs and a limit of 100 bytes, the exact same file, exact same application (same executable, not rebuilt on the new machine or anything), setting the same 100MB limit leads to the application crashing. Only when I set the limit back to around the same size as the file being mmapd, can it run successfully.
So there are a two questions here:
Does mmap need to load the whole file into virtual memory to work or not? My evidence points to yes after trying ulimit... and no after my experiment with cgroups, on the 4GB machine.
Any suggestions on what other factors could explain why the 4GB is able to successfully work with the cgroup limit, but not the 500GB machine?
Whenever a new / malloc is used, OS create a new(or reuse) heap memory segment, aligned to the page size and return it to the calling process. All these allocations will constitute to the Process's virtual memory. In 32bit computing, any process can scale only upto 4 GB. Higher the heap allocation, the rate of increase of process memory is higher. Though there are lot of memory management / memory pools available, all these utilities end up again in creating a heap and reusing it effeciently.
mmap (Memory mapping) on the other hand, provides the ablity to visualize a file as memory stream, and enables the program to use pointer manipulations directly on file. But here again, mmap is actually allocating the range of addresses in the process space. So if we mmap a 3GB file with size 3GB and take a pmap of the process, you could see the total memory consumed by the process is >= 3GB.
My question is, is it possible to have a file based memory pool [just like mmaping a file], however, does not constitute the process memory space. I visualize something like a memory DB, which is backed by a file, which is so fast for read/write, which supports pointer manipulations [i.e get a pointer to the record and store anything as if we do using new / malloc], which can grow on the disk, without touching the process virtual 4GB limit.
Is it possible ? if so, what are some pointers for me to start working.
I am not asking for a ready made solution / links, but to conceptually understand how it can be achieved.
It is generally possible but very coplicated. You would have to re-map if you wanted to acces different 3Gb segments of your file, which would probably kill the performance in case of scattered access. Pointers would only get much more difficult to work with, as remmpaing changes data but leaves the adresses the same.
I have seen STXXL project that might be interesting to you; or it might not. I have never used it so I cannot give you any other advice about it.
What you are looking for, is in principle, a memory backed file-cache. There are many such things in for example database implementations (where the whole database is way larger than the memory of the machine, and the application developer probably wants to have a bit of memory left for application stuff). This will involve having some sort of indirection - an index, hash or some such to indicate what area of the file you want to access, and using that indirection to determine if the memory is in memory or on disk. You would essentially have to replicate what the virtual memory handling of the OS and the processor does, by having tables that indicate where in physical memory your "virtual heap" is, and if it's not present in physical memory, read it in (and if the cache is full, get rid of some - and if it's been written, write it back again).
However, it's most likely that in today's world, you have a machine capable of 64-bit addressing, and thus, it would be much easier to recompile the application as a 64-bit application, usemmap or similar to access the large memory. In this case, even if RAM isn't sufficient, you can access the memory of the file via the virtual memory system, and it takes care of all the mapping back and forth between disk and RAM (physical memory).
is there any memory limit for a single process in x64 Linux?
we are running a Linux Server with 32Gb of RAM and I'm wondering if I can allocate most of it for a single process I'm coding which requires lots of RAM!
Certain kernels have different limits, but on any modern 64-bit linux the single-process limit is still far over 32GB (assuming that process is a 64-bit executable). Various distributions may also have set per-process limits using sysctl, so you'll want to check your local environment to make sure that there aren't arbitrarily low limits set (also check ipcs -l on RPM-based systems).
The Debian port documentation for the AMD64 port specifically mentions that the per-process virtual address space limit is 128TiB (twice the physical memory limit), so that should be the reasonable upper bound you're working with.
The resource limits are set using setrlimit syscall. You can change them with a shell builtin (e.g. ulimit on bash, limit with zsh).
The practical limit is also related to RAM size and swap size. The free command show these. (Some systems are overcommitting memory, but that is risky).
A process actually don't use RAM, it consumes virtual memory using system calls like mmap (which may get called by malloc). You could even map a portion of a file into memory with that call.
To learn about the memory map of a process 1234, look into the /proc/1234/maps file. From your own application, read the /proc/self/maps. And you have also /proc/1234/smaps and /proc/self/smaps. Try the command cat /proc/self/mapsto understand the memory map of the process running that cat.
On a 32Gb RAM machine, you can usually run a process with 31 Gb of process space (assuming no other big process exist). If you had also 64Gb of swap, you could run a process of at least 64Gb but that would be unbelievably slow (most of the time would be spent on swapping to disk). You can add swap space (e.g. by swapping to a file, initialized with dd then mkswap, and activated with swapon).
If coding a server, be very careful about memory leaks. The valgrind tool is helpful to hunt such bugs. And you could consider using Boehm's garbage collector
Current 64bit Linux kernel has limit to 64TB of physical RAM and 128TB of virtual memory (see RHEL limits and Debian port). Current x86_64 CPUs (ie. what we have in the PC) has (virtual) address limit 2^48=256TB because of how the address register in the CPU use all the bits (upper bits are used for page flags like ReadOnly, Writable, ExecuteDisable, PagedToDisc etc in the pagetable), but the specification allows to switch to true 64bit address mode reaching the maximum at 2^64=16EB (Exa Bytes). However, the motherboard and CPU die does not have so many pins to deliver all 48 bits of the memory address to the RAM chip through the address bus, so the limit for physical RAM is lower (and depends on manufacturer), but the virtual address space could by nature reach more than the amount of RAM one could have on the motherboard up to virtual memory limit mentioned above.
The limit per process are raised by how the memory virtual address space for the process is set, because there could be various sizes for stack, mmap() area (and dynamic libraries), program code itself, also the kernel is mapped into the process space. Some of these settings could be changed by passing argument to the linker, sometimes by special directive in the source code, or by modifying the binary file with the program directly (binary has ELF format). Also there are limits the administrator of the machine (root) has set or the user has (see output of the command "ulimit -a"). These limits could be soft or hard and the user is unable to overcome hard limit.
Also the Linux kernel could be set to allow memory overcommit allocation. In this case, the program is allowed to allocate a huge amount of RAM and then use only a few of pages (see sparse arrays, sparse matrix), see Linux kernel documentation. So in this case, the program will fail only after filling up the requested memory by data, but not at the time of memory allocation.
My System:
Physical memory: 3gb
Windows XP Service Pack 3 (32bit)
Swap file size: 30gb
Goal: To find the largest possible memory map size I can allocate on my machine.
When I run the following code to allocate 2gb memory map file, the call fails.
handle=CreateFileMapping(INVALID_HANDLE_VALUE,NULL,PAGE_READWRITE|SEC_COMMIT,0,INT_MAX,NULL);
I've been very puzzled by this, because I can allocate a memory map file's up to the system swap file size of 30gb by constantly calling CreateFileMapping with 100mb at a time.
After restarting the machine, and re-running the application that requests 2gb of memory mapped file to CreateFileMapping it works and it returns a valid handle. So this leads me a bit confused what the hell is going on under the hood with windows?
So the situation is this, I can create many small memory mapped files using up all the system page file (30gb), but when asking for a single allocation of 2gb the call fails. When restarting the machine and running the same application the call succeeds!
Some notes:
1) The memory mapped file is not being loaded into the proccess virtual address space, there is no view yet to the file.
2) The OS can allocate small 100mb memory mapped files to 30gb of the systems page file!
Right now the only conclusion I can come to, is that the Windows XP SP3 (32bit) virtual memory manager cannot successfully reserve the requested 2gb in the system page file, and then fails due to the system memory fragmentation (it seems like it needs to reserve a continues allocation of memory, even though the page file is 4kb each). After a restart I assume the system memory fragmentation is less, thus allowing the same call to succeed and allocate a memory mapped file of 2gb in size.
I've run some experiments, after running the machine for a day I started a small application that would allocate a memory maped file of 300mb and then release it. It would then increase the size by 1mb and try again. Finally it stops at 700mb and reports (insufficient system resources). I would then go through and close down each application and this would in turn stop the error messages and it finally continues to allocate a memory mapped file of 3.5gb in size!
So my question is what is going on here? There must be some type of memory fragmentation happening internally with the virtual memory manager, because allocating 100mbs memory mapped files will consume up to the 30gb of the system page file (commit limit).
Update
Conclusion is if you're going to create a large memory mapped file backed by the system page file with INVALID_HANDLE_VALUE, then the system page file (swap file) needs to be resize to the required size and be in a non fragmented state for large allocations > 2gb! Though under heavy IO load it can still fail. To get around all these problems you can create your own file with the needed size (I did 1tb) and memory map to that file instead.
Final Update
I ran the same tests on a Windows 7 box, and to my surprise it works every single time (up to the system page file size) without touching anything. So I guess this is just a bug, that large memory allocations can fail more often on Windows XP than Windows 7.
The problem is file fragmentation. Physical memory (RAM) has nothing to do with anything here. In a virtual memory system, 'memory' is allocated from the file system. Physical memory is just an optimization to speed access to memory.
When you request a memory-mapped file with write access, the system must have a file with contiguous pages free. The system swap file is often fragmented. If your disk drive is nicely defragmented, you should be able to create a large memory-mapped file using a file of your choice (not the system page file).
So if you really have to have a 2GB memory-mapped file, you need to create one on the drive at installation. This shifts the problem of creating a contiguous 2GB file to installation, but once created, you should be ok.
So my question is what is going on here? There must be some type of memory fragementation happening internally with the virtual memory manager, because allocating 100mbs memory mapped files will consume up to the 30gb of the system page file (commit limit).
Sounds about right. If you don't need large contiguous chunks of memory, don't ask for them if you can get the same amount of memory in smaller chunks.
To find the largest possible memory map size I can allocate on my machine.
Try it with size X.
If that fails, try with size X/2 and repeat.
This gets you a chunk at runtime, maybe not the exact largest possible chunk, but within a factor of 2.
Let's takes up Windows developer position.
Assume some user perform following steps:
Create memory mapping.
Populate some memory with sensitive data
Unmap from file
Continue using memory
Windows need to unload these pages for critical tasks.
Resolution - mapped memory should feat for swapping. But it doesn't means that mapped will be swapped.
I'm using an image manipulation library that throws an exception when I attempt to load images > 4GB in size. It claims to be 64bit, but wouldn't a 64bit library allow loading images larger than that? I think they recompiled their C libraries using a 64 bit memory model/compiler but still used unsigned integers and failed upgrade to use 64 bit types.
Is that a reasonable conclusion?
Edit - As an after-thought can OS memory become so fragemented that allocation of large chunks is no longer possible? (It doesn't work right after a reboot either, but just wondering.) What about under .NET? Can the .NET managed memory become so fragmented that allocation of large chunks fails?
It's a reasonable suggestion, however the exact cause could be a number of things - for example what OS are you running, how much RAM / swap do you have? The application/OS may not over-commit virtual memory so you'll need 4GB (or more) of free RAM to open the image.
Out of interest does it seem to be a definite stop at the 4GB boundary - i.e. does a 3.99GB image succeed, but a 4GB one fail - you say it does which would suggest a definite use of a 32bit size in the libraries data structures.
Update
With regards your second question - not really. Pretty much all modern OS's use virtual memory, so each process gets it's own contiguous address space. A single contiguous region in a processes' address space doesn't need to be backed by contiguous physical RAM, it can be made up of a number of separate physical areas of RAM made to look like they are contiguous; so the OS doesn't need to have a single 4GB chunk of RAM free to give your application a 4GB chunk.
It's possible that an application could fragment it's virtual address space such that there isn't room for a contiguous 4GB region, but considering the size of a 64-bit address space it's probably highly unlikely in your scenario.
Yes, unless perhaps the binary file format itself limits the size of images.