My System:
Physical memory: 3gb
Windows XP Service Pack 3 (32bit)
Swap file size: 30gb
Goal: To find the largest possible memory map size I can allocate on my machine.
When I run the following code to allocate 2gb memory map file, the call fails.
handle=CreateFileMapping(INVALID_HANDLE_VALUE,NULL,PAGE_READWRITE|SEC_COMMIT,0,INT_MAX,NULL);
I've been very puzzled by this, because I can allocate a memory map file's up to the system swap file size of 30gb by constantly calling CreateFileMapping with 100mb at a time.
After restarting the machine, and re-running the application that requests 2gb of memory mapped file to CreateFileMapping it works and it returns a valid handle. So this leads me a bit confused what the hell is going on under the hood with windows?
So the situation is this, I can create many small memory mapped files using up all the system page file (30gb), but when asking for a single allocation of 2gb the call fails. When restarting the machine and running the same application the call succeeds!
Some notes:
1) The memory mapped file is not being loaded into the proccess virtual address space, there is no view yet to the file.
2) The OS can allocate small 100mb memory mapped files to 30gb of the systems page file!
Right now the only conclusion I can come to, is that the Windows XP SP3 (32bit) virtual memory manager cannot successfully reserve the requested 2gb in the system page file, and then fails due to the system memory fragmentation (it seems like it needs to reserve a continues allocation of memory, even though the page file is 4kb each). After a restart I assume the system memory fragmentation is less, thus allowing the same call to succeed and allocate a memory mapped file of 2gb in size.
I've run some experiments, after running the machine for a day I started a small application that would allocate a memory maped file of 300mb and then release it. It would then increase the size by 1mb and try again. Finally it stops at 700mb and reports (insufficient system resources). I would then go through and close down each application and this would in turn stop the error messages and it finally continues to allocate a memory mapped file of 3.5gb in size!
So my question is what is going on here? There must be some type of memory fragmentation happening internally with the virtual memory manager, because allocating 100mbs memory mapped files will consume up to the 30gb of the system page file (commit limit).
Update
Conclusion is if you're going to create a large memory mapped file backed by the system page file with INVALID_HANDLE_VALUE, then the system page file (swap file) needs to be resize to the required size and be in a non fragmented state for large allocations > 2gb! Though under heavy IO load it can still fail. To get around all these problems you can create your own file with the needed size (I did 1tb) and memory map to that file instead.
Final Update
I ran the same tests on a Windows 7 box, and to my surprise it works every single time (up to the system page file size) without touching anything. So I guess this is just a bug, that large memory allocations can fail more often on Windows XP than Windows 7.
The problem is file fragmentation. Physical memory (RAM) has nothing to do with anything here. In a virtual memory system, 'memory' is allocated from the file system. Physical memory is just an optimization to speed access to memory.
When you request a memory-mapped file with write access, the system must have a file with contiguous pages free. The system swap file is often fragmented. If your disk drive is nicely defragmented, you should be able to create a large memory-mapped file using a file of your choice (not the system page file).
So if you really have to have a 2GB memory-mapped file, you need to create one on the drive at installation. This shifts the problem of creating a contiguous 2GB file to installation, but once created, you should be ok.
So my question is what is going on here? There must be some type of memory fragementation happening internally with the virtual memory manager, because allocating 100mbs memory mapped files will consume up to the 30gb of the system page file (commit limit).
Sounds about right. If you don't need large contiguous chunks of memory, don't ask for them if you can get the same amount of memory in smaller chunks.
To find the largest possible memory map size I can allocate on my machine.
Try it with size X.
If that fails, try with size X/2 and repeat.
This gets you a chunk at runtime, maybe not the exact largest possible chunk, but within a factor of 2.
Let's takes up Windows developer position.
Assume some user perform following steps:
Create memory mapping.
Populate some memory with sensitive data
Unmap from file
Continue using memory
Windows need to unload these pages for critical tasks.
Resolution - mapped memory should feat for swapping. But it doesn't means that mapped will be swapped.
Related
I have an application that opens a file with mmap() and does stuff to it (long story short, makes calls to gdb to parse a coredump file and then 7z to compress the dump). What I am trying to achieve is setting a limit on how much resident memory (a.k.a. actual RAM) can be used by this application, while letting it use as much total virtual memory as it wants.
There are two main suggestions I've seen to achieve this: ulimit and cgroups.
mmap: an observation
Before moving forward, a note on mmap: my understanding is the whole point of using it is to minimize the total amount of memory used to read file. This works by having the mmap'ed file backed up by themselves, not by swap or RAM. However, when I start my application (that uses mmap) and look at the output from top, I notice it still reports the application as using a large amount of virtual memory... using just a bit under the size of the file that is being opened with mmap. So a 15GB file might report 0.5GB of RAM usage and 14.5GB of virtual memory usage. So does this mean mmap needs to load the entire file into (virtual) memory or is this just a quirk of the way Linux reports memory usage for mmap (as in, it "counts" the space on the hard drive where the file is located as virtual memory)?
ulimit
ulimit only supports setting a limit for virtual memory as a whole. There is no way to way to specify a limit for only resident memory, which is what I'm interested in. Since mmap appears to use roughly the same amount of virtual memory as the size of the file it is opening (as described above), this doesn't work for me. Set ulimit -v to any thing less, and my application crashes.
cgroups
cgroups lets us set a specific limit for resident memory with memory.limit_in_bytes. I tried creating a cgroup and running my application with it. Here I saw a phenomenon that's left me stumped: on a machine with only 4GB of RAM and 2 CPUS, the cgroup seems to respect the RAM usage limit I set, with the limit_in_bytes only set to 100MB. However on a machine with 500GB, 60 CPUs and a limit of 100 bytes, the exact same file, exact same application (same executable, not rebuilt on the new machine or anything), setting the same 100MB limit leads to the application crashing. Only when I set the limit back to around the same size as the file being mmapd, can it run successfully.
So there are a two questions here:
Does mmap need to load the whole file into virtual memory to work or not? My evidence points to yes after trying ulimit... and no after my experiment with cgroups, on the 4GB machine.
Any suggestions on what other factors could explain why the 4GB is able to successfully work with the cgroup limit, but not the 500GB machine?
I have a program that is projected to use a few GB of lmdb diskspace (it's a blockchain, and we're moving away from leveldb due to its lack of ACID, which I need for some future plans). Is it possible to run that program with that database on a Raspberry Pi without adding more swap (with >1 GB memory)? (considering that adding swap is for advanced users).
Currently when I run that program mdb_env_set_mapsize(1 << 30), hence 1 GB of mapsize, it returns error 12, which is out-of-memory. But it works if I reduce the size to 512 MB.
But what's the right way to handle such memory issues in lmdb when the database size keeps increasing?
The maximum size of memory that can be memory mapped depends on the size of the virtual address space, which is dictated by the CPU's virtual memory manager. A 32-bit CPU have a limit of 4GB virtual address space, this limit is for the whole system unless PAE is enabled, in which case the limit is per process.
In addition to this, the kernel and your application reserves some space of their own on your address space, and memory allocation usually requires contiguous address space, reducing memory available for the database to allocate.
So your user will need to either enable PAE on their system, or upgrade to 64-bit CPU. If neither of these is an option in your application, then you cannot use a memory mapped file larger than your available address space, so you'll have to do some segmentation to split your data into multiple files that you can map only small chunks at a time. I'm guessing that lmdb requires that it can map the entire database file into memory.
For a blockchain application, your data is mostly a linear sequence of log entries, so your application should only need to work with the most recent entries most of the time. You can separate the recent entries into its own working file, and the rest of the log in a database that doesn't require mapping the entire file into memory or in multiple fixed size files that you can map and unmap as needed.
Whenever a new / malloc is used, OS create a new(or reuse) heap memory segment, aligned to the page size and return it to the calling process. All these allocations will constitute to the Process's virtual memory. In 32bit computing, any process can scale only upto 4 GB. Higher the heap allocation, the rate of increase of process memory is higher. Though there are lot of memory management / memory pools available, all these utilities end up again in creating a heap and reusing it effeciently.
mmap (Memory mapping) on the other hand, provides the ablity to visualize a file as memory stream, and enables the program to use pointer manipulations directly on file. But here again, mmap is actually allocating the range of addresses in the process space. So if we mmap a 3GB file with size 3GB and take a pmap of the process, you could see the total memory consumed by the process is >= 3GB.
My question is, is it possible to have a file based memory pool [just like mmaping a file], however, does not constitute the process memory space. I visualize something like a memory DB, which is backed by a file, which is so fast for read/write, which supports pointer manipulations [i.e get a pointer to the record and store anything as if we do using new / malloc], which can grow on the disk, without touching the process virtual 4GB limit.
Is it possible ? if so, what are some pointers for me to start working.
I am not asking for a ready made solution / links, but to conceptually understand how it can be achieved.
It is generally possible but very coplicated. You would have to re-map if you wanted to acces different 3Gb segments of your file, which would probably kill the performance in case of scattered access. Pointers would only get much more difficult to work with, as remmpaing changes data but leaves the adresses the same.
I have seen STXXL project that might be interesting to you; or it might not. I have never used it so I cannot give you any other advice about it.
What you are looking for, is in principle, a memory backed file-cache. There are many such things in for example database implementations (where the whole database is way larger than the memory of the machine, and the application developer probably wants to have a bit of memory left for application stuff). This will involve having some sort of indirection - an index, hash or some such to indicate what area of the file you want to access, and using that indirection to determine if the memory is in memory or on disk. You would essentially have to replicate what the virtual memory handling of the OS and the processor does, by having tables that indicate where in physical memory your "virtual heap" is, and if it's not present in physical memory, read it in (and if the cache is full, get rid of some - and if it's been written, write it back again).
However, it's most likely that in today's world, you have a machine capable of 64-bit addressing, and thus, it would be much easier to recompile the application as a 64-bit application, usemmap or similar to access the large memory. In this case, even if RAM isn't sufficient, you can access the memory of the file via the virtual memory system, and it takes care of all the mapping back and forth between disk and RAM (physical memory).
Platofrm - Linux, Arch - ARM
Programming lang - C/C++
Objective - map a regular (let say text) file to a pre-known location (physical address) in ram and pass that physical address to some other application. Size of the block which i map at a time is 128K.
The way I am trying to go about is-
User space process issues the ioctl call to ask a device driver to get a chunk of memory (ram), calculated the physical address and return it to the user space.
User space process needs to maps the file to that physical address space
I am not sure how to go about it. Any help is appreciated. ???
Issue with mmap call on the file and then calculating the physical address is that, pages are not in memory till someone access them and physical memory pages allocated might not be contiguous.
The other process which will actually access the file is from third party vendor application. That application demands once we pass it the physical address, file contents needs to be present in contiguous memory.
How i am doing it right now --
User process call the mmap to device.
Device driver does a kmalloc, calculate the starting physical address and mmap the VMA to that physical address.
Now user process do a read on the file and copies it to the address space obtained during the mmap.
Issue - Copy of the file exist two location in the ram, one when read is done from disk and other when i copy it to the buffer obtained using mmap and corresponding copying overheads.
In a ideal world i would like to load the file directly from the disk to a known/predefined location.
"Mapping a file" implies using virtual addresses rather than physical, so that's not going to do what you want.
If you want to put the file contents into a contiguous block of physical memory, just use open() and read() once you have obtained the contiguous buffer.
Perhaps something like madvise() with MADV_SEQUENTIAL advice argument could help?
Some things to consider:
How large is the file you're going to be mapping?
That might affect your ability to get a contiguous block of RAM, even if you were to take the kernel driver based approach.
For a kernel driver based approach, well behaved drivers typically should not kmalloc(), e.g. to get a contiguous block of memory, more than 32KB. Furthermore, you typically can't kmalloc() more than 2MB (I've tried this :)). Is that going to be suitable for your needs?
If you need a really large chunk of memory something like the kernel's alloc_bootmem() function could help, but it only works for static "built-in" drivers, not dynamically loadable ones.
Is there any way you can rework your design so that a large contiguous block of mapped memory isn't necessary?
I'm using OS X 10.5.6. I have a C++ application with a GUI made with Qt. When I start my application it uses 30 MB of memory (reported by OS X Activity Monitor RSIZE).
I use this application to read in text files to memory, parse the data and finally visualize it. If I open (read to memory, parse, visualize) a 9 MB text file Activity Monitor reports that my application grows from the initial 30 MB of memory used to 103 MB.
Now if the file is closed and the parsed and visualized data is deleted, the size of the application stays at 103 MB. This sounds like a memory leak to me. But if I open the file again, reading it to memory, parsing it and visualizing it the application stays at 103 MB. No matter how many times I open the file (or another file of the same size) my applications memory use stays more or less unchanged. Does this mean that it's not a memory leak? If it was a leak the memory usage should keep growing each time the file is opened should it not? The only time it grows is if I open a larger file than the previous one.
Is this normal? Is this platform or library dependent? Is this some sort of caching done by the OS or libraries?
This seems relatively normal, but all OS are slightly different.
In the usual application life cycle the application requests memory from the OS and is given memory in huge chunks that it manages (via the C/C++ standard libraries). As the application acquires/releases memory this is all done internally within the application without recourse to the OS until the application has non left then a call is made to the OS for another huge chunk.
Memory is not usually returned to the OS until the application quits (though most OS do provide the mechanisms to do this if required and some C/C++ standard libraries will use this facility). Instead of returning memory to the OS the application uses everything it has been given and does its own memory management.
Though note: just because an application has memory does not mean that this is currently taking up RAM on a chip. Memory that is sporadically used or has not been used in a while will be temporarily saved onto secondary/tertiary storage.
Activity Monitor: Is not a very useful tool for checking memory usage, as you have discovered it only displays the total actually allocated to the application. It does not display any information about how the application has internal allocated this memory (most of which could be deallocated). Check the folder where XCode lives, there are a broad set of tools for examining how an application works provided with the development environment.
NB: I have avoided using terms like page etc as these are nothing to-do with C/C++/Objective C and are all OS/hardware specific.
This sounds like a memory fragmentation problem to me. Memory is acquired from OS in pages. Pages are usually several kB large, e.g. 4 kB. Now if you allocate, let's say, 100 MB of RAM for your objects, your memory allocator (new / malloc) asks OS for many free memory pages and allocates your objects on them. When your application finishes computations and deletes some, even most of, but not all of the previously allocated objects, the objects that were not deleted hold pages and disallow to return them back to the OS. A page can be returned only if all its memory is freed. So in extreme cases, an 8B object can prevent a full 4kB page from being returned.
The OS reports memory consumption by calculating the number of pages committed to your application, not by counting how much space your objects take on these pages. So if your memory is fragmented, the pages remain committed, and reported memory consumption stays the same.
The memory consumption does not grow on the second run, because on the second run the allocator reuses, previously acquired, mostly free pages.
The solution for fragmentation problems is usually preallocating a larger block of memory and using a custom memory allocator to allocate objects with similar lifetime from this larger block. Then, when you're done with objects, delete the whole block.
Another solution is switching to a fully garbage collected environment like Java or .NET - they have compacting garbage collectors that prevent such problems.