What is up with memory mapped files and actual memory usage?

What is up with memory mapped files and actual memory usage? - c++

Cant really find any specifics on this, heres all I know about mmf's in windows:
Creating a memory mapped file in windows adds nothing to the apparent amount of memory a program uses
Creating a view to that file consumes memory equivalent to the view size
This looks rather backwards to me, since for one, I know that the mmf itself actually has memory...somewhere. If I write something in a mmf and destroy the view, the data is still there. Meanwhile, why does the view take any memory at all? Its just a pointer, no?
Then theres the weirdness with whats actually in the ram and whats on the disk. In large mmf's with a distributed looking access pattern, sometimes the speed is there and sometimes its not. I'm guessing some of it gets sometimes stored in the file if one is tied to it or the paging file but really, I have no clue.
Anyways, the problem that drove me to investigate this is that I have a ~2gb file that I want multiple programs to share. I can't create a 2gb view in each of them since I'm just "out of memory" so I have to create/destroy smaller ones. This creates a lot of overhead due to additional offset calculations and the creation of the view itself. Can anybody explain to me why it is like this?

On a demand-paged virtual memory operating system like Windows, the view of an MMF occupies address space. Just numbers to the processor, one for each 4096 bytes. You only start using RAM until you actually use the view. Reading or writing data. At which point you trigger a page fault and force the OS to map the virtual memory page to physical memory. The "demand-paged" part.
You can't get a single chunk of 2 GB of address space in a 32-bit process since there would not be room for anything else. The limit is the largest hole in the address space between other allocations for code and data, usually hovers around ~650 megabytes, give or take. You'll need to target x64. Or build an x86 program that's linked with /LARGEADDRESSAWARE and runs on a 64-bit operating system. A backdoor which is getting to be pretty pointless these days.

The thing in memory mapped file is that it lets you manipulate its data without I/O calls. Because of this behavior, when you access the file, windows loads it to the physical memory, so it can be manipulated in it rather than on the disk. You can read more about this in here: http://blogs.msdn.com/b/khen1234/archive/2006/01/30/519483.aspx

Anyways, the problem that drove me to investigate this is that I have a ~2gb file that I want multiple programs to share. I can't create a 2gb view in each of them since I'm just "out of memory" so I have to create/destroy smaller ones.
The most likely cause is that the programs are 32-bit. 32-bit programs (by default) only have 2GB of address space so you can't map a 2GB file in a single view. If you rebuild them in 64-bit mode, the problem should go away.

Related

File Based Memory Pool - Is it Possible?

Whenever a new / malloc is used, OS create a new(or reuse) heap memory segment, aligned to the page size and return it to the calling process. All these allocations will constitute to the Process's virtual memory. In 32bit computing, any process can scale only upto 4 GB. Higher the heap allocation, the rate of increase of process memory is higher. Though there are lot of memory management / memory pools available, all these utilities end up again in creating a heap and reusing it effeciently.
mmap (Memory mapping) on the other hand, provides the ablity to visualize a file as memory stream, and enables the program to use pointer manipulations directly on file. But here again, mmap is actually allocating the range of addresses in the process space. So if we mmap a 3GB file with size 3GB and take a pmap of the process, you could see the total memory consumed by the process is >= 3GB.
My question is, is it possible to have a file based memory pool [just like mmaping a file], however, does not constitute the process memory space. I visualize something like a memory DB, which is backed by a file, which is so fast for read/write, which supports pointer manipulations [i.e get a pointer to the record and store anything as if we do using new / malloc], which can grow on the disk, without touching the process virtual 4GB limit.
Is it possible ? if so, what are some pointers for me to start working.
I am not asking for a ready made solution / links, but to conceptually understand how it can be achieved.

It is generally possible but very coplicated. You would have to re-map if you wanted to acces different 3Gb segments of your file, which would probably kill the performance in case of scattered access. Pointers would only get much more difficult to work with, as remmpaing changes data but leaves the adresses the same.
I have seen STXXL project that might be interesting to you; or it might not. I have never used it so I cannot give you any other advice about it.

What you are looking for, is in principle, a memory backed file-cache. There are many such things in for example database implementations (where the whole database is way larger than the memory of the machine, and the application developer probably wants to have a bit of memory left for application stuff). This will involve having some sort of indirection - an index, hash or some such to indicate what area of the file you want to access, and using that indirection to determine if the memory is in memory or on disk. You would essentially have to replicate what the virtual memory handling of the OS and the processor does, by having tables that indicate where in physical memory your "virtual heap" is, and if it's not present in physical memory, read it in (and if the cache is full, get rid of some - and if it's been written, write it back again).
However, it's most likely that in today's world, you have a machine capable of 64-bit addressing, and thus, it would be much easier to recompile the application as a 64-bit application, usemmap or similar to access the large memory. In this case, even if RAM isn't sufficient, you can access the memory of the file via the virtual memory system, and it takes care of all the mapping back and forth between disk and RAM (physical memory).

mmap versus memory allocated with new

I have a BitVector class that can either allocate memory dynamically using new or it can mmap a file. There isn't a noticeable difference in performance when using it with small files, but when using a 16GB file I have found that the mmap file is far slower than the memory allocated with new. (Something like 10x slower or more.) Note that my machine has 64GB of RAM.
The code in question is loading values from a large disk file and placing them into a Bloom filter which uses my BitVector class for storage.
At first I thought this might be because the backing for the mmap file was on the same disk as the file I was loading from, but this didn't seem to be the issue. I put the two files on two physically different disks, and there was no change in performance. (Although I believe they are on the same controller.)
Then, I used mlock to try to force everything into RAM, but the mmap implementation was still really slow.
So, for the time being I'm just allocating the memory directly. The only thing I'm changing in the code for this comparison is a flag the BitVector constructor.
Note that to measure performance I'm both looking at top and watching how many states I can add into the Bloom filter per second. The CPU usage doesn't even register on top when using mmap - although jbd2/sda1-8 starts to move up (I'm running on an Ubuntu server), which looks to be a process that is dealing with journaling for the drive. The input and output files are stored on two HDDs.
Can anyone explain this huge difference in performance?
Thanks!

Just to start with, mmap is an system call or interface provided to access the Virtual Memory of the system.
Now, in linux (I hope you are working on *nix) a lot of performance improvement is acheived by lazy loading or more commonly known as Copy-On-Write.
For mmap as well, this kind of lazy loading is implemented.
What happens is, when you call mmap on a file, kernel does not immediately allocate main memory pages for the file to be mapped. Instead, it waits for the program to write/read from the illusionary page, at which stage, a page fault occurs, and the corresponding interrupt handler will then actually load that particular file part that can be held in that page frame (Also the page table is updated, so that next time, when you are reading/writing to same page, it is pointing to a valid frame).
Now, you can control this behavior with mlock, madvise, MAP_POPULATE flag with mmap etc.
MAP_POPULATE flags with mmap, tells the kernel to map the file to memory pages before the call returns rather than page faulting every time you access a new page.So, till the file is loaded, the function will be blocked.
From the Man Page:
MAP_POPULATE (since Linux 2.5.46)
Populate (prefault) page tables for a mapping. For a file
mapping, this causes read-ahead on the file. Later accesses
to the mapping will not be blocked by page faults.
MAP_POPULATE is supported for private mappings only since
Linux 2.6.23.

Swapping objects out to file

My C++ application occasionally runs out of memory due to large amounts of data being retrieved from a database. It has to run on 32bit WinXP machines.
Is it possible to transparently (for most of the existing code) swap out the data objects to disk and read them into memory only on demand, so I'm not limited to the 2GB that 32bit Windows gives to the process?
I've looked at VirtualAlloc and Address Window Extensions but I'm not sure it's what I want.
I also found this SO question where the questioner creates a file mapping and wants to create objects in there. One answer suggests using placement new which sounds like it would be pretty transparent to the rest of the code.
Will this prevent my application to run out of physical memory? I'm not entirely sure of it because after all there is still the 32bit address space limit. Or is this a different kind of problem that will occur when trying to create a lot of objects?

So long as you are using a 32-bit operating system there is nothing you can do about this. There is no way to have more than 3GB (2GB in the case of Windows) of data in virtual memory, whether or not it's actually swapped out to disk.
Historically databases have always handled this problem by using read, write and seek. So rather than accessing data directly from memory, they use a fake (64-bit) pointer. Data is split into blocks (normally around 4kb), and a number of these blocks are allocated in memory. When they want to access data from a fake pointer address they check if the block is loaded into memory and if it is they access it from there. If it is not then they find an empty slot and copy it in, then return the address. If there are no slots free then a piece of data will be written back out to disk (if it's been modified) and that slot will be reused.
The real beauty of this is that if your system has enough RAM then the operating system will cache much more than 2GB of this data in RAM at any point in time, and when you feel like you are actually reading and writing from disk the operating system will probably just be copying data around in memory. This, of course, requires a 32-bit operating system that support more than 3GB of physical memory, such as Linux or Windows Server with PAE.
SQLite has a nice self-contained implementation of this, which you could probably make use of with little effort.
If you do not wish to do this then your only alternatives are to either use a 64-bit operating system or to work with less data at any given point in time.

Dealing with large amounts of data in c++

I have an application that sometimes will utilize a large amount of data. The user has the option to load in a number of files which are used in a graphical display. If the user selects more data than the OS can handle, the application crashes pretty hard. On my test system, that number is about the 2 gigs of physical RAM.
What is a good way to handle this situation? I get the "bad alloc" thrown from new and tried trapping that but I still run into a crash. I feel as if I'm treading in nasty waters loading this much data but it is a requirement of this application to handle this sort of large data load.
Edit: I'm testing under a 32 bit Windows system for now but the application will run on various flavors of Windows, Sun and Linux, mostly 64 bit but some 32.
The error handling is not strong: It simply wraps the main instantiation code with a try catch block, the catch looking for any exception per another peer's complaint of not being able to trap the bad_alloc everytime.
I think you guys are right, I need a memory management system that doesn't load all of this data into the RAM, it just seems like it.
Edit2: Luther said it best. Thanks guy. For now, I just need a way to prevent a crash which with proper exception handling should be possible. But down the road I'll be implementing that acception solution.

There is the STXXL library which offers STL like containers for large Datasets.
http://stxxl.sourceforge.net/
Change "large" into "huge". It is designed and optimized for multicore processing of data sets that fit on terabyte-disks only. This might suffice for your problem, or the implementation could be a good starting point to tailor your own solution.
It is hard to say anything about your application crashing, because there are numerous hiccups involved when it comes to tight memory conditions: You could hit a hard address space limit (for example by default 32-bit Windows only has 2GB address space per user process, this can be changed, http://www.fmepedia.com/index.php/Category:Windows_3GB_Switch_FAQ ), or be eaten alive by the OOM killer ( Not a mythical beast:, see http://lwn.net/Articles/104179/ ).
What I'd suggest in any case to think about a way to keep the data on disk and treat the main memory as a kind of Level-4 cache for the data. For example if you have, say, blobs of data, then wrap these in a class which can transparently load the blobs from disk when they are needed and registers to some kind of memory manager which can ask some of the blob-holders to free up their memory before the memory conditions become unbearable. A buffer cache thus.

The user has the option to load in a number of files which are used in a graphical display.
Usual trick is not to load the data into memory directly, but rather use the memory mapping mechanism to make the files look like memory.
You need to make sure that the memory mapping is done in read-only mode to allow the OS to evict it from RAM if it is needed for something else.
If the user selects more data than the OS can handle, the application crashes pretty hard.
Depending on OS it is either: application is missing some memory allocation error handling or you really getting to the limit of available virtual memory.
Some OSs also have an administrative limit on how large the heap of application can grow.
On my test system, that number is about the 2 gigs of physical RAM.
It sounds like:
your application is 32-bits and
your OS uses the 2GB/2GB virtual memory split.
To avoid hitting the limit, your need to:
upgrade your app and OS to 64-bit or
tell OS (IIRC patch for Windows; most Linuxes already have it) to use 3GB/1GB virtual memory split. Some 32-bit OSs are using 2GB/2GB memory split: 2GB of virtual memory for kernel and 2 for the user application. 3/1 split means 1GB of VM for kernel, 3 for the user application.

How about maintaining a header table instead of loading the entire data. Load the actual page when the user requests the data.
Also use some data compression algorithms (like 7zip, znet etc.) which reduce the file size. (In my project they reduced the size from 200MB to 2MB)

I mention this because it was only briefly mentioned above, but it seems a "file paging system" could be a solution. These systems read large data sets in "chunks" by breaking the files into pieces. Once written, they generally "just work" and you hopefully won't have to tinker with them anymore.
Reading Large Files
Variable Length Data in File--Paging
New Link below with very good answer.
Handling Files greater than 2 GB
Search term: "file paging lang:C++" add large or above 2GB for more. HTH

Not sure if you are hitting it or not, but if you are using Linux, malloc will typically not fail, and operator new will typically not throw bad_alloc. This is because Linux will overcommit, and instead kill your process when it decides the system doesn't have enough memory, possibly at a page fault.
See: Google search for "oom killer".
You can disable this behavior with:
echo 2 > /proc/sys/vm/overcommit_memory

Upgrade to a 64-bit CPU, 64-bit OS and 64-bit compiler, and make sure you have plenty of RAM.
A 32-bit app is restricted to 2GB of memory (regardless of how much physical RAM you have). This is because a 32-bit pointer can address 2^32 bytes == 4GB of virtual memory. 20 years ago this seemed like a huge amount of memory, so the original OS designers allocated 2GB to the running application and reserved 2GB for use by the OS. There are various tricks you can do to access more than 2GB, but they're complex. It's probably easier to upgrade to 64-bit.

Which one is faster, reading from disk or allocate system memory

My environment is XP 32-bit. I find when allocated memory is nearly the maximum size, 2GB, that means a little virtual space is available, allocationnew memory is very slow.
So if I have a page file, my app need to analyze them.
I have two ways. One is to read them all into system memory, then do the analysis.
The other is to reserv a memory buffer first as a cache, and read part of page file into that buffer, analyze and then discard it, then read second part of page file, and override the cache, do the analysis again.
From the profiling, it looks the second one is faster, since it avoid the allocation time cost.
What do you think? Thanks in adavance.

(1) I'm not sure the question matches the title. If you're allocating close to 2GB of RAM on 32 bit Windows, the system is probably paging a lot of memory to disk, and that's where I'd look first for the slow down. When you're using a lot of memory, you should think of it as being stored on disk (in pagefile.sys) but cached in physical RAM. The second one might be faster not because of the cost of doing allocation, but because of the cost of using a lot of memory at once. In effect when you copy the file into one big allocation you're copying much of it disk->disk via RAM, then when you run over it again to analyse, you're loading the copy back to RAM again. If your analysis is a single-pass algorithm that's a lot of redundant work.
(2) What I think is, mmap the file (MapViewOfFile and friends on Windows).
Edit: (3) a caution. If the file is currently 1.8GB, there might be a chance that next year it might be 4GB. If so, I'd plan now for it to have a size greater than 2^32 on a 32bit machine, which means either taking your second option, or else still using MapViewOfFile but doing it one sensible-sized chunk of the file at a time, rather than all at once. Otherwise you'll be revisiting this code the first time someone tries it on a big file and reports the bug.

You forget 3d way - to map memory onto file, see function CreateFileMapping/MapViewOfFile
This is most fast way

You best bet is to use the windows MapViewOfFile and similar functions (the Windows equivalent of mmap). This will allow the operating system to manage the paging in of various parts of the file.

Why is the amount allocated memory so high? If memory allocations take a reasonable amount of time then you will find doing it in memory is far far quicker - my approach would be to do it in memory, and try to find a way to reduce the memory usage to the point where its quick again.

As I see the situation, you either manage the paging yourself or let the operating system manage the paging for you. In most cases I would suggest letting the operating system handle the paging (use virtual memory). Since I have a distrust of MS operating systems, I cannnot recommend this technique, although your mileage may vary.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js