My C++ application occasionally runs out of memory due to large amounts of data being retrieved from a database. It has to run on 32bit WinXP machines.
Is it possible to transparently (for most of the existing code) swap out the data objects to disk and read them into memory only on demand, so I'm not limited to the 2GB that 32bit Windows gives to the process?
I've looked at VirtualAlloc and Address Window Extensions but I'm not sure it's what I want.
I also found this SO question where the questioner creates a file mapping and wants to create objects in there. One answer suggests using placement new which sounds like it would be pretty transparent to the rest of the code.
Will this prevent my application to run out of physical memory? I'm not entirely sure of it because after all there is still the 32bit address space limit. Or is this a different kind of problem that will occur when trying to create a lot of objects?
So long as you are using a 32-bit operating system there is nothing you can do about this. There is no way to have more than 3GB (2GB in the case of Windows) of data in virtual memory, whether or not it's actually swapped out to disk.
Historically databases have always handled this problem by using read, write and seek. So rather than accessing data directly from memory, they use a fake (64-bit) pointer. Data is split into blocks (normally around 4kb), and a number of these blocks are allocated in memory. When they want to access data from a fake pointer address they check if the block is loaded into memory and if it is they access it from there. If it is not then they find an empty slot and copy it in, then return the address. If there are no slots free then a piece of data will be written back out to disk (if it's been modified) and that slot will be reused.
The real beauty of this is that if your system has enough RAM then the operating system will cache much more than 2GB of this data in RAM at any point in time, and when you feel like you are actually reading and writing from disk the operating system will probably just be copying data around in memory. This, of course, requires a 32-bit operating system that support more than 3GB of physical memory, such as Linux or Windows Server with PAE.
SQLite has a nice self-contained implementation of this, which you could probably make use of with little effort.
If you do not wish to do this then your only alternatives are to either use a 64-bit operating system or to work with less data at any given point in time.
Related
Cant really find any specifics on this, heres all I know about mmf's in windows:
Creating a memory mapped file in windows adds nothing to the apparent amount of memory a program uses
Creating a view to that file consumes memory equivalent to the view size
This looks rather backwards to me, since for one, I know that the mmf itself actually has memory...somewhere. If I write something in a mmf and destroy the view, the data is still there. Meanwhile, why does the view take any memory at all? Its just a pointer, no?
Then theres the weirdness with whats actually in the ram and whats on the disk. In large mmf's with a distributed looking access pattern, sometimes the speed is there and sometimes its not. I'm guessing some of it gets sometimes stored in the file if one is tied to it or the paging file but really, I have no clue.
Anyways, the problem that drove me to investigate this is that I have a ~2gb file that I want multiple programs to share. I can't create a 2gb view in each of them since I'm just "out of memory" so I have to create/destroy smaller ones. This creates a lot of overhead due to additional offset calculations and the creation of the view itself. Can anybody explain to me why it is like this?
On a demand-paged virtual memory operating system like Windows, the view of an MMF occupies address space. Just numbers to the processor, one for each 4096 bytes. You only start using RAM until you actually use the view. Reading or writing data. At which point you trigger a page fault and force the OS to map the virtual memory page to physical memory. The "demand-paged" part.
You can't get a single chunk of 2 GB of address space in a 32-bit process since there would not be room for anything else. The limit is the largest hole in the address space between other allocations for code and data, usually hovers around ~650 megabytes, give or take. You'll need to target x64. Or build an x86 program that's linked with /LARGEADDRESSAWARE and runs on a 64-bit operating system. A backdoor which is getting to be pretty pointless these days.
The thing in memory mapped file is that it lets you manipulate its data without I/O calls. Because of this behavior, when you access the file, windows loads it to the physical memory, so it can be manipulated in it rather than on the disk. You can read more about this in here: http://blogs.msdn.com/b/khen1234/archive/2006/01/30/519483.aspx
Anyways, the problem that drove me to investigate this is that I have a ~2gb file that I want multiple programs to share. I can't create a 2gb view in each of them since I'm just "out of memory" so I have to create/destroy smaller ones.
The most likely cause is that the programs are 32-bit. 32-bit programs (by default) only have 2GB of address space so you can't map a 2GB file in a single view. If you rebuild them in 64-bit mode, the problem should go away.
Whenever a new / malloc is used, OS create a new(or reuse) heap memory segment, aligned to the page size and return it to the calling process. All these allocations will constitute to the Process's virtual memory. In 32bit computing, any process can scale only upto 4 GB. Higher the heap allocation, the rate of increase of process memory is higher. Though there are lot of memory management / memory pools available, all these utilities end up again in creating a heap and reusing it effeciently.
mmap (Memory mapping) on the other hand, provides the ablity to visualize a file as memory stream, and enables the program to use pointer manipulations directly on file. But here again, mmap is actually allocating the range of addresses in the process space. So if we mmap a 3GB file with size 3GB and take a pmap of the process, you could see the total memory consumed by the process is >= 3GB.
My question is, is it possible to have a file based memory pool [just like mmaping a file], however, does not constitute the process memory space. I visualize something like a memory DB, which is backed by a file, which is so fast for read/write, which supports pointer manipulations [i.e get a pointer to the record and store anything as if we do using new / malloc], which can grow on the disk, without touching the process virtual 4GB limit.
Is it possible ? if so, what are some pointers for me to start working.
I am not asking for a ready made solution / links, but to conceptually understand how it can be achieved.
It is generally possible but very coplicated. You would have to re-map if you wanted to acces different 3Gb segments of your file, which would probably kill the performance in case of scattered access. Pointers would only get much more difficult to work with, as remmpaing changes data but leaves the adresses the same.
I have seen STXXL project that might be interesting to you; or it might not. I have never used it so I cannot give you any other advice about it.
What you are looking for, is in principle, a memory backed file-cache. There are many such things in for example database implementations (where the whole database is way larger than the memory of the machine, and the application developer probably wants to have a bit of memory left for application stuff). This will involve having some sort of indirection - an index, hash or some such to indicate what area of the file you want to access, and using that indirection to determine if the memory is in memory or on disk. You would essentially have to replicate what the virtual memory handling of the OS and the processor does, by having tables that indicate where in physical memory your "virtual heap" is, and if it's not present in physical memory, read it in (and if the cache is full, get rid of some - and if it's been written, write it back again).
However, it's most likely that in today's world, you have a machine capable of 64-bit addressing, and thus, it would be much easier to recompile the application as a 64-bit application, usemmap or similar to access the large memory. In this case, even if RAM isn't sufficient, you can access the memory of the file via the virtual memory system, and it takes care of all the mapping back and forth between disk and RAM (physical memory).
Environment:
Windows 8 64 bit, Windows 2008 server 64 bit
Visual Studio (professional) 2012 64 bits
list L; //I have 1000s of large CMyObject in my program that I cache, which is shared by different threads in my windows service program.
For our SaaS middleware product, we cache in memory 1000s of large C++ objects (read only const objects, each about 4MB in size), which runs the system out of memory. Can we associate a disk file (or some other persistent mechanism that is OS managed) to our C++ objects? There is no need for sharing / inter-process communication.
The disk file will suffice if it works for the duration of the process (our windows service program). The read-only const C++ objects are shared by different threads in the same windows service.
I was even considering using object databases (like mongoDB) to store the objects, which will then be loaded / unloaded at each use. Though faster than reading our serialized file (hopefully), it will still spoil the performance.
The purpose is to retain caching of C++ objects for performance reason and avoid having to load / unload the serialized C++ object every time. It would be great if this disk file is OS managed and requires minimal tweaking in our code.
Thanks in advance for your responses.
The only thing which is OS managed in the manner you describe is swap file. You can create a separate application (let it be called "cache helper"), which loads all the objects into memory and waits for requests. Since it does not use it's memory pages, OS will eventually displace the pages to the swap file, recalling it only if/when needed.
Communication with the applciation can be done through named pipes or sockets.
Disadvantages of such approach are that the performance of such cache will be highly volatile, and it may degrade performance of the whole server.
I'd recommend to write your own caching algorithm/application, as you may later need to adjust its properties.
One solution is of course to simply load every object, and let the OS deal with swapping it in from/out to disk as required. (Or dynamically load, but never discard unless the object is absolutely being destroyed). This approach will work well if there are are number of objects that are more frequently used than others. And the loading from swapspace is almost certainly faster than anything you can write. The exception to this is if you do know beforehand what objects are more likely or less likely to be used next, and can "throw out" the right objects in case of low memory.
You can certainly also use a memory mapped file - this will allow you to read from and write to the file as if it was memory (and the OS will cache the content in RAM as memory is available). On WIndows, you will be using CreateFileMapping or OpenFileMapping to create/open the filemapping, and then MapViewOfFile to map the file into memory. When finished, use UnmapViewOfFile to "unmap" the memory, and then CloseHandle to close the FileMapping.
The only worry about a filemapping is that it may not appear at the same address in memory next time around, so you can't have pointers within the filemapping and load the same data as binary next time. It would of course work fine to create a new filemapping each time.
So your thousands of massive objects have constructor, destructor, virtual functions and pointers. This means you can't easily page them out. The OS can do it for you though, so your most practical approach is simply to add more physical memory, possibly an SSD swap volume, and use that 64-bit address space. (I don't know how much is actually addressable on your OS, but presumably enough to fit your ~4G of objects).
Your second option is to find a way to just save some memory. This might be using a specialized allocator to reduce slack, or removing layers of indirection. You haven't given enough information about your data for me to make concrete suggestions on this.
A third option, assuming you can fit your program in memory, is simply to speed up your deserialization. Can you change the format to something you can parse more efficiently? Can you somehow deserialize objects quickly on-demand?
The final option, and the most work, is to manually manage a swapfile. It would be sensible as a first step to split your massive polymorphic classes into two: a polymorphic flyweight (with one instance per concrete subtype), and a flattened aggregate context structure. This aggregate is the part you can swap in and out of your address space safely.
Now you just need a memory-mapped paging mechanism, some kind of cache tracking which pages are currently mapped, possibly a smart pointer replacing your raw pointer with a page+offset which can map data in on-demand, etc. Again, you haven't given enough information on your data structure and access patterns to make more detailed suggestions.
I know that is possible to use memory-mapped files i.e. real files on disk that are transparently mapped to memory. As far as I understand (I haven't used these yet) the mapping takes place immediately, the file is partly read on the first memory access while the OS starts "caching" the whole file in the background.
Now: Is it possible to somewhat abuse this concept and memory-map another block of memory? Assuming the OS provides such indirection one could create a kind of compressed_malloc() that returns a mapping from memory to memory. The memory returned to the caller is simple the memory-mapped range that is transparently compressed in memory and also eventually kept in memory. Thus, for large buffers it could be possible that only part of it get decompressed on-the-fly (on access) while the remaining blocks are kept compressed.
Is that concept technically possible at the moment or - if already realized (in software) - what are the things to look at?
Update 1: I am more or less looking for something that is technically achievable without modifying the OS kernel itself or which requires a virtualization platform.
Update 2: I am hoping for something which allows me to implement the compression and related logic in my own user-space code. I would just use the facilities of the operating system to create the memory-mapping.
Very much so. The VM (Virtual Memory) system is designed to handle different kinds of objects that can be mapped. There is in fact a filesystem call cramfs that does something similar in the sense that it keeps compressed data in storage, but enables transparent, uncompressed access.
You would not be modifying the kernel per se, but you will have to work in the kernel space, implementing VM handlers for this new kind of a memory mapped object.
This is possible, eg.
http://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.resourcemanagement.doc_41/managing_memory_resources/c_memory_compression.html
It is not correctly implemented in kernel space in Linux, but something like this could be implemented in user space.
I have an application that sometimes will utilize a large amount of data. The user has the option to load in a number of files which are used in a graphical display. If the user selects more data than the OS can handle, the application crashes pretty hard. On my test system, that number is about the 2 gigs of physical RAM.
What is a good way to handle this situation? I get the "bad alloc" thrown from new and tried trapping that but I still run into a crash. I feel as if I'm treading in nasty waters loading this much data but it is a requirement of this application to handle this sort of large data load.
Edit: I'm testing under a 32 bit Windows system for now but the application will run on various flavors of Windows, Sun and Linux, mostly 64 bit but some 32.
The error handling is not strong: It simply wraps the main instantiation code with a try catch block, the catch looking for any exception per another peer's complaint of not being able to trap the bad_alloc everytime.
I think you guys are right, I need a memory management system that doesn't load all of this data into the RAM, it just seems like it.
Edit2: Luther said it best. Thanks guy. For now, I just need a way to prevent a crash which with proper exception handling should be possible. But down the road I'll be implementing that acception solution.
There is the STXXL library which offers STL like containers for large Datasets.
http://stxxl.sourceforge.net/
Change "large" into "huge". It is designed and optimized for multicore processing of data sets that fit on terabyte-disks only. This might suffice for your problem, or the implementation could be a good starting point to tailor your own solution.
It is hard to say anything about your application crashing, because there are numerous hiccups involved when it comes to tight memory conditions: You could hit a hard address space limit (for example by default 32-bit Windows only has 2GB address space per user process, this can be changed, http://www.fmepedia.com/index.php/Category:Windows_3GB_Switch_FAQ ), or be eaten alive by the OOM killer ( Not a mythical beast:, see http://lwn.net/Articles/104179/ ).
What I'd suggest in any case to think about a way to keep the data on disk and treat the main memory as a kind of Level-4 cache for the data. For example if you have, say, blobs of data, then wrap these in a class which can transparently load the blobs from disk when they are needed and registers to some kind of memory manager which can ask some of the blob-holders to free up their memory before the memory conditions become unbearable. A buffer cache thus.
The user has the option to load in a number of files which are used in a graphical display.
Usual trick is not to load the data into memory directly, but rather use the memory mapping mechanism to make the files look like memory.
You need to make sure that the memory mapping is done in read-only mode to allow the OS to evict it from RAM if it is needed for something else.
If the user selects more data than the OS can handle, the application crashes pretty hard.
Depending on OS it is either: application is missing some memory allocation error handling or you really getting to the limit of available virtual memory.
Some OSs also have an administrative limit on how large the heap of application can grow.
On my test system, that number is about the 2 gigs of physical RAM.
It sounds like:
your application is 32-bits and
your OS uses the 2GB/2GB virtual memory split.
To avoid hitting the limit, your need to:
upgrade your app and OS to 64-bit or
tell OS (IIRC patch for Windows; most Linuxes already have it) to use 3GB/1GB virtual memory split. Some 32-bit OSs are using 2GB/2GB memory split: 2GB of virtual memory for kernel and 2 for the user application. 3/1 split means 1GB of VM for kernel, 3 for the user application.
How about maintaining a header table instead of loading the entire data. Load the actual page when the user requests the data.
Also use some data compression algorithms (like 7zip, znet etc.) which reduce the file size. (In my project they reduced the size from 200MB to 2MB)
I mention this because it was only briefly mentioned above, but it seems a "file paging system" could be a solution. These systems read large data sets in "chunks" by breaking the files into pieces. Once written, they generally "just work" and you hopefully won't have to tinker with them anymore.
Reading Large Files
Variable Length Data in File--Paging
New Link below with very good answer.
Handling Files greater than 2 GB
Search term: "file paging lang:C++" add large or above 2GB for more. HTH
Not sure if you are hitting it or not, but if you are using Linux, malloc will typically not fail, and operator new will typically not throw bad_alloc. This is because Linux will overcommit, and instead kill your process when it decides the system doesn't have enough memory, possibly at a page fault.
See: Google search for "oom killer".
You can disable this behavior with:
echo 2 > /proc/sys/vm/overcommit_memory
Upgrade to a 64-bit CPU, 64-bit OS and 64-bit compiler, and make sure you have plenty of RAM.
A 32-bit app is restricted to 2GB of memory (regardless of how much physical RAM you have). This is because a 32-bit pointer can address 2^32 bytes == 4GB of virtual memory. 20 years ago this seemed like a huge amount of memory, so the original OS designers allocated 2GB to the running application and reserved 2GB for use by the OS. There are various tricks you can do to access more than 2GB, but they're complex. It's probably easier to upgrade to 64-bit.