Pagefile-backed memory-mapped files vs. Heap -- what's the difference?

Pagefile-backed memory-mapped files vs. Heap -- what's the difference? - c++

What is the advantage of using a memory-mapped file backed by the system paging file (through CreateFileMapping(INVALID_HANDLE_VALUE, ...), instead of just allocating memory from the heap the usual way (malloc(...), HeapAlloc(...), etc.)?
i.e. When should I use which?

It's lower level, it gives you more than malloc does:
You can share the mapping with other processes (of course you also need to synchronize)
You can set permissions on the memory (for example you can have read-only memory via PAGE_READONLY)
You can set some cache / page parameters

Related

mmap's worst case memory usage when using MAP_PRIVATE vs MAP_SHARED

I haven't seen this explicitly anywhere so I just wanted to clarify. This is all in the context of a single threaded program: say we have a 10GB text file when we open with mmap, using the MAP_PRIVATE option. Initially of course, I should expect to see 0GB resident memory used. Now say I modify every character in the file. Will this then require 10GB in resident memory? And if not, why not?
Now what if we did the same thing but with MAP_SHARED, what should I expect the resident memory usage to look like?

MAP_SHARED creates a mapping that is backed by the original file. Any changes to the data are written back to that file (assuming a read/write mapping).
MAP_PRIVATE creates a mapping that is backed by the original file for reads only. If you change bytes in the mapping, then the OS creates a new page that is occupies physical memory and is backed by swap (if any).
The impact on resident set size is not dependent on the mapping type: pages will be in your resident set if they're actively accessed (read or write). If the OS needs physical memory, then pages that are not actively accessed are dropped (if clean), or written to either the original file or swap (if dirty, and depending on mapping type).
Where the two types differ is in total commitment against physical memory and swap. A shared mapping doesn't increase this commitment, a private mapping does. If you don't have enough combined memory and swap to hold every page of the private mapping, and you write to every page, then you (or possibly some other process) will be killed by the out-of-memory daemon.
Update: what I wrote above applies to memory-mapped files. You can map an anonymous block (MAP_ANONYMOUS) with MAP_SHARED, in which case the memory is backed by swap, not a file.

What is the functionality of munmap, mmap

When I try to study some piece of code that deals with FPGA, I came across with munmap, mmap.
I go through the manual provided here. I am still not understanding the purpose of this function. What exactly this does?

mmap() is a system call, which helps in memory-mapped I/O operations. It allocates a memory region and maps that into the calling process virtual address space so as to enable the application to access the memory.
mmap() returns a pointer to the mapped area which can be used to access the memory.
Similarly, munmap() removes the mapping so no further access to the allocated memory remains legal.
These are lower level calls, behaviourally similar to what is offered by memory allocator functions like malloc() / free() on a higher level. However, this system call allow one to have fine grained control over the allocated region behaviour, like,
memory protection of the mapping (read, write, execute permission)
(approximate) location of the mapping (see MAP_FIXED flag)
the initial content of the mapped area (see MAP_UNINITIALIZED flag)
etc.
You can also refer to the wikipedia article if you think alternate wordings can help you.

It maps a chunk of disk cache into process space so that the mapped file can be manipulated at a byte level instead of requiring the application to go through the VFS with read(), write(), et alia.

The manual is clear:
mmap() creates a new mapping in the virtual address space of the calling process
In short, it maps a chunk of file/device memory/whatever into the process' space, so that it can directly access the content by just accessing the memory.
For example:
fd = open("xxx", O_RDONLY);
mem = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
Will map the file's content to mem, reading mem is just like reading the content of the file xxx.
If the fd is some FPGA's device memory, then the mem becomes the content of the FPGA's content.
It is very convenient to use and efficient in some cases.

Boost managed_mapped_file: setting maximum allowed memory usage

Is there any way to set the maximum allowed memory used by managed_mapped_file? For example, I have 64GB of memory and I create a 20GB file. This is all loaded into memory. Is there a way to specify to only use 1GB of memory for example? Even approximately will suffice.
EDIT: I should add I use boost::interprocess::vector.. maybe there is a way to specialize the allocator?
typedef bi::allocator<Node, bi::managed_mapped_file::segment_manager> allocator_node_t;
typedef bi::vector<Node, allocator_node_t> vector_node_t;
bi::managed_mapped_file* nodeFile = new bi::managed_mapped_file(bi::open_or_create, "nodes_m.bin", bigSize);
allocator_node_t alloc_n(nodeFile->get_segment_manager());
vector_node_t* nodes = nodeFile->find_or_construct<vector_node_t>("nodes")(alloc_n);

There's not such a way (portably).
Also the premise is wrong:
For example, I have 64GB of memory and I create a 20GB file. This is all loaded into memory
Wrong: it will load only the pages that are used. Yes, this may mean that you might end up having the full 20GB in memory. The OS is free to do that as long as no other process requires the physical memory for other tasks.
It would be silly for the OS to arbitrarily unmap that data any reason. You want the OS to take advantage of available memory. Otherwise the money on those silicon chips was wasted.
EDIT: I should add I use boost::interprocess::vector.. maybe there is a way to specialize the allocator?
Using boost::interprocess::vector without a custom allocator doesn't use shared memory in the first place. You need to use e.g. boost::interprocess::allocator<T, boost::interprocess::managed_mapped_file::segment_manager> to use the mapped file in the first place.
And no, nothing in the allocator can override the OS virtual memory tuning parameters.
Nothing needs to be specialized (in the C++ sense of the word)
bi::managed_mapped_file* nodeFile = new bi::managed_mapped_file(bi::open_or_create, "nodes_m.bin", bigSize);
allocator_node_t alloc_n(nodeFile->get_segment_manager());
vector_node_t* nodes = nodeFile->find_or_construct<vector_node_t>("nodes")(alloc_n);
Executing this code the first time around (i.e. creating "nodes_m.bin") will not load bigSize. In fact it will not even allocate bigSize on disk! On all systems that support it (I know of no mainstream OS that doesn't) the file is created sparse.

how to manage large arrays

I have a c++ program that uses several very large arrays of doubles, and I want to reduce the memory footprint of this particular part of the program. Currently, I'm allocating 100 of them and they can be 100 Mb each.
Now, I do have the advantage, that eventually parts of these arrays become obsolete during later parts of the program's execution, and there is little need to ever have the whole of any one of then in memory at any one time.
My question is this:
Is there any way of telling the OS after I have created the array with new or malloc that a part of it is unnecessary any more ?
I'm coming to the conclusion that the only way to achieve this is going to be to declare an array of pointers, each of which may point to a chunk say 1Mb of the desired array, so that old chunks that are not needed any more can be reused for new bits of the array. This seems to me like writing a custom memory manager which does seem like a bit of a sledgehammer, that's going to create a bit of a performance hit as well
I can't move the data in the array because it is going to cause too many thread contention issues. the arrays may be accessed by any one of a large number of threads at any time, though only one thread ever writes to any given array.

It depends on the operating system. POSIX - including Linux - has the system call madvise to do improve memory performance. From the man page:
The madvise() system call advises the kernel about how to handle paging input/output in the address range beginning at address addr and with size length bytes. It allows an application to tell the kernel how it expects to use some mapped or shared memory areas, so that the kernel can choose appropriate read-ahead and caching techniques. This call does not influence the semantics of the application (except in the case of MADV_DONTNEED), but may influence its performance. The kernel is free to ignore the advice.
See the man page of madvise for more information.
Edit: Apparently, the above description was not clear enough. So, here are some more details, and some of them are specific to Linux.
You can use mmap to allocate a block of memory (directly from the OS instead of the libc), that is not backed by any file. For large chunks of memory, malloc is doing exactly the same thing. You have to use munmap to release the memory - regardless of the usage of madvise:
void* data = ::mmap(nullptr, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// ...
::munmap(data, size);
If you want to get rid of some parts of this chunk, you can use madvise to tell the kernel to do so:
madvise(static_cast<unsigned char*>(data) + 7 * page_size,
3 * page_size, MADV_DONTNEED);
The address range is still valid, but it is no longer backed - neither by physical RAM nor by storage. If you access the pages later, the kernel will allocate some new pages on the fly and re-initialize them to zero. Be aware, that the dontneed pages are also part of the virtual memory size of the process. It might be necessary to make some configuration changes to the virtual memory management, e.g. activating over-commit.

It would be easier to answer if we had more details.
1°) The answer to the question "Is there any way of telling the OS after I have created the array with new or malloc that a part of it is unnecessary any more ?" is "not really". That's the point of C and C++, and any language that let you handle memory manually.
2°) If you're using C++ and not C, you should not be using malloc.
3°) Nor arrays, unless for a very specific reason. Use a std::vector.
4°) Preferably, if you need to change often the content of the array and reduce the memory footprint, use a linked list (std::list), though it'll be more expensive to "access" individually the content of the list (but will be almost as fast if you only iterate through it).

A std::deque with pointers to std::array<double,LARGE_NUMBER> may do the job, but you better make a dedicated container with the deque, so you can remap the indexes and most importantly, define when entries are not used anymore.
The dedicated container can also contain a read/write lock, so it can be used in a thread-safe way.

You could try using lists instead of arrays. Of course list is 'heavyer' than array but on the other hand it is easy to reconstruct a list so that you can throw away a part of it when it becomes obsolete. You could also use a wrapper which would only contain indexes saying which part of the list is up-to-date and which part may be reused.
This will help you improve performance, but will require a little bit more (reusable) memory.

Allocating by chunk and delete[]-ing and new[]-ing on the way seems like the good solution. It may be possible to do as little as memory management as possible. Do not reuse chunk yourself, simply deallocate old one and allocate new chunks when needed.

Does any OS allow moving memory from one address to another without physically copying it?

memcpy/memmove duplicate (copy the data) from source to destination. Does anything exist to move pages from one virtual address to another without doing an actual byte by byte copy of the source data? It seems to be perfectly possible to me, but does any operating system actually allow this? It seems odd to me that dynamic arrays are such a widespread and popular concept but that growing them by physically copying is such a wasteful operation. It just doesn't scale when you start talking about array sizes in the gigabytes (e.g. imagine growing a 100GB array into a 200GB array. That's a problem that's entirely possible on servers in the < $10K range now.
void* very_large_buffer = VirtualAlloc(NULL, 2GB, MEM_COMMIT);
// Populate very_large_buffer, run out of space.
// Allocate buffer twice as large, but don't actually allocate
// physical memory, just reserve the address space.
void* even_bigger_buffer = VirtualAlloc(NULL, 4GB, MEM_RESERVE);
// Remap the physical memory from very_large_buffer to even_bigger_buffer without copying
// (i.e. don't copy 2GB of data, just copy the mapping of virtual pages to physical pages)
// Does any OS provide support for an operation like this?
MoveMemory(very_large_buffer, even_bigger_buffer, 2GB)
// Now very_large_buffer no longer has any physical memory pages associated with it
VirtualFree(very_large_buffer)

To some extent, you can do that with mremap on Linux.
That call plays with the process's page table to do a zero-copy reallocation if it can. It is not possible in all cases (address space fragmentation, and simply the presence of other existing mappings are an issue).
The man page actually says this:
mremap() changes the mapping between virtual addresses and memory pages. This can be used to implement a very efficient realloc(3).

Yes it's a common use of memory mapped files to 'move' or copy memory between process by mapping different views of the file

Every POSIX system is able to do this. If you use mmap with a file descriptor (obtained by open or shm_open) and not anonymously you can unmap it, then truncate (shrink or grow) and then map it again. You may and often will get a different virtual address for the same pages.

I mean, you'd never be able to absolutely guarantee that there would be no active memory in that next 100GB, so you might not be able to make it contiguous.
On the other hand, you could use a ragged array (an array of arrays) where the arrays do not have to be next to each other (or even the same size). Many of the advantages of dynamic arrays may not scale to the 100GB realm.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js