What is the functionality of munmap, mmap - c++

When I try to study some piece of code that deals with FPGA, I came across with munmap, mmap.
I go through the manual provided here. I am still not understanding the purpose of this function. What exactly this does?

mmap() is a system call, which helps in memory-mapped I/O operations. It allocates a memory region and maps that into the calling process virtual address space so as to enable the application to access the memory.
mmap() returns a pointer to the mapped area which can be used to access the memory.
Similarly, munmap() removes the mapping so no further access to the allocated memory remains legal.
These are lower level calls, behaviourally similar to what is offered by memory allocator functions like malloc() / free() on a higher level. However, this system call allow one to have fine grained control over the allocated region behaviour, like,
memory protection of the mapping (read, write, execute permission)
(approximate) location of the mapping (see MAP_FIXED flag)
the initial content of the mapped area (see MAP_UNINITIALIZED flag)
etc.
You can also refer to the wikipedia article if you think alternate wordings can help you.

It maps a chunk of disk cache into process space so that the mapped file can be manipulated at a byte level instead of requiring the application to go through the VFS with read(), write(), et alia.

The manual is clear:
mmap() creates a new mapping in the virtual address space of the calling process
In short, it maps a chunk of file/device memory/whatever into the process' space, so that it can directly access the content by just accessing the memory.
For example:
fd = open("xxx", O_RDONLY);
mem = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
Will map the file's content to mem, reading mem is just like reading the content of the file xxx.
If the fd is some FPGA's device memory, then the mem becomes the content of the FPGA's content.
It is very convenient to use and efficient in some cases.

Related

mmap's worst case memory usage when using MAP_PRIVATE vs MAP_SHARED

I haven't seen this explicitly anywhere so I just wanted to clarify. This is all in the context of a single threaded program: say we have a 10GB text file when we open with mmap, using the MAP_PRIVATE option. Initially of course, I should expect to see 0GB resident memory used. Now say I modify every character in the file. Will this then require 10GB in resident memory? And if not, why not?
Now what if we did the same thing but with MAP_SHARED, what should I expect the resident memory usage to look like?
MAP_SHARED creates a mapping that is backed by the original file. Any changes to the data are written back to that file (assuming a read/write mapping).
MAP_PRIVATE creates a mapping that is backed by the original file for reads only. If you change bytes in the mapping, then the OS creates a new page that is occupies physical memory and is backed by swap (if any).
The impact on resident set size is not dependent on the mapping type: pages will be in your resident set if they're actively accessed (read or write). If the OS needs physical memory, then pages that are not actively accessed are dropped (if clean), or written to either the original file or swap (if dirty, and depending on mapping type).
Where the two types differ is in total commitment against physical memory and swap. A shared mapping doesn't increase this commitment, a private mapping does. If you don't have enough combined memory and swap to hold every page of the private mapping, and you write to every page, then you (or possibly some other process) will be killed by the out-of-memory daemon.
Update: what I wrote above applies to memory-mapped files. You can map an anonymous block (MAP_ANONYMOUS) with MAP_SHARED, in which case the memory is backed by swap, not a file.

Can I reuse host buffer memory ad libidum or should I re-map it every frame?

My app has a VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT buffer and a permanent command buffer that uploads the memory to a VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT buffer.
I have two questions about this setup. This is question 1, question 2 is separate.
For better performance, resources (buffers, buffer memories, memory mapping, command buffers, etc.) are allocated outside the main loop. The only thing I do in the main loop (per-frame) is triggering the command buffer with a vkQueueSubmit(), which transfers the data from host memory to device-local memory. I took several significant "shortcuts" in respect to literature (the classic Vulkan tutorial everybody starts with). By writing direcly into stagingMemory I need no separate memory and no memcpy(), and doing most of it outside the loop is even more of a shortcut. This is the pseudo code:
void* stagingMemory;
vkMapMemory(logicalDevice, stagingBufferMemory, 0, size, 0, &stagingMemory);
while (running)
{
// write directly into stagingMemory by fiddling with pointers and offsets
if (its_time_to_update_ubo_on_device)
{
VkQueueSubmit(...) // transfer stagingBufferMemory to device-local buffer
}
}
// only on exit
vkUnmapMemory(logicalDevice, stagingBufferMemory);
This works and I understand this is performant because I minimize instantiations (such as SubmitInfo and command buffers) and several other operations. But I wonder if this is safe on the long run. What happens when memory pressure triggers virtual memory pages to be paged out to disk? Can this happen, or is stagingMemory safe?
What raises doubts in me, is that I've always read about a very different approach, like this:
while (running)
{
// write to memory (not staging memory!)
void* stagingMemory;
vkMapMemory(logicalDevice, stagingBufferMemory, 0, size, 0, &stagingMemory);
memcpy(stagingMemory, memory, size);
vkUnmapMemory(logicalDevice, stagingBufferMemory);
if (its_time_to_update_ubo_on_device)
{
SubmitInfo info {}; // re-initialize every time anew
VkQueueSubmit(... info ...) // upload to device-local memory
}
}
Is this less-optimized approach just for didactic reasons, or does this prevent problems I don't envision, yet, and will ruin everything later on?
Am I doing what is described in this nVidia blog post as Pinned Host Memory, or is this something still different?
What happens when memory pressure triggers virtual memory pages to be paged out to disk?
Um... that's not a thing that happens, actually.
Virtual pages are never "paged out"; only physical storage gets paged out. The storage underneath a virtual address range can get paged out, but the actual virtual addresses are fine.
Perhaps you're thinking that Vulkan would have to ensure that physical pages associated with a mapped range can't be paged out, lest a DMA operation fail to complete. Well, that's not how Vulkan transfer operations work. They don't require that the memory is mapped during the transfer (nor do they require that it is unmapped prior to the transfer. Vulkan doesn't care). So it doesn't matter to Vulkan whether there is some virtual address range bound to the storage; internally, it could be using the actual physical addresses for its DMA operations.
If the GPU needs that range of memory to not be paged out all the time, then it will need it regardless of whether it is mapped. If the GPU is fine with it being paged out, and will page it back in prior to any DMA operations from/to it, then that's a thing that has nothing to do with the memory being mapped.
In short, your question is a non-sequitur: keeping it mapped or not will not affect memory pressure. The only thing it might affect is how much virtual memory addresses your program uses. Which in the days of 64-bit programs is really kind of academic. Unless you think you're going to allocate 2^48 bytes of storage.
So the only reason to unmap memory (besides when you're about to delete it) is if you're writing a 32-bit application and you need to be careful with virtual address space and you know that the implementation will not assign virtual addresses to CPU-accessible memory unless you map it (implementations are free to always give them virtual address space).

Does any OS allow moving memory from one address to another without physically copying it?

memcpy/memmove duplicate (copy the data) from source to destination. Does anything exist to move pages from one virtual address to another without doing an actual byte by byte copy of the source data? It seems to be perfectly possible to me, but does any operating system actually allow this? It seems odd to me that dynamic arrays are such a widespread and popular concept but that growing them by physically copying is such a wasteful operation. It just doesn't scale when you start talking about array sizes in the gigabytes (e.g. imagine growing a 100GB array into a 200GB array. That's a problem that's entirely possible on servers in the < $10K range now.
void* very_large_buffer = VirtualAlloc(NULL, 2GB, MEM_COMMIT);
// Populate very_large_buffer, run out of space.
// Allocate buffer twice as large, but don't actually allocate
// physical memory, just reserve the address space.
void* even_bigger_buffer = VirtualAlloc(NULL, 4GB, MEM_RESERVE);
// Remap the physical memory from very_large_buffer to even_bigger_buffer without copying
// (i.e. don't copy 2GB of data, just copy the mapping of virtual pages to physical pages)
// Does any OS provide support for an operation like this?
MoveMemory(very_large_buffer, even_bigger_buffer, 2GB)
// Now very_large_buffer no longer has any physical memory pages associated with it
VirtualFree(very_large_buffer)
To some extent, you can do that with mremap on Linux.
That call plays with the process's page table to do a zero-copy reallocation if it can. It is not possible in all cases (address space fragmentation, and simply the presence of other existing mappings are an issue).
The man page actually says this:
mremap() changes the mapping between virtual addresses and memory pages. This can be used to implement a very efficient realloc(3).
Yes it's a common use of memory mapped files to 'move' or copy memory between process by mapping different views of the file
Every POSIX system is able to do this. If you use mmap with a file descriptor (obtained by open or shm_open) and not anonymously you can unmap it, then truncate (shrink or grow) and then map it again. You may and often will get a different virtual address for the same pages.
I mean, you'd never be able to absolutely guarantee that there would be no active memory in that next 100GB, so you might not be able to make it contiguous.
On the other hand, you could use a ragged array (an array of arrays) where the arrays do not have to be next to each other (or even the same size). Many of the advantages of dynamic arrays may not scale to the 100GB realm.

Pagefile-backed memory-mapped files vs. Heap -- what's the difference?

What is the advantage of using a memory-mapped file backed by the system paging file (through CreateFileMapping(INVALID_HANDLE_VALUE, ...), instead of just allocating memory from the heap the usual way (malloc(...), HeapAlloc(...), etc.)?
i.e. When should I use which?
It's lower level, it gives you more than malloc does:
You can share the mapping with other processes (of course you also need to synchronize)
You can set permissions on the memory (for example you can have read-only memory via PAGE_READONLY)
You can set some cache / page parameters

Memory allocators

I want to make a virtual allocator using c++ on windows,, which allocate data on a file on the hard disk, to reduce physical memory usage when allocate large objects !..
I don't want to use system virtual memory with virtualAlloc.. . I want to create a file on the disk and use it to allocate the whole objects and then move the part or the object that I need to the RAM .
I tried to use Memory mapped file , but I faced some problems: I used the mapped file to allocate vector elements, but when I bake to delete any of them, the address of the element changed, also I can't find a method to map the object only when needed "in my test I mapped the whole file"!
Any resources or open source projects can help ???
Google can help here. I implemented a custom STL allocator a number of years ago that used a shared memory store. The same techniques can be used to implement a disk-backed allocator. I would start by looking at this SourceForge project for inspiration.
You may find inspiration from Boost.Interprocess, which provides support for memory mapped files, as well as allocators and containers over that memory.
More information about the allocator design can also be found at http://www.boost.org/doc/libs/1_37_0/doc/html/interprocess/architecture.html
Sorry, but you fail to understand how (virtual) memory works. One the one hand you state that "I want to make "custom memory allocator" but without take a large space from the memory" but on the other hand you're surprised that "the address of the element changed".
This is pretty much to be expected. To make sure that the address of a (logical) object doesn't change, you have to keep the memory represented by that address committed to the object. If you free the memory, it becomes available for reuse, and so does the address. And if the address is reused, you can't page back the object to that address.
Ultimately, the problem here it that addresses and memory are very, very deeply connected. Recycling memory means recycling addresses.
From http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=213
The POSIX header includes memory mapping syscalls and data structures. Because this interface is more intuitive and simpler than that of Windows, I base my memory mapping example on the POSIX library.
The mmap() system call:
caddr_t mmap(caddress_t map_addr,
size_t length,
int protection,
int flags,
int fd,
off_t offset);
Let's examine what each parameter means.
In the following example, the program maps the first 4 KB of a file passed in command line into its memory and then reads int value from it:
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/types.h>
int main(int argc, char *argv[])
{
int fd;
void * pregion;
if (fd= open(argv[1], O_RDONLY) <0)
{
perror("failed on open");
return –1;
}
/*map first 4 kilobytes of fd*/
pregion=mmap(NULL, 4096, PROT_READ,MAP_SHARED,fd,0);
if (pregion==(caddr_t)-1)
{
perror("mmap failed")
return –1;
}
close(fd); //close the physical file because we don't need it
//access mapped memory; read the first int in the mapped file
int val= *((int*) pregion);
}
To unmap a mapped region, use the munmap() function:
int munmap(caddr_t addr, int length);
addr is the address of the region being unmapped. length specifies how much of the memory should be unmapped (you may unmap a portion of a previously-mapped region). The following example unmaps the first kilobyte of the previously-mapped file. The remaining three KB still remain mapped to the process's RAM after this call:
munmap(pregion, 1024);
Probably the best way to solve this is not to return regular pointers to large objects. Simply return small proxies. These proxy objects implement the full interface of the larger object. However, these proxy objects can deal with the raw data being either in RAM or on disk. The proxies implement a LRU mechanism amongst themselves to optimize RAM use. The caller never sees the address of these proxies change, nor does it get any pointers to raw data.