Copying data from a shared-memory-mapped object using sendfile()/fcopyfile() - c++

Is it possible – and if so prudent – to use sendfile() (or its Darwin/BSD cousin fcopyfile()) to shuttle data directly between a shared-memory object and a file?
Functions like sendfile() and fcopyfile() can perform all of the mechanistic necessities underpinning such transfers of data entirely without leaving kernel-space – you pass along two open descriptors, a source and a destination, when calling these functions, and they take it from there.
Other means of copying data will invariably require one to manually maneuver across the boundary between kernel-space and user-space; such context-switches are inherently quite costly, performance-wise.
I can’t find anything definitive on the subject of using a shared-memory descriptor as an argument thusly: no articles for or against the practice; nothing in the respective man-pages; no tweets publicly considering sendfile()-ing shared-memory descriptors harmful; &c… But so, I am thinking I should be able to do something like this:
char const* name = "/yo-dogg-i-heard-you-like-shm"; /// only one slash, at zero-index
int len = A_REASONABLE_POWER_OF_TWO; /// valid per shm_open()
int descriptor = shm_open(name, O_RDWR | O_CREAT, 0600);
int destination = open("/tmp/yodogg.block", O_RDWR | O_CREAT, 0644);
void* memory = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, descriptor, 0);
off_t bytescopied = 0;
sendfile(destination, descriptor, &bytescopied, len);
/// --> insert other stuff with memset(…), memcopy(…) &c. here, possibly
munmap(memory, len);
close(descriptor); close(destination);
shm_unlink(name);
… Is this misguided, or a valid technique?
And if the latter, can one adjust the size of the in-memory shared map before copying the data?
EDIT: I am developing the project to which this inquiry pertains on macOS 10.12.4; I am aiming for it to work on Linux, with eventual FreeBSD interoperability.

Copying data between two "things" mapped in memory - like in the example above - will indeed require copying things from kernel to userspace and then back. And no, you can't really use sendfile(2) system call to send to a file descriptor, I'm afraid.
But you should be able to do it like this:
Create the shared memory object (or a file, really; due to the second step it will be shared in memory anyway
Map it in memory, with MAP_SHARED; you'll get a pointer
Open the destination file
write(destination_fd, source_pointer, source_length)
In this case, the write syscall won't need to copy the data into your process. Not sure what the actual performance characteristic will be, though. Smart use of madvise(2) might help.

Related

Accessing binary files with mmap vs fstream or fopen

I did not know binary files could be read with mmap(). I used to think mmap()could only be used for IPC(interprocess communication) in Linux to exchange data between unrelated process.
Can someone explain how files are read with mmap()? I heard it has huge advantage when binary files are randomly accessed.
Well, mmapping a file is done the same way as it is done for the IPC or mapping anonymous memory. In the case of mapping anonymous memory the parts that have not been written to will be filled with zeroed pages on demand.
In case of a mapped file, the pages that correspond to file contents are read upon access (and upon writes too) from the file / or the buffer cache. Reading or writing outside the size of the file will result in SIGBUS. Essentially the pointer returned by mmap can be considered in the similar manner as the pointer returned by malloc, except that up to the size of mapping / up to the end-of-file bytes within the mapping are automatically read from / and possibly written to the backing file transparently.
Example:
fd = open("input.txt", O_RDWR, 0666);
fstat(fd, &stat_result);
char* contents = mmap(0, stat_result->st_size,
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
(error checking omitted)
After executing that you can consider contents as pointing to the first byte of a character array of stat_result->st_size characters, and you can use it just like an ordinary array, and the operating system will transparently write back the changes into the file.
With mmap the operating system will have a better view about which parts of the file should be kept in memory / buffer cache and which shouldn't.

What is the functionality of munmap, mmap

When I try to study some piece of code that deals with FPGA, I came across with munmap, mmap.
I go through the manual provided here. I am still not understanding the purpose of this function. What exactly this does?
mmap() is a system call, which helps in memory-mapped I/O operations. It allocates a memory region and maps that into the calling process virtual address space so as to enable the application to access the memory.
mmap() returns a pointer to the mapped area which can be used to access the memory.
Similarly, munmap() removes the mapping so no further access to the allocated memory remains legal.
These are lower level calls, behaviourally similar to what is offered by memory allocator functions like malloc() / free() on a higher level. However, this system call allow one to have fine grained control over the allocated region behaviour, like,
memory protection of the mapping (read, write, execute permission)
(approximate) location of the mapping (see MAP_FIXED flag)
the initial content of the mapped area (see MAP_UNINITIALIZED flag)
etc.
You can also refer to the wikipedia article if you think alternate wordings can help you.
It maps a chunk of disk cache into process space so that the mapped file can be manipulated at a byte level instead of requiring the application to go through the VFS with read(), write(), et alia.
The manual is clear:
mmap() creates a new mapping in the virtual address space of the calling process
In short, it maps a chunk of file/device memory/whatever into the process' space, so that it can directly access the content by just accessing the memory.
For example:
fd = open("xxx", O_RDONLY);
mem = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
Will map the file's content to mem, reading mem is just like reading the content of the file xxx.
If the fd is some FPGA's device memory, then the mem becomes the content of the FPGA's content.
It is very convenient to use and efficient in some cases.

Stream while serializing with Cap'n'Proto

Consider a Cap'n'Proto schema like this:
struct Document {
header #0 : Header;
records #1 :List(Record); // usually large number of records.
footer #2 :Footer;
}
struct Header { numberOfRecords : UInt32; /* some fields */ };
struct Footer { /* some fields */ };
struct Record {
type : UInt32;
desc : Text;
/* some more fields, relatively large in total */
}
Now I want to serialize (i.e. build) a document instance and stream it to a remote destination.
Since the document is usually very large I don't want to completely build it in memory before sending it. Instead I am looking for a builder that directly sends struct by struct over the wire. Such that the additional needed memory buffer is constant (i.e. O(max(sizeof(Header), sizeof(Record), sizeof(Footer))).
Looking at the tutorial material I don't find such a builder. The MallocMessageBuilder seems to create everything in memory first (then you call writeMessageToFd on it).
Does the Cap'n'Proto API support such a use-case?
Or is Cap'n'Proto more meant to be used for messages that fit into memory before sending?
In this example, the Document struct could be omitted and then one could just send a sequence of one Header message, n Record messages and one Footer. Since a Cap'n'Proto message is self-delimiting, this should work. But you loose your document root - perhaps sometimes this is not really an option.
The solution you outlined -- sending the parts of the document as separate messages -- is probably best for your use case. Fundamentally, Cap'n Proto is not designed for streaming chunks of a single message, since that would not fit well with its random-access properties (e.g. what happens when you try to follow a pointer that points to a chunk you haven't received yet?). Instead, when you want streaming, you should split a large message into a series of smaller messages.
That said, unlike other similar systems (e.g. Protobuf), Cap'n Proto does not strictly require messages to fit into memory. Specifically, you can do some tricks using mmap(2). If your document data is coming from a file on disk, you can mmap() the file into memory and then incorporate it into your message. With mmap(), the operating system does not actually read the data from disk until you attempt to access the memory, and the OS can also purge the pages from memory after they are accessed since it knows it still has a copy on disk. This often lets you write much simpler code, since you no longer need to think about memory management.
In order to incorporate an mmap()ed chunk into a Cap'n Proto message, you'll want to use capnp::Orphanage::referenceExternalData(). For example, given:
struct MyDocument {
body #0 :Data;
# (other fields)
}
You might write:
// Map file into memory.
void* ptr = (kj::byte*)mmap(
nullptr, size, PROT_READ, MAP_PRIVATE, fd, 0);
if (ptr == MAP_FAILED) {
KJ_FAIL_SYSCALL("mmap", errno);
}
auto data = capnp::Data::Reader((kj::byte*)ptr, size);
// Incorporate it into a message.
capnp::MallocMessageBuilder message;
auto root = message.getRoot<MyDocument>();
root.adoptDocumentBody(
message.getOrphanage().referenceExternalData(data));
Because Cap'n Proto is zero-copy, it will end up writing the mmap()ed memory directly out to the socket without ever accessing it. It's then up to the OS to read the content from disk and out to the socket as appropriate.
Of course, you still have a problem on the receiving end. You'll find it a lot more difficult to design the receiving end to read into mmap()ed memory. One strategy might be to dump the entire stream directly to a file first (without involving the Cap'n Proto library), then mmap() that file and use capnp::FlatArrayMessageReader to read the mmap()ed data in-place.
I describe all this because it's a neat thing that is possible with Cap'n Proto but not most other serialization frameworks (e.g. you couldn't do this with Protobuf). Playing tricks with mmap() is sometimes really useful -- I've used this successfully in several places in Sandstorm, Cap'n Proto's parent project. However, I suspect that for your use case, splitting the document into a series of messages probably makes more sense.

Optimal method to mmap a file to RAM?

I am using mmap to read a file and I only recently found out that it is not actually getting it into RAM, but is only creating a virtual address space for it. This will cause any accessing of the data to still use disk which I want to avoid, so I want to read it all into RAM.
I am reading the file via:
char* cs_virt;
cs_virt = (char*)mmap(0, nchars, PROT_READ, MAP_PRIVATE, finp, offset);
and when I loop after this, I see that the virtual memory for this process has, indeed, been blown up. I want to copy this into RAM, though, so I do the following:
char* cs_virt;
cs_virt = (char*)mmap(0, nchars, PROT_READ, MAP_PRIVATE, finp, offset);
cs = (char*)malloc(nchars*sizeof(char));
for(int ichar = 0; ichar < nchars; ichar++) {
cs[ichar] = cs_virt[ichar];
}
Is this the best method? If not, what is a more efficient method to do this? I have this taking place in a function and cs is declared outside the function. Once I exit the function, I will retain cs, but will cs_virt need to be deleting or will it go away on it's own since it is declared locally in the function?
If you are using Linux, you may be able to use MAP_POPULATE:
MAP_POPULATE (since Linux 2.5.46)
Populate (prefault) page tables for a mapping. For a file mapping, this causes read-ahead on the file. Later
accesses to the mapping will not be blocked by page faults.
MAP_POPULATE is supported for private mappings
only since Linux 2.6.23.
This may be useful if you have time to spare when you mmap() but your later accesses need to be responsive. Consider also MAP_LOCKED if you really need the file to be mapped in and never swapped back out.
MPI and I/O is a murky issue. HDF5 seems to be the most common library that can help you with that, but it often needs tuning for the particular cluster, which is often impossible for mere users of the cluster. A colleague of mine had better success with SIONlib, and was able to get his code working on nearly 1e6 cores on JUGENE with that, so I'd have look at that.
In both cases you will probably need to adapt your file format. In the case of my colleague it even paid of to write the data in parallel fashion using SIONlib, and to later do e sequential postprocessing to "defragment" the holes left be the parallel access pattern that SIONlib chose. It might be similar for input.

Memory allocators

I want to make a virtual allocator using c++ on windows,, which allocate data on a file on the hard disk, to reduce physical memory usage when allocate large objects !..
I don't want to use system virtual memory with virtualAlloc.. . I want to create a file on the disk and use it to allocate the whole objects and then move the part or the object that I need to the RAM .
I tried to use Memory mapped file , but I faced some problems: I used the mapped file to allocate vector elements, but when I bake to delete any of them, the address of the element changed, also I can't find a method to map the object only when needed "in my test I mapped the whole file"!
Any resources or open source projects can help ???
Google can help here. I implemented a custom STL allocator a number of years ago that used a shared memory store. The same techniques can be used to implement a disk-backed allocator. I would start by looking at this SourceForge project for inspiration.
You may find inspiration from Boost.Interprocess, which provides support for memory mapped files, as well as allocators and containers over that memory.
More information about the allocator design can also be found at http://www.boost.org/doc/libs/1_37_0/doc/html/interprocess/architecture.html
Sorry, but you fail to understand how (virtual) memory works. One the one hand you state that "I want to make "custom memory allocator" but without take a large space from the memory" but on the other hand you're surprised that "the address of the element changed".
This is pretty much to be expected. To make sure that the address of a (logical) object doesn't change, you have to keep the memory represented by that address committed to the object. If you free the memory, it becomes available for reuse, and so does the address. And if the address is reused, you can't page back the object to that address.
Ultimately, the problem here it that addresses and memory are very, very deeply connected. Recycling memory means recycling addresses.
From http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=213
The POSIX header includes memory mapping syscalls and data structures. Because this interface is more intuitive and simpler than that of Windows, I base my memory mapping example on the POSIX library.
The mmap() system call:
caddr_t mmap(caddress_t map_addr,
size_t length,
int protection,
int flags,
int fd,
off_t offset);
Let's examine what each parameter means.
In the following example, the program maps the first 4 KB of a file passed in command line into its memory and then reads int value from it:
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/types.h>
int main(int argc, char *argv[])
{
int fd;
void * pregion;
if (fd= open(argv[1], O_RDONLY) <0)
{
perror("failed on open");
return –1;
}
/*map first 4 kilobytes of fd*/
pregion=mmap(NULL, 4096, PROT_READ,MAP_SHARED,fd,0);
if (pregion==(caddr_t)-1)
{
perror("mmap failed")
return –1;
}
close(fd); //close the physical file because we don't need it
//access mapped memory; read the first int in the mapped file
int val= *((int*) pregion);
}
To unmap a mapped region, use the munmap() function:
int munmap(caddr_t addr, int length);
addr is the address of the region being unmapped. length specifies how much of the memory should be unmapped (you may unmap a portion of a previously-mapped region). The following example unmaps the first kilobyte of the previously-mapped file. The remaining three KB still remain mapped to the process's RAM after this call:
munmap(pregion, 1024);
Probably the best way to solve this is not to return regular pointers to large objects. Simply return small proxies. These proxy objects implement the full interface of the larger object. However, these proxy objects can deal with the raw data being either in RAM or on disk. The proxies implement a LRU mechanism amongst themselves to optimize RAM use. The caller never sees the address of these proxies change, nor does it get any pointers to raw data.