What does boost interprocess file_lock actually do with the target file? - c++

I've done some reading about boost::interprocess::file_lock and it seems to do pretty much what I'm after (support shareable and exclusive locking, and being unlocked if the process crashes or exits).
One thing I'm not sure about though, is what does it do to the file? Can I use for example a file of 0 bytes long? Does boost::interprocess write anything into it? Or is its presence all the system cares about?
I've been using boost::interprocess now for some time to reliably memory map a file and write into it, now I need to go multiprocess and ensure that reads and writes to this file are protected; file_lock does seem the way to go, I just wonder if I now need to add another file to use as a mutex.
Thanks in advance

what does it do to the file?
Boost does not do anything with the file, it relies on the operating system to get that job done. Support for memory mapped files is a generic capability of a demand-paged virtual memory operating system. Like Windows, Linux, OSX. Memory is normally backed by the paging file, having it backed by a specific file you select is but a small step. Boost just provides a platform-independent adapter, nothing more.
You'll want to take a look at the relevant OS documentation pages to see what's possible and how it is expected to work when you do something unusual. For Linux and OSX you'll want to look at the mmap man pages. For Windows look at CreatefileMapping.
file_lock does seem the way to go
Yes, you almost always need to arbitrate access to the memory mapped file so for example one process will only attempt to read the data when the other process finished writing it. The most suitable synchronization primitive for that is not a file_lock (the OS already locks the file), it is a named mutex. Use, say, boost's named_mutex class.
Do keep in mind that this is a very low-level interop mechanism and comes without any conveniences whatsoever. By the time you add all of the required synchronization, you're half-way to what the OS already does with a named pipe or local-loopback socket. If you discover that you have to copy data into the mapped view, not uncommon since it is not easily resizable, then you've lost all benefits.

Related

Linux non-persistent backing store for mmap()

First, a little motivating background info: I've got a C++-based server process that runs on an embedded ARM/Linux-based computer. It works pretty well, but as part of its operation it creates a fairly large fixed-size array (e.g. dozens to hundreds of megabytes) of temporary/non-persistent state information, which it currently keeps on the heap, and it accesses and/or updates that data from time to time.
I'm investigating how far I can scale things up, and one problem I'm running into is that eventually (as I stress-test the server by making its configuration larger and larger), this data structure gets big enough to cause out-of-memory problems, and then the OOM killer shows up, and general unhappiness ensues. Note that this embedded configuration of Linux doesn't have swap enabled, and I can't (easily) enable a swap partition.
One idea I have on how to ameliorate the issue is to allocate this large array on the computer's local flash partition, instead of directly in RAM, and then use mmap() to make it appear to the server process like it's still in RAM. That would reduce RAM usage considerably, and my hope is that Linux's filesystem-cache would mask most of the resulting performance cost.
My only real concern is file management -- in particular, I'd like to avoid any chance of filling up the flash drive with "orphan" backing-store files (i.e. old files whose processes don't exist any longer, but the file is still present because its creating process crashed or by some other mistake forgot to delete it on exit). I'd also like to be able to run multiple instances of the server simultaneously on the same computer, without the instances interfering with each other.
My question is, does Linux have any built-it facility for handling this sort of use-case? I'm particularly imagining some way to flag a file (or an mmap() handle or similar) so that when the file that created the process exits-or-crashes, the OS automagically deletes the file (similar to the way Linux already automagically recovers all of the RAM that was allocated by a process, when the process exits-or-crashes).
Or, if Linux doesn't have any built-in auto-temp-file-cleanup feature, is there a "best practice" that people use to ensure that large temporary files don't end up filling up a drive due to unintentionally becoming persistent?
Note that AFAICT simply placing the file in /tmp won't help me, since /tmp is using a RAM-disk and therefore doesn't give me any RAM-usage advantage over simply allocating in-process heap storage.
Yes, and I do this all the time...
open the file, unlink it, use ftruncate or (better) posix_fallocate to make it the right size, then use mmap with MAP_SHARED to map it into your address space. You can then close the descriptor immediately if you want; the memory mapping itself will keep the file around.
For speed, you might find you want to help Linux manage its page cache. You can use posix_madvise with POSIX_MADV_WILLNEED to advise the kernel to page data in and POSIX_MADV_DONTNEED to advise the kernel to release the pages.
You might find that last does not work the way you want, especially for dirty pages. You can use sync_file_range to explicitly control flushing to disk. (Although in that case you will want to keep the file descriptor open.)
All of this is perfectly standard POSIX except for the Linux-specific sync_file_range.
Yes, You create/open the file. Then you remove() the file by its filename.
The file will still be open by your process and you can read/write it just like any opened file, and it will disappear when the process having the file opened exits.
I believe this behavior is mandated by posix, so it will work on any unix like system. Even at a hard reboot, the space will be reclaimed.
I believe this is filesystem-specific, but most Linux filesystems allow deletion of open files. The file will still exist until the last handle to it is closed. I would recommend that you open the file then delete it immediately and it will be automatically cleaned up when your process exits for any reason.
For further details, see this post: What happens to an open file handle on Linux if the pointed file gets moved, delete

With what API do you perform a read-consistent file operation in OS X, analogous to Windows Volume Shadow Service

We're writing a C++/Objective C app, runnable on OSX from versions 10.7 to present (10.11).
Under windows, there is the concept of a shadow file, which allows you read a file as it exists at a certain point in time, without having to worry about other processes writing to that file in the interim.
However, I can't find any documentation or online articles discussing a similar feature in OS X. I know that OS X will not lock a file when it's being written to, so is it necessary to do something special to make sure I don't pick up a file that is in the middle of being modified?
Or does the Journaled Filesystem make any special handling unnecessary? I'm concerned that if I have one process that is creating or modifying files (within a single context of, say, an fopen call - obviously I can't be guaranteed of "completeness" if the writing process is opening and closing a file repeatedly during what should be an atomic operation), that a reading process will end up getting a "half-baked" file.
And if JFS does guarantee that readers only see "whole" files, does this extend to Fat32 volumes that may be mounted as external drives?
A few things:
On Unix, once you open a file, if it is replaced (as opposed to modified), your file descriptor continues to access the file you opened, not its replacement.
Many apps will replace rather than modify files, using things like -[NSData writeToFile:atomically:] with YES for atomically:.
Cocoa and the other high-level frameworks do, in fact, lock files when they write to them, but that locking is advisory not mandatory, so other programs also have to opt in to the advisory locking system to be affected by that.
The modern approach is File Coordination. Again, this is a voluntary system that apps have to opt in to.
There is no feature quite like what you described on Windows. If the standard approaches aren't sufficient for your needs, you'll have to build something custom. For example, you could make a copy of the file that you're interested in and, after your copy is complete, compare it to the original to see if it was being modified as you were copying it. If the original has changed, you'll have to start over with a fresh copy operation (or give up). You can use File Coordination to at least minimize the possibility of contention from cooperating programs.

How to program securely when you can only expect advisory file locks where the program will be used?

Asked on the grounds of: "...but if your question generally covers…
- a specific programming problem..." (Help center - asking)
Scope: This is not about how to use the file lock mechanisms on different platforms, but about how to mitigate the absence of mandatory file locks on the user's system. E.g.: I can't expect a user of a Linux system to modify the system, let alone know how to do it, so I have to assume advisory locks is all that is available. I have found a lot of info about how to use locks of both kinds, and what is available on some platforms, and even why they are not available on some systems. Portability would be great, but this is probably to much down to the bone for that.
I am a bit confused about how to safeguard my program's data if other processes don't cooperate, intentionally or not. Assuming that my program uses its own directory for the data, is there a way to make sure that my data will stay consistent while the program runs?
Would, for example, temporary hidden files be a practical solution (create a file, delete it from the OS' directory, so only my handle holds the inode to the file), copying all data at program start and overwriting the original at the end? It seems to be very platform specific, though.
Are there specific mechanisms or techniques to use that could help with this, or can I only "trust"?
Note: This is not specifically about Linux, it's just an example.
------ EDIT -------
I'm looking for a way to do this that works in C/C++, hence those tags, but am aware that it might involve system specific features. If possible, the solution would work regardless of platform and file locking mechanism.
While file locks is the mechanism referenced in the question, the real problem is how to prevent another process from trampling over the data my program relies on while running, even if that process runs as the same user as my program does, but does not care to check whether the files are locked. Also if it is using a mechanism that isn't working well with the one my program is using. (AFAICT, locks acquired with one of flock()/lockf() on Linux may not work when the other is used in the other process) (Another situation from Linux, but one that is outside the scope of the question)
As I tried to explain about the scope, this is not about how to use file locks on any platform, but what to do when you cannot assume anything about what mechanism is available/turned on, to achieve similar protection to what mandatory file locks would give.
I am a bit confused about how to safeguard my program's data if other processes don't cooperate, intentionally or not. Assuming that my program uses its own directory for the data, is there a way to make sure that my data will stay consistent while the program runs?
You cannot do so and should not even try to do so. How do you safeguard your data if the admin pulls the power cord? All things with access to your data must cooperate -- that is the precondition for safeguarding being possible.
Simply specify that your program requires a directory that is only touched by programs that cooperate with yours. That is a trivial requirement that many programs have any any competent administrator can provide.

Write Only Memory Mapping in boost?

Why doesn't boost interprocess support write only memory mapping?
Maybe I'm missing something but wouldn't a write only mapping be significantly faster than a read/write mapping as the OS doesn't have to read in the pages from the disk, just flush out pages from memory to the disk? Also it would have the benefit of being entirely non blocking (except for flushing and destruction).
Would I benefit by switching from boost to native OS memory mapping?
In fact if you allocate a new memory-mapped file of size, say, 20Gb, you'll get a sparse file allocation of that size.
When "mapping in" pages of that files, there need to be a read operation (as the OS might be able to tell that the page is not physically present yet on disk), and only when (if) those pages are dirtied need they be written out.
Of course, this is implementation dependent and I don't think POSIX (can) guarantee this, but it's not unreasonable behaviour IYAM, and would be the equivalent of write-only mapping.
Actually, a write-only memmap would not be faster, as the OS can only keep track of changes / provide those mappings in whole-page granularity.
At least, if you want to avoid the prohibitive cost of simulating all access to such pages in kernel-land (not implemented) instead of just mapping a page.
Somehow, I doubt going directly to the OS API instead of going through the Boost-API could provide any significant speed-ups:
The boost API is a thin wrapper over the OS-specific interface and will be completely inlined and thus compiled out by any decent compiler.

Temp file that exists only in RAM?

I'm trying to write an encrpytion using the OTP method. In keeping with the security theories I need the plain text documents to be stored only in memory and never ever written to a physical drive. The tmpnam command appears to be what I need, but from what I can see it saves the file on the disk and not the RAM.
Using C++ is there any (platform independent) method that allows a file to exist only in RAM? I would like to avoid using a RAM disk method if possible.
Thanks
Edit:
Thanks, its more just a learning thing for me, I'm new to encryption and just working through different methods, I don't actually plan on using many of them (esspecially OTP due to doubling the original file size because of the "pad").
If I'm totally honest, I'm a Linux user so ditching Windows wouldn't be too bad, I'm looking into using RAM disks for now as FUSE seems a bit overkill for a "learning" thing.
The simple answer is: no, there is no platform independent way. Even keeping the data only in memory, it will still risk being swapped out to disk by the virtual memory manager.
On Windows, you can use VirtualLock() to force the memory to stay in RAM. You can also use CryptProtectMemory() to prevent other processes from reading it.
On POSIX systems (e.g. BSD, Linux) you can use mlock() to lock memory in RAM.
Not really unless you count in-memory streams (like stringstream).
No especially and specifically for security purposes: any piece of data can be swapped to disk on virtual memory systems.
Generally, if you are concerned about security, you have to use platform-specific methods for controlling access: What good is keeping your data in RAM if everyone can read it?
You might want to look at TrueCrypt's source code. Getting code at the file system level might be your best bet.
OTP is an awful encryption method for arbitrary files, unless you have a massive amount of entropy that you can guarantee never repeats itself (that's why it's called "one-time"!)
If you want to create a file-like object that only exists in memory and you don't care about Windows, I'd look at writing a custom FUSE filesystem (http://fuse.sourceforge.net/); this way you guarantee what will and will not get written to disk, and your files are accessible by all programs.
Using one of std::stringstream or fmemopen will get you file-like access to blocks of memory. If (for security) you want to avoid it being swapped out, use mlock which is probably easiest to use with fmemopen's buffer than std::stringstream. Combining mlock with std::stringstream would probably need to be done via a custom allocator (used as a template parameter).