Linux non-persistent backing store for mmap()

Linux non-persistent backing store for mmap() - c++

First, a little motivating background info: I've got a C++-based server process that runs on an embedded ARM/Linux-based computer. It works pretty well, but as part of its operation it creates a fairly large fixed-size array (e.g. dozens to hundreds of megabytes) of temporary/non-persistent state information, which it currently keeps on the heap, and it accesses and/or updates that data from time to time.
I'm investigating how far I can scale things up, and one problem I'm running into is that eventually (as I stress-test the server by making its configuration larger and larger), this data structure gets big enough to cause out-of-memory problems, and then the OOM killer shows up, and general unhappiness ensues. Note that this embedded configuration of Linux doesn't have swap enabled, and I can't (easily) enable a swap partition.
One idea I have on how to ameliorate the issue is to allocate this large array on the computer's local flash partition, instead of directly in RAM, and then use mmap() to make it appear to the server process like it's still in RAM. That would reduce RAM usage considerably, and my hope is that Linux's filesystem-cache would mask most of the resulting performance cost.
My only real concern is file management -- in particular, I'd like to avoid any chance of filling up the flash drive with "orphan" backing-store files (i.e. old files whose processes don't exist any longer, but the file is still present because its creating process crashed or by some other mistake forgot to delete it on exit). I'd also like to be able to run multiple instances of the server simultaneously on the same computer, without the instances interfering with each other.
My question is, does Linux have any built-it facility for handling this sort of use-case? I'm particularly imagining some way to flag a file (or an mmap() handle or similar) so that when the file that created the process exits-or-crashes, the OS automagically deletes the file (similar to the way Linux already automagically recovers all of the RAM that was allocated by a process, when the process exits-or-crashes).
Or, if Linux doesn't have any built-in auto-temp-file-cleanup feature, is there a "best practice" that people use to ensure that large temporary files don't end up filling up a drive due to unintentionally becoming persistent?
Note that AFAICT simply placing the file in /tmp won't help me, since /tmp is using a RAM-disk and therefore doesn't give me any RAM-usage advantage over simply allocating in-process heap storage.

Yes, and I do this all the time...
open the file, unlink it, use ftruncate or (better) posix_fallocate to make it the right size, then use mmap with MAP_SHARED to map it into your address space. You can then close the descriptor immediately if you want; the memory mapping itself will keep the file around.
For speed, you might find you want to help Linux manage its page cache. You can use posix_madvise with POSIX_MADV_WILLNEED to advise the kernel to page data in and POSIX_MADV_DONTNEED to advise the kernel to release the pages.
You might find that last does not work the way you want, especially for dirty pages. You can use sync_file_range to explicitly control flushing to disk. (Although in that case you will want to keep the file descriptor open.)
All of this is perfectly standard POSIX except for the Linux-specific sync_file_range.

Yes, You create/open the file. Then you remove() the file by its filename.
The file will still be open by your process and you can read/write it just like any opened file, and it will disappear when the process having the file opened exits.
I believe this behavior is mandated by posix, so it will work on any unix like system. Even at a hard reboot, the space will be reclaimed.

I believe this is filesystem-specific, but most Linux filesystems allow deletion of open files. The file will still exist until the last handle to it is closed. I would recommend that you open the file then delete it immediately and it will be automatically cleaned up when your process exits for any reason.
For further details, see this post: What happens to an open file handle on Linux if the pointed file gets moved, delete

Related

Memory mapped IO concept details

I'm attempting to figure out what the best way is to write files in Windows. For that, I've been running some tests with memory mapping, in an attempt to figure out what is happening and how I should organize things...
Scenario: The file is intended to be used in a single process, in multiple threads. You should see a thread as a worker that works on the file storage; some of them will read, some will write - and in some cases the file will grow. I want my state to survive both process and OS crashes. Files can be large, say: 1 TB.
After reading a lot on MSDN, I whipped up a small test case. What I basically do is the following:
Open a file (CreateFile) using FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH.
Build a mmap file handle (CreateFileMapping) on the file, using some file growth mechanism.
Map the memory regions (MapViewOfFile) using a multiple of the sector size (from STORAGE_PROPERTY_QUERY). The mode I intend to use is READ+WRITE.
So far I've been unable to figure out how to use these construct exactly (tools like diskmon won't work for good reasons) so I decided to ask here. What I basically want to know is: how I can best use these constructs for my scenario?
If I understand correctly, this is more or less the correct approach; however, I'm unsure as to the exact role of CreateFileMapping vs MapViewOfFile and if this will work in multiple threads (e.g. the way writes are ordered when they are flushed to disk).
I intend to open the file once per process as per (1).
Per thread, I intend to create a mmap file handle as per (2) for the entire file. If I need to grow the file, I will estimate how much space I need, close the handle and reopen it using CreateFileMapping.
While the worker is doing its thing, it needs pieces of the file. So, I intend to use MapViewOfFile (which seems limited to 2 GB) for each piece, process it annd unmap it again.
Questions:
Do I understand the concepts correctly?
When is data physically read and written to disk? So, when I have a loop that writes 1 MB of data in (3), will it write that data after the unmap call? Or will it write data the moment I hit memory in another page? (After all, disks are block devices so at some point we have to write a block...)
Will this work in multiple threads? This is about the calls themselves - I'm not sure if they will error if you have -say- 100 workers.
I do understand that (written) data is immediately available in other threads (unless it's a remote file), which means I should be careful with read/write concurrency. If I intend to write stuff, and afterwards update a single-physical-block) header (indicating that readers should use another pointer from now on) - then is it guaranteed that the data is written prior to the header?
Will it matter if I use 1 file or multiple files (assuming they're on the same physical device of course)?

Memory mapped files generally work best for READING; not writing. The problem you face is that you have to know the size of the file before you do the mapping.
You say:
in some cases the file will grow
Which really rules out a memory mapped file.
When you create a memory mapped file on Windoze, you are creating your own page file and mapping a range of memory to that page file. This tends to be the fastest way to read binary data, especially if the file is contiguous.
For writing, memory mapped files are problematic.

How to flush memory-mapped files using Boost's `mapped_file_sink` class?

Using the Boost Libraries version 1.62.0 and the mapped_file_sink class from Boost.IOStreams.
I want to flush the written data to disk at will, but there is no mapped_file_sink::flush() member function.
My questions are:
How can I flush the written data when using mapped_file_sink?
If the above can't be done, why not, considering that msync() and FlushViewOfFile() are available for a portable implementation?

If you look at the mapped file support for proposed Boost.AFIO v2 at https://ned14.github.io/boost.afio/classboost_1_1afio_1_1v2__xxx_1_1map__handle.html, you'll notice a lack of ability to flush mapped file views as well.
The reason why is because it's redundant on modern unified page cache kernels when the mapped view is identical in every way to the page cached buffers for that file. msync() is therefore a no-op on such kernels because dirty pages are already queued for writing out to storage as and when the system decides it is appropriate. You can block your process until the system has finished writing out all the dirty pages for that file using good old fsync().
All the above does not apply where (a) your kernel is not a unified page cache design (QNX, NetBSD etc) or (b) your file resides on a networked file system. If you are in an (a) situation, best to simply avoid memory mapped i/o altogether, just do read() and write(), they are such a small percentage of OSs nowadays let them suffer with poor performance. For the (b) situation, you are highly inadvised to be using memory mapped i/o ever with networked file systems. There is an argument for read-only maps of immutable files only, otherwise just don't do it unless you know what you're doing. Fall back to read() and write(), it's safer and less likely to surprise.
Finally, you linked to a secure file deletion program. Those programs don't work reliably any more with recent file systems because of delayed extent allocation or copy on write allocation. In other words, when you rewrite a section of an existing file, it doesn't modify the original data on storage but actually allocates new storage and points the extents list for the file at the new linked list. This allows a consistent file system to be recovered after unexpected data loss easily. To securely delete data on recent file systems you usually need to use special OS APIs, though deleting all the files and then filling the free space with random data may securely delete most of the data in question most of the time. Note copy on write filing systems may not release freed extents back to the free space pool for new allocation for many days or weeks until the next time a garbage collection routine fires or a snapshot is deleted. In this situation, filling free space with randomness will not securely delete the files in question. If all this is a problem, use FAT32 as your filing system, it's very simple and rewriting data on it really does rewrite the same data on storage (though note that some storage media e.g. SSDs are highly likely to also not rewrite data, these also write modifications to new storage and garbage collect freed extents later).

Do pictures ever get stored in RAM?

I am a beginner C++ programmer.
I wrote a simple program that creates a char array (the size is user's choice) and reads what previous information was in it. Often you can find something that makes sense but most of it is just strange characters. I made it output into a binary file.
Why do I often find multiple copies of the alphabet?
Is it possible to find a picture inside of the RAM chunk I retrieved?
I heard about file signatures (headers), which goes before any of the data in a file, but do "trailers" go in the back after all the data?

When you read uninitialized data from memory that you allocated, you'll never see any data from another process. You only ever see data that your own process has written. That is: your code plus all the libraries that you called.
This is a security feature of your kernel: It never leaks information from a process unless it's specifically asked to transfer that information.
If you didn't load a picture in memory, you'll never see one using this method.

Assumning your computer runs Linux, Windows, MacOS or something like that, there will NEVER be any pictures in the memory your process uses - unless you loaded them into your process. For security reasons, the memory used by other processes is cleared before it gets given to YOUR process. This is the case for all modern OS's, and has been the case for multi-user OS's (Unix, VAX-VMS, etc) more or less since they were first invented in the late 1950's or early 1960's - because someone figured out that it's kind of unfun when "your" data is found by someone else who is just out there fishing for it.
Even a process that has ended will have it's memory cleared - how would you like it if your password was still stored in memory for someone to find when the program that reads the password ended? [Programs that hold highly sensitive data, such as encryption keys or passwords, often manually (as in using code, but not waiting until the OS clears it when the process ends) clear the memory used to store such, because of the below debug functionally allowing the memory content to be inspected at any time, and the shorter time, the less likely a leak of sensitive information]
Once memory has been allocated to your process, and freed again, it will contain whatever happens to be in that memory, as clearing it takes extra time, and most of the time, you'd want to fill it with something else anyway. So it contains whatever it happens to contain, and if you poke around it, you will potentially "find stuff". But it's all your own processes work.
Most OS's have a way to read what another process is doing as part of the debug functionality (if you run the "debugger" in your system, it will of course run as a separate process, but needs to be able to access your program when you debug it, so there needs to be ways to read the memory of that process), but that requires a little more effort than just calling new or malloc (and you either will need to have extra permissions (superuser, adminstrator, etc), or be the owner of the other process too).
Of course, if your computer is running DOS or CP/M, it has no such security features, and you get whatever happens to be in the memory (and you could also just make up a pointer to an arbitrary address and read it, as long as you stay within the memory range of the system).

FileFlushBuffer() is so slow

We are repeatedly writing (many 1000's of times) to a single large archive file, patching various parts of it. After each write, we were calling FileFlushBuffer(), but have found this is very, very slow. If we wait and only call it every now and then (say every 32ish files), things run better, but I don't think this is the correct way of doing this.
Is there any way to not flush the buffer at all until we complete our last patch? If we take away the call completetly, close() does handle the flush, but then it becomes a huge bottleneck in itself. Failing that, having it not lock our other threads when it runs would make it less annoying, as we won't be doing any IO read IO on that file outside of the write. It just feels like the disk system is really getting in the way here.
More Info:
Target file is currently 16Gigs, but is always changing (usually upwards). We are randomly pinging all over the place in the file for the updates, and it's big enough that we can't cache the whole file. In terms of fragmentation, who knows. This is a large database of assets that gets updated frequently, so quite probably. Not sure of how to make it not fragment. Again, open to any suggestions.

If you know the maximum size of the file at the start then this looks like a classic memory mapped file application
edit. (On windows at least) You can't change the size of a memory mapped file while it's mapped. But you can very quickly expand it between opening the file and opening the mapping, simply SetFilePointer() to some large value and setEndOfFile(). You can similarly shrink it after you close the mapping and before you close the file.
You can map a <4Gb view (or multiple views) into a much larger file and the filesystem cache tends to be efficent with memory mapped files because it's the same mechanism the OS uses for loading programs, so is well tuned. You can let the OS manage when an update occurs or you can force a flush of certain memory ranges.

How to guarantee files that are decrypted during run time are cleaned up?

Using C or C++, After I decrypt a file to disk- how can I guarantee it is deleted if the application crashes or the system powers off and can't clean it up properly? Using C or C++, on Windows and Linux?

Unfortunately, there's no 100% foolproof way to insure that the file will be deleted in case of a full system crash. Think about what happens if the user just pulls the plug while the file is on disk. No amount of exception handling will protect you from that (the worst) case.
The best thing you can do is not write the decrypted file to disk in the first place. If the file exists in both its encrypted and decrypted forms, that's a point of weakness in your security.
The next best thing you can do is use Brian's suggestion of structured exception handling to make sure the temporary file gets cleaned up. This won't protect you from all possibilities, but it will go a long way.
Finally, I suggest that you check for temporary decrypted files on start-up of your application. This will allow you to clean up after your application in case of a complete system crash. It's not ideal to have those files around for any amount of time, but at least this will let you get rid of them as quickly as possible.

Don't write the file decrypted to disk at all.
If the system is powerd off the file is still on disk, the disk and therefore the file can be accessed.
Exception would be the use of an encrypted file system, but this is out of control of your program.

I don't know if this works on Windows, but on Linux, assuming that you only need one process to access the decrypted file, you can open the file, and then call unlink() to delete the file. The file will continue to exist as long as the process keeps it open, but when it is closed, or the process dies, the file will no longer be accessible.
Of course the contents of the file are still on the disk, so really you need more than just deleting it, but zeroing out the contents. Is there any reason that the decrypted file needs to be on disk (size?). Better would just to keep the decrypted version in memory, preferably marked as unswappable, so it never hits the disk.

Try to avoid it completely:
If the file is sensitive, the best bet is to not have it written to disk in a decrypted format in the first place.
Protecting against crashes: Structured exception handling:
However, you could add structured exception handling to catch any crashes.
__try and __except
What if they pull the plug?:
There is a way to protect against this...
If you are on windows, you can use MoveFileEx and the option MOVEFILE_DELAY_UNTIL_REBOOT with a destination of NULL to delete the file on the next startup. This will protect against accidental computer shutdown with an undeleted file. You can also ensure that you have an exclusively opened handle to this file (specify no sharing rights such as FILE_SHARE_READ and use CreateFile to open it). That way no one will be able to read from it.
Other ways to avoid the problem:
All of these are not excuses for having a decrypted file on disk, but:
You could also consider writing to a file that is larger than MAX_PATH via file syntax of \\?\. This will ensure that the file is not browsable by windows explorer.
You should set the file to have the temporary attribute
You should set the file to have the hidden attribute

In C (and so, I assume, in C++ too), as long as your program doesn't crash, you could register an atexit() handler to do the cleanup. Just avoid using _exit() or _Exit() since those bypass the atexit() handlers.
As others pointed out, though, it is better to avoid having the decrypted data written to disk. And simply using unlink() (or equivalent) is not sufficient; you need to rewrite some other data over the original data. And journalled file systems make that very difficult.

A process cannot protect or watch itself. Your only possibility is to start up a second process as a kind of watchdog, which regularly checks the health of the decrypting other process. If the other process crashes, the watchdog will notice and delete the file itself.
You can do that using hearth-beats (regular polling of the other process to see whether it's still alive), or using interrupts sent from the other process itself, which will trigger a timeout if it has crashed.
You could use sockets to make the connection between the watchdog and your app work, for example.
It's becoming clear that you need some locking mechanism to prevent swapping to the pagefile / swap-partition. On Posix Systems, this can be done by the m(un)lock* family of functions.

There's a problem with deleting the file. It's not really gone.
When you delete files off your hard drive (not counting the recycle bin) the file isn't really gone. Just the pointer to the file is removed.
Ever see those spy movies where they overwrite the hard drive 6, 8,24 times and that's how they know that it's clean.. Well they do that for a reason.
I'd make every effort to not store the file's decrypted data. Or if you must, make it small amounts of data. Even, disjointed data.
If you must, then they try catch should protect you a bit.. Nothing can protect from the power outage though.
Best of luck.

Check out tmpfile().
It is part of BSD UNIX not sure if it is standard.
But it creates a temporary file and automatically unlinks it so that it will be deleted on close.
Writing to the file system (even temporarily) is insecure.
Do that only if you really have to.
Optionally you could create an in-memory file system.
Never used one myself so no recommendations but a quick google found a few.

In C++ you should use an RAII tactic:
class Clean_Up_File {
std::string filename_;
public Clean_Up_File(std::string filename) { ... } //open/create file
public ~Clean_Up_File() { ... } //delete file
}
int main()
{
Clean_Up_File file_will_be_deleted_on_program_exit("my_file.txt");
}
RAII helps automate a lot of cleanup. You simply create an object on the stack, and have that object do clean up at the end of its lifetime (in the destructor which will be called when the object falls out of scope). ScopeGuard even makes it a little easier.
But, as others have mentioned, this only works in "normal" circumstances. If the user unplugs the computer you can't guarantee that the file will be deleted. And it may be possible to undelete the file (even on UNIX it's possible to "grep the harddrive").
Additionally, as pointed out in the comments, there are some cases where objects don't fall out of scope (for instance, the std::exit(int) function exits the program without leaving the current scope), so RAII doesn't work in those cases. Personally, I never call std::exit(int), and instead I either throw exceptions (which will unwind the stack and call destructors; which I consider an "abnormal exit") or return an error code from main() (which will call destructors and which I also consider an "abnormal exit"). IIRC, sending a SIGKILL also does not call destructors, and SIGKILL can't be caught, so there you're also out of luck.

This is a tricky topic. Generally, you don't want to write decrypted files to disk if you can avoid it. But keeping them in memory doesn't always guarentee that they won't be written to disk as part of a pagefile or otherwise.
I read articles about this a long time ago, and I remember there being some difference between Windows and Linux in that one could guarentee a memory page wouldn't be written to disk and one couldn't; but I don't remember clearly.
If you want to do your due diligence, you can look that topic up and read about it. It all depends on your threat model and what you're willing to protect against. After all, you can use compressed air to chill RAM and pull the encryption key out of that (which was actually on the new Christian Slater spy show, My Own Worst Enemy - which I thought was the best use of cutting edge, accurate, computer security techniques in media yet)

on Linux/Unix, use unlink as soon as you created the file. The file will be removed as soon as you program closes the file descriptor or exits.
Better yet, the file will be removed even if the whole system crashes - because it is basically removed as soon as you unlink it.
The data will not be physically deleted from the disk, of course, so it still may be available for hacking.

Remember that the computer could be powered down at any time. Then, somebody you don't like could boot up with a Linux live CD, and examine your disk in any level of detail desired without changing a thing. No system that writes plaintext to the disk can be secure against such attacks, and they aren't hard to do.
You could set up a function that will overwrite the file with ones and zeros repeatedly, preferably injecting some randomness, and set it up to run at end of program, or at exit. This will work, provided there are no hardware or software glitches, power failures, or other interruptions, and provided the file system writes to only the sectors it claims to be using (journalling file systems, for example, may leave parts of the file elsewhere).
Therefore, if you want security, you need to make sure no plaintext is written out, and that also means it cannot be written to swap space or the equivalent. Find out how to mark memory as unswappable on all platforms you're writing for. Make sure decryption keys and the like are treated the same way as plaintext: never written to the disk under any circumstances, and kept in unswappable memory.
Then, your system should be secure against attacks short of hostiles breaking in, interrupting you, and freezing your RAM chips before powering down, so they don't lose their contents before being transferred for examination. Or authorities demanding your key, legally (check your local laws here) or illegally.
Moral of the story: real security is hard.

The method that I am going to implement will be to stream the decryption- so that the only part that is in memory is the part that is decrypted during the read as the data is being used. Here is a diagram of the pipeline:
This will be a streamed implementation, so the only data that is in memory is the data that I am consuming in the application at any given point. This makes some things tricky- considering a lot of traditional file tricks are no longer available, but since the implementation will be stream based i will still be able to seek to different points of the file which would be translated to the crypt stream to decrypt at different sections.
Basically, it will be encrypting blocks of the file at a time - so then if I try to seek to a certain point it will decrypt that block to read. When I read past a block it decrypts the next block and releases the previous (within the crypt stream).
This implementation does not require me to decrypt to a file or to memory and is compatible with other stream consumers and providers (fstream).
This is my 'plan'. I have not done this type of work with fstream before and I will likely be posting a question as soon as I am ready to work on this.
Thanks for all the other answers- it was very informative.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js