Cross-platform atomic writes/renames without a transactional FS in C++

Cross-platform atomic writes/renames without a transactional FS in C++ - c++

I'm working on the app that needs to ensure consistency of its data saved to disk. I need to guarantee that the data never gets corrupt when dumped to disk. I.e. a reboot or app shutdown could happen when saving the data.
I know the steps that need to be done:
http://blogs.msdn.com/b/adioltean/archive/2005/12/28/507866.aspx
But I was wondering whether there's already an implementation allowing for this preferably in a cross-platform way? I presume boost::filesystem guarantees atomic rename (on Windows and POSIX), so wondering if I missed this functionality in boost somewhere? Thanks
UPD: I had hopes for boost::interprocess::message_queue but it just hangs on reading the queue if the process is killed in the middle of adding to the queue + memory mapped file takes up maximum size on disk, which is expected to be the worst case anyway.

you can get decrease of performance and/or lose all app data, if you will use renaming. May be store some key information (record ID and fingerprint, for example) after each record, and seek last correct key information when application is starting is better way?

Related

Linux non-persistent backing store for mmap()

First, a little motivating background info: I've got a C++-based server process that runs on an embedded ARM/Linux-based computer. It works pretty well, but as part of its operation it creates a fairly large fixed-size array (e.g. dozens to hundreds of megabytes) of temporary/non-persistent state information, which it currently keeps on the heap, and it accesses and/or updates that data from time to time.
I'm investigating how far I can scale things up, and one problem I'm running into is that eventually (as I stress-test the server by making its configuration larger and larger), this data structure gets big enough to cause out-of-memory problems, and then the OOM killer shows up, and general unhappiness ensues. Note that this embedded configuration of Linux doesn't have swap enabled, and I can't (easily) enable a swap partition.
One idea I have on how to ameliorate the issue is to allocate this large array on the computer's local flash partition, instead of directly in RAM, and then use mmap() to make it appear to the server process like it's still in RAM. That would reduce RAM usage considerably, and my hope is that Linux's filesystem-cache would mask most of the resulting performance cost.
My only real concern is file management -- in particular, I'd like to avoid any chance of filling up the flash drive with "orphan" backing-store files (i.e. old files whose processes don't exist any longer, but the file is still present because its creating process crashed or by some other mistake forgot to delete it on exit). I'd also like to be able to run multiple instances of the server simultaneously on the same computer, without the instances interfering with each other.
My question is, does Linux have any built-it facility for handling this sort of use-case? I'm particularly imagining some way to flag a file (or an mmap() handle or similar) so that when the file that created the process exits-or-crashes, the OS automagically deletes the file (similar to the way Linux already automagically recovers all of the RAM that was allocated by a process, when the process exits-or-crashes).
Or, if Linux doesn't have any built-in auto-temp-file-cleanup feature, is there a "best practice" that people use to ensure that large temporary files don't end up filling up a drive due to unintentionally becoming persistent?
Note that AFAICT simply placing the file in /tmp won't help me, since /tmp is using a RAM-disk and therefore doesn't give me any RAM-usage advantage over simply allocating in-process heap storage.

Yes, and I do this all the time...
open the file, unlink it, use ftruncate or (better) posix_fallocate to make it the right size, then use mmap with MAP_SHARED to map it into your address space. You can then close the descriptor immediately if you want; the memory mapping itself will keep the file around.
For speed, you might find you want to help Linux manage its page cache. You can use posix_madvise with POSIX_MADV_WILLNEED to advise the kernel to page data in and POSIX_MADV_DONTNEED to advise the kernel to release the pages.
You might find that last does not work the way you want, especially for dirty pages. You can use sync_file_range to explicitly control flushing to disk. (Although in that case you will want to keep the file descriptor open.)
All of this is perfectly standard POSIX except for the Linux-specific sync_file_range.

Yes, You create/open the file. Then you remove() the file by its filename.
The file will still be open by your process and you can read/write it just like any opened file, and it will disappear when the process having the file opened exits.
I believe this behavior is mandated by posix, so it will work on any unix like system. Even at a hard reboot, the space will be reclaimed.

I believe this is filesystem-specific, but most Linux filesystems allow deletion of open files. The file will still exist until the last handle to it is closed. I would recommend that you open the file then delete it immediately and it will be automatically cleaned up when your process exits for any reason.
For further details, see this post: What happens to an open file handle on Linux if the pointed file gets moved, delete

hibernation feature for arbitrary code

So this just came to my mind when I was thinking about data serialization and its resemblance to windows hibernation. When you hibernate the system, the OS does not care about individual programs and whether they can serialize/deserialize their data. It just dumps the whole thing to disk and later on you can resume whatever you have been doing.
Here's the question: How does Windows do this without caring about each individual program? Is it possible to somehow emulate this for your code so that you can "dump" it to disk and later on resume it without bothering to write serialization/deserialization methods?

Windows does this by suspending execution of every process and writing out the active (allocated) memory pages to disk. When this memory is later restored and the kernel kick-started, it is able to resume everything where it left off, because from its perspective, the memory hasn't actually changed. It's as though it just froze for a very long period of time.
The only way you could do this with one of your own processes would be to have some other supervisory code running in the kernel -- you'd need a way to get at your process' memory map and preserve it along with the actual memory pages so that all existing pointers in the application's memory remain valid when the pages are restored later. You would also need a way to persist other data (such as any open file descriptors) so that they could be restored as well.
This is not practical for most applications.

Absolute fastest way to store a 32-bit integer to disk?

I have a very latency sensitive routine that generates integers sequentially, but needs to store the last generated one to disk in case of a crash or re-start.
Currently I'm doing a seek to beginning of file then writing out the integer then flush each time a new int is generated. The flush is required so the write at least hits the battery-backed controller cache.
The seek is quite costly so I was thinking about just appending 4 bytes and if recovery is needed then to seek to the end and read the last 4 bytes. This previous statement obviously assumes that there isn't too much other disk activity happening, so the write head should ideally stay at end of the file.
The number won't typically go higher than 10,000,000 so 40MB isn't so bad.
Any advice as to how to achieve minimum latency without sacrificing integrity?
C or C++ on Linux 2.6+

I would think the fastest/easiest way to do this would be with mmap/msync -- mmap 1 page of the file into memory and store the value on that page. Any time the value changes, call msync(2) to force the page back to disk. This way you need only one system call per store

If I read correctly, how about using a memory mapped file? Just write your number to the assigned address and it appears in the file. This makes assumptions that the OS writing the cache to disk robustly when needed, but you might find it worth a try.
int len = sizeof(unsigned);
int fildes = open(...)
void* address = mmap(0, len, PROT_READ, MAP_PRIVATE, fildes, 0)
unsigned* mappedNumber = (unsigned*)(address);
*mappedNumber can now contain your integer.

Measure.
How much control do you have over the hardware? If anything less than full, you'll get no guarantees.
On Linux I'd probably try making a kernel driver that would do its writes with the highest priority, possibly even without using a file system.
But, theoretically... If it is enough for you to hit the controller cache, data will hit it every time you flush anything to disk. This means regardless of whether there will be physical seek inside the drive or not, the data will already be there. And because you'll never know what will other applications do, or how fast does the disk rotate, your seeks will be random even if you keep the logical file handle at the beginning or end of file.
And you can always ask your user to use a flash drive.

The fastest way to write a file is to map that file into memory and treat it as a char array.
You don't need to sync the file if you don't care about OS crashes (Linux never crashed on me in production). All your writes go to that file mapping bypassing the kernel, in other words, real zero-copy (you can't do that with sockets on the standard hardware yet). You may need to keep a header in that file that contains a number of records written in case your application crash during writing a record into the memory. I.e. write a record and only after that increment the record counter.
Resizing this file requires ftruncate()/remap() sequence which may take a bit too long, so you may want to minimize resizing by growing the file by a factor, like std::vector<> grows by 1.5 its size on push_back() when it overflows. Depending on your throughput and latency requirements certain optimization can be applied.
The kernel is going to write the file mapping to disk asynchronously (as if there were another thread in your application dedicated to writing to disk). There is a way to force the writes to disk if necessary by using msync(). This is only necessary, however, if you'd like to survive an OS crash. But surviving an OS crash requires sophisticated application design anyway, so in practice surviving the application crash is good enough.

Why does your application have to wait for the write complete at all?
Write your data asynchronously, or perhaps from another thread.
You don't really have much low-level control over the harddrive. As long as you write so little data at a time, you're going to incur a lot of expensive seeks. But since you're only using it as "checkpoints" to recover from in case of a crash, there seems to be no reason why the write couldn't occur asynchronously.

Storing an int only takes one block on disc, regardless of block size. So you have to sync one block to disc, and it takes as long as it takes, and there is nothing you can do to make it faster.
Whatever else you do, fdatasync() will be the killer, time-wise. It will sync one block into your (battery-backed RAID) controller.
Unless you have some kind of non-volatile ram, all (sensible) methods are going to be exactly equivalent because they all require one block to be sync'd.
Doing a seek system call is not going to make any difference, as that has no effect on hardware. In any case, you can avoid it by using pwrite().

Consider what "appending 4 bytes" means. Disks don't store files, or even bytes. They store clusters, and a fixed number of them. The notion of a file is created by the OS. It allocates some clusters to file system tables, to keep track of where a file is precisely located. Now, appending 4 bytes means at least writing the 4 bytes to a cluster. But that also means determining which cluster. What's the existing file size? Do we need a new cluster? If not, we need to read the last cluster, patch the 4 bytes in the correct position, and write back the cluster, then update the file size in the file system. If we do append a new cluster, we can write the 4 bytes followed by zeroes (don't need old value) but we need to do a whole lot of bookkeeping to add a cluster to a file.
So, the absolute fastest way cannot ever be to append 4 bytes. You must overwrite 4 existing bytes. Preferably in a sector that you already have in memory. Others have already pointed out that you can achieve this with mmap/msync.
Obviously, given current SSD and developer prices, and your 40 MB limit, you'll be using an SSD. It pays for itself if you save an hour. Therefore seek times are irrelevant; SSDs don't have physical heads.

There are a lot of people here talking about mmap() as if that will fix something, but your syscall overhead is basically zero compared to the disk write overhead. Remember that appending or writing to a file requires you to update the inode (mtime, filesize) anyway, so that means a disk seek.
I suggest you consider storing the integer somewhere other than a disk. For example:
write it to some nvram that you control (eg. on an embedded system). (If your RAID controller has nvram for writing, it might do this for you. But if you're asking this question, it probably doesn't.)
write it to free bytes in the system CMOS memory (eg. on PC hardware).
write it to another machine on the network (if it's a fast network) and get them to acknowledge.
redesign your application so you can get away with syncing after every n transactions, instead of after every transaction. That will be about n times faster than doing it every time.
redesign your application so that if the integer is lost, the changes from your most recent transaction are also lost. Then the fact that you've technically lost an integer update doesn't matter; when you reboot, it'll be as if you never incremented it, so you can just resume from there.
You didn't explain why you need this behaviour; to be honest, if your app needs this, it sounds like your application is probably not designed very well. For example, some people suggested using a database because they do this sort of thing all the time; true, but databases do it by being slow (ie. syncing the disk every time), unless you create a transaction first, in which case the disk only needs to get synced when you do 'commit transaction'. But if you absolutely must have a sync after every integer, you'd be constantly committing transactions, and a database couldn't save you from that; there's no magical way a database could guarantee not to lose data unless it does at least fdatasync().

How to guarantee files that are decrypted during run time are cleaned up?

Using C or C++, After I decrypt a file to disk- how can I guarantee it is deleted if the application crashes or the system powers off and can't clean it up properly? Using C or C++, on Windows and Linux?

Unfortunately, there's no 100% foolproof way to insure that the file will be deleted in case of a full system crash. Think about what happens if the user just pulls the plug while the file is on disk. No amount of exception handling will protect you from that (the worst) case.
The best thing you can do is not write the decrypted file to disk in the first place. If the file exists in both its encrypted and decrypted forms, that's a point of weakness in your security.
The next best thing you can do is use Brian's suggestion of structured exception handling to make sure the temporary file gets cleaned up. This won't protect you from all possibilities, but it will go a long way.
Finally, I suggest that you check for temporary decrypted files on start-up of your application. This will allow you to clean up after your application in case of a complete system crash. It's not ideal to have those files around for any amount of time, but at least this will let you get rid of them as quickly as possible.

Don't write the file decrypted to disk at all.
If the system is powerd off the file is still on disk, the disk and therefore the file can be accessed.
Exception would be the use of an encrypted file system, but this is out of control of your program.

I don't know if this works on Windows, but on Linux, assuming that you only need one process to access the decrypted file, you can open the file, and then call unlink() to delete the file. The file will continue to exist as long as the process keeps it open, but when it is closed, or the process dies, the file will no longer be accessible.
Of course the contents of the file are still on the disk, so really you need more than just deleting it, but zeroing out the contents. Is there any reason that the decrypted file needs to be on disk (size?). Better would just to keep the decrypted version in memory, preferably marked as unswappable, so it never hits the disk.

Try to avoid it completely:
If the file is sensitive, the best bet is to not have it written to disk in a decrypted format in the first place.
Protecting against crashes: Structured exception handling:
However, you could add structured exception handling to catch any crashes.
__try and __except
What if they pull the plug?:
There is a way to protect against this...
If you are on windows, you can use MoveFileEx and the option MOVEFILE_DELAY_UNTIL_REBOOT with a destination of NULL to delete the file on the next startup. This will protect against accidental computer shutdown with an undeleted file. You can also ensure that you have an exclusively opened handle to this file (specify no sharing rights such as FILE_SHARE_READ and use CreateFile to open it). That way no one will be able to read from it.
Other ways to avoid the problem:
All of these are not excuses for having a decrypted file on disk, but:
You could also consider writing to a file that is larger than MAX_PATH via file syntax of \\?\. This will ensure that the file is not browsable by windows explorer.
You should set the file to have the temporary attribute
You should set the file to have the hidden attribute

In C (and so, I assume, in C++ too), as long as your program doesn't crash, you could register an atexit() handler to do the cleanup. Just avoid using _exit() or _Exit() since those bypass the atexit() handlers.
As others pointed out, though, it is better to avoid having the decrypted data written to disk. And simply using unlink() (or equivalent) is not sufficient; you need to rewrite some other data over the original data. And journalled file systems make that very difficult.

A process cannot protect or watch itself. Your only possibility is to start up a second process as a kind of watchdog, which regularly checks the health of the decrypting other process. If the other process crashes, the watchdog will notice and delete the file itself.
You can do that using hearth-beats (regular polling of the other process to see whether it's still alive), or using interrupts sent from the other process itself, which will trigger a timeout if it has crashed.
You could use sockets to make the connection between the watchdog and your app work, for example.
It's becoming clear that you need some locking mechanism to prevent swapping to the pagefile / swap-partition. On Posix Systems, this can be done by the m(un)lock* family of functions.

There's a problem with deleting the file. It's not really gone.
When you delete files off your hard drive (not counting the recycle bin) the file isn't really gone. Just the pointer to the file is removed.
Ever see those spy movies where they overwrite the hard drive 6, 8,24 times and that's how they know that it's clean.. Well they do that for a reason.
I'd make every effort to not store the file's decrypted data. Or if you must, make it small amounts of data. Even, disjointed data.
If you must, then they try catch should protect you a bit.. Nothing can protect from the power outage though.
Best of luck.

Check out tmpfile().
It is part of BSD UNIX not sure if it is standard.
But it creates a temporary file and automatically unlinks it so that it will be deleted on close.
Writing to the file system (even temporarily) is insecure.
Do that only if you really have to.
Optionally you could create an in-memory file system.
Never used one myself so no recommendations but a quick google found a few.

In C++ you should use an RAII tactic:
class Clean_Up_File {
std::string filename_;
public Clean_Up_File(std::string filename) { ... } //open/create file
public ~Clean_Up_File() { ... } //delete file
}
int main()
{
Clean_Up_File file_will_be_deleted_on_program_exit("my_file.txt");
}
RAII helps automate a lot of cleanup. You simply create an object on the stack, and have that object do clean up at the end of its lifetime (in the destructor which will be called when the object falls out of scope). ScopeGuard even makes it a little easier.
But, as others have mentioned, this only works in "normal" circumstances. If the user unplugs the computer you can't guarantee that the file will be deleted. And it may be possible to undelete the file (even on UNIX it's possible to "grep the harddrive").
Additionally, as pointed out in the comments, there are some cases where objects don't fall out of scope (for instance, the std::exit(int) function exits the program without leaving the current scope), so RAII doesn't work in those cases. Personally, I never call std::exit(int), and instead I either throw exceptions (which will unwind the stack and call destructors; which I consider an "abnormal exit") or return an error code from main() (which will call destructors and which I also consider an "abnormal exit"). IIRC, sending a SIGKILL also does not call destructors, and SIGKILL can't be caught, so there you're also out of luck.

This is a tricky topic. Generally, you don't want to write decrypted files to disk if you can avoid it. But keeping them in memory doesn't always guarentee that they won't be written to disk as part of a pagefile or otherwise.
I read articles about this a long time ago, and I remember there being some difference between Windows and Linux in that one could guarentee a memory page wouldn't be written to disk and one couldn't; but I don't remember clearly.
If you want to do your due diligence, you can look that topic up and read about it. It all depends on your threat model and what you're willing to protect against. After all, you can use compressed air to chill RAM and pull the encryption key out of that (which was actually on the new Christian Slater spy show, My Own Worst Enemy - which I thought was the best use of cutting edge, accurate, computer security techniques in media yet)

on Linux/Unix, use unlink as soon as you created the file. The file will be removed as soon as you program closes the file descriptor or exits.
Better yet, the file will be removed even if the whole system crashes - because it is basically removed as soon as you unlink it.
The data will not be physically deleted from the disk, of course, so it still may be available for hacking.

Remember that the computer could be powered down at any time. Then, somebody you don't like could boot up with a Linux live CD, and examine your disk in any level of detail desired without changing a thing. No system that writes plaintext to the disk can be secure against such attacks, and they aren't hard to do.
You could set up a function that will overwrite the file with ones and zeros repeatedly, preferably injecting some randomness, and set it up to run at end of program, or at exit. This will work, provided there are no hardware or software glitches, power failures, or other interruptions, and provided the file system writes to only the sectors it claims to be using (journalling file systems, for example, may leave parts of the file elsewhere).
Therefore, if you want security, you need to make sure no plaintext is written out, and that also means it cannot be written to swap space or the equivalent. Find out how to mark memory as unswappable on all platforms you're writing for. Make sure decryption keys and the like are treated the same way as plaintext: never written to the disk under any circumstances, and kept in unswappable memory.
Then, your system should be secure against attacks short of hostiles breaking in, interrupting you, and freezing your RAM chips before powering down, so they don't lose their contents before being transferred for examination. Or authorities demanding your key, legally (check your local laws here) or illegally.
Moral of the story: real security is hard.

The method that I am going to implement will be to stream the decryption- so that the only part that is in memory is the part that is decrypted during the read as the data is being used. Here is a diagram of the pipeline:
This will be a streamed implementation, so the only data that is in memory is the data that I am consuming in the application at any given point. This makes some things tricky- considering a lot of traditional file tricks are no longer available, but since the implementation will be stream based i will still be able to seek to different points of the file which would be translated to the crypt stream to decrypt at different sections.
Basically, it will be encrypting blocks of the file at a time - so then if I try to seek to a certain point it will decrypt that block to read. When I read past a block it decrypts the next block and releases the previous (within the crypt stream).
This implementation does not require me to decrypt to a file or to memory and is compatible with other stream consumers and providers (fstream).
This is my 'plan'. I have not done this type of work with fstream before and I will likely be posting a question as soon as I am ready to work on this.
Thanks for all the other answers- it was very informative.

Fastest small datastore on Windows

My app keeps track of the state of about 1000 objects. Those objects are read from and written to a persistent store (serialized) in no particular order.
Right now the app uses the registry to store each object's state. This is nice because:
It is simple
It is very fast
Individual object's state can be read/written without needing to read some larger entity (like pulling out a snippet from a large XML file)
There is a decent editor (RegEdit) which allow easily manipulating individual items
Having said that, I'm wondering if there is a better way. SQLite seems like a possibility, but you don't have the same level of multiple-reader/multiple-writer that you get with the registry, and no simple way to edit existing entries.
Any better suggestions? A bunch of flat files?

If what you mean by 'multiple-reader/multiple-writer' is that you keep a lot of threads writing to the store concurrently, SQLite is threadsafe (you can have concurrent SELECTs and concurrent writes are handled transparently). See the [FAQ [1]] and grep for 'threadsafe'
[1]: http://www.sqlite.org/faq.html/ FAQ

If you do begin to experiment with SQLite, you should know that "out of the box" it might not seem as fast as you would like, but it can quickly be made to be much faster by applying some established optimization tips:
SQLite optimization
Depending on the size of the data and the amount of RAM available, one of the best performance gains will occur by setting sqlite to use an all-in-memory database rather than writing to disk.
For in-memory databases, pass NULL as the filename argument to sqlite3_open and make sure that TEMP_STORE is defined appropriately
On the other hand, if you tell sqlite to use the harddisk, then you will get a similar benefit to your current usage of RegEdit to manipulate the program's data "on the fly."
The way you could simulate your current RegEdit technique with sqlite would be to use the sqlite command-line tool to connect to the on-disk database. You can run UPDATE statements on the sql data from the command-line while your main program is running (and/or while it is paused in break mode).

I doubt any sane person would go this route these days, however some of what you describe could be done with Window's Structured/Compound Storage. I only mention this since you're asking about Windows - and this is/was an official Windows way to do this.
This is how DOC files were put together (but not the new DOCX format). From MSDN it'll appear really complicated, but I've used it, it isn't the worst API in Win32.
it is not simple
it is fast, I would guess it's faster then the registry.
Individual object's state can be read/written without needing to read some larger entity.
There is no decent editor, however there are some real basic stuff (VC++ 6.0 had the "DocFile Viewer" under Tools. (yeah, that's what that thing did) I found a few more online.
You get a file instead of registry keys.
You gain some old-school Windows developer geek-cred.
Other random thoughts:
I think XML is the way to go (despite the random access issue). Heck, INI files may work. The registry gives you very fine grain security if you need it - people seem to forget this when the claim using files are better. An embedded DB seems like overkill if I'm understanding what you're doing.

Do you need to persist the objects on each change event or just in memory and store on shutdown? If so, just load them up and serialize them at the end, assuming your app runs for a long time (and you don't share that state with another program) then in memory is going to be a winner.
If you've got fixed size structures then you could consider just using a memory mapped file and allocate memory from that?

If the only thing you do is serialize/deserialize individual objects (no fancy queries), then use a btree database, for example Berkeley DB. It is very fast at storing and retrieving chunks of data by key (I assume your objects have some id that can be used as a key) and access by multiple processes is supported.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js