I'm attempting to figure out what the best way is to write files in Windows. For that, I've been running some tests with memory mapping, in an attempt to figure out what is happening and how I should organize things...
Scenario: The file is intended to be used in a single process, in multiple threads. You should see a thread as a worker that works on the file storage; some of them will read, some will write - and in some cases the file will grow. I want my state to survive both process and OS crashes. Files can be large, say: 1 TB.
After reading a lot on MSDN, I whipped up a small test case. What I basically do is the following:
Open a file (CreateFile) using FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH.
Build a mmap file handle (CreateFileMapping) on the file, using some file growth mechanism.
Map the memory regions (MapViewOfFile) using a multiple of the sector size (from STORAGE_PROPERTY_QUERY). The mode I intend to use is READ+WRITE.
So far I've been unable to figure out how to use these construct exactly (tools like diskmon won't work for good reasons) so I decided to ask here. What I basically want to know is: how I can best use these constructs for my scenario?
If I understand correctly, this is more or less the correct approach; however, I'm unsure as to the exact role of CreateFileMapping vs MapViewOfFile and if this will work in multiple threads (e.g. the way writes are ordered when they are flushed to disk).
I intend to open the file once per process as per (1).
Per thread, I intend to create a mmap file handle as per (2) for the entire file. If I need to grow the file, I will estimate how much space I need, close the handle and reopen it using CreateFileMapping.
While the worker is doing its thing, it needs pieces of the file. So, I intend to use MapViewOfFile (which seems limited to 2 GB) for each piece, process it annd unmap it again.
Questions:
Do I understand the concepts correctly?
When is data physically read and written to disk? So, when I have a loop that writes 1 MB of data in (3), will it write that data after the unmap call? Or will it write data the moment I hit memory in another page? (After all, disks are block devices so at some point we have to write a block...)
Will this work in multiple threads? This is about the calls themselves - I'm not sure if they will error if you have -say- 100 workers.
I do understand that (written) data is immediately available in other threads (unless it's a remote file), which means I should be careful with read/write concurrency. If I intend to write stuff, and afterwards update a single-physical-block) header (indicating that readers should use another pointer from now on) - then is it guaranteed that the data is written prior to the header?
Will it matter if I use 1 file or multiple files (assuming they're on the same physical device of course)?
Memory mapped files generally work best for READING; not writing. The problem you face is that you have to know the size of the file before you do the mapping.
You say:
in some cases the file will grow
Which really rules out a memory mapped file.
When you create a memory mapped file on Windoze, you are creating your own page file and mapping a range of memory to that page file. This tends to be the fastest way to read binary data, especially if the file is contiguous.
For writing, memory mapped files are problematic.
Related
I need to learn how to update a file concurrently without blocking other threads. Let me explain how it should work, needs, and how I think it should be implemented, then I ask my questions:
Here is how the worker works:
Worker is multithreaded.
There is one very large file (6 Terabyte).
Each thread is updating part of this file.
Each write is equal to one or more disk blocks (4096 bytes).
No two worker write at same block (or same group of blocks) at the same time.
Needs:
Threads should not block other blocks (no lock on file, or minimum possible number of locks should be used)
In case of (any kind of) failure, There is no problem if updating block corrupts.
In case of (any kind of) failure, blocks that are not updating should not corrupts.
If file write was successful, we must be sure that it is not buffered and be sure that actually written on disk (fsync)
I can convert this large file to as many smaller files as needed (down to 4kb files), but I prefer not to do that. Handling that many files is difficult, and needs a lot of file handles open/close operations, which has negative impact on performance.
How I think it should be implemented:
I'm not much familiar with file manipulation and how it works at operating system level, but I think writing on a single block should not corrupt other blocks when errors happen. So I think this code should perfectly work as needed, without any change:
char write_value[] = "...4096 bytes of data...";
int write_block = 12345;
int block_size = 4096;
FILE *fp;
fp = fopen("file.txt","w+");
fseek(fp, write_block * block_size, SEEK_SET);
fputs(write_value, fp);
fsync(fp);
fclose(fp);
Questions:
Obviously, I'm trying to understand how it should be implemented. So any suggestions are welcome. Specially:
If writing to one block of a large file fails, what is the chance of corrupting other blocks of data?
In short, What things should be considered on perfecting code above, (according to the last question)?
Is it possible to replace one block of data with another file/block atomically? (like how rename() system call replaces one file with another atomically, but in block-level. Something like replacing next-block-address of previous block in file system or whatever else).
Any device/file system/operating system specific notes? (This code will run on CentOS/FreeBSD (not decided yet), but I can change the OS if there is better alternative for this problem. File is on one 8TB SSD).
Threads should not block other blocks (no lock on file, or minimum possible number of locks should be used)
Your code sample uses fseek followed by fwrite. Without locking in-between those two, you have a race condition because another thread could jump in-between. There are three reasonable solutions:
Use flockfile, followed by regular fseek and fwrite_unlocked then funlock. Those are POSIX-2001 standard
Use separate file handles per thread
Use pread and pwrite to do IO without having to worry about the seek position
Option 3 is the best for you.
You could also use the asynchronous IO from <aio.h> to handle the multithreading. It basically works with a thread-pool calling pwrite on most Unix implementations.
In case of (any kind of) failure, There is no problem if updating block corrupts
I understand this to mean that there should be no file corruption in any failure state. To the best of my knowledge, that is not possible when you overwrite data. When the system fails in the middle of a write command, there is no way to guarantee how many bytes were written, at least not in a file-system agnostic version.
What you can do instead is similar to a database transaction: You write the new content to a new location in the file. Then you do an fsync to ensure it is on disk. Then you overwrite a header to point to the new location. If you crash before the header is written, your crash recovery will see the old content. If the header gets written, you see the new content. However, I'm not an expert in this field. That final header update is a bit of a hand-wave.
In case of (any kind of) failure, blocks that are not updating should not corrupts.
Should be fine
If file write was successful, we must be sure that it is not buffered and be sure that actually written on disk (fsync)
Your sample code called fsync, but forgot fflush before that. Or you set the file buffer to unbuffered using setvbuf
I can convert this large file to as many smaller files as needed (down to 4kb files), but I prefer not to do that. Handling that many files is difficult, and needs a lot of file handles open/close operations, which has negative impact on performance.
Many calls to fsync will kill your performance anyway. Short of reimplementing database transactions, this seems to be your best bet to achieve maximum crash recovery. The pattern is well documented and understood:
Create a new temporary file on the same file system as the data you want to overwrite
Read-Copy-Update the old content to the new temporary file
Call fsync
Rename the new file to the old file
The renaming on a single file system is atomic. Therefore this procedure will ensure after a crash, you either get the old data or the new one.
If writing to one block of a large file fails, what is the chance of corrupting other blocks of data?
None.
Is it possible to replace one block of data with another file/block atomically? (like how rename() system call replaces one file with another atomically, but in block-level. Something like replacing next-block-address of previous block in file system or whatever else).
No.
I'm working on a new project (a game engine for self education) and trying to create a logging system. I want the logger to help with debugging as much as possible, so I plan on using it a lot to write to a log file. The only issue is that I'm worried doing file I/O will slow down the game loop which needs to operate within a time bound. What is the best way I can write to a file with minimal risk of slowing down the important section?
I have thought about using threads, but I'm worried that the overhead of context switched due to the process scheduler may be even more of an impediment to performance.
I have considered writing to a buffer and occasionally doing a large dump to the file, but I have read that this can potentially be even slower than regular file writing if the buffer becomes too big. Is it feasible to keep the whole buffer in memory and only write all the contents to the file at once at the end of the program?
I have read lightly about using a memory mapped file, but I've also read that it requires the boost library to be done effectively. I'd like to minimize the dependencies, so ideally I wouldn't use boost. I'm also not entirely sure that my concept of memory mapped files is correct. From what I understand, it behaves as if you are simply writing to memory, but eventually the memory contents will be written to the file. Is this conception correct?
Thanks for reading all of this :)
TL;DR - How can I implement a logging system that minimizes the performance decrease of my program?
If you decide to write everything to memory and at the end write the whole logs to the file, then if any application crash will wipe away all the debug data.
About the memory mapped file, you are write. But you have to consider when the in-memory pages will be written to the disk.
You can use from Ipc methods and separate the logger process from main process and these two process communicate with each other via a queue. main process put the message in queue and logger process get the message and write to file.
I am trying to use Memory Mapped File (MMF) to read my .csv data file (very large and time consuming).
I've heared that MMF is very fast since it caches content of the file, thus users can get access to the content in disk as in memory.
May I know if MMF is any faster than using other reading methods?
If this is true, can anyone show me a simple example how to read a file from disk?
Many thanks in advance.
May I know if MMF is any faster than using other reading methods?
If you're reading the entire file sequentially in one pass, then a memory-mapped file is probably approximately the same as using conventional file I/O.
can anyone show me a simple example how to read a file from disk?
Memory mapped files are typically an operating system feature, so you'd have to tell us which platform you're on to get an example of using it.
If you want to read a file sequentially, you can use the C++ ifstream class or the C run-time functions like fopen, fread, and fclose.
If it's faster or not depends on many different factors (such as what data you are accessing, how you are accessing it, etc. To determine what is right for YOUR case, you need to benchmark different solutions, and see what is best in your case.
The main benefit of memory mapped files is that the data can be copied directly from the filesystem into the user-accessible memory.
In traditional (fstream::read(), fredad(), etc) types of file-reading, the content of the file is read into a temporary buffer in the OS, then (part of) that buffer is copied to the user supplied buffer. This is because the OS can't rely on the memory being there and it gets pretty messy pretty quickly. For memory mapped files, the OS knows directly where the memory is for the different sections (because it's the OS's task to assign that memory and keep track of where it is!) of the file, so the OS can just copy it straight in.
However, I strongly suspect that the method of reading the file is a minor part, and the actual interpretation/parsing/copying out of the file may well be a large part. [Speculation, we haven't seen your code, of course]. And of course, the I/O speed available from the DISK itself may play a large factor if the file is very large.
i have a multithreaded program that reads and writes files. One thread receives data and writes them in a file. Every 250 Mb of data, a new file is created. Multiple other threads can read into these files to retrieve data. I'm using C++ std file stream.
To prevent problems, my current implementation uses two file descriptors for the same file: one for readers and one for the writer. A mutex protects from multiple access at the same time, and the file descriptor position is moved each time the mutex owner needs it.
I really need to be able to read in the file as fast as possible, and the mutex doesn't really help me.
Firstly, I would like to know if it's safe to read and write the file or have multiple reads at the same time (on every platform).
Secondly, if yes, I would like to know how it is safe for the hardware like the "Disk read-and-write head" for a HDD. The software works on the disk all the time to save data, and i don't want my algorithm to decrease too much the hard disk life time (already short).
Thank you for your help
There is no problem regarding multiple threads reading the same file.
Now, if I understood your description correctly, you do not modify already-written data, you just continuously append data to your file until it reaches 250Mb, then you continue writing on a new file.
If this is the case, you may not need a mutex at all. For instance, you might be able to keep your whole "file" into memory until it reaches 250mb, and only then you would write it all to disk, so you know that any files already on disk aren't going to be written anymore and can be read freely with no worries. As for the file that is still being written, you can have a global integer that holds how many bytes (or strings or whatever you use) have already been written, and reading-threads are limited by this integer, which does not need a lock, as long as you only update the integer after you have already written the data. (since you said there is only 1 thread writing data).
Simply reading the integer cannot corrupt it even when being done by multiple threads at the same time and being written by a single one, so this will ensure your reader threads will not read beyond the limit, and such limit will always be safe and consistent, while the writer-thread can peacefully write data in an area that is guaranteed to not be bothered by read-threads until it is finished.
As for your second question, if you are indeed able to keep the currently-being-written file fully in memory, that will already save up some HDD usage, as well as time. Additionally, keep in mind most modern HDDs have 32Mb+ of cache, so it is not like every read and write will be directly hitting the HDD itself, unless you have a ton of threads reading random files and random parts of them all the time. If that is the case, there is probably not much you can do to help the HDD. And if that's not the case, there is not much to worry about, as the OS and the caches will do what they were meant to do :)
We are repeatedly writing (many 1000's of times) to a single large archive file, patching various parts of it. After each write, we were calling FileFlushBuffer(), but have found this is very, very slow. If we wait and only call it every now and then (say every 32ish files), things run better, but I don't think this is the correct way of doing this.
Is there any way to not flush the buffer at all until we complete our last patch? If we take away the call completetly, close() does handle the flush, but then it becomes a huge bottleneck in itself. Failing that, having it not lock our other threads when it runs would make it less annoying, as we won't be doing any IO read IO on that file outside of the write. It just feels like the disk system is really getting in the way here.
More Info:
Target file is currently 16Gigs, but is always changing (usually upwards). We are randomly pinging all over the place in the file for the updates, and it's big enough that we can't cache the whole file. In terms of fragmentation, who knows. This is a large database of assets that gets updated frequently, so quite probably. Not sure of how to make it not fragment. Again, open to any suggestions.
If you know the maximum size of the file at the start then this looks like a classic memory mapped file application
edit. (On windows at least) You can't change the size of a memory mapped file while it's mapped. But you can very quickly expand it between opening the file and opening the mapping, simply SetFilePointer() to some large value and setEndOfFile(). You can similarly shrink it after you close the mapping and before you close the file.
You can map a <4Gb view (or multiple views) into a much larger file and the filesystem cache tends to be efficent with memory mapped files because it's the same mechanism the OS uses for loading programs, so is well tuned. You can let the OS manage when an update occurs or you can force a flush of certain memory ranges.