How to asynchronously flush a memory mapped file? - c++

I am using memory mapped files to have read-/write-access to a large number of image files (~10000 x 16 MB) under Windows 7 64bit. My goals are:
Having as much data cached as possible.
Being able to allocate new images and write to those as fast as possible.
Therefore I am using memory mapped files to access the files. Caching works well, but the OS is not flushing dirty pages until I am nearly out of physical memory. Because of that allocating and writing to new files is quite slow once the physical memory is filled.
One solution would be to regularly use FlushViewOfFile(), but this function does not return until the data has been writen to disk.
Is there a way to asynchroniously flush a file mapping? The only solution I found is to Unmap() and MapViewOfFile() again, but using this approach I can not be sure to get the same data pointer again. Can someone suggest a better approach?
Edit:
Reading the WINAPI documentation a little longer, it seems that I found a suitable solution to my problem:
Calling VirtualUnlock() on a memory range that is not locked results in a flushing of dirty pages.

I heard that FlushViewOfFile() function does NOT wait until it physically write to file.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366563(v=vs.85).aspx
The FlushViewOfFile function does not flush the file metadata, and it does not wait to return until the changes are flushed from the underlying hardware disk cache and physically written to disk.
After call "FlushFileBuffers( ... )" then your data will be physically written to disk.

Related

Memory mapped IO concept details

I'm attempting to figure out what the best way is to write files in Windows. For that, I've been running some tests with memory mapping, in an attempt to figure out what is happening and how I should organize things...
Scenario: The file is intended to be used in a single process, in multiple threads. You should see a thread as a worker that works on the file storage; some of them will read, some will write - and in some cases the file will grow. I want my state to survive both process and OS crashes. Files can be large, say: 1 TB.
After reading a lot on MSDN, I whipped up a small test case. What I basically do is the following:
Open a file (CreateFile) using FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH.
Build a mmap file handle (CreateFileMapping) on the file, using some file growth mechanism.
Map the memory regions (MapViewOfFile) using a multiple of the sector size (from STORAGE_PROPERTY_QUERY). The mode I intend to use is READ+WRITE.
So far I've been unable to figure out how to use these construct exactly (tools like diskmon won't work for good reasons) so I decided to ask here. What I basically want to know is: how I can best use these constructs for my scenario?
If I understand correctly, this is more or less the correct approach; however, I'm unsure as to the exact role of CreateFileMapping vs MapViewOfFile and if this will work in multiple threads (e.g. the way writes are ordered when they are flushed to disk).
I intend to open the file once per process as per (1).
Per thread, I intend to create a mmap file handle as per (2) for the entire file. If I need to grow the file, I will estimate how much space I need, close the handle and reopen it using CreateFileMapping.
While the worker is doing its thing, it needs pieces of the file. So, I intend to use MapViewOfFile (which seems limited to 2 GB) for each piece, process it annd unmap it again.
Questions:
Do I understand the concepts correctly?
When is data physically read and written to disk? So, when I have a loop that writes 1 MB of data in (3), will it write that data after the unmap call? Or will it write data the moment I hit memory in another page? (After all, disks are block devices so at some point we have to write a block...)
Will this work in multiple threads? This is about the calls themselves - I'm not sure if they will error if you have -say- 100 workers.
I do understand that (written) data is immediately available in other threads (unless it's a remote file), which means I should be careful with read/write concurrency. If I intend to write stuff, and afterwards update a single-physical-block) header (indicating that readers should use another pointer from now on) - then is it guaranteed that the data is written prior to the header?
Will it matter if I use 1 file or multiple files (assuming they're on the same physical device of course)?
Memory mapped files generally work best for READING; not writing. The problem you face is that you have to know the size of the file before you do the mapping.
You say:
in some cases the file will grow
Which really rules out a memory mapped file.
When you create a memory mapped file on Windoze, you are creating your own page file and mapping a range of memory to that page file. This tends to be the fastest way to read binary data, especially if the file is contiguous.
For writing, memory mapped files are problematic.

RAM consumption on opening a file

I have a Binary file of ~400MB which I want to convert to CSV format. The output CSV file will be ~1GB (according to my calculations).
I read the binary file and store it in an array of structures (required for other processing too), and when the user wants to export it to CSV, I am creating a file (or opening an existing file - depending on the user's choice), opening it using fopen and then writing to it using fwrite, line by line.
Coming to my question, this link from CPlusPlus.com says:
The returned stream is fully buffered by default if it is known to not
refer to an interactive device
My query is when I open this file, will it be loaded in RAM? Like when at the end, my file is of ~1GB, will it consume that much RAM or will it be just on the hard disk?
This code will run on Windows as well as Android.
FILE* streams buffering is a C feature and it is used to reduce system call overhead (i.e. do not call read() for each fgetc() which is expensive). Usually buffer is small - i.e. 512 bytes.
Page Cache or similiar mechanisms are different beasts -- they are used to reduce number of disks operations. Usually operating system uses free memory to cache previously read or written data to/from disk so subsequent operations will use RAM.
If there are shortage of free memory -- data is evicted from page cache.
It is operating system and file system and computer specific. And it might not matter that much. Read about page cache.
BTW, you might be interested by sqlite
From an application writer point of view, you should care more about virtual memory and address space of your process than about RAM. Physical RAM is managed by the operating system.
On Linux and Android, if you want to optimize that you might consider (later) using posix_fadvise(2) and perhaps madvise(2). I'm not sure it is worth the pain in your case (since a gigabyte file is not that much today).
I read the binary file and store it in an array of structures (required for other processing too), and when the user wants to export it to CSV
Reading per se doesn't use a lot of memory, like myaut says the buffer is small. The elephant in the room here is: do you you read up all the file and put all the data into structures? or do you start processing after one or few reads to get the minimum amount of data needed to do some processing? Doing the former will indeed use ~400MB or more memory, doing the later will use quite a lot less, that being said, it all depends on the amount of data needed to start processing, and maybe you need all the data loaded at once.

FileFlushBuffer() is so slow

We are repeatedly writing (many 1000's of times) to a single large archive file, patching various parts of it. After each write, we were calling FileFlushBuffer(), but have found this is very, very slow. If we wait and only call it every now and then (say every 32ish files), things run better, but I don't think this is the correct way of doing this.
Is there any way to not flush the buffer at all until we complete our last patch? If we take away the call completetly, close() does handle the flush, but then it becomes a huge bottleneck in itself. Failing that, having it not lock our other threads when it runs would make it less annoying, as we won't be doing any IO read IO on that file outside of the write. It just feels like the disk system is really getting in the way here.
More Info:
Target file is currently 16Gigs, but is always changing (usually upwards). We are randomly pinging all over the place in the file for the updates, and it's big enough that we can't cache the whole file. In terms of fragmentation, who knows. This is a large database of assets that gets updated frequently, so quite probably. Not sure of how to make it not fragment. Again, open to any suggestions.
If you know the maximum size of the file at the start then this looks like a classic memory mapped file application
edit. (On windows at least) You can't change the size of a memory mapped file while it's mapped. But you can very quickly expand it between opening the file and opening the mapping, simply SetFilePointer() to some large value and setEndOfFile(). You can similarly shrink it after you close the mapping and before you close the file.
You can map a <4Gb view (or multiple views) into a much larger file and the filesystem cache tends to be efficent with memory mapped files because it's the same mechanism the OS uses for loading programs, so is well tuned. You can let the OS manage when an update occurs or you can force a flush of certain memory ranges.

Speeding up file I/O: mmap() vs. read()

I have a Linux application that reads 150-200 files (4-10GB) in parallel. Each file is read in turn in small, variably sized blocks, typically less than 2K each.
I currently need to maintain over 200 MB/s read rate combined from the set of files. The disks handle this just fine. There is a projected requirement of over 1 GB/s (which is out of the disk's reach at the moment).
We have implemented two different read systems both make heavy use of posix_advise: first is a mmaped read in which we map the entirety of the data set and read on demand.
The second is a read()/seek() based system.
Both work well but only for the moderate cases, the read() method manages our overall file cache much better and can deal well with 100s of GB of files, but is badly rate limited, mmap is able to pre-cache data making the sustained data rate of over 200MB/s easy to maintain, but cannot deal with large total data set sizes.
So my question comes to these:
A: Can read() type file i/o be further optimized beyond the posix_advise calls on Linux, or having tuned the disk scheduler, VMM and posix_advise calls is that as good as we can expect?
B: Are there systematic ways for mmap to better deal with very large mapped data?
Mmap-vs-reading-blocks
is a similar problem to what I am working and provided a good starting point on this problem, along with the discussions in mmap-vs-read.
Reads back to what? What is the final destination of this data?
Since it sounds like you are completely IO bound, mmap and read should make no difference. The interesting part is in how you get the data to your receiver.
Assuming you're putting this data to a pipe, I recommend you just dump the contents of each file in its entirety into the pipe. To do this using zero-copy, try the splice system call. You might also try copying the file manually, or forking an instance of cat or some other tool that can buffer heavily with the current file as stdin, and the pipe as stdout.
if (pid = fork()) {
waitpid(pid, ...);
} else {
dup2(dest, 1);
dup2(source, 0);
execlp("cat", "cat");
}
Update0
If your processing is file-agnostic, and doesn't require random access, you want to create a pipeline using the options outlined above. Your processing step should accept data from stdin, or a pipe.
To answer your more specific questions:
A: Can read() type file i/o be further optimized beyond the posix_advise calls on Linux, or having tuned the disk scheduler, VMM and posix_advise calls is that as good as we can expect?
That's as good as it gets with regard to telling the kernel what to do from userspace. The rest is up to you: buffering, threading etc. but it's dangerous and probably unproductive guess work. I'd just go with splicing the files into a pipe.
B: Are there systematic ways for mmap to better deal with very large mapped data?
Yes. The following options may give you awesome performance benefits (and may make mmap worth using over read, with testing):
MAP_HUGETLB
Allocate the mapping using "huge pages."
This will reduce the paging overhead in the kernel, which is great if you will be mapping gigabyte sized files.
MAP_NORESERVE
Do not reserve swap space for this mapping. When swap space is reserved, one has the guarantee that it is possible to modify the mapping. When swap space is not reserved one might get SIGSEGV upon a write if no physical memory is available.
This will prevent you running out of memory while keeping your implementation simple if you don't actually have enough physical memory + swap for the entire mapping.**
MAP_POPULATE
Populate (prefault) page tables for a mapping. For a file mapping, this causes read-ahead on the file. Later accesses to the mapping will not be blocked by page faults.
This may give you speed-ups with sufficient hardware resources, and if the prefetching is ordered, and lazy. I suspect this flag is redundant, the VFS likely does this better by default.
Perhaps using the readahead system call might help, if your program can predict in advance the file fragments it wants to read (but this is only a guess, I could be wrong).
And I think you should tune your application, and perhaps even your algorithms, to read data in chunk much bigger than a few kilobytes. Can't than be half a megabyte instead?
The problem here doesn't seem to be which api is used. It doesn't matter if you use mmap() or read(), the disc still has to seek to the specified point and read the data (although the os does help to optimize the access).
mmap() has advantages over read() if you read very small chunks (a couple of bytes) because you don't have call the os for every chunk, which becomes very slow.
I would also advise like Basile did to read more than 2kb consecutively so the disc doesn't have to seek that often.

Memory Based Data Server for local IPC

I am going to be running an app(s) that require about 200MB of market data each time it runs.
This is trivial amount of data to store in memory these days, so for speed thats what i want to do.
Over the course of a days session I will probably run, re-run, re-write and re-run etc etc one or more applications over and over.
SO, the question is how to hold the data in memory all day such that even if the app crashes I do not have to reload the data by opening the data file on disk and re-loading the data?
My initial idea is to write a data server app that does nothing more than read the data into shared memory so that it is available for use. If I do that I guess I could use memory mapping for the IPC by calling
CreateFile()
CreateFileMapping()
MapViewOfFile()
Is there a better IPC/approach?
If you have enough memory and nothing else asks for memory, that might reduce your startup time. To guarantee access to the memory, you probably want to have a memory mapped file in named shared memory, as described here. You can have a simple program create the share and manage it so you can guarantee it remains in memory.
Just memory map the data file. Unless your computer is low on memory, the file will stay in file cache even when the program exits. The next time it starts up, access will be fast.
If your in-memory data is different from the on-disk data, just use two files. On restart, check a timestamp and a file revision written into the memory file to compare to the disk file and that way your program will know which one has the most recent data.