I'm working on a ACID database software product and I have some questions about file durability on WinOS.
CreateFile has two attributes, FILE_FLAG_WRITE_THROUGH and FILE_FLAG_NO_BUFFERING - do I need both these to achieve file durability (ie. override all kinds of disk or OS file caching)? I'm asking since they seem to do the same thing, and setting FILE_FLAG_NO_BUFFERING causes WriteFile to throw an ERROR_INVALID_PARAMETER error.
FILE_FLAG_NO_BUFFERING specifies no caching at al. No read nor write cache all data goes directly to and from your application to disk. This is mostly usefull if you read such large chunks that caching is useless or you do your own caching. Note WhozCraig's comment on properly aligning your data when using this flag.
FILE_FLAG_WRITE_THROUGH only means that writes should written directly to disk before the function returns. This is enough to achieve ACID while it still gives the option to the OS to cache data from the file.
Using FlushFileBuffers() can provide a more efficient approach for achieving ACID as you can do several writes to a file and then flush them in one go. Combining writes in one flush is very important as non cached writes will limit you to the spindle speed of your harddrive. 120 non cached writes or flushes per second max for a 7200 rpm disk.
Related
There is FlushFileBuffers() API in Windows to flush buffers till hard drive for a single file. There is sync() API in Linux to flush file buffers for all files.
However, is there WinAPI for flushing all files too, i.e. a sync() analog?
https://learn.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-flushfilebuffers
It is possible to flush the entire hard drive.
To flush all open files on a volume, call FlushFileBuffers with a handle to the volume. The caller must have administrative privileges. For more information, see Running with Special Privileges.
Also, the same article states the correct procedure to follow if, for some reason, data must be flushed: CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags.
Due to disk caching interactions within the system, the FlushFileBuffers function can be inefficient when used after every write to a disk drive device when many writes are being performed separately. If an application is performing multiple writes to disk and also needs to ensure critical data is written to persistent media, the application should use unbuffered I/O instead of frequently calling FlushFileBuffers. To open a file for unbuffered I/O, call the CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags. This prevents the file contents from being cached and flushes the metadata to disk with each write. For more information, see CreateFile.
But also check the restrictions of file buffering about memory and data alignment.
According to File Management Functions there is no any sync() analog from Linux in WinAPI.
I am receiving a large quantity of data at a fixed rate. I need to do some processing on this data on a different thread, but this may run slower than the data is coming in, so I need to buffer the data. Due to the quantity of data coming in the available RAM would be quickly exhausted, so it needs to overflow onto the hard disk. What I could do with is something like a filesystem-backed pipe, so the writer could be blocked by the filesystem, but not by the reader running too slowly.
Here's a rough set of requirements:
Writing should not be blocked by the reader running too slowly.
If data is read slow enough that the available RAM is exhausted it should overflow to the filesystem. It's ok for writes to the disk to block.
Reading should block if no data is available unless the stream has been closed by the writer.
If the reader is able to keep up with the data then it should never hit the hard disk as the RAM buffer would be sufficient (nice but not essential).
Disk space should be recovered as the data is consumed (or soon after).
Does such a mechanism exist in Windows?
This looks like a classic message queue. Did you consider MSMQ or similar? MSMQ has all the properties you are asking for. You may want to use direct addressing to avoid Active Directory http://msdn.microsoft.com/en-us/library/ms700996(v=vs.85).aspx and use local or TCP/IP queue address.
Use an actual file. Write to the file as the data is received, and in another process read the data from the file and process it.
You even get the added benefits of no multithreading.
I have written a program in C/C++ which needs to fetch data from the disk. After some time it so happens that the operating system stores some of the data in its caches. Is there some way by which I may figure out in a C/c++ programs whether the data has been retrieved from the caches or the data has been retrieved from the disk?
A simple solution would be to time the read operation. Disk reads are significantly slower. you can read a a group of file blocks (4K) twice to get an estimate.
The problem is that if you run the program again or copy the file in a shell, the OS will cache it.
Bakcground:
I'm developing on the new SparkleDB NoSQL database, the database is ACID and has its own disk space manager (DSM) all for its database file storage accessing. The DSM allows for multiple thread concurrent I/O operations on the same physical file, ie. Asynchronous I/O or overlapped I/O. We disable disk caching, thus we write pages directly to the disk, as this is required for ACID databases.
My question is:
Is there a performance gain by arranging continuous disk page from many threads writes before sending the I/O request to the underlying disk OS I/O subsystem(thus merging the data to be written if they are continuous), or does the I/O subsystem do this for you? My question applies to UNIX, Linux, and Windows.
Example (all happends within a space of 100ms):
Thread #1: Write 4k to physical file address 4096
Thread #2: Write 4k to physical file address 0
Thread #3: Write 4k to physical file address 8192
Thread #4: Write 4k to physical file address 409600
Thread #5: Write 4k to physical file address 413696
Using this information, the DSM arranges a single 12kb write operation to physical file address 0, and a single 8kb write operation to physical file address 409600.
Update:
The DSM does all the physical file access address positioning on Windows by providing a OVERLAPPED structure, io_prep_pwrite on Linux AIO, and aiocb's aio_offset on POSIX AIO.
The most efficient method to use a hard drive is to keep writing as much data as you can while the platters are still spinning. This involves reducing the quantity of writes and increase the amount of data per write. If this can happen, then having a disk area of continuous sectors will help.
For each write, the OS needs to translate the write to your file into logical or physical coordinates on the drive. This may involve reading the directory, searching for your file and locating the mapping of your file within the directory.
After the OS determines the location, it sends data across the interface to the hard drive. Your data may be cached along the way many times until it is placed onto the platters. An efficient write will use the block sizes of the caches and data interfaces.
Now the questions are: 1) How much time does this save? and 2) Is the time saving significant. For example, if all this work saves you 1 second, this one second gained may be lost in waiting for a response from the User.
Many programs, OS and drivers will postpone writes to a hard drive to non-critical or non-peak periods. For example, while you are waiting for User input, you could be writing to the hard drive. This posting of writes may be less effort than optimizing the disk writes and have more significant impact to your application.
BTW, this has nothing to do with C++.
I am using sqlite database in my arm9 embedded linux platform. I want to reduce writes to disk database because my disk is a flash memory and it needs minimum write cycles. So I tried to increment SQLITE_DEFAULT_CACHE_SIZE as 5000. My objective was to write data to cache and when the cache is filled, automatically flush to disk. But by incrementing SQLITE_DEFAULT_CACHE_SIZE, I can't confirm whether this is working or not. I am not seeing any changes in the operations! Is my way correct? Can anybody give me some suggestions?
Thanks
Aneesh
SQLite to be ACID db flushes with every commit OR with every insert/delete/update not wrapped with transaction. Use transactions for grouping operations OR turn OFF ACIDity and set PRAGMA synchronous=OFF.
"PRAGMA synchronous = OFF" and SQLite won't flush data at all (effectively leaving that to OS Cache)
SQLITE_DEFAULT_CACHE_SIZE is ONLY for size of cache. And cache is used ONLY for reading data.
There is another option - you can implement own VFS layer and prevent page saving at all before your own buffer will be full. http://www.sqlite.org/c3ref/vfs.html
But I'm sure that sync=off (or much better to use transactions) will do the job good enough (while having a good chance to corrupt your db in case power failures or hard reset for sync=off).
Another hint is to place JOURNAL in memory or turn it off completely. Again - it's turning off acidity, but that also removes some disk touches.
The latest SQLite has a feature for backing up hot databases it's still experimental, but my recommendation would be to use an on Memory Database and merge it with the Disk database when you think is appropriate.
Ok Neil .If the "SQLite is using a form of write-through caching" then on the cache overflow, it will try to flush the Data to some temporary file or disk file .This is the Same point i am trying to experiment by by enalrging the cache size and thus acquire control over flushing rate .but it is not happening.please reply.
You have the source code to SQLite - why not simply instrument it to record the information you are interested in.