Does fwrite block until data has been written to disk? - c++

Does the fwrite() function return after the data to be written to disk has been handed over to the operating system or does it return only after the data is actually physically written to the disk?
For my case, I'm hoping that it's the first case since I don't want to wait until all the data is physically written to the disk. I'm hoping that another OS thread transfers it in the background.
I'm curious about behavior on Windows 10 in this particular case.
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fwrite

There are several places where there is buffering of data in order to improve efficiency when using fwrite(): buffering within the C++ Runtime and buffering in the operating system file system interface and buffering within the actual disk hardware.
The default for these are to delay the actual physical writing of data to disk until there is an actual request to flush buffers or if appropriate indicators are turned on to perform physical writes as the write requests are made.
If you want to change the behavior of fwrite() take a look at the setbuf() function setbuf redirection as well as setbuff() Linux man page and here is the Microsoft documentation on setbuf().
And if you look at the documentation for the underlying Windows CreateFile() function you will see there are a number of flags which include flags as to whether buffering of data should be done or not.
FILE_FLAG_NO_BUFFERING 0x20000000
The file or device is being opened with no system caching for data
reads and writes. This flag does not affect hard disk caching or
memory mapped files.
There are strict requirements for successfully working with files
opened with CreateFile using the FILE_FLAG_NO_BUFFERING flag, for
details see File Buffering.
And see the Microsoft documentation topic File Buffering.
In a simple example, the application would open a file for write
access with the FILE_FLAG_NO_BUFFERING flag and then perform a call to
the WriteFile function using a data buffer defined within the
application. This local buffer is, in these circumstances, effectively
the only file buffer that exists for this operation. Because of
physical disk layout, file system storage layout, and system-level
file pointer position tracking, this write operation will fail unless
the locally-defined data buffers meet certain alignment criteria,
discussed in the following section.
Take a look at this discussion about settings at the OS level for what looks to be Linux https://superuser.com/questions/479379/how-long-can-file-system-writes-be-cached-with-ext4

Does the fwrite(fp,...) function return after the data to be written to disk has been handed over to the operating system or does it return only after the data is actually physically written to the disk?
No. In fact, it does not even (necessarily) wait until data has been handed over to the OS -- fwrite may just put the data in its internal buffer and return immediately without actually writing anything.
To force data to the OS, you need to use fflush(fp) on the FILE pointer, but that still does not necessarily write data to the disk, though it will generally queue it for writing. But it does not wait for those queued writes to finish.
So to guarentee the data is written to disk, you need to do an OS level call to wait until the queued writes complete. On POSIX systems (such as Linux), that is fsync(fileno(fp)). You'll need to study the Windows documentation to figure out how to do the equivalent on Windows.

Related

How to ensure data is written to file

I'm writing a logging program for a microcontroller with OS Linux. There is also a calculation function, in which those results shall stored on HDD and loaded when the logger is restarted.
My problem is, when I unplug the µC from current meanwhile the µC is overwritting some data, the overwritten data could be lost.
So how I may overwrite some data, but ensure whether the overwritten data or the written data is consistent if a unplug meanwhile the µC is overwritting happens?
Programming language is C++, so I would be in love if there is an boost library or even better a stl type.
Use stream << flush; to flush the C++ output buffer to the OS, and use Linux fsync() to flush from the OS buffer to disk.
The latter requires a Unix file descriptor, so you'll need to use an implementation-dependent method to get the FD from the C++ stream. See Retrieving file descriptor from a std::fstream
For additional protection you need to use a fault-resistent filesystem with journaling. See https://www.ibm.com/developerworks/library/l-journaling-filesystems/index.html for an example.

Concurrently writing to file while reading it out using mmap

The situation is this.
A large buffer of data (which shall exceed reasonable RAM
consumption) is being generated by the program.
The program concurrently serves a websocket which will allow a web
client to specify a small subset of this buffer of data to view.
To support the first goal, the file is written to using standard methods (I use portable C-stdio fopen and fwrite because it's been shown to be faster than various "pure C++" methods. Doesn't matter. Data gets appended to file; stdio will buffer the writes and periodically flush them.)
To support the second goal (on BSD, in particular iOS), the file is opened (open from sys/fcntl.h -- not as portable as stdio.h) and memory-mapped (mmap from sys/mman.h -- ditto). By deciding to use memory mapping I have to give up some portability with this code. It seems like Boost is something I could look at to avoid wheel reinvention.
Anyway, my question is about how exactly I'm supposed to do this, because there will be at least two threads: The main program thread appending to the file periodically, and the network (or a worker) thread which responds to web requests and delivers data read out of the memory regions that are mapped to the file on disk.
Supposing the file starts out 1024 bytes in size, mmap is called initially mapping 1024 bytes. As the main thread writes a further 512 bytes into the file, how can the network thread be notified or know anything about the current actual size of the file (so that it can munmap and mmap again with a larger buffer corresponding to the new size)? Furthermore, if I do this naively, I am wary of a situation where the main thread reports that 512 bytes are written, so the other thread now maps 1536 bytes of the file, but not all of the new 512 bytes actually got written to disk yet (OS is still working on writing it, maybe). What happens now? Could there be some garbage that shows up? Will my program crash?
How can I determine when data has been properly flushed? How can I be notified in a timely fashion after the data has been flushed so that I can memory map it?
In particular, is calling fflush the only way to guarantee that the file is now updated w.r.t. the stream, and then can I guarantee (once fflush returns) that the memory map can access the new size without an access violation? What about fsync?
When you are using POSIX API directly in the form of mmap, you should also be using it directly for the writing. POSIX and LibC interfaces just don't play well together.
write is a system call which transfers the data directly to kernel. It would be slow for writing byte-by-byte, but for writing large buffers it is tiny fraction faster because it has less overhead (fwrite ends up calling write under the hood anyway). And it is definitely more efficient that fwrite+fflush, because those may end up being two or more calls to write and if you do direct write, it is just one.
The documentation of mmap is not very clear about it, but it seems you must not request more bytes than the file actually has.

Flushing only file metadata

We're developing on a new ACID database system that focuses more on data integrity than throughput. Its storage engine accesses secondary storage devices directly with flags like O_DIRECT or FILE_FLAG_WRITE_THROUGH & FILE_FLAG_NO_BUFFERING.
In some cases we only change file metadata using kernel functions like fallocate() or SetFileValidData() - in these cases I would like to flush only the metadata and not all pending file I/O to leverage execution performance as the call blocks until the device reports that the transfer has completed - even if no file buffering is in use it still only applies to application data and the file system may still cache file metadata.
I've so far found that fsync() or FlushFileBuffers() flushes metadata, but unfortunately it also flushes all pending I/O. Anyone know of a way of only flushing the file metadata? This problem applies to Linux, UNIX, and Windows.
I am a newbie to FS. But when you go through implementation of any physical FS (ext4/ext3/etc) they haven't exposed such functionality to upper layer. But internally in fsyc() implementation they only update metadata of the file and remaining task is delegated to generic_block_fdatasync().
You might want to write a hack for your requirement of flushing only metadata.
Anyone know of a way of only flushing the file metadata?
No, Based on my understanding, there is no interface/API provided by any operating system. There are two types of the interfaces provided by FileSystem through which application(User mode) program can control when data gets written/saved to disk.
fsync: A call to fsync( ) ensures that all dirty data associated with the file mapped by the file descriptor fd is written back to disk. This call writes back both data and metadata.
fdatasync: This system call does the same thing as fsync( ), except that it only flushes data.
This means there is a way to perform something opposite to the task mentioned in this question. However while reading your question,it appears to me that you want to achieve this to get optimal performance and data consistency. With my understanding we should not think much about the execution performance as modern FileSystem implements the "delayed write" and various other mechanism to avoid unnecessary disk writes.
The main intention over here is to switch between User Mode and Kernel Mode as it is more expensive compared to anything else. This might be reason that kernel developer has not provided such interface which can only be used to update the meta data of that particular file. This could be due to limitation of the FileSystem and I guess here we can do little to achieve more efficiency.
For complete information on internal algorithm you may want to refer the great great classic book "The Design Of UNIX Operating System" By Maurice J Bach which describes these concepts and the implementation in detailed way.

synchronized write operation in C

I am working on a smart camera that runs linux. I capture images from the camera streaming software and writes the images on a SD card (attached with the camera). For writing the individual JPEG images, I used fopen and fwrite C functions. For synchronizing the disk write operation, I use fflulsh(pointer) to flush the buffers and write the data on the SD card. But it seems it has no effect as the write operation uses system memory and the memory gets decreased after every write operation. I also used low-level open and write functions in conjunction with fsync (filedesc), but it also has no effect.
The flushing of buffers take place only when I dismount the SD card and then the memory is freed. How can I disable this cache write instead of SD card write? or how can I force the data to be written on the SD card at the same time instead of using the system memory?
sync(2) is probably your best bet:
SYNC(2) Linux Programmer's Manual SYNC(2)
NAME
sync - commit buffer cache to disk
SYNOPSIS
#include <unistd.h>
void sync(void);
DESCRIPTION
sync() first commits inodes to buffers, and then buffers to disk.
BUGS
According to the standard specification (e.g., POSIX.1-2001), sync()
schedules the writes, but may return before the actual writing is done.
However, since version 1.3.20 Linux does actually wait. (This still
does not guarantee data integrity: modern disks have large caches.)
You can set the O_SYNC if you open the file using open(), or use sync() as suggested above.
With fopen(), you can use fsync(), or use a combination of fileno() and ioctl() to set options on the descriptor.
For more details see this very similar post: How can you flush a write using a file descriptor?
Check out fsync(2) when working with specific files.
There may be nothing that you can really do. Many file systems are heavily cached in memory so a write to a file may not immediately be written to disk. The only way to guarantee a write in this scenario is to actually unmount the drive.
When mounting the disk, you might want to specify the sync option (either using the -oflag in mount or on your fstab line. This will ensure that at least your writes are written synchronously. This is what you should always use for removable media.
Just because it's still taking up memory doesn't mean it hasn't also been written out to storage - a clean (identical to the copy on physical storage) copy of the data will stay in the page cache until that memory is needed for something else, in case an application later reads that data back.
Note that fflush() doesn't ensure the data has been written to storage - if you are using stdio, you must first use fflush(f), then fsync(fileno(f)).
If you know that you will not need to read that data again in the forseeable future (as seems likely for this case), you can use posix_fadvise() with the POSIX_FADV_DONTNEED flag before closing the file.

How best to manage Linux's buffering behavior when writing a high-bandwidth data stream?

My problem is this: I have a C/C++ app that runs under Linux, and this app receives a constant-rate high-bandwith (~27MB/sec) stream of data that it needs to stream to a file (or files). The computer it runs on is a quad-core 2GHz Xeon running Linux. The filesystem is ext4, and the disk is a solid state E-SATA drive which should be plenty fast for this purpose.
The problem is Linux's too-clever buffering behavior. Specifically, instead of writing the data to disk immediately, or soon after I call write(), Linux will store the "written" data in RAM, and then at some later time (I suspect when the 2GB of RAM starts to get full) it will suddenly try to write out several hundred megabytes of cached data to the disk, all at once. The problem is that this cache-flush is large, and holds off the data-acquisition code for a significant period of time, causing some of the current incoming data to be lost.
My question is: is there any reasonable way to "tune" Linux's caching behavior, so that either it doesn't cache the outgoing data at all, or if it must cache, it caches only a smaller amount at a time, thus smoothing out the bandwidth usage of the drive and improving the performance of the code?
I'm aware of O_DIRECT, and will use that I have to, but it does place some behavioral restrictions on the program (e.g. buffers must be aligned and a multiple of the disk sector size, etc) that I'd rather avoid if I can.
You can use the posix_fadvise() with the POSIX_FADV_DONTNEED advice (possibly combined with calls to fdatasync()) to make the system flush the data and evict it from the cache.
See this article for a practical example.
If you have latency requirements that the OS cache can't meet on its own (the default IO scheduler is usually optimized for bandwidth, not latency), you are probably going to have to manage your own memory buffering. Are you writing out the incoming data immediately? If you are, I'd suggest dropping that architecture and going with something like a ring buffer, where one thread (or multiplexed I/O handler) is writing from one side of the buffer while the reads are being copied into the other side.
At some size, this will be large enough to handle the latency required by a pessimal OS cache flush. Or not, in which case you're actually bandwidth limited and no amount of software tuning will help you until you get faster storage.
You can adjust the page cache settings in /proc/sys/vm, (see /proc/sys/vm/dirty_ratio, /proc/sys/vm/swappiness specifically) to tune the page cache to your liking.
If we are talking about std::fstream (or any C++ stream object)
You can specify your own buffer using:
streambuf* ios::rdbuf ( streambuf* streambuffer);
By defining your own buffer you can customize the behavior of the stream.
Alternatively you can always flush the buffer manually at pre-set intervals.
Note: there is a reson for having a buffer. It is quicker than writting to a disk directly (every 10 bytes). There is very little reason to write to a disk in chunks smaller than the disk block size. If you write too frquently the disk controler will become your bottle neck.
But I have an issue with you using the same thread in the write proccess needing to block the read processes.
While the data is being written there is no reason why another thread can not continue to read data from your stream (you may need to some fancy footwork to make sure they are reading/writting to different areas of the buffer). But I don't see any real potential issue with this as the IO system will go off and do its work asyncroniously (potentially stalling your write thread (depending on your use of the IO system) but not nesacerily your application).
I know this question is old, but we know a few things now we didn't know when this question was first asked.
Part of the problem is that the default values for /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio are not appropriate for newer machines with lots of memory. Linux begins the flush when dirty_background_ratio is reached, and blocks all I/O when dirty_ratio is reached. Lower dirty_background_ratio to start flushing sooner, and raise dirty_ratio to start blocking I/O later. On very large memory systems, (32GB or more) you may even want to use dirty_bytes and dirty_background_bytes, since the minimum increment of 1% for the _ratio settings is too coarse. Read https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/ for a more detailed explanation.
Also, if you know you won't need to read the data again, call posix_fadvise with FADV_DONTNEED to ensure cache pages can be reused sooner. This has to be done after linux has flushed the page to disk, otherwise the flush will move the page back to the active list (effectively negating the effect of fadvise).
To ensure you can still read incoming data in the cases where Linux does block on the call to write(), do file writing in a different thread than the one where you are reading.
Well, try this ten pound hammer solution that might prove useful to see if i/o system caching contributes to the problem: every 100 MB or so, call sync().
You could use a multithreaded approach—have one thread simply read data packets and added them to a fifo, and the other thread remove packets from the fifo and write them to disk. This way, even if the write to disk stalls, the program can continue to read incoming data and buffer it in RAM.