I'm on MacOSX.
In the logger part of my application, I'm dumping data to a file.
suppose I have a globally declared std::ofstream outFile("log");
and in my logging code I have:
outFile << "......." ;
outFile.flush();
Now, suppose my code crashes after the flush() happens; Is the stuff written to outFile before the flush() guaranteed to be written to disk (note that I don't call a close()).
Thanks!
From the C++ runtime's point of view, it should have been written to disk. From an OS perspective it might still linger in a buffer, but that's only going to be an issue if your whole machine crashes.
As an alternative approach, you can disable buffering altogether with
outFile.rdbuf()->pubsetbuf(0, 0);
Writing to an unbuffered fstream may hurt performance, but worrying about that before measuring would be premature optimization.
flush() flushes the iostream library's buffers - however the data is almost certainly not immediately flushed from the operating system's buffers at exactly the same time, so there is a small period during in which an operating system crash could lose you data. You can of course lose data at any time if you suffer a hard disk failure, whether the data was written or not, so I wouldn't worry too much about this.
As long as flush() has returned, your program has successfully put the output in the OS's hands. Unless the OS (or disk) crashes, your data should be on disk next time the disk writes (note that the disk likely has a solid state cache of its own).
Until flush() returns, it's anybody's guess how much will make it to the disk.
Related
To start with,Here in cplusplus.com it says every stream object has a associated std::streambuf
And in c++ primer 5th it says:
Each output stream manages a buffer, which it uses to hold the data that the programreads and writes. For example, when the following code is executed
os << "please enter a value: ";
the literal string might be printed immediately, or the operating system might store the data in a buffer to be printed later
There are several conditions that cause the buffer to be flushed—that is, to be written—to the actual output device or file:The program completes normally,using a manipulator such as endl,etc.
In my understanding,the sentence "the operating system might store the data in a buffer(not in the buffer)" in the context above means that both the stream object and OS use their own buffer,that is,one in the process address space,another in the kernel space managed by the OS.
And here is my question,
why does every process/object(like cout) manage its own buffer?Why not just arose a system call and give the data directly to the OS buffer?
Furthermore,is the term 'flushed' acting on object buffer or the OS buffer?I guess the flushed action actually arouse a system call and tell the OS to immediately put the data in OS buffer onto the screen.
why does every process/object(like cout) manage its own buffer?Why not just arose a system call and give the data directly to the OS buffer?
As a bit of a pre-answer, you could always re-write the stream buffer to always flush to a system OS call for output (or input). In fact, your system may already do this -- it just depends on the implementation. This system just allows buffering at the level of the iostreams library, but doesn't necessarily require it as far as I remember.
For buffering, it is not always the most efficient to send out or read data byte by byte. In cases like cout and cin in many systems this may be better handled by the OS, but you could adapt the iostreams to handle input and output streams that are reading sockets (I/O from internet connections). In sockets, you could write each individual character within a single package to your target over the internet, but this could become really slow depending on the type of link and how busy the link is. When you read a socket, the message can be split across packets so you need to buffer the input until you hit 'critical mass'. There are potentially ways to do this buffering at the level of OS, but I found at least I could get much better performance if I handled most of this buffering myself (since usually the size of messages had a large standard deviation across the runtime). So the buffering within iostreams was a useful way to manage the input and output to optimize performance, and this especially helped when you tried to juggle I/O from multiple connections at the same time.
But you can't always assume the OS will do the right thing. I remember once we were using this FUSE module that allowed us to have a distributed file system across multiple computer nodes. It had a really weird problem when writing and reading single characters. Whereas reading or writing a long sequence of single characters would take at most seconds on a normal hard disk using an ext4 system, the same operation would take days on the FUSE system (ignoring for the moment why we did it this way in the first place). Through debugging, we found the hang was at the level of I/O, and reading and writing individual characters exacerbated this run-time problem. We had to re-write the code to buffer our reads and writes. The best we could figure out is that the OS on ext4 did its own buffering but this FUSE file system didn't do a similar buffering when reading and writing to the hard disk.
In any case, the OS may do its own buffering, but there are a number of cases where this buffering is non-existent or minimal. Buffering on the iostream end could help your performance.
Furthermore,is the term 'flushed' acting on object buffer or the OS buffer?I guess the flushed action actually arouse a system call and tell the OS to immediately put the data in OS buffer onto the screen.
I believe most texts will talk about 'flushed' in terms of the standard I/O streams in C++. Your program probably doesn't have direct control over how the OS handles its I/O. But in general I think the I/O of the OS and your program will be in sync for most systems.
Idea/Fact #1
I was reading few post about how the streams are buffered so fwrite() is usually buffered stream. On the other hand write() will not be buffered.
Why the fwrite libc function is faster than the syscall write function?
Idea/Fact #2
I was also looking into the article about disc caching and how Linux uses it heavily to improve the disc performance substantially.
http://www.linuxatemyram.com/play.html
So in the presence of disc buffering which Linux do by default shouldn't fwrite() and write() will render same performance? What fwrite() is doing is a "buffering over already buffered disc"! which should not give huge boost. What am i missing here?
fwrite buffering and disk caching work on two very different levels.
fwrite works on the program level: it buffers numerous small writes and pools them together to make one system call, rather than an individual system call for each small write. This saves you the repeated overhead of switching from user mode to kernel mode and back.
Disk caching works on the kernel level, by pooling disk writes, allowing them to be delayed. Hard disks can be slow, so if you'd have to wait for all the data to be consumed by the disk driver, then your program will be delayed. By utilising cache, which is generally much faster than the drive, you can complete the write much faster and return to the program. While the program continues running, the cache will slowly be emptied onto the disk, without the program having to wait for it.
Let's say I am using c++ files stream asynchronously. I mean never using std::flush nor std::endl. My application writes a lot of data to a file and abruptly crashes down.
Is the data remaining in the cache system flushed to the disk, or discarded (and lost)?
Complicating this problem is that there are multiple 'caches' in play.
C++ streams have their own internal buffering mechanism. Streams don't ask the OS to write to disk until either (a) you've sent enough data into the buffer that the streams library thinks the write wouldn't be wasted (b) you ask for a flush specifically (c) the stream is in line-buffering mode, and you've sent along the endl. Any data in these buffers are lost when the program crashes.
The OS will buffer writes to make best use of the limited amount of disk IO available. Writes will typically be flushed within five to thirty seconds; sooner if the programmer (or libraries) calls fdatasync(2) or fsync(2) or sync(2) (which asks for all dirty data to be flushed). Any data in the OS buffers are written to disk (eventually) when the program crashes, lost if the kernel crashes.
The hard drive will buffer writes to try to make the best use of its slow head, rotational latency, etc. Data arrives in this buffer when the OS flushes its caches. Data in these buffers are written to disk when the program crashes, will probably be written to disk if the kernel crashes, and might be written to disk if the power is suddenly removed from the drive. (Some have enough power to continue writing their buffers, typically this would take less than a second anyway.)
The stuff in the library buffer (which you flush with std::flush or such) is lost, the data in the OS kernel buffers (which you can flush e.g. with fsync()) is not lost unless the OS itself crashes.
I am writing some binary data into a binary file through fwrite and once i am through with writing i am reading back the same data thorugh fread.While doing this i found that fwrite is taking less time to write whole data where as fread is taking more time to read all data.
So, i just want to know is it fwrite always takes less time than fread or there is some issue with my reading portion.
Although, as others have said, there are no guarantees, you'll typically find that a single write will be faster than a single read. The write will be likely to copy the data into a buffer and return straight away, while the read will be likely to wait for the data to be fetched from the storage device. Sometimes the write will be slow if the buffers fill up; sometimes the read will be fast if the data has already been fetched. And sometimes one of the many layers of abstraction between fread/fwrite and the storage hardware will decide to go off into its own little world for no apparent reason.
The C++ language makes no guarantees on the comparative performance of these (or any other) functions. It is all down to the combination of hardware and operating system, the load on the machine and the phase of the moon.
These functions interact with the operating system's file system cache. In many cases it is a simple memory-to-memory copy. Write could indeed be marginally faster if you run your program repeatedly. It just needs to find a hole in the cache to dump its data. Flushing that data to the disk happens at a time you can't see or measure.
More work is usually needed to read. At a minimum it needs to traverse the cache structure to discover if the disk data is already cached. If not, it is going to have to block on a disk driver request to retrieve the data from the disk, that takes many milliseconds.
The standard trap with profiling this behavior is taking measurements from repeated runs of your program. They are not at all representative for the way your program is going to behave in the wild. The odds that the disk data is already cached are very good on the second run of your program. They are very poor in real life, reads are likely to be very slow, especially the first one. An extra special trap exists for a write, at some point (depending on the behavior of other programs too), the cache is not going to be able to buffer the write request. Write performance is then going to fall of a cliff as your program gets blocked until enough data is flushed to the disk.
Long story short: don't ever assume disk read/write performance measurements are representative for how your program will behave in production. And perhaps more to the point: there isn't anything you can do to solve disk I/O perf problems in your code.
You are seeing some effect of the buffer/cache systems as other have said, however, if you use async API (as you said your suing fread/write you should look at aio_read/aio_write) you can experiment with some other methods for I/O which are likely more well optimized for what your doing.
One suggestion is that if you are read/update/write/reading a file a lot, you should, by way of an ioctl or DeviceIOControl, request to the OS to provide you the geometry of the disk your code is running on, then determine the size of a disk cylander so you may be able to determine if you can do your read/write operations buffered inside of a single cylinder. This way, the drive head will not move for your read/write and save you a fair amount of run time.
My problem is this: I have a C/C++ app that runs under Linux, and this app receives a constant-rate high-bandwith (~27MB/sec) stream of data that it needs to stream to a file (or files). The computer it runs on is a quad-core 2GHz Xeon running Linux. The filesystem is ext4, and the disk is a solid state E-SATA drive which should be plenty fast for this purpose.
The problem is Linux's too-clever buffering behavior. Specifically, instead of writing the data to disk immediately, or soon after I call write(), Linux will store the "written" data in RAM, and then at some later time (I suspect when the 2GB of RAM starts to get full) it will suddenly try to write out several hundred megabytes of cached data to the disk, all at once. The problem is that this cache-flush is large, and holds off the data-acquisition code for a significant period of time, causing some of the current incoming data to be lost.
My question is: is there any reasonable way to "tune" Linux's caching behavior, so that either it doesn't cache the outgoing data at all, or if it must cache, it caches only a smaller amount at a time, thus smoothing out the bandwidth usage of the drive and improving the performance of the code?
I'm aware of O_DIRECT, and will use that I have to, but it does place some behavioral restrictions on the program (e.g. buffers must be aligned and a multiple of the disk sector size, etc) that I'd rather avoid if I can.
You can use the posix_fadvise() with the POSIX_FADV_DONTNEED advice (possibly combined with calls to fdatasync()) to make the system flush the data and evict it from the cache.
See this article for a practical example.
If you have latency requirements that the OS cache can't meet on its own (the default IO scheduler is usually optimized for bandwidth, not latency), you are probably going to have to manage your own memory buffering. Are you writing out the incoming data immediately? If you are, I'd suggest dropping that architecture and going with something like a ring buffer, where one thread (or multiplexed I/O handler) is writing from one side of the buffer while the reads are being copied into the other side.
At some size, this will be large enough to handle the latency required by a pessimal OS cache flush. Or not, in which case you're actually bandwidth limited and no amount of software tuning will help you until you get faster storage.
You can adjust the page cache settings in /proc/sys/vm, (see /proc/sys/vm/dirty_ratio, /proc/sys/vm/swappiness specifically) to tune the page cache to your liking.
If we are talking about std::fstream (or any C++ stream object)
You can specify your own buffer using:
streambuf* ios::rdbuf ( streambuf* streambuffer);
By defining your own buffer you can customize the behavior of the stream.
Alternatively you can always flush the buffer manually at pre-set intervals.
Note: there is a reson for having a buffer. It is quicker than writting to a disk directly (every 10 bytes). There is very little reason to write to a disk in chunks smaller than the disk block size. If you write too frquently the disk controler will become your bottle neck.
But I have an issue with you using the same thread in the write proccess needing to block the read processes.
While the data is being written there is no reason why another thread can not continue to read data from your stream (you may need to some fancy footwork to make sure they are reading/writting to different areas of the buffer). But I don't see any real potential issue with this as the IO system will go off and do its work asyncroniously (potentially stalling your write thread (depending on your use of the IO system) but not nesacerily your application).
I know this question is old, but we know a few things now we didn't know when this question was first asked.
Part of the problem is that the default values for /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio are not appropriate for newer machines with lots of memory. Linux begins the flush when dirty_background_ratio is reached, and blocks all I/O when dirty_ratio is reached. Lower dirty_background_ratio to start flushing sooner, and raise dirty_ratio to start blocking I/O later. On very large memory systems, (32GB or more) you may even want to use dirty_bytes and dirty_background_bytes, since the minimum increment of 1% for the _ratio settings is too coarse. Read https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/ for a more detailed explanation.
Also, if you know you won't need to read the data again, call posix_fadvise with FADV_DONTNEED to ensure cache pages can be reused sooner. This has to be done after linux has flushed the page to disk, otherwise the flush will move the page back to the active list (effectively negating the effect of fadvise).
To ensure you can still read incoming data in the cases where Linux does block on the call to write(), do file writing in a different thread than the one where you are reading.
Well, try this ten pound hammer solution that might prove useful to see if i/o system caching contributes to the problem: every 100 MB or so, call sync().
You could use a multithreaded approach—have one thread simply read data packets and added them to a fifo, and the other thread remove packets from the fifo and write them to disk. This way, even if the write to disk stalls, the program can continue to read incoming data and buffer it in RAM.