When does output buffer flush? - c++

Apart from manually calling flush, what is the condition that cout or STDOUT(printf) would flush?
Exiting the current scope or current function? Is it timed? Flush when the buffer is full (and how big is the buffer)?

For <stdio.h> streams you can set the buffering mode using setvbuf(). It takes three buffering modes:
_IOFBF: the buffer is flushed when it is full or when a flush is explicitly requested.
_IOLBF: the buffer is flushed when a newline is found, the buffer is full, or a flush is requested.
_IONBF: the stream is unbuffered, i.e., output is sent as soon as available.
I had the impressino that the default setup for stdout is _IOLBF, for stderr it is _IONBF, and for other streams it is _IOFBF. However, looking at the C standard I don't find any indication on what the default is for any C stream.
For the standard C++ stream objects there is no equivalent to _IOLBF: if you want line buffer you'd use std::endl or, preferrably, '\n' and std::flush. There are a few setups for std::ostream, though:
You generally can use buf.pubsetbuf(0, 0) to turn a stream to be unbuffered. Since stream buffers can be implemented by users, it isn't guaranteed that the corresponding call to set the buffer is honored, though.
You can set std::ios_base::unitbuf which causes the stream to be flushed after each [properly implemented] output operations. By default std::ios_base::unitbuf is only set for std::cerr.
The normal setup for an std::ostream to flush the buffer when the buffer is full or when explicitly requested where, unfortunately, std::endl makes an explicit request to flush the buffer (causing performance problems in many cases because it tends to be used as a surrogate for '\n' which it is not).
An interesting one is the ability to in.tie() an output buffer to an input stream: if in.tie() contains a pointer to an std::ostream this output stream will be flushed prior to an attempt to read from in (assuming correctly implemented input operators, of course). By default, std::cout is tie()d to std::cin.
Nearly forgot an important one: if std::ios_base::sync_with_stdio() wasn't called with false the standard C++ streams (std::cin, std::cout, std::cerr and std::clog and their wchar_t counterparts) are probably entirely unbuffered! With the default settings of std::ios_base::sync_with_stdio(true) the standard C and C++ streams can be used in a mixed way. However, since the C library is generally oblivious of the C++ library this means that the C++ standard stream objects can't do any buffering. Using std::sync_with_stdio(true) is the major performance problem with standard C++ stream objects!
Neither in C nor in C++ you can really control the size of buffers: the requests to set a non-zero buffer are allowed to be ignored and normally will be ignored. That is, the stream will pretty much be flushed at somewhat random places.

Related

how endl mainly affects fully buffered streams?

https://www.cplusplus.com/doc/tutorial/basic_io/
In the following site, just before cin heading, it is stated that
The endl manipulator produces a newline character, exactly as the insertion of '\n' does; but it also has an additional behavior: the stream's buffer (if any) is flushed, which means that the output is requested to be physically written to the device, if it wasn't already. This affects mainly fully buffered streams, and cout is (generally) not a fully buffered stream.
My questions are, why endl mainly affects the fully buffered streams, and how cout is not a fully buffered stream?
There are three main buffering strategies used for output streams:
No buffering - Every write to the stream is immediately written to the underlying output device.
Line buffering - Writes to the stream are stored in memory until a newline character is written or the buffer is full, at which point the buffer is flushed to the underlying output device.
Full buffering - Writes to the stream are stored in memory until the stream's internal buffer is full, at which point the buffer is flushed to the underlying output device.
why endl mainly affects the fully buffered streams
This should be fairly apparent from the descriptions above. If the stream is unbuffered then std::endl doesn't do any extra work; there is no buffer to flush. If the stream is line buffered, then writing a newline will flush the buffer anyway, so std::endl doesn't do anything extra. Only for a fully buffered stream does std::endl do any extra work.
how cout is not a fully buffered stream?
The C++ language doesn't specify the buffering strategy used for std::cout, but most implementations use either no buffering or line buffering when the program's standard output stream is hooked up to a terminal. If stdout is redirected to something else, like a file, many implementations will switch to using a fully buffered stream for std::cout.

why we need to close a file to complete the writing process of a file? [duplicate]

I understand cout << '\n' is preferred over cout << endl; but cout << '\n' doesn't flush the output stream. When should the output stream be flushed and when is it an issue?
What exactly is flushing?
Flushing forces an output stream to write any buffered characters. Read streamed input/output.
It depends on your application, in real-time or interactive applications you need to flush them immediately but in many cases you can wait until closing the file and leave the program to flush it automatically.
When must the output stream in C++ be flushed?
When you want to be sure that data written to it is visible to other programs or (in the case of file streams) to other streams reading the same file which aren't tied to this one; and when you want to be certain that the output is written even if the program terminates abnormally.
So you would want to do this when printing a message before a lengthy computation, or for printing a message to indicate that something's wrong (although you'd usually use cerr for that, which is automatically flushed after each output).
There's usually no need to flush cerr (which, by default, has its unitbuf flag set to flush after each output), or to flush cout before reading from cin (these streams are tied so that cout is flushed automatically before reading cin).
If the purpose of your program is to produce large amounts of output, either to cout or to a file, then don't flush after each line - that could slow it down significantly.
What exactly is flushing?
Output streams contain memory buffers, which are typically much faster to write to than the underlying output. Output operations put data into the buffer; flushing sends it to the final output.
First, you read wrong. Whether you use std::endl or '\n'
depends largely on context, but when in doubt, std::endl is
the normal default. Using '\n' is reserved to cases where
you know in advance that the flush isn't necessary, and that it
will be too costly.
Flushing is involved with buffering. When you write to
a stream, (typically) the data isn't written immediately to the
system; it is simply copied into a buffer, which will be written
when it is full, or when the file is closed. Or when it is
explicitly flushed. This is for performance reasons: a system
call is often a fairly expensive operation, and it's generally
not a good idea to do it for every characters. Historically,
C had something called line buffered mode, which flushed with
every '\n', and it turns out that this is a good compromize
for most things. For various technical reasons, C++ doesn't
have it; using std::endl is C++'s way of achieving the same
results.
My recommendation would be to just use std::endl until you
start having performance problems. If nothing else, it makes
debugging simpler. If you want to go further, it makes sense to
use '\n' when you're outputting a series of lines in just
a few statements. And there are special cases, like logging,
where you may want to explicitly control the flushing.
Flushing can be disastrous if you are writing a large file with frequent spaces.
For example
for(int i = 0 ;i < LARGENUMBER;i++)
{//Slow?
auto point = xyz[i];
cout<< point.x <<",",point.y<<endl;
}
vs
for(int i = 0 ;i < LARGENUMBER;i++)
{//faster
auto point = xyz[i];
cout<< point.x <<",",point.y<<"\n";
}
vs
for(int i = 0 ;i < LARGENUMBER;i++)
{//fastest?
auto point = xyz[i];
printf("%i,%i\n",point.x,point.y);
}
endl() was often know for doing other things, for example synchronize threads when in a so-called debug mode on MSVC, resulting in multithreaded programs that, contrary to expectation, printed uninterrupted phrases from different threads.
I/O libraries buffer data sent to stream for performance reasons. Whenever you need to be sure data has actually been sent to stream, you need to flush it (otherwise it may still be in buffer and not visible on screen or in file).
Some operations automatically flush streams, but you can also explicitly call something like ostream::flush.
You need to be sure data is flushed, whenever for example you have other program waiting for the input from first program.
It depends on what you are doing. For example, if you are using the console to warn the user about a long process... printing a series of dots in the same line... flushing can be interesting. For normal output, line per line, you should not care about flushing.
So, for char based output or non line based console output, flushing can be necessary. For line based output, it works as expected.
This other answer can clarify your question, based on why avoiding endl and flushing manually may be good for performance reasons:
mixing cout and printf for faster output
Regarding what is flushing: when you write to a buffered stream, like ostream, you don't have any guarantee that your data arrived the destination device (console, file, etc). This happens because the stream can use intermediary buffers to hold your data and to not stop your program. Usually, if your buffers are big enough, they will hold all data and won't stop your program due to slow I/O device. You may have already noticed that the console is very slow. The flush operation tells the stream that you want to be sure all intermediary data arrived on the destination device, or at least that their buffers are now empty. It is very important for log files, for example, where you want to be sure (not 100%) a line will be on disk not in an buffer somewhere. This becomes more important if your program can't loose data, i.e., if it crashes, you want to be sure you did you best to write your data on disk. For other applications, performance is more important and you can let the OS decide when to flush buffers for you or wait until you close the stream, for example.

Speeding printf and cout in windows

Windows cout and printf is really slow, so when a lot of data is sent it slows applications (it happens with code running during days to check if all is working well).
A metod to make it faster is to use a buffer by writting following code at the beginning of the main() function:
#ifndef __linux__ //Introduce this code at the beginning of main() to increase a lot the speed of cout in windows:
char buffer_setvbuf[1024];setvbuf(stdout, buffer_setvbuf, _IOFBF, sizeof buffer_setvbuf); //¿¡¡Sometimes it does not print cout inside a function until cout is used in main() or end of buffer is reached.
#endif
But unfortunately a side effect is that sometimes it does not print the data because the buffer is not full.
Then the questions:
1. How to force print: by making \n?
2. How to disable the buffer?
printf
I see you are trying to use larger buffer on memory to reduce the number of writes on stdout. Indeed, your code would not print anything until your buffer becomes full, because the buffering mode is set to _IOFBF (i.e. full buffering). Since you want control when to flush, there are two ways to go about.
Use _IOLBF (i.e. line buffering), and put newline character whenever you want to flush.
Call fflush(stdout) to manually flush the buffer.
std::cout
I think std::cout should be preferred when writing c++ code, because of its ease of use. One thing that might slow down the I/O process is synchronization between iostream and stdio. As far as I know, the default on many systems is to keep the two in sync, and it has some overhead. You can disable it by calling std::ios_base::sync_with_stdio(false). reference
When you need to flush output, you can use what is called "manipulators" for output stream - namely std::flush and std::endl. When those manipulators are put into an output stream like the following: std::cout << "your string" << std::endl, it is guaranteed that the output stream is flushed.
std::endl reference
std::flush reference
Bottom Line
Use fflush to flush stdout when using printf for output.
I recommend trying std::cout with sync off, and test if it fits your performance need.

What's the difference between read() and getc()

I have two code segments:
while((n=read(0,buf,BUFFSIZE))>0)
if(write(1,buf,n)!=n)
err_sys("write error");
while((c=getc(stdin))!=EOF)
if(putc(c,stdout)==EOF)
err_sys("write error");
Some sayings on internet make me confused. I know that standard I/O does buffering automatically, but I have passed a buf to read(), so read() is also doing buffering, right? And it seems that getc() read data char by char, how much data will the buffer have before sending all the data out?
Thanks
While both functions can be used to read from a file, they are very different. First of all on many systems read is a lower-level function, and may even be a system call directly into the OS. The read function also isn't standard C or C++, it's part of e.g. POSIX. It also can read arbitrarily sized blocks, not only one byte at a time. There's no buffering (except maybe at the OS/kernel level), and it doesn't differ between "binary" and "text" data. And on POSIX systems, where read is a system call, it can be used to read from all kind of devices and not only files.
The getc function is a higher level function. It usually uses buffered input (so input is read in blocks into a buffer, sometimes by using read, and the getc function gets its characters from that buffer). It also only returns a single characters at a time. It's also part of the C and C++ specifications as part of the standard library. Also, there may be conversions of the data read and the data returned by the function, depending on if the file was opened in text or binary mode.
Another difference is that read is also always a function, while getc might be a preprocessor macro.
Comparing read and getc doesn't really make much sense, more sense would be comparing read with fread.

Will fseek function flush data in the buffer in C++?

We know that call to functions like fprintf or fwrite will not write data to the disk immediately, instead, the data will be buffered until a threshold is reached. My question is, if I call the fseek function, will these buffered data writen to disk before seeking to the new position? Or the data is still in the buffer, and is writen to the new position?
cheng
I'm not aware if the buffer is guaranteed to be flushed, it may not if you seek to a position close enough. However there is no way that the buffered data will be written to the new position. The buffering is just an optimization, and as such it has to be transparent.
Yes; fseek() ensures that the file will look like it should according to the fwrite() operations you've performed.
The C standard, ISO/IEC 9899:1999 §7.19.9.2 fseek(), says:
The fseek function sets the file position indicator for the stream pointed to by stream.
If a read or write error occurs, the error indicator for the stream is set and fseek fails.
I don't believe that it's specified that the data must be flushed on a fseek but when the data is actually written to disk it must be written at that position that the stream was at when the write function was called. Even if the data is still buffered, that buffer can't be written to a different part of the file when it is flushed even if there has been a subsequent seek.
It seems that your real concern is whether previously-written (but not yet flushed) data would end up in the wrong place in the file if you do an fseek.
No, that won't happen. It'll behave as you'd expect.
I have vague memories of a requirement that you call fflush before
fseek, but I don't have my copy of the C standard available to verify.
(If you don't it would be undefined behavior or implementation defined,
or something like that.) The common Unix standard specifies that:
If the most recent operation, other than ftell(), on a given stream is
fflush(), the file offset in the underlying open file description
shall be adjusted to reflect the location specified by fseek().
[...]
If the stream is writable and buffered data had not been written to
the underlying file, fseek() shall cause the unwritten data to be
written to the file and shall mark the st_ctime and st_mtime fields of
the file for update.
This is marked as an extention to the ISO C standard, however, so you can't count on it except on Unix platforms (or other platforms which make similar guarantees).