I don't understand buffered output and cout - c++

I've got a really simple program that prints lines using cout and sleeps after each line. All is well and good for about 7 iterations, as the buffer is clearly not flushed at any point. After then, what I assume is only part of the buffer is flushed on every iteration.
I have a few questions about this behaviour:
If the buffer is supposedly big enough to fit ~7 lines of output, why is the buffer flushed one line at a time?
If this buffer is indeed flushed in this way, what is the advantage to this? Why isn't the whole buffer flushed at once?
Is it just a coincidence that the exact same number of characters are flushed to the output as my line length, or is the cout buffer internally flushed based on end-of-line delimiters such as '\n'?
int main(){
for(int i = 0; i < 100; ++i){
std::cout << "This is line " << i << '\n';
Sleep(1000);
}
return 0;
}

You seem to assume that the buffer will not be written until it is full. Probably what happens is that an asynchronous write is started with as little as one output byte. The empty buffer space is used to receive characters while the asynchronous write is in progress. When the current write completes, if/when there are additional characters in the buffer, a new asynchronous write is started. The process would only need to block on writing if the buffer got full.

Related

Why reading from a stream does not require a buffer flush

Just started C++ learning by C++ Primer 5th ed.
The very first example of the book on page 6 is as followed
#include <iostream>
int main()
{
std::cout << "Enter two numbers:" << std::endl;
int v1 = 0, v2 = 0;
std::cin >> v1 >> v2;
std::cout << "The sum of " << v1 << " and " << v2
<< " is " << v1 + v2 << std::endl;
return 0;
}
The manipulator endl insert newline and flush the buffer.
soon followed by the code snipe in the page 7, the author emphasized that
Programmers often add print statements during debugging. Such
statements should always flush the stream. Otherwise, if the program
crashes, output may be left in the buffer, leading to incorrect
inference about where the program crashed
From the code example and the emphasized warning I feel it is important to do the flush when writing to a Stream
Here is the part I don't understand, how a bout reading from a stream case, such as std::cin, is there any necessary to do the flush then?
Appended Question:
#include <iostream>
int main()
{
int sum = 0, val = 1;
while (val <= 5) {
sum += val;
++val;
std::cout << sum << std::endl; //test
}
}
When I changed the line marked with test to std::cout << sum << '\n';, there is no visual difference in the console. Why is that, isn't it supposed to print as follows if there is no flush for each loop?
1
1 3
1 3 6
1 3 6 10
1 3 6 10 15
Thanks
Even if #Klitos Kyriacou is right when he says you should create a new post for a new question, it seems to me that both your questions arise from the same misunderstanding.
Programmers often add print statements during debugging. Such statements should always flush the stream. Otherwise, if the program crashes, output may be left in the buffer, leading to incorrect inference about where the program crashed
This quote does not mean that you need to flush every buffer in your program to create any output on the console.
By flushing the buffers you can make sure that the output is printed before the next line of code is executed.
If you don't flush the buffers and your program finishes, the buffers will be flushed anyway.
So, the reason you see the same output on the console with std::endl and \n is, that exactly the same text is printed to the console. In the former case, the output might be there slightly earlier, as the buffers are flushed early. In the latter case, the buffers are flushed later, but they will be flushed at some time.
What the quote talks about is the case when your program does not exit gracefully, e.g. when your program crashes or is interrupted by the OS. In these cases, your output might not be written to the console when you did not explicitly flush the buffers.
What the quote wants you to know is: Whenever you want to debug a program that crashes, you should explicitly flush the buffers to make sure your debug output is printed to the console before your program is interrupted.
Note that this might not be true for all implementations.
From http://en.cppreference.com/w/cpp/io/manip/endl
In many implementations, standard output is line-buffered, and writing '\n' causes a flush anyway, unless std::ios::sync_with_stdio(false) was executed.
Quote from the same book page 26:
Buffer A region of storage used to hold data. IO facilities often store input (or out-put) in a buffer and read or write the buffer
independently from actions in the program. Output buffers can be
explicitly flushed to force the buffer to be written. Be default,
reading cin flushes cout; cout is also flushed when the program ends
normally.

What does this part from the book C++ Primer 5ed mean (portion in description)? [duplicate]

This question already has answers here:
endl and flushing the buffer
(5 answers)
Closed 6 years ago.
std::cout << "Enter two numbers:";
std::cout << std:endl;
This code snippet is followed by two paragraphs and a warning note, among which I understood the first para, but neither the second one nor the note. The text is as follows -
"The first output operator prints a message to the user. That message
is a string literal, which is a sequence of characters enclosed in
double quotation marks. The text between the quotation marks is
printed to the standard output.
The second operator prints endl,
which is a special value called a manipulator. Writing endl has
the effect of ending the current line and flushing the buffer
associated with that device. Flushing the buffer ensures that all the
output the program has generated so far is actually written to the
output stream, rather than sitting in memory waiting to be written.
Warning Programmers often add print statements during debugging. Such statement should always flush the stream. Otherwise, if the
program crashes, output may be left in the buffer, leading to
incorrect inferences about where the program crashed."
So I didn't understand of the part of endl, nor the following warning. Can anyone please explain this to me as explicitly as possible and please try to keep it simple.
Imagine you have some code that crashes somewhere, and you don't know where. So you insert some print statements to narrow the problem down:
std::cout << "Before everything\n";
f1();
std::cout << "f1 done, now running f2\n";
f2();
std::cout << "all done\n";
Assuming that the program crashes during the evaluation of either f1() or f2(), you may not see any output, or you may see partial output that is misleading -- e.g. you could see only "Before everything", even though the crash happened in f2(). That's because the output data may be waiting in a buffer and hasn't actually been written to the output device.
The Primer's recommendation is therefore to flush each output, which you can conveniently achieve with endl:
std::cout << "Before everything" << std::endl;
f1();
std::cout << "f1 done, now running f2" << std::endl;
f2();
std::cout << "all done" << std::endl;
An alternative is to write debug output to std::cerr instead, which is not buffered by default (though you can always change the buffering of any ostream object later).
A more realistic use case is when you want to print a progress bar in a loop. Usually, a newline (\n) causes line-based output to be printed anyway, but if you want to print a single character for progress, you may not see it printed at all until after all the work is done unless you flush:
for (int i = 0; i != N; ++i)
{
if (i % 1000 == 0)
{
std::cout << '#'; // progress marger
std::cout.flush();
}
do_work();
}
std::cout << '\n';
Well, simply:
std::cout << "Hello world!";
will print "Hello world!" and will remain in the same line. Now if you want to go to a new line, you should use:
std::cout << "\n";
or
std::cout << std::endl;
Now before I explain the difference, you have to know 1 more simple thing: When you issue a print command with the std::cout stream, things are not printed immediately. They are stored in a buffer, and at some point this buffer is flushed, either when the buffer is full, or when you force it to flush.
The first kind, \n, will not flush, but the second kind std::endl, will go to a new line + flush.
Operating systems do buffered IO. That is, when your program outputs something, they dont necessarily put it immediately where it should go (i.e. disk, or the terminal), they might decide to keep the data in an internal memory buffer for some while before performing the actual IO operation on the device.
They do this to optmize performance, because doing the IO in chunks is better than doing it immediately as soon as there are a few bytes to write.
Flushing a buffer means asking the OS to perform immediately the IO operation without any more waiting. A programmer would do this this when (s)he knows that waiting for more data doesn't make sense.
The second note says that endl not only prints a newline, but also hints the cout to flush its buffer.
The 3rd note warns that debugging errors, if buffered and not flushed immediately, might not be seen if the program crashes while the error messages are still in the buffer (not flushed yet).

What does flushing the buffer mean?

I am learning C++ and I found something that I can't understand:
Output buffers can be explicitly flushed to force the buffer to be
written. By default, reading cin flushes cout; cout is also flushed
when the program ends normally.
So flushing the buffer (for example an output buffer): does this clear the buffer by deleting everything in it or does it clear the buffer by outputting everything in it? Or does flushing the buffer mean something completely different?
Consider writing to a file. This is an expensive operation. If in your code you write one byte at a time, then each write of a byte is going to be very costly. So a common way to improve performance is to store the data that you are writing in a temporary buffer. Only when there is a lot of data is the buffer written to the file. By postponing the writes, and writing a large block in one go, performance is improved.
With this in mind, flushing the buffer is the act of transferring the data from the buffer to the file.
Does this clear the buffer by deleting everything in it or does it clear the buffer by outputting everything in it?
The latter.
You've quoted the answer:
Output buffers can be explicitly flushed to force the buffer to be written.
That is, you may need to "flush" the output to cause it to be written to the underlying stream (which may be a file, or in the examples listed, a terminal).
Generally, stdout/cout is line-buffered: the output doesn't get sent to the OS until you write a newline or explicitly flush the buffer. The advantage is that something like std::cout << "Mouse moved (" << p.x << ", " << p.y << ")" << endl causes only one write to the underlying "file" instead of six, which is much better for performance. The disadvantage is that a code like:
for (int i = 0; i < 5; i++) {
std::cout << ".";
sleep(1); // or something similar
}
std::cout << "\n";
will output ..... at once (for exact sleep implementation, see this question). In such cases, you will want an additional << std::flush to ensure that the output gets displayed.
Reading cin flushes cout so you don't need an explicit flush to do this:
std::string colour;
std::cout << "Enter your favourite colour: ";
std::cin >> colour;
Clear the buffer by outputting everything.

getline while reading a file vs reading whole file and then splitting based on newline character

I want to process each line of a file on a hard-disk now. Is it better to load a file as a whole and then split on basis of newline character (using boost), or is it better to use getline()? My question is does getline() reads single line when called (resulting in multiple hard disk access) or reads whole file and gives line by line?
getline will call read() as a system call somewhere deep in the gutst of the C library. Exactly how many times it is called, and how it is called depends on the C library design. But most likely there is no distinct difference in reading a line at a time vs. the whole file, becuse the OS at the bottom layer will read (at least) one disk-block at a time, and most likely at least a "page" (4KB), if not more.
Further, unles you do nearly nothing with your string after you have read it (e.g you are writing something like "grep", so mostly just reading the to find a string), it is unlikely that the overhead of reading a line at a time is a large part of the time you spend.
But the "load the whole file in one go" has several, distinct, problems:
You don't start processing until you have read the whole file.
You need enough memory to read the entire file into memory - what if the file is a few hundred GB in size? Does your program fail then?
Don't try to optimise something unless you have used profiling to prove that it's part of why your code is running slow. You are just causing more problems for yourself.
Edit: So, I wrote a program to measure this, since I think it's quite interesting.
And the results are definitely interesting - to make the comparison fair, I created three large files of 1297984192 bytes each (by copying all source files in a directory with about a dozen different source files, then copying this file several times over to "multiply" it, until it took over 1.5 seconds to run the test, which is how long I think you need to run things to make sure the timing isn't too susceptible to random "network packet came in" or some other outside influences taking time out of the process).
I also decided to measure the system and user-time by the process.
$ ./bigfile
Lines=24812608
Wallclock time for mmap is 1.98 (user:1.83 system: 0.14)
Lines=24812608
Wallclock time for getline is 2.07 (user:1.68 system: 0.389)
Lines=24812608
Wallclock time for readwhole is 2.52 (user:1.79 system: 0.723)
$ ./bigfile
Lines=24812608
Wallclock time for mmap is 1.96 (user:1.83 system: 0.12)
Lines=24812608
Wallclock time for getline is 2.07 (user:1.67 system: 0.392)
Lines=24812608
Wallclock time for readwhole is 2.48 (user:1.76 system: 0.707)
Here's the three different functions to read the file (there's some code to measure time and stuff too, of course, but for reducing the size of this post, I choose to not post all of that - and I played around with ordering to see if that made any difference, so results above are not in the same order as the functions here)
void func_readwhole(const char *name)
{
string fullname = string("bigfile_") + name;
ifstream f(fullname.c_str());
if (!f)
{
cerr << "could not open file for " << fullname << endl;
exit(1);
}
f.seekg(0, ios::end);
streampos size = f.tellg();
f.seekg(0, ios::beg);
char* buffer = new char[size];
f.read(buffer, size);
if (f.gcount() != size)
{
cerr << "Read failed ...\n";
exit(1);
}
stringstream ss;
ss.rdbuf()->pubsetbuf(buffer, size);
int lines = 0;
string str;
while(getline(ss, str))
{
lines++;
}
f.close();
cout << "Lines=" << lines << endl;
delete [] buffer;
}
void func_getline(const char *name)
{
string fullname = string("bigfile_") + name;
ifstream f(fullname.c_str());
if (!f)
{
cerr << "could not open file for " << fullname << endl;
exit(1);
}
string str;
int lines = 0;
while(getline(f, str))
{
lines++;
}
cout << "Lines=" << lines << endl;
f.close();
}
void func_mmap(const char *name)
{
char *buffer;
string fullname = string("bigfile_") + name;
int f = open(fullname.c_str(), O_RDONLY);
off_t size = lseek(f, 0, SEEK_END);
lseek(f, 0, SEEK_SET);
buffer = (char *)mmap(NULL, size, PROT_READ, MAP_PRIVATE, f, 0);
stringstream ss;
ss.rdbuf()->pubsetbuf(buffer, size);
int lines = 0;
string str;
while(getline(ss, str))
{
lines++;
}
munmap(buffer, size);
cout << "Lines=" << lines << endl;
}
The OS will read a whole block of data (depending on how the disk is formatted, typically 4-8k at a time) and do some of the buffering for you. Let the OS take care of it for you, and read the data in the way that makes sense for your program.
The fstreams are buffered reasonably. The underlying acesses to the harddisk by the OS are buffered reasonably. The hard disk itself has a reasonable buffer. You most surely will not trigger more hard disk accesses if you read the file line by line. Or character by character, for that matter.
So there is no reason to load the whole file into a big buffer and work on that buffer, because it already is in a buffer. And there often is no reason to buffer one line at a time, either. Why allocate memory to buffer something in a string that is already buffered in the ifstream? If you can, work on the stream directly, don't bother tossing everything around twice or more from one buffer to the next. Unless it supports readability and/or your profiler told you that disc access is slowing your program down significantly.
I believe the C++ idiom would be to read the file line-by-line, and create a line-based container as you read the file. Most likely the iostreams (getline) will be buffered enough that you won't notice a significant difference.
However for very large files you may get better performance by reading larger chunks of the file (not the whole file at once) and splitting internall as newlines are found.
If you want to know specifically which method is faster and by how much, you'll have to profile your code.
Its better to fetch the all data if it can be accommodated in memory because whenever you request the I/O your programmme looses the processing and put in a wait Q.
However if the file size is big then it's better to read as much data at a time which is required in processing. Because bigger read operation will take much time to complete then the small ones. cpu process switching time is much smaller then this entire file read time.
If it's a small file on disk, it's probably more efficient to read the entire file and parse it line by line vs. reading one line at a time--that would take lot's of disk access.

Strange stdout behavior in C++

I want my program to display the unix windmill while processing. There's a for loop and in every iteration theres a printf function:
printf("Fetching articles (%c)\r",q);
q is one of the characters in the windmill (-\|/) depending on the iteration number.
The problem is - it seems like in 100 iterations there are only two changes in displayed line, and every iteration takes about one second to complete.
What could be the aouse for this?
Here's the whole loop with only two possible chars for the windmill:
for (int i=0;i<numb_articles;i++) {
memset(file_path,0x0,BUFF_SIZE);
url=article_urls[i];
if (rules->print!=NO_PRINT) {
url=modify_url(url,rules->printout,rules->print);
if (url=="NULL")
continue;
}
get_page(url,file_content);
if (strcmp(rules->save.data(),"NULL")!=0)
if (!check_save(rules->save,file_content,url))
continue;
at_least_one_saved=true;
numb_articles_accepted++;
encoding_list[i]=get_encoding(file_content);
title=get_title(file_content,err_msg);
if (title=="")
continue;
title_list[i]=strdup(title.data());
filename=get_filename(title);
int count=numb_fn_found(filename_list,i,filename.data());
char *tmp = new char[10];
if (count>0) {
sprintf(tmp,"(%d)",count);
filename.insert((size_t)filename.length(),tmp);
}
filename_list[i]=strdup(filename.data());
char q;
if (i%2==0)
q='|';
else q='-';
printf("Fetching articles (%c)\r",q);
ofstream output_file;
sprintf(file_path,TMP_FILE,filename.data());
strncat(file_path,".html",5);
output_file.open(file_path);
output_file << file_content;
output_file.close();
}
Flush the output after writing each line:
printf("Fetching articles (%c)\r",q);
fflush(stdout);
Without doing this, normally stdout is buffered and only dumps its output when a newline is seen, or its internal buffer fills up.
The output (console) is buffered. That is, it only writes the output to the screen when the buffer is full or a newline is written. If you want to output characters one at a time, you will need to call fflush(stdout) explicitly to flush the output buffer.