In the C++ primer book, in chapter (1), it mentions the following:
endl is a special value, called a manipulator, that when written to an
output stream has the effect of writing a newline to the output and
flushing the buffer associated with that device. By flushing the buffer, we ensure that the user will see the output written to the
stream immediately.
What is meant by "flushing the buffer" here?
Output is generally buffered before it's written to the intended device. That way, when writing to slow to access devices(like files), it doesn't have to access the device after every single character.
Flushing means emptying the buffer and actually writing it to the device.
C++'s iostreams are buffered, that means that when you output to an ostream, the content will not immediately go to what is behind the stream, e.g. stdout in the case of cout. The implementation of the stream determines when to actually send the buffered part of the stream out. This is done for reasons of efficiency, it would be very inefficient to write to a network or disk stream byte by byte, by buffering this problem is solved.
This does however mean that when you write say debug messages to a log file and your program crashes you may lose part of the data you wrote to the log file through the stream, as a part of the log may still be in the stream's buffer and not yet written to the actual file. To prevent this from happening you need to make the stream flush its buffers either by an explicit flush method call, or by using the convenience of endl.
If however you're just writing to a file regularly you should use \n instead of endl to prevent the stream from unnecessarily flushing the stream every line reducing your performance.
Edited to include this note:
cin and cout have a special relationship, where reading from cin will automatically flush cout beforehand. This makes sure that the e.g. the prompt you wrote to cout will actually be seen by the user before the read from cin is waiting for input. Hence, even in cout you don't normally need endl but can use \n instead. You can create such relationships between other streams as well by tying them together.
What is meant by "flushing the buffer" here?
std::endl causes the data in the stream's internal staging memory (its "buffer") to be "flushed" (transferred) to the operating system. The subsequent behavior depends on what type of device the stream is mapped to, but in general, flushing will give the appearance that the data has been physically transferred to the associated device. A sudden loss of power, however, might defeat the illusion.
This flushing involves some overhead (wasted time), and should therefore be minimized when execution speed is an important concern. Minimizing the overall impact of this overhead is the fundamental purpose of data buffering, but this goal can be defeated by excessive flushing.
Background information
The I/O of a computing system is typically very sophisticated and composed of multiple abstraction layers. Each such layer may introduce a certain amount of overhead. Data buffering is a way of reducing this overhead by minimizing the number of individual transactions performed between two layers of the system.
CPU/memory system-level buffering (caching): For very high activity, even the random-access-memory system of a computer can become a bottleneck. To address this, the CPU virtualizes memory accesses by providing multilple layers of hidden caches (the individual buffers of which are called cache lines). These processor caches buffer your algorithm's memory writes (pursuant to a writing policy) in order to minimize redundant accesses on the memory bus.
Application-level buffering: Although it isn't always necessary, it is not uncommon for an application to allocate chunks of memory to accumulate output data before passing it to the I/O library. This provides the fundamental benefit of allowing for random accesses (if necessary), but a significant reason for doing this is that it minimizes the overhead associated with making library calls -- which may be substantially more time-consuming than simply writing to a memory array.
I/O library buffering: The C++ IO stream library optionally manages a buffer for every open stream. This buffer is used, in particular, to limit the number of system calls to the operating system kernel because such calls tend to have some non-trivial overhead. This is the buffer which is flushed when using std::endl.
operating system kernel and device drivers: The operating system routes the data to a specific device driver (or subsystem) based on what output device the stream is attached to. At this point, the actual behavior may vary widely depending on the nature and characteristics of that type of device. For example, when the device is a hard disk, the device driver might not initiate an immediate transfer to the device, but rather maintain its own buffer in order to further minimize redundant operations (since disks, too, are most efficiently written to in chunks). In order to explicitly flush kernel-level buffers, it may be necessary to call a system-level function such as fsync() on Linux -- even closing the associated stream, doesn't necessarily force such flush.
Example output devices might include...
a terminal on the local machine
a terminal on a remote machine (via SSH or similar)
data being sent to another application via pipes or sockets
many variations of mass-storage devices and associated file-systems, which may be (again) locally attached or distributed via a network
hardware buffers: Specific hardware may contain its own memory buffers. Hard drives, for example, typically contain a disk buffer in order to (among other things) allow the physical writes to occur without requiring the system's CPU to be engaged in the entire process.
Under many circumstances, these various buffering layers tend to be (to a certain extent) redundant -- and therefore essentially overkill. However, the buffering at each layer can provide a tremendous gain in throughput if the other layers, for whatever reason, fail to deliver optimum buffering with respect to the overhead associated with each layer.
Long story short, std::endl only addressed the buffer which is managed by the C++ IO stream library for that particular stream. After calling std::endl, the data will have been moved to kernel-level management, and what happens next with the data depends on a great many factors.
How to avoid the overhead of std::endl
Method 1: Don't use std::endl -- use '\n' instead.
Method 2: Don't use std::endl -- use something like the following version instead...
inline std::ostream & endl( std::ostream & os )
{
os.put( os.widen('\n') ); // http://en.cppreference.com/w/cpp/io/manip/endl
if ( debug_mode ) os.flush(); // supply 'debug_mode' however you want
return os;
}
In this example, you provide a custom endl which can be called with-or-without invoking the internal call to flush() (which is what forces the transfer to the operating system). Enabling the flush (with the debug_mode variable) is useful for debugging scenarios where you want to be able to examine the output (for example a disk-file) when the program has terminated before cleanly closing the associated streams (which would have forced a final flush of the buffer).
When using std::cout, the operand used after the output operator ( << ) are stored in a buffer and are not displayed onto the stdin (usually terminal, or the command prompt) until it comes across std::endl or std::cin, which causes the buffer to be flushed, in the sense, display/output the contents of the buffer onto the stdin.
Consider this program:
#include <iostream>
#include <unistd.h>
int main(void)
{
std::cout << "Hello, world";
sleep(2);
std::cout << std::endl;
return 0;
}
The output obtained will be:
after 2 seconds
Hello, World
One simple code to show you the effects of buffered I/O in c++
Whatever input you provide is buffered and then passed on to the program variables in case of inputs.
Have a look at the code below:
//program to test how buffered I/O can have unintended effects on our program
#include<bits/stdc++.h>
using namespace std;
int main()
{
int a;
char c;
cin>>a;
cin>>c;
cout<<"the number is : "<<a;
cout<<"\nthe character is : "<<c;
}
here we have declared two variables one int and one char
if we input the number as "12d34"
this will cause the int variable to accept only 12 as value and it will discard the rest which will still be there in the buffer.
And in the next input the char variable will automatically accept the value "d"
without even asking you for any input
Related
My Operating Systems professor was talking today about how a read system call is unbuffered while a istream::read function has a buffer. This left me a bit confused as you still make a buffer for the istream::read function when using it.
The only thing I can think of is that there are more than one buffers in the istream::read function call. Why?
What does the istream::read() function do differently from the read() function system call?
The professor was talking about buffers internal to the istream rather than the buffer provided by the calling code where the data ends up after the read.
As an example, say you are reading individual int objects out of an istream, the istream is likely to have an internal buffer where some number of bytes is stored and the next read can be satisfied out of that rather than going to the OS. Note, however, that whatever the istream is hooked to very likely has internal buffers as well. Most OSes have means to perform zero-copy reads (that is, read directly from the I/O source to your buffer), but that facility comes with severe restrictions (read size must be multiple of some particular number of bytes, and if reading from a disk file the file pointer must also be on a multiple of that byte count). Most of the time such zero-copy reads are not worth the hassle.
I've been running into some issues with writing to a file - namely, not being able to write fast enough.
To explain, my goal is to capture a stream of data coming in over gigabit Ethernet and simply save it to a file.
The raw data is coming in at a rate of 10MS/s, and it's then saved to a buffer and subsequently written to a file.
Below is the relevant section of code:
std::string path = "Stream/raw.dat";
ofstream outFile(path, ios::out | ios::app| ios::binary);
if(outFile.is_open())
cout << "Yes" << endl;
while(1)
{
rxSamples = rxStream->recv(&rxBuffer[0], rxBuffer.size(), metaData);
switch(metaData.error_code)
{
//Irrelevant error checking...
//Write data to a file
std::copy(begin(rxBuffer), end(rxBuffer), std::ostream_iterator<complex<float>>(outFile));
}
}
The issue I'm encountering is that it's taking too long to write the samples to a file. After a second or so, the device sending the samples reports its buffer has overflowed. After some quick profiling of the code, nearly all of the execution time is spent on std::copy(...) (99.96% of the time to be exact). If I remove this line, I can run the program for hours without encountering any overflow.
That said, I'm rather stumped as to how I can improve the write speed. I've looked through several posts on this site, and it seems like the most common suggestion (in regard to speed) is to implement file writes as I've already done - through the use of std::copy.
If it's helpful, I'm running this program on Ubuntu x86_64. Any suggestions would be appreciated.
So the main problem here is that you try to write in the same thread as you receive, which means that your recv() can only be called again after copy is complete. A few observations:
Move the writing to a different thread. This is about a USRP, so GNU Radio might really be the tool of your choice -- it's inherently multithreaded.
Your output iterator is probably not the most performant solution. Simply "write()" to a file descriptor might be better, but that's performance measurements that are up to you
If your hard drive/file system/OS/CPU aren't up to the rates coming in from the USRP, even if decoupling receiving from writing thread-wise, then there's nothing you can do -- get a faster system.
Try writing to a RAM disk instead
In fact, I don't know how you came up with the std::copy approach. The rx_samples_to_file example that comes with UHD does this with a simple write, and you should definitely favor that over copying; file I/O can, on good OSes, often be done with one copy less, and iterating over all elements is probably very slow.
Let's do a bit of math.
Your samples are (apparently) of type std::complex<std::float>. Given a (typical) 32-bit float, that means each sample is 64 bits. At 10 MS/s, that means the raw data is around 80 megabytes per second--that's within what you can expect to write to a desktop (7200 RPM) hard drive, but getting fairly close to the limit (which is typically around 100-100 megabytes per second or so).
Unfortunately, despite the std::ios::binary, you're actually writing the data in text format (because std::ostream_iterator basically does stream << data;).
This not only loses some precision, but increases the size of the data, at least as a rule. The exact amount of increase depends on the data--a small integer value can actually decrease the quantity of data, but for arbitrary input, a size increase close to 2:1 is fairly common. With a 2:1 increase, your outgoing data is now around 160 megabytes/second--which is faster than most hard drives can handle.
The obvious starting point for an improvement would be to write the data in binary format instead:
uint32_t nItems = std::end(rxBuffer)-std::begin(rxBuffer);
outFile.write((char *)&nItems, sizeof(nItems));
outFile.write((char *)&rxBuffer[0], sizeof(rxBuffer));
For the moment I've used sizeof(rxBuffer) on the assumption that it's a real array. If it's actually a pointer or vector, you'll have to compute the correct size (what you want is the total number of bytes to be written).
I'd also note that as it stands right now, your code has an even more serious problem: since it hasn't specified a separator between elements when it writes the data, the data will be written without anything to separate one item from the next. That means if you wrote two values of (for example) 1 and 0.2, what you'd read back in would not be 1 and 0.2, but a single value of 10.2. Adding separators to your text output will add yet more overhead (figure around 15% more data) to a process that's already failing because it generates too much data.
Writing in binary format means each float will consume precisely 4 bytes, so delimiters are not necessary to read the data back in correctly.
The next step after that would be to descend to a lower-level file I/O routine. Depending on the situation, this might or might not make much difference. On Windows, you can specify FILE_FLAG_NO_BUFFERING when you open a file with CreateFile. This means that reads and writes to that file will basically bypass the cache and go directly to the disk.
In your case, that's probably a win--at 10 MS/s, you're probably going to use up the cache space quite a while before you reread the same data. In such a case, letting the data go into the cache gains you virtually nothing, but costs you some data to copy data to the cache, then somewhat later copy it out to the disk. Worse, it's likely to pollute the cache with all this data, so it's no longer storing other data that's a lot more likely to benefit from caching.
Suppose I have a file which has x records. One 'block' holds m records. Total number of blocks in file n=x/m. If I know the size of one record, say b bytes (size of one block = b*m), I can read the complete block at once using system command read() (is there any other method?). Now, how do I read each record from this block and put each record as a separate element into a vector.
The reason why I want to do this in the first place is to reduce the disk i/o operations. As the disk i/o operations are much more expensive according to what I have learned.
Or will it take the same amount of time as when I read record by record from file and directly put it into vectors instead of reading block by block? On reading block by block, I will have only n disk I/O's whereas x I/O's if I read record by record.
Thanks.
You should consider using mmap() instead of reading your files using read().
What's nice about mmap is that you can treat file contents as simply mapped into your process space as if you already had a pointer into the file contents. By simply inspecting memory contents and treating it as an array, or by copying data using memcpy() you will implicitly perform read operations, but only as necessary - operating system virtual memory subsystem is smart enough to do it very efficiently.
The only possible reason to avoid mmap maybe if you are running on 32-bit OS and file size exceeds 2 gigabytes (or slightly less than that). In this case OS may have trouble allocating address space to your mmap-ed memory. But on 64-bit OS using mmap should never be a problem.
Also, mmap can be cumbersome if you are writing a lot of data, and size of the data is not known upfront. Other than that, it is always better and faster to use it over the read.
Actually, most modern operating systems rely on mmap extensively. For example, in Linux, to execute some binary, your executable is simply mmap-ed and executed from memory as if it was copied there by read, without actually reading it.
Reading a block at a time won't necessarily reduce the number of I/O operations at all. The standard library already does buffering as it reads data from a file, so you do not (normally) expect to see an actual disk input operation every time you attempt to read from a stream (or anything close).
It's still possible reading a block at a time would reduce the number of I/O operations. If your block is larger than the buffer the stream uses by default, then you'd expect to see fewer I/O operations used to read the data. On the other hand, you can accomplish the same by simply adjusting the size of buffer used by the stream (which is probably a lot easier).
I'm reading accelerated c++ and the author writes:
Flushing the output buffers at opportune moments is an important habit when you are writing programs that might take a long time to run. Otherwise, some of the program's output might languish in the systems buffers for a long time between when your program writes it and when you see it
Please correct me if i misunderstand any of these concepts:
Buffer: a block of random access memory that is used to hold input or output temporarily.
Flushing: freeing up random access memory that had been... eh.. assigned to certain ..umm
There is this explanation I found:
Flushing an output device means that all preceding output operations are required to be completed immediately. This is related to the issue of buffering, which is an optimization technique used by the operating system. Roughly speaking, the operating system reserves (and usually exerts) the right to put the data “on stand by” until it decides that it has an amount of data large enough to justify the cost associated to sending the data to the screen. In some cases, however, we need the guarantee that the output operations performed in our program are completed at a given point in the execution of our program, so we flush the output device.
Continuing from that explanation i read that the three events that cause the system to flush the buffer:
Buffer becomes full and will automatically flush
The library might be asked to read from standard input stream *is standard input stream like std::cin >> name ;
The third occasion is when we explicitly tell it to. How do we explicitly tell it to?
Despite I don't feel like a fully grasp the following:
What a output buffer is vs just a buffer and presumable other types of buffers...
What it means to flush a buffer. Does it simply mean to clear the ram?
What is the "output device" refereed to in the above explanation
And finally after all this when are opportune moments to to flush your buffer...ugh that doesn't sound pleasant.
To flush an std::ostream, you use the std::flush manipulator. i.e.
std::cout << std::flush;
Note that std::endl already flushes the stream. So if you are in the habit of ending your insertions with it, you don't need to do anything additional. Note that this means if you are seeing poor performance because you flush too much, you need to switch from inserting std::endl to inserting a newline: '\n'.
A stream is a sequence of characters (i.e. things of type char). An output stream is one you write characters to. Typical applications are writing data to files, printing text on screen, or storing them in a std::string.
Streams often have the feature that writing 1024 characters at once is an order of magnitude (or more!) faster than writing 1 character at a time 1024 times. One of the main purposes of the notion of 'buffering' is to deal with this in a convenient fashion. Rather than writing directly to whatever you actually want the characters to go, you instead write to the buffer. Then, when you're ready, you "flush" the buffer: you move the characters from the buffer to the place where you want them. Or, if you don't care about the precise details, you use a buffer that flush itself automatically. e.g. the buffer used in an std::ofstream is typically fixed size, and will flush whenever its full.
When is it an opportune time to flush, you ask? I say you're optimizing prematurely. :) Rather than looking for the perfect moments to flush, just do it often. Put in enough flushes so that flush frequently enough that you'll never find yourself in a situation where, e.g., you want to look at the data in a file but it's sitting unwritten in a buffer. Then if it really does turn out there are too many flushes hurting performance, that's when you spend time looking into it.
You explicitly flush a stream with your_stream.flush();.
What a output buffer is vs just a buffer and presumable other types of buffers...
A buffer is usually a block of memory used to hold data waiting for processing. One typical use is data that's just been read from a stream, or data waiting to be written to disk. Either way, it's generally more efficient to read/write large blocks of data at a time, so read/write an entire buffer at a time, but the client code can read/write in whatever amount is convenient (e.g., one character or one line at a time).
What it means to flush a buffer. Does it simply mean to clear the ram?
That depends. For an input buffer, yes, it typically means just clearing the contents of the buffer, discarding any data that's been read into the buffer (though it doesn't usually clear the RAM -- it just sets its internal book-keeping to say the buffer is empty).
For an output buffer, flushing the buffer normally means forcing whatever data is in the buffer to be written to the associated stream immediately.
What is the "output device" refereed to in the above explanation
When you're writing data, it's whatever device you're ultimately writing to. That could be a file on the disk, the screen, etc.
And finally after all this when are opportune moments to to flush your buffer...ugh that doesn't sound pleasant.
One obvious opportune moment is right when you finish writing data for a while, and you're going to go back to processing (or whatever) that doesn't produce any output (at least to the same destination) for a while. You don't want to flush the buffer if you're likely to produce more data going the same place right afterward -- but you also don't want to leave the data in the buffer when there's going to be a noticeable delay before you fill the buffer (or whatever) so the data will get written to its destination.
This depends very much on the type of application, but one rule of thumb is to flush after you written one record. For text that is usually after every line, for binary data after every object. If the performance seems to be to slow, then flush every X record you write, and experiment with the X until you find a number when you are happy with the performance and while X is not big enough so you loose too much data in case of a crash.
I think the author means stream buffers. An opportune moment to flush a buffer is really dependent on what your code does, how its constructed and how the buffer is allocated and probably the scope its initialized in.
For stream and output buffers take a look at this.
Yes a standard input stream means using the >> operator. (Mostly)
you can explicitly tell a stream buffer to flush by calling for example ofstream::flush of course other types of buffers have their own explicit flushing methods and some might require a manual implementation.
Taking your questions one by one:
A buffer, in general, is just a block of memory used to temporarily
hold data. When writing to an `std::ofstream`, characters are sent to a
`std::filebuf`, which typically, by default, will simply put them into a
buffer rather than outputting immediately to the system. When using an
`std::ofstream`, there are actually two buffers in play, one in the
`ofstream` (within your process), and one in the OS.
The standard speaks of the underlying data as a sequence of characters
on an external support, with the buffer representing a window into that
sequence; outputting data may only update the image in the buffer, and
flushing "synchronizes" the image in the buffer with the image of the
data the OS has. Which is a reasonably good description if you're
outputting to a real file, but doesn't really fit if you're outputting
directly to a serial port, or something like that, where the OS doesn't
maintain any "image" of the data. Basically, if you've written data
to the stream which hasn't been transfered to the OS, flushing the
buffer will transfer it to the OS (which means that the `ofstream` can
reuse the buffer memory for further buffering). Flushing the buffer
typically (i.e. on all of the implementations I know) only synchronizes
with the OS (which is all that the standard requires); it doesn't ensure
that the data has actually been written to disk. Depending on the
application, this may or may not be an issue.
The "output device" is anything the system wants it to be. A file, a
window on the screen, or in older times or on simpler systems, a printer
or a serial port. And the explination you cite is very misleading (or
rather isn't talking about `ofstream`), because flushing an `ofstream`
doesn't ensure that all preceding output operations are fully finished.
All it ensures is that the data in the stream buffer has been transfered
to (synchronized with) the OS. In most cases (at least under Windows
and Unix), all this means is that the data has been moved from one
buffer (in your process) to another (in the OS).
The opportune moments will depend a lot on what the application is
doing. As a general rule, I'd suggest flushing often, so that if your
program crashes, you can see more or less how far it has gotten.
(Remember, outputting `std::endl` flushes. For most simple use, just
using `std::endl` instead of `'\n'` is sufficient.) There are at least
two cases where you will want to think more about flushing, however; if
you're outputting a very large amount of data in a block (i.e. without
doing much more than formatting between the outputs), excessive flushing
can slow the output down considerably. In such cases, you may want to
consider using `'\n'` instead of `std::endl`. And the other is for
things like logging, where you want the data to appear immediatly, even
if the following data will not be output for a while—in this case,
you want to be sure that the data has been flushed before continuing.
Data will be explicitly flushed if you call std::ostream::flush() or
std::ofstream::close(). (In the latter case, of course, you cannot
write more data later.)
Note too that because the data is not actually "written" until it is
flushed, most possible errors cannot be detected until then. In
particular, something like:
if ( output << data ) {
// succeeded...
}
doesn't actually work; the "success" reported by the ofstream is only
that it has successfully copied the characters into its buffer (which
can hardly fail).
The usual idiom when writing a large block of data, without
interruption, is to just write it, without flushing, then close the file
and check for errors then. This is not appropriate when writing with
interruptions if you want the data to appear immediately, and it has the
disadvantage that if your program crashes, some of the data you've
"written" will have disappeared, which can make debugging harder.
I need to write data into drive. I have two options:
write raw sectors.(_write(handle, pBuffer, size);)
write into a file (fwrite(pBuffer, size, count, pFile);)
Which way is faster?
I expected the raw sector writing function, _write, to be more efficient. However, my test result failed! fwrite is faster. _write costs longer time.
I've pasted my snippet; maybe my code is wrong. Can you help me out? Either way is okay by me, but I think raw write is better, because it seems the data in the drive is encrypted at least....
#define SSD_SECTOR_SIZE 512
int g_pSddDevHandle = _open("\\\\.\\G:",_O_RDWR | _O_BINARY, _S_IREAD | _S_IWRITE);
TIMER_START();
while (ulMovePointer < 1024 * 1024 * 1024)
{
_write(g_pSddDevHandle,szMemZero,SSD_SECTOR_SIZE);
ulMovePointer += SSD_SECTOR_SIZE;
}
TIMER_END();
TIMER_PRINT();
FILE * file = fopen("f:\\test.tmp","a+");
TIMER_START();
while (ulMovePointer < 1024 * 1024 * 1024)
{
fwrite(szMemZero,SSD_SECTOR_SIZE,1,file);
ulMovePointer += SSD_SECTOR_SIZE;
}
TIMER_END();
TIMER_PRINT();
Probably because a direct write isn't buffered. When you call fwrite, you are doing buffered writes which tend to be faster in most situations. Essentially, each FILE* handler has an internal buffer which is flushed to disk periodically when it becomes full, which means you end up making less system calls, as you only write to disk in larger chunks.
To put it another way, in your first loop, you are actually writing SSD_SECTOR_SIZE bytes to disk during each iteration. In your second loop you are not. You are only writing SSD_SECTOR_SIZE bytes to a memory buffer, which, depending on the size of the buffer, will only be flushed every Nth iteration.
In the _write() case, the value of SSD_SECTOR_SIZE matters. In the fwrite case, the size of each write will actually be BUFSIZ. To get a better comparison, make sure the underlying buffer sizes are the same.
However, this is probably only part of the difference.
In the fwrite case, you are measuring how fast you can get data into memory. You haven't flushed the stdio buffer to the operating system, and you haven't asked the operating system to flush its buffers to physical storage. To compare more accurately, you should call fflush() before stopping the timers.
If you actually care about getting data onto the disk rather than just getting the data into the operating systems buffers, you should ensure that you call fsync()/FlushFileBuffers() before stopping the timer.
Other obvious differences:
The drives are different. I don't know how different.
The semantics of a write to a device are different to the semantics of writes to a filesystem; the file system is allowed to delay writes to improve performance until explicitly told not to (eg. with a standard handle, a call to FlushFileBuffers()); writes directly to a device aren't necessarily optimised in that way. On the other hand, the file system must do extra I/O to manage metadata (block allocation, directory entries, etc.)
I suspect that you're seeing a different in policy about how fast things actually get on to the disk. Raw disk performance can be very fast, but you need big writes and preferably multiple concurrent outstanding operations. You can also avoid buffer copying by using the right options when you open the handle.