C++-What is the need of both buffer and stream? - c++

Although i have read about buffer and stream and it's working with files in c++ but i don't know what is the need of a buffer if a stream is there, stream is always there to transfer the data of one file to the program. So why do we use buffers to store data(performing same task that stream does) and what are buffered and unbuffered stream.

Consider a stream that writes to a file. If there were no buffer, if your program wrote a single byte to the stream, you'd have to write a single byte to the file. That's very inefficient. So streams have buffers to decouple operations one on side of the stream from operations on the other side of the stream.

Ok lets lets start from the scratch suppose you want to work with files. For this purpose you would have to manage how the data is entered into your file and if the sending of data into the file was successful or not, and all other basic working problems. Now either you can manage all that on your own which would take a lots a time and hard work or What you can do is you can use a stream.
Yes, you can allocate a stream for such purposes. Streams work with abstraction mechanism i.e. we c++ programmers don't know how they are working but we only know that we are at the one side of a stream (on our program's side) we offer our data to the stream and it has the responsibility to transfer data from one end to the other (file's side)
Eg-
ofstream file("abc.txt"); //Here an object of output file stream is created
file<<"Hello"; //We are just giving our data to stream and it transfers that
file.close(); //The closing of file
Now if you work with files you should know that working with files is a really expensive operation i.e. it takes more time to access file than to access memory and we also don't have to perform file operations every time. So programmers created a new feature called buffer which is a part of computer's memory and stores data temporarily for handling files.
Suppose at the place of reading file every time to read data you just read some memory location where all the data of file is copied temporarily.Now it would be a less expensive task as you are reading memory not file.
Those streams that have a buffer for their working i.e. they open the file and by default copy all the data of file to the buffer are called as buffered streams whereas those streams which don't use any buffer are called as unbuffered streams.
Now if you enter data to a buffered stream then that data will be queued up until the stream is not flushed (flushing means replacing the data of buffer with that of file). Unbuffered streams are faster in working (from the point of user at one end of the stream) as data is not temporarily stored into a buffer and is sent to the file as it comes to the stream.

A buffer and a stream are different concepts.
A buffer is a part of the memory to temporarily store data. It can be implemented and structured in various ways. For instance, if one wants to read a very large file, chunks of the file can be read and stored in the buffer. Once a certain chunk is processed the data can be discarded and the next chunk can be read. A chunk in this case could be a line of the file.
Streams are the way C++ handles input and output. Their implementation uses buffers.

I do agree that stream is probably the poorest written and the most badly udnerstood part of standard library. People use it every day and many of them have not a slightest clue how the constructs they use work. For a little fun, try asking what is std::endl around - you might find that some answers are funny.
On any rate, streams and streambufs have different responsibilities. Streams are supposed to provide formatted input and output - that is, translate an integer to a sequence of bytes (or the other way around), and buffers are responsible of conveying the sequence of bytes to the media.
Unfortunately, this design is not clear from the implementation. For instance, we have all those numerous streams - file stream and string stream for example - while the only difference between those are the buffer. The stream code remains exactly the same. I believe, many people would redesign streams if they had their way, but I am afraid, this is not going to happen.

Related

How is an ofstream object and ofstream stream created? [duplicate]

When my book says : a stream is a sequence of characters read or written from a device then my book says : the istream and the ostream types represent input and output stream ( what does it mean ?) how exactly do cout and cin work?
I'm not a language native and I can't understand when my book says : the output operator writes the given value on the given ostream.
The fundamental idea behind the metaphor of a "stream" is that it provides or consumes data in a single-pass fashion: for example, for an input stream, data is produced precisely once. You can ask the stream for some more data, and once the stream has given you the data, it will never give you that same data again.
This is why in order to do anything meaningful with streams, you will very usually want to attach a kind of buffer (a "stream buffer", if you will) to the stream which stores some (usually small) amount of data that's been extracted from the stream in a random-access, inspectable and processable piece of memory. (There are similar, reversed ideas for output streams.)
Occasionally it makes sense to process streams without any buffering. For example, if you have an input and an output stream and you read integers from the input and write the doubled value of each integer to the output, you can do that without buffering.
So when thinking about ranges of data, streams are things that you can traverse only once and never again. If you're thinking in terms of forward progress, then streams have another property, which is that they may block: an input stream may block when it has no data, and an output stream may block when it cannot accept any more data. That way, from within the program logic, you can imagine that a input stream always contains data until its end is reached, but a program relying on that may run for an arbitrary, unbounded amount of wall-clock time.
You can define it in simple words as the flow of data which can be the input flow and output flow. So you can think of it as the flow of data from a program to a file or vice versa. The below image may help you understand it better:
From MSDN
The stream is the central concept of the iostream classes. You can
think of a stream object as a smart file that acts as a source and
destination for bytes. A stream's characteristics are determined by
its class and by customized insertion and extraction operators.
From a language point of view, streams are just objects with a certain streamlike interface: they allow you to extract data from it (an input stream) or to push data into it (an output stream). Input streams do not allow random access (whatever that may mean) to the data they provide (whatever that might be).
Note that this is purely an interface description for a class, nothing more. Where the stream gets its data from / what it does with the data pushed into it, is entirely up to the stream. A stream is an abstraction for recieving/sending data.
A concrete implementation of a stream may read data from a terminal application and present it to the program (cin), another one may return characters to the terminal application as the program requests (cout), a third one may read/write data from/to a file (the fstreams), a fourth one may read/write data from/to a memory buffer (stringstream), a fifth one may "read" data from a random number generator, and so on. The possibility are numerous, as are the different implementations of the stream interface that have been created.
That is the beauty of the abstraction of streams: it is a very flexible way for a piece of code to communicate. The process does not need to know anything about the source/destination of its data, other than that it can read/write data from/to it.
A stream is a logical abstraction of physical file(regular file or device file) for IO operations. In Unix, a stream is a pointer to _IO_FILE structure defined in glibc. The _IO_FILE structure given by the OS stores attributes of the opening file. Application program operates(read, write, seek, and etc) on these file attributes to access data in the file.
You can build all type of streams (char stream, byte stream, input stream, output stream, or even bidirectional stream) on top of the above stream concept. They are all implementations or wrappers/decorators of the above stream.

How is a read system call different from the istream::read function?

My Operating Systems professor was talking today about how a read system call is unbuffered while a istream::read function has a buffer. This left me a bit confused as you still make a buffer for the istream::read function when using it.
The only thing I can think of is that there are more than one buffers in the istream::read function call. Why?
What does the istream::read() function do differently from the read() function system call?
The professor was talking about buffers internal to the istream rather than the buffer provided by the calling code where the data ends up after the read.
As an example, say you are reading individual int objects out of an istream, the istream is likely to have an internal buffer where some number of bytes is stored and the next read can be satisfied out of that rather than going to the OS. Note, however, that whatever the istream is hooked to very likely has internal buffers as well. Most OSes have means to perform zero-copy reads (that is, read directly from the I/O source to your buffer), but that facility comes with severe restrictions (read size must be multiple of some particular number of bytes, and if reading from a disk file the file pointer must also be on a multiple of that byte count). Most of the time such zero-copy reads are not worth the hassle.

How to force a file descriptor to buffer my output

Lots of people want to switch off buffering on their file descriptors. I want the reverse: I deliberately want to configure a file descriptor to buffer, say, 1K of data before writing to disk.
The reason is that I'm writing a unit test for a "flush" function of a C++ class. To test that it's working I want to write some data, check the size of the file on disk, then flush, then check that the size has grown. But in practice, by the time I do the first file size check the data has already been written.
Note that I'm working with a raw file descriptor here, not a stream or anything.
This is on linux if that matters.
How to force a file descriptor to buffer my output
If you're using the POSIX write() (or a variant of it), you can't.
The write() call must behave thus:
After a write() to a regular file has successfully returned:
Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the
write() for that position until such byte positions are again
modified.
Any subsequent successful write() to the same byte position in the file shall overwrite that file data.
Those requirements mean the data written is visible to any other process on the system, and to be consistent, if the data written causes the file size to grow, the file size reported by the kernel must reflect the data written.
I want to write some data, check the size of the file on disk, then flush, then check that the size has grown.
That fundamentally doesn't work with write(). The file size will grow as the data written - write() does not buffer data.
If you want it to do that, you'll have to implement your own filesystem - one that isn't POSIX compliant.

Could I read lines from a C++ socket using Ubuntu?

I wonder if I could read several lines from a C++ socket using Ubuntu?
Please note that every line is to be used for a different purpose (e.g. maybe the first is used as a string and the second as a char array).
I.e. Could I put those two lines directly after each other without encountering any problem?
read(socketFileDescriptor, buffer1, BUFFER_SIZE);
read(socketFileDescriptor, buffer2, BUFFER_SIZE);
Thanks in advance,
Regards,
You call read twice in sequence without any problem in itself.
What you get from each call may not correspond to a single line of input though. read basically just does "raw" reading, just about like it does when reading from a file on disk--if data is available, it will read as much data as necessary to fill the buffer you gave it (up to the size you specified).
TCP treats data as a stream, so data you pass to two (or more) separate calls to write could end up being put into a single packet and transmitted together. On the receiving end, all that data could be read by a single call to read--or, depending on the buffer size you pass, it might read only part of one, or might read all of the first and part of the second, etc.
If you want to read the input as "lines", you could (for one example) create a stream buffer that reads data from a socket, and create an iostream object that parses data from that buffer to read lines. This initially seems attractive to many people (it did to me, anyway), but has never worked out very well, at least for me. Iostreams basically assume a synchronous protocol, but sockets are mostly asynchronous. Trying to treat sockets as synchronous tends to lead to more problems rather than to solutions.

when is a opportune moment to flush the output buffer and some basic c++

I'm reading accelerated c++ and the author writes:
Flushing the output buffers at opportune moments is an important habit when you are writing programs that might take a long time to run. Otherwise, some of the program's output might languish in the systems buffers for a long time between when your program writes it and when you see it
Please correct me if i misunderstand any of these concepts:
Buffer: a block of random access memory that is used to hold input or output temporarily.
Flushing: freeing up random access memory that had been... eh.. assigned to certain ..umm
There is this explanation I found:
Flushing an output device means that all preceding output operations are required to be completed immediately. This is related to the issue of buffering, which is an optimization technique used by the operating system. Roughly speaking, the operating system reserves (and usually exerts) the right to put the data “on stand by” until it decides that it has an amount of data large enough to justify the cost associated to sending the data to the screen. In some cases, however, we need the guarantee that the output operations performed in our program are completed at a given point in the execution of our program, so we flush the output device.
Continuing from that explanation i read that the three events that cause the system to flush the buffer:
Buffer becomes full and will automatically flush
The library might be asked to read from standard input stream *is standard input stream like std::cin >> name ;
The third occasion is when we explicitly tell it to. How do we explicitly tell it to?
Despite I don't feel like a fully grasp the following:
What a output buffer is vs just a buffer and presumable other types of buffers...
What it means to flush a buffer. Does it simply mean to clear the ram?
What is the "output device" refereed to in the above explanation
And finally after all this when are opportune moments to to flush your buffer...ugh that doesn't sound pleasant.
To flush an std::ostream, you use the std::flush manipulator. i.e.
std::cout << std::flush;
Note that std::endl already flushes the stream. So if you are in the habit of ending your insertions with it, you don't need to do anything additional. Note that this means if you are seeing poor performance because you flush too much, you need to switch from inserting std::endl to inserting a newline: '\n'.
A stream is a sequence of characters (i.e. things of type char). An output stream is one you write characters to. Typical applications are writing data to files, printing text on screen, or storing them in a std::string.
Streams often have the feature that writing 1024 characters at once is an order of magnitude (or more!) faster than writing 1 character at a time 1024 times. One of the main purposes of the notion of 'buffering' is to deal with this in a convenient fashion. Rather than writing directly to whatever you actually want the characters to go, you instead write to the buffer. Then, when you're ready, you "flush" the buffer: you move the characters from the buffer to the place where you want them. Or, if you don't care about the precise details, you use a buffer that flush itself automatically. e.g. the buffer used in an std::ofstream is typically fixed size, and will flush whenever its full.
When is it an opportune time to flush, you ask? I say you're optimizing prematurely. :) Rather than looking for the perfect moments to flush, just do it often. Put in enough flushes so that flush frequently enough that you'll never find yourself in a situation where, e.g., you want to look at the data in a file but it's sitting unwritten in a buffer. Then if it really does turn out there are too many flushes hurting performance, that's when you spend time looking into it.
You explicitly flush a stream with your_stream.flush();.
What a output buffer is vs just a buffer and presumable other types of buffers...
A buffer is usually a block of memory used to hold data waiting for processing. One typical use is data that's just been read from a stream, or data waiting to be written to disk. Either way, it's generally more efficient to read/write large blocks of data at a time, so read/write an entire buffer at a time, but the client code can read/write in whatever amount is convenient (e.g., one character or one line at a time).
What it means to flush a buffer. Does it simply mean to clear the ram?
That depends. For an input buffer, yes, it typically means just clearing the contents of the buffer, discarding any data that's been read into the buffer (though it doesn't usually clear the RAM -- it just sets its internal book-keeping to say the buffer is empty).
For an output buffer, flushing the buffer normally means forcing whatever data is in the buffer to be written to the associated stream immediately.
What is the "output device" refereed to in the above explanation
When you're writing data, it's whatever device you're ultimately writing to. That could be a file on the disk, the screen, etc.
And finally after all this when are opportune moments to to flush your buffer...ugh that doesn't sound pleasant.
One obvious opportune moment is right when you finish writing data for a while, and you're going to go back to processing (or whatever) that doesn't produce any output (at least to the same destination) for a while. You don't want to flush the buffer if you're likely to produce more data going the same place right afterward -- but you also don't want to leave the data in the buffer when there's going to be a noticeable delay before you fill the buffer (or whatever) so the data will get written to its destination.
This depends very much on the type of application, but one rule of thumb is to flush after you written one record. For text that is usually after every line, for binary data after every object. If the performance seems to be to slow, then flush every X record you write, and experiment with the X until you find a number when you are happy with the performance and while X is not big enough so you loose too much data in case of a crash.
I think the author means stream buffers. An opportune moment to flush a buffer is really dependent on what your code does, how its constructed and how the buffer is allocated and probably the scope its initialized in.
For stream and output buffers take a look at this.
Yes a standard input stream means using the >> operator. (Mostly)
you can explicitly tell a stream buffer to flush by calling for example ofstream::flush of course other types of buffers have their own explicit flushing methods and some might require a manual implementation.
Taking your questions one by one:
A buffer, in general, is just a block of memory used to temporarily
hold data. When writing to an `std::ofstream`, characters are sent to a
`std::filebuf`, which typically, by default, will simply put them into a
buffer rather than outputting immediately to the system. When using an
`std::ofstream`, there are actually two buffers in play, one in the
`ofstream` (within your process), and one in the OS.
The standard speaks of the underlying data as a sequence of characters
on an external support, with the buffer representing a window into that
sequence; outputting data may only update the image in the buffer, and
flushing "synchronizes" the image in the buffer with the image of the
data the OS has. Which is a reasonably good description if you're
outputting to a real file, but doesn't really fit if you're outputting
directly to a serial port, or something like that, where the OS doesn't
maintain any "image" of the data. Basically, if you've written data
to the stream which hasn't been transfered to the OS, flushing the
buffer will transfer it to the OS (which means that the `ofstream` can
reuse the buffer memory for further buffering). Flushing the buffer
typically (i.e. on all of the implementations I know) only synchronizes
with the OS (which is all that the standard requires); it doesn't ensure
that the data has actually been written to disk. Depending on the
application, this may or may not be an issue.
The "output device" is anything the system wants it to be. A file, a
window on the screen, or in older times or on simpler systems, a printer
or a serial port. And the explination you cite is very misleading (or
rather isn't talking about `ofstream`), because flushing an `ofstream`
doesn't ensure that all preceding output operations are fully finished.
All it ensures is that the data in the stream buffer has been transfered
to (synchronized with) the OS. In most cases (at least under Windows
and Unix), all this means is that the data has been moved from one
buffer (in your process) to another (in the OS).
The opportune moments will depend a lot on what the application is
doing. As a general rule, I'd suggest flushing often, so that if your
program crashes, you can see more or less how far it has gotten.
(Remember, outputting `std::endl` flushes. For most simple use, just
using `std::endl` instead of `'\n'` is sufficient.) There are at least
two cases where you will want to think more about flushing, however; if
you're outputting a very large amount of data in a block (i.e. without
doing much more than formatting between the outputs), excessive flushing
can slow the output down considerably. In such cases, you may want to
consider using `'\n'` instead of `std::endl`. And the other is for
things like logging, where you want the data to appear immediatly, even
if the following data will not be output for a while—in this case,
you want to be sure that the data has been flushed before continuing.
Data will be explicitly flushed if you call std::ostream::flush() or
std::ofstream::close(). (In the latter case, of course, you cannot
write more data later.)
Note too that because the data is not actually "written" until it is
flushed, most possible errors cannot be detected until then. In
particular, something like:
if ( output << data ) {
// succeeded...
}
doesn't actually work; the "success" reported by the ofstream is only
that it has successfully copied the characters into its buffer (which
can hardly fail).
The usual idiom when writing a large block of data, without
interruption, is to just write it, without flushing, then close the file
and check for errors then. This is not appropriate when writing with
interruptions if you want the data to appear immediately, and it has the
disadvantage that if your program crashes, some of the data you've
"written" will have disappeared, which can make debugging harder.