I'm trying to learn a bit more about how I/O streams work in C++, and I'm really confused at when to use what.
What exactly is a streambuf?
When do I use a streambuf, as compared to a string, an istream, or a vector? (I already know the last three, but not how streambuf compares to them, if it does at all.)
With the help of streambuf, we can work in an even lower level. It allows access to the underlying buffers.
Here are some good examples : Copy, load, redirect and tee using C++ streambufs and in reference to comparison, This might be helpful,
See this for more details : IOstream Library
Stream buffers represent input or output devices and provide a low level interface for unformatted I/O to that device. Streams, on the other hand, provide a higher level wrapper around the buffer by way of basic unformatted I/O functions and especially via formatted I/O functions (i.e., operator<< and operator>> overloads). Stream objects may also manage a stream buffer's lifetime.
For example a file stream has an internal file stream buffer. The stream manages the lifetime of the buffer and the buffer is what provides actual read and write capabilities to a file. The stream's formatting operators ultimately access the stream buffer's unformatted I/O functions, so you only ever have to use the stream's I/O functions, and don't need to touch the buffer's I/O functions directly.
Another way to understand the differences is to look at the different uses they make of locale objects. Streams use the facets that have to do with formatting such as numpunct and num_get. You can also expect that the overloads of stream operator<< and operator>> for custom time or money data types will use the time and money formatting facets. Stream buffers, however, use the codecvt facets in order to convert between the units their interface uses and bytes. So, for example, the interface for basic_streambuf<char16_t> uses char16_t and so basic_streambuf<char16_t> internally uses codecvt<char16_t, char, mbstate_t> by default to convert the formatted char16_t units written to the buffer to char units written to the underlying device. So you can see that streams are mostly for formatting and stream buffers provide a low level interface for unformatted input or output to devices which may use a different, external encoding.
You can use a stream buffer when you want only unformatted access to an I/O device. You can also use stream buffers if you want to set up multiple streams that share a stream buffer (though you'll have to carefully manage the lifetime of the buffer). There are also special purpose stream buffers you might want to use, such as wbuffer_convert in C++11 which acts as a façade for a basic_streambuf<char> to make it look like a wide character stream buffer. It uses the codecvt facet it's constructed with instead of using the codecvt facet attached to any locale. You can usually achieve the same effect by simply using a wide stream buffer imbued with a locale that has the appropriate facet.
Related
When my book says : a stream is a sequence of characters read or written from a device then my book says : the istream and the ostream types represent input and output stream ( what does it mean ?) how exactly do cout and cin work?
I'm not a language native and I can't understand when my book says : the output operator writes the given value on the given ostream.
The fundamental idea behind the metaphor of a "stream" is that it provides or consumes data in a single-pass fashion: for example, for an input stream, data is produced precisely once. You can ask the stream for some more data, and once the stream has given you the data, it will never give you that same data again.
This is why in order to do anything meaningful with streams, you will very usually want to attach a kind of buffer (a "stream buffer", if you will) to the stream which stores some (usually small) amount of data that's been extracted from the stream in a random-access, inspectable and processable piece of memory. (There are similar, reversed ideas for output streams.)
Occasionally it makes sense to process streams without any buffering. For example, if you have an input and an output stream and you read integers from the input and write the doubled value of each integer to the output, you can do that without buffering.
So when thinking about ranges of data, streams are things that you can traverse only once and never again. If you're thinking in terms of forward progress, then streams have another property, which is that they may block: an input stream may block when it has no data, and an output stream may block when it cannot accept any more data. That way, from within the program logic, you can imagine that a input stream always contains data until its end is reached, but a program relying on that may run for an arbitrary, unbounded amount of wall-clock time.
You can define it in simple words as the flow of data which can be the input flow and output flow. So you can think of it as the flow of data from a program to a file or vice versa. The below image may help you understand it better:
From MSDN
The stream is the central concept of the iostream classes. You can
think of a stream object as a smart file that acts as a source and
destination for bytes. A stream's characteristics are determined by
its class and by customized insertion and extraction operators.
From a language point of view, streams are just objects with a certain streamlike interface: they allow you to extract data from it (an input stream) or to push data into it (an output stream). Input streams do not allow random access (whatever that may mean) to the data they provide (whatever that might be).
Note that this is purely an interface description for a class, nothing more. Where the stream gets its data from / what it does with the data pushed into it, is entirely up to the stream. A stream is an abstraction for recieving/sending data.
A concrete implementation of a stream may read data from a terminal application and present it to the program (cin), another one may return characters to the terminal application as the program requests (cout), a third one may read/write data from/to a file (the fstreams), a fourth one may read/write data from/to a memory buffer (stringstream), a fifth one may "read" data from a random number generator, and so on. The possibility are numerous, as are the different implementations of the stream interface that have been created.
That is the beauty of the abstraction of streams: it is a very flexible way for a piece of code to communicate. The process does not need to know anything about the source/destination of its data, other than that it can read/write data from/to it.
A stream is a logical abstraction of physical file(regular file or device file) for IO operations. In Unix, a stream is a pointer to _IO_FILE structure defined in glibc. The _IO_FILE structure given by the OS stores attributes of the opening file. Application program operates(read, write, seek, and etc) on these file attributes to access data in the file.
You can build all type of streams (char stream, byte stream, input stream, output stream, or even bidirectional stream) on top of the above stream concept. They are all implementations or wrappers/decorators of the above stream.
My Operating Systems professor was talking today about how a read system call is unbuffered while a istream::read function has a buffer. This left me a bit confused as you still make a buffer for the istream::read function when using it.
The only thing I can think of is that there are more than one buffers in the istream::read function call. Why?
What does the istream::read() function do differently from the read() function system call?
The professor was talking about buffers internal to the istream rather than the buffer provided by the calling code where the data ends up after the read.
As an example, say you are reading individual int objects out of an istream, the istream is likely to have an internal buffer where some number of bytes is stored and the next read can be satisfied out of that rather than going to the OS. Note, however, that whatever the istream is hooked to very likely has internal buffers as well. Most OSes have means to perform zero-copy reads (that is, read directly from the I/O source to your buffer), but that facility comes with severe restrictions (read size must be multiple of some particular number of bytes, and if reading from a disk file the file pointer must also be on a multiple of that byte count). Most of the time such zero-copy reads are not worth the hassle.
I am working on a TCP server using boost asio and I got lost with choosing the best data type to work with when dealing with byte buffers.
Currently I am using std::vector<char> for everything. One of the reasons is that most of examples of asio use vectors or arrays. I receive data from network and put it in a buffer vector. Once a packet is available, it is extracted from the buffer and decrypted/decompressed if needed (both operations may result in more amount of data). Then multiple messages are extracted from the payload.
I am not happy with this solution because it involves inserting and removing data from vectors constantly, but it does the job.
Now I need to work on data serialization. There is not an easy way to read or write arbitrary data types from a char vector so I ended up implementing a "buffer" that hides a vector inside, and allows to write (wrapper for insert) and read (wrapper for casting) from it. Then I can write uint16 code; buffer >> code; and also add serialization/deserialization methods to other objects while keeping things simple.
The thing is that every time I think about this I feel like I am using the wrong data type as container for the binary data. Reasons are:
Streams already do a good job as potentially endless source of data input or data output. While in background this may result in inserting and removing data, probably does a better job than using a char vector.
Streams already allow to read or write basic data types, so I don't have to reinvent the wheel.
There is no need to access to a specific position of data. Usually I need to read or write sequentially.
In this case, are streams the best choice or is there something that I am not seeing?. And if so, is stringstream the one I should use?
Any reasons to avoid streams and work only with containers?
PD: I can not use boost serialization or any other existing solution because I don't have control over the network protocol.
Your approach seems fine. You might consider a deque instead of a vector if you're doing a lot of stealing from the end and erasing from the front, but if you use circular-buffer logic while iterating then this doesn't matter either.
You could switch to a stream, but then you're completely at the mercy of the standard library, its annoyances/oddities, and the semantics of its formatted extraction routines — if these are insufficient then you have to extract N bytes and do your own reinterpretation anyway, so you're back to square one but with added copying and indirection.
You say you don't need random-access, so that's another reason not to care either way. Personally I like to have random-access in case I need to resync, or seek-ahead, or seek-behind, or even just during debugging want to have better capabilities without having to suddenly refactor all my buffer code.
I don't think there's any more specific answer to this in the general case.
I would like to copy data efficiently between std::streambuf instances. That is, I would like to shovel blocks of data between them, as opposed to perform character-by-character copying. For example, this is not what I am looking for:
stringbuf in{ios_base::in};
stringbuf out{ios_base::out};
copy(istreambuf_iterator<char>{in},
istreambuf_iterator<char>{},
ostreambuf_iterator<char>{out});
There exists syntactic sugar for this, with a bit more error checking:
ostream os{&out};
os << ∈
Here's a snippet of the implementation of operator<<(basic_streambuf<..>*) in my standard library (Mac OS X, XCode 7):
typedef istreambuf_iterator<_CharT, _Traits> _Ip;
typedef ostreambuf_iterator<_CharT, _Traits> _Op;
_Ip __i(__sb);
_Ip __eof;
_Op __o(*this);
size_t __c = 0;
for (; __i != __eof; ++__i, ++__o, ++__c)
{
*__o = *__i;
if (__o.failed())
break;
}
The bottom line is: this is still per-character copying. I was hoping the standard library uses an algorithm that relies on the block-level member functions of streambuffers, sputn and sgetn, as opposed to per-character transport. Does the standard library provide such an algorithm or do I have to roll my own?
I'm afraid that the answer is: it is not possible with the current design of the standard library. The reason is that streambuffers completely hide the character sequence they manage. This makes it impossible to directly copy bytes from the get area of one streambuffer to the put area of another.
If the "input" streambuffer would expose its internal buffer, then the "output" streambuffer could just use sputn(in.data(), in.size()). Or more obviously: if the output buffer also exposed its internal buffer, then one could use plain memcpy to shovel bytes between the two. Other I/O libraries operate in this fashion: the stream implementation of Google's Protocol Buffers, for example. Boost IOStreams has an optimized implementation to copy between streams. In both cases, efficient block-level copying is possible because the streambuffer equivalent provides access to its intermediary buffer.
In fact, streambuffers ironically do not even need to have a buffer: when operating unbuffered, each read/write goes directly to the underlying device. Presumably this is one reason why the standard library does not support introspection. The unfortunate consequence is that no efficient copying between input and output streambuffers is possible. Block-level copying requires an intermediary buffer, and a copy algorithm would operate as follows:
Read from the input streambuffer via sgetn into the intermediary buffer.
Write from the intermediary buffer into the output streambuffer via sputn.
Go to 1. until input is exhausted or writes to the output streambuffer fail
Although i have read about buffer and stream and it's working with files in c++ but i don't know what is the need of a buffer if a stream is there, stream is always there to transfer the data of one file to the program. So why do we use buffers to store data(performing same task that stream does) and what are buffered and unbuffered stream.
Consider a stream that writes to a file. If there were no buffer, if your program wrote a single byte to the stream, you'd have to write a single byte to the file. That's very inefficient. So streams have buffers to decouple operations one on side of the stream from operations on the other side of the stream.
Ok lets lets start from the scratch suppose you want to work with files. For this purpose you would have to manage how the data is entered into your file and if the sending of data into the file was successful or not, and all other basic working problems. Now either you can manage all that on your own which would take a lots a time and hard work or What you can do is you can use a stream.
Yes, you can allocate a stream for such purposes. Streams work with abstraction mechanism i.e. we c++ programmers don't know how they are working but we only know that we are at the one side of a stream (on our program's side) we offer our data to the stream and it has the responsibility to transfer data from one end to the other (file's side)
Eg-
ofstream file("abc.txt"); //Here an object of output file stream is created
file<<"Hello"; //We are just giving our data to stream and it transfers that
file.close(); //The closing of file
Now if you work with files you should know that working with files is a really expensive operation i.e. it takes more time to access file than to access memory and we also don't have to perform file operations every time. So programmers created a new feature called buffer which is a part of computer's memory and stores data temporarily for handling files.
Suppose at the place of reading file every time to read data you just read some memory location where all the data of file is copied temporarily.Now it would be a less expensive task as you are reading memory not file.
Those streams that have a buffer for their working i.e. they open the file and by default copy all the data of file to the buffer are called as buffered streams whereas those streams which don't use any buffer are called as unbuffered streams.
Now if you enter data to a buffered stream then that data will be queued up until the stream is not flushed (flushing means replacing the data of buffer with that of file). Unbuffered streams are faster in working (from the point of user at one end of the stream) as data is not temporarily stored into a buffer and is sent to the file as it comes to the stream.
A buffer and a stream are different concepts.
A buffer is a part of the memory to temporarily store data. It can be implemented and structured in various ways. For instance, if one wants to read a very large file, chunks of the file can be read and stored in the buffer. Once a certain chunk is processed the data can be discarded and the next chunk can be read. A chunk in this case could be a line of the file.
Streams are the way C++ handles input and output. Their implementation uses buffers.
I do agree that stream is probably the poorest written and the most badly udnerstood part of standard library. People use it every day and many of them have not a slightest clue how the constructs they use work. For a little fun, try asking what is std::endl around - you might find that some answers are funny.
On any rate, streams and streambufs have different responsibilities. Streams are supposed to provide formatted input and output - that is, translate an integer to a sequence of bytes (or the other way around), and buffers are responsible of conveying the sequence of bytes to the media.
Unfortunately, this design is not clear from the implementation. For instance, we have all those numerous streams - file stream and string stream for example - while the only difference between those are the buffer. The stream code remains exactly the same. I believe, many people would redesign streams if they had their way, but I am afraid, this is not going to happen.