I need some help to understand this paragraph from Nicolai Josuttis book - c++

I'd like to have some explanation regarding this paragraph on page 518 of Nicolai Josuttis book "The C++ Standard Library" (first edition):
These flags are maintained by the class basic_ios and are thus present in all objects of type basic_istream or basic_ostream. However, the stream buffers don't have state flags. One stream buffer can be shared by multiple stream objects, so the flags only represent the state of the stream as found in the last operation. Even this is only the case if goodbit was set prior to this operation. Otherwise the flags may have been set by some earlier operation.
I don't understand what does he mean by "the stream buffer don't have state flags" and right below this paragraph there's a table with the title "Member functions for stream states".

Streams consist of two objects:
The actual stream object (std::istream or std::ostream, derived from std::ios).
The stream buffer, i.e., a class derived from std::streambuf.
The state flags are present in std::ios but not in std::streambuf.

There are "stream buffer objects", and a "stream objects". One stream buffer can be shared between multiple stream objects. Each stream object has its own set of flags - so one stream may be "reached end of file", where another is not - or the flags for output in decimal or hex may be completely different for two output streams using the same buffer.
[Of course, if you are using the same buffer for multiple streams, you will have to take care that you don't mess things up - and it's not a common thing to share the buffer over multiple streams, but it can be done!]

The iostate flags store things about output formatting: whether you want numbers printed in decimal or hex, capitals or lowercase, etc. Stream objects control formatting, so the flags are inside the stream object.
In iostreams, buffering is separate from formatting. Linked to the iostream object is a stream buffer object, which controls sending and/or receiving characters from an underlying source. The buffer object has no such flags; its only state variables deal with preparing (encoding) the characters and optionally storing (buffering) them to reduce the number of times the operating system is asked to perform I/O. (Or in the case of stringstream, the buffer provides the ultimate storage behind the stream.)

So a stream has state flags, but the stream buffer it uses doesn't.
A stream buffer goes inside a stream.
The buffer holds some amount of bytes that the stream is reading/writing before sending/recieving it to whatever the stream is talking to (file/stdin/tcpsocket/etc.).
Stream reference: http://www.cplusplus.com/reference/istream/iostream/
Stream Buffer Reference: http://www.cplusplus.com/reference/streambuf/streambuf/
By default a stream will usually create it's own stream buffer, but you can tell it to use one of your choosing in the constructor: http://www.cplusplus.com/reference/istream/iostream/iostream/
Or you can get/set the buffer with the rdbuf method.

The "stream buffer" is an object of class basic_streambuf. That class doesn't have state flags. Every stream (basic_istream or basic_ostream) has a pointer to a basic_streambuf, but the flags are a property of the stream, not of the stream buffer.

Related

What is a marker of a stream and why there's only 1 marker in a stream?

From C++ Primer 5th (emphasized mine):
There Is Only One Marker
The fact that the library distinguishes between the “putting” and “getting” versions of the seek and tell functions can be misleading. Even though the library makes this distinction, it maintains only a single marker in a stream—there is not a distinct read marker and write marker.
When we’re dealing with an input-only or output-only stream, the distinction isn’t even apparent. We can use only the g or only the p versions on such streams. If we attempt to call tellp on an ifstream, the compiler will complain. Similarly, it will not let us call seekg on an ostringstream.
The fstream and stringstream types can read and write the same stream. In
these types there is a single buffer that holds data to be read and written and a single marker denoting the current position in the buffer. The library maps both the g and p positions to this single marker.
Because there is only a single marker, we must do a seek to reposition the
marker whenever we switch between reading and writing.
All I know about stream buffer is from this page https://en.cppreference.com/w/cpp/io/basic_streambuf. From the text and diagram, I know that a stream is a source containing data, which can at most have 2 buffers, maintained by 6 pointers.
So what is the marker mentioned in quote practically? And why is it saying that there's only 1 marker for a stream that allows both input and output, which violates my basic understanding of stream?
I'm not an expert, so just an educated guess based on my own knowledge:
the so called marker is the current offset from beginning of the stream, so the first byte is offset = 0.
the distinction between "putting" and "getting" allows the user to implement his own stream classes that uses different g and p, thus not relaying on a single marker for reading/writing, this approach provides more flexibility, since you're not tied to a single marker.

Clarify the difference between input/output stream and input/output buffer

EDITED
Input stream and Input buffer
From what I understand, when a key is pressed on the keyboard, the character go into the input stream (stdin) and get stored in the buffer. Then the scanf (in case of C) or cin(in case of C++) extracts the character from the buffer and places it in the main memory.
Output stream and Output buffer
Similarly, before characters are displayed on the screen, they are first stored in a buffer. Then the printf (in case of C) or cout(in case of C++) extracts the characters from the buffer (when it is full) and sends it to the output (stdout) stream.
Am I right? I've been stuck on this for quite a while now and my logic may be flawed.
Side note: scanf() is not the function to read input, see more here.
Now for your question: When asking about C (and, C++), e.g., the language, you should stay within the abstract concepts the language provides. So, don't start at the keyboard, that's far outside your program.
Start here: The operating system wants to deliver some input to you. Now, your C runtime provides a stream of input to your code. The stream is an abstract concept, it just means something you can continuously read from. This stream can be buffered or unbuffered, and if it's buffered, there are different modes (fully buffered or line buffered) available. You can configure all of that.
If your stream is unbuffered, this means the operating system has to wait until your code wants to read from the input stream. By default, your standard input stream is line buffered, which means your C runtime accepts the input immediately and puts it into a buffer until there is a newline -- your code calling input functions will get a result from that buffer.
Conceptually the same happens with output, just the other way around. If your output stream is for example line buffered, your C runtime will fill a buffer until there is a newline and deliver that whole line to the operating system for output. If the output is unbuffered, every single character is immediately passed to the operating system.
Disclaimer: this is still a lot simplified, but should be enough to start with.
As you ask about the term "buffer overflow" in the comments, mentioning gets() -- this is about a buffer inside your own code. With any input function that reads more than a single value/char, you have to provide your own buffer for it to store the result to. With gets(), there's no way to tell the function how large this buffer is, so it will just overflow it whenever the input available is too large. This is why gets() is ill-defined and meanwhile removed from the language C. It has nothing to do with buffers of your C runtime that are possibly used in the implementation of the I/O streams.

Handling unget and putback with file streams

I have implemented std::basic_streambuf derived wrapper around std::basic_filebuf which converts between encodings. Within this wrapper I use a single buffer for both input and output. The buffering technique comes from this article.
Now, the problem I can't figure out is this. My internal buffer is filled on calls to underflow. According to the article, when switching from input to output, the buffer should be put in a state of limbo. To do this, I need to unget the unread data in the buffer. Reading the docs and the source codes, unget and putback are not guaranteed to succeed. This will leave me with an invalid tellg pointer with the next input operation.
I'm not asking for somebody to write this for me, but I am asking advice as to how to manage ungetting data from std::basic_filebuf in a way that will not fail.
I think, the only sure way is to calculate the bytes that would be written to file and adjust the offset accordingly. It's not as simple as it sounds though. The filebuf may have an associated locale, unknown at compile time. I tried getting the facet and passing the data through it's out member, but it doesn't work. The previously read data may have a non default mbstate_t value, and some codecvt objects also write a BOM.
Basically, it's almost impossible to calculate the 'on file' length of a section of file data after it has passed through a codecvt.
I have tagged this question with 'c' since 'c'-file streams also work with buffers and also use get and put pointers. std::basic_filebuf is just a wrapper around a 'c'-file stream. Answers in 'c' are also applicable to this problem.
Does anybody have any suggestions as to how to implement unlimited unget on file streams?

What is a stream in C++?

I have been hearing about streams, more specifically file streams.
So what are they?
Is it something that has a location in the memory?
Is it something that contains data?
Is it just a connection between a file and an object?
The term stream is an abstraction of a construct that allows you to send or receive an unknown number of bytes. The metaphor is a stream of water. You take the data as it comes, or send it as needed. Contrast this to an array, for example, which has a fixed, known length.
Examples where streams are used include reading and writing to files, receiving or sending data across an external connection. However the term stream is generic and says nothing about the specific implementation.
IOStreams are a front-end interface (std::istream, std::ostream) used to define input and output functions. The streams also store formatting options, e.g., the base to use for integer output and hold a std::locale object for all kind of customization. Their most important component is a pointer to a std::streambuf which defines how to access a sequence of characters, e.g., a file, a string, an area on the screen, etc. Specifically for files and strings special stream buffers are provided and classes derived from the stream base classes are provided for easier creation. Describing the entire facilities of the IOStreams library can pretty much fill an entire book: In C++ 2003 about half the library section was devoted to stream related functionality.
Stream is linear queue that connects a file to the program and maintain the flow of data in both direction. Here the source is any file, I/O device, Hard disk, CD/DVD etc.
Basically stream is if two type 1.Text Stream 2.Binary stream
Text Stream : It is a sequence of character arranges in line and each line terminated by new line (unix).
Binary Stream: It is data as it is coded internally in computer's main memory, without any modification.
File system is designed to work with a wide variety of devices, including terminals, disk drives, tape drives etc. Even though each device is different, file system transforms each into a logical device called stream. Streams are device independent so same function can be used to write a disk file and a tape file. In more technical term stream provides a abstraction between programmer and actual device being used.

Formatted and unformatted input and output and streams

I had been reading a few articles on some sites about Formatted and Unformatted I/O, however i have my mind more messed up now.
I know this is a very basic question, but i would request anyone can give a link [ to some site or previously answered question on Stackoverflow ] which explains, the idea of streams in C and C++.
Also, i would like to know about Formatted and Unformatted I/O.
The standard doesn't define what these terms mean, it just says which of the functions defined in the standard are formatted IO and which are not. It places some requirements on the implementation of these functions.
Formatted IO is simply the IO done using the << and >> operators. They are meant to be used with text representation of the data, they involve some parsing, analyzing and conversion of the data being read or written. Formatted input skips whitespace:
Each formatted input function begins execution by constructing an object of class sentry with the noskipws (second) argument false.
Unformatted IO reads and writes the data just as a sequence of 'characters' (with possibly applying the codecvt of the imbued locale). It's meant to read and write binary data, or function as a lower-level used by the formatted IO implementation. Unformatted input doesn't skip whitespace:
Each unformatted input function begins execution by constructing an object of class sentry with the default argument noskipws (second) argument true.
And allows you to retrieve the number of characters read by the last input operation using gcount():
Returns: The number of characters extracted by the last unformatted input member function called for the object.
Formatted IO means that your output is determined by a "format string", that means you provide a string with certain placeholders, and you additionally give arguments that should be used to fill these placeholders:
const char *daughter_name = "Lisa";
int daughter_age = 5;
printf("My daughter %s is %d years old\n", daughter_name, daughter_age);
The placeholders in the example are %s, indicating that this shall be substituted using a string, and %d, indicating that this is to be replaced by a signed integer number. There are a lot more options that give you control over how the final string will present itself. It's a convenience for you as the programmer, because it relieves you from the burden of converting the different data types into a string and it additionally relieves you from string appending operations via strcat or anything similar.
Unformatted IO on the other hand means you simply write character or byte sequences to a stream, not using any format string while you are doing so.
Which brings us to your question about streams. The general concept behind "streaming" is that you don't have to load a file or whatever input as a whole all the time. For small data this does work though, but imagine you need to process terabytes of data - no way this will fit into a single byte array without your machine running out of memory. That's why streaming allows you to process data in smaller-sized chunks, one at a time, one after the other, so that at any given time you just have to deal with a fix-sized amount of data. You read the data into a helper variable over and over again and process it, until your underlying stream tells you that you are done and there is no more data left.
The same works on the output side, you write your output step for step, chunk for chunk, rather than writing the whole thing at once.
This concept brings other nice features, too. Because you can nest streams within streams within streams, you can build a whole chain of transformations, where each stream may modify the data until you finally receive the end result, not knowing about the single transformations, because you treat your stream as if there were just one.
This can be very useful, for example C or C++ streams buffer the data that they read natively from e.g. a file to avoid unnecessary calls and to read the data in optimized chunks, so that the overall performance will be much better than if you would read directly from the file system.
Unformatted Input/Output is the most basic form of input/output. Unformatted input/output transfers the internal binary representation of the data directly between memory and the file. Formatted output converts the internal binary representation of the data to ASCII characters which are written to the output file. Formatted input reads characters from the input file and converts them to internal form. Formatted