What does the g stand for in gcount, tellg and seekg? - c++

What does the g stand for in std::iostream's gcount, tellg and seekg members? And the p in pcount, tellp and seekp?
Why aren't they called just count, tell and seek?

In streams supporting both read and write, you actually have two positions, one for read (i.e. "get" denoted by "g") and one for write (i.e. "put" denoted by a "p").
And that's why you have a seekp (inherited from basic_ostream), and a seekg (inherited from basic_istream).
Side note: The language C has - in contrast to C++ - only one such function fseek for both pointers; There it is necessary to re-position the pointer when switching from read to write and vice versa (cf., for example, this answer). To avoid this, C++ offers separate functions for read and write, respectively.

C++ offers two pointers while navigating the file: the get pointer and the put pointer. The first one is used for read operations, the second one for write operations.
seekg() is used to move the get pointer to a desired location with respect to a reference point.
tellg() is used to know where the get pointer is in a file.
seekp() is used to move the put pointer to a desired location with respect to a reference point.
tellp() is used to know where the put pointer is in a file.
Main source: Quora, answer by Gunjan B. Yadav on Dec 1, 2017.

Related

When is calling basic_filebuf::pbackfail allowed/defined to succeed

On implementing a basic_filebuf I stumbled over basic_filebuf::pbackfail and don't fully understand its definition.
From cplusplus.com
Moves the current input position on position back to point to the previous character and, if supported, makes c available as that next character to be read.
If the implementation does not support writing to putback positions, c shall either match the character at the putback position or be the end-of-file value (traits_type::eof()). Otherwise, the function fails. [...]
If the get pointer (gptr) is at the beginning of the character sequence before the call, the function may either fail or make additional putback positions available and succeed, depending on the library implementation.
Or from cppreference:
1) The caller is requesting that the get area is backed up by one character (pbackfail() is called with no arguments), in which case, this function re-reads the file starting one byte earlier and decrements basic_streambuf::gptr(), e.g. by calling gbump(-1).
2) The caller attempts to putback a different character from the one retrieved earlier (pbackfail() is called with the character that needs to be put back), in which case
a) First, checks if there is a putback position, and if there isn't, backs up the get area by re-reading the file starting one byte earlier.
a) Then checks what character is in the putback position. If the character held there is already equal to c, as determined by Traits::eq(to_char_type(c), gptr()[-1]), then simply decrements basic_streambuf::gptr().
b) Otherwise, if the buffer is allowed to modify its own get area, decrements basic_streambuf::gptr() and writes c to the location pointed to gptr() after adjustment.
So in essence both say that the input position is decremented (unless it is at the start of the file) and possibly a char is put back. So the following should succeed (assume prepared file according to comments, using std classes for comparison of behavior):
std::fstream f(..., in | out | binary);
f.get() // == 'a'
f.get() // == 'b'
f.sync(); // or f.seekg(0)
f.putback('b');
f.putback('a');
f.putback(); // may fail
However on libc++ the first putback fails already and checking the source code I found pbackfail to be guarded by if (__file_ && this->eback() < this->gptr()) aka "if there is an open file and there is space at the front of the current read buffer".
A flush/sync/seek clears the read buffer which explains the failing putback. When using unbuffered IO there will only be a single char space in the read buffer so (at least) the 2nd putback will fail even without the flush. Or the second get might cross a buffer "border" which means "b" will be the first char in the current buffer which also makes the second putback fail.
Question: How is putback exactly specified? It seems to be only valid immediately after a get although both cppreference and cplusplus seem to imply that the read position is decremented in any case. If they are right, is libc++ non-conforming or am I missing anything?
Question: How is putback exactly specified?
It is specified in the section istream.unformatted.
However, that refers to sputbackc, and you'll probably want to read the general stream buffer requirements as well.

What is a marker of a stream and why there's only 1 marker in a stream?

From C++ Primer 5th (emphasized mine):
There Is Only One Marker
The fact that the library distinguishes between the “putting” and “getting” versions of the seek and tell functions can be misleading. Even though the library makes this distinction, it maintains only a single marker in a stream—there is not a distinct read marker and write marker.
When we’re dealing with an input-only or output-only stream, the distinction isn’t even apparent. We can use only the g or only the p versions on such streams. If we attempt to call tellp on an ifstream, the compiler will complain. Similarly, it will not let us call seekg on an ostringstream.
The fstream and stringstream types can read and write the same stream. In
these types there is a single buffer that holds data to be read and written and a single marker denoting the current position in the buffer. The library maps both the g and p positions to this single marker.
Because there is only a single marker, we must do a seek to reposition the
marker whenever we switch between reading and writing.
All I know about stream buffer is from this page https://en.cppreference.com/w/cpp/io/basic_streambuf. From the text and diagram, I know that a stream is a source containing data, which can at most have 2 buffers, maintained by 6 pointers.
So what is the marker mentioned in quote practically? And why is it saying that there's only 1 marker for a stream that allows both input and output, which violates my basic understanding of stream?
I'm not an expert, so just an educated guess based on my own knowledge:
the so called marker is the current offset from beginning of the stream, so the first byte is offset = 0.
the distinction between "putting" and "getting" allows the user to implement his own stream classes that uses different g and p, thus not relaying on a single marker for reading/writing, this approach provides more flexibility, since you're not tied to a single marker.

How to Duplicate a file pointer [duplicate]

This question already has answers here:
Duplicating file pointers?
(3 answers)
Closed 7 years ago.
I want to use fgets twice on the same stream. I have defined two file pointer pointing to the same file but when I use fgets on one of the pointer, the other also gets modified.
fun(FILE * input) {
FILE * input_dup=input;
char str[2];
fgets(str, 2, input);
fgets(str, 2, input_dup);
}
On the second call to fgets, why is it reading the next character.. It should give the same output as they both are pointing to the same location
Well, you are laboring under a fundamental mis-understanding:
If you copy a pointer, that does not copy the object it points to.
As it happens, there's no standard way for duplicating a FILE (though there are nonstandard ones, see: Duplicating file pointers?).
Which doesn't happen you are SOL, you can just use ftell to get the current position, and fseek to get back there (provided the stream is seekable, like a file).
As with all C++ coding, it is doing exactly what you told it to do. If you want another copy, you could make an overriding function that tells the system to read and act and copy the stream and return a pointer to an array. That would accomplish your goal.
Asecond solution would be to unget the stream's last char, then read again.

Handling unget and putback with file streams

I have implemented std::basic_streambuf derived wrapper around std::basic_filebuf which converts between encodings. Within this wrapper I use a single buffer for both input and output. The buffering technique comes from this article.
Now, the problem I can't figure out is this. My internal buffer is filled on calls to underflow. According to the article, when switching from input to output, the buffer should be put in a state of limbo. To do this, I need to unget the unread data in the buffer. Reading the docs and the source codes, unget and putback are not guaranteed to succeed. This will leave me with an invalid tellg pointer with the next input operation.
I'm not asking for somebody to write this for me, but I am asking advice as to how to manage ungetting data from std::basic_filebuf in a way that will not fail.
I think, the only sure way is to calculate the bytes that would be written to file and adjust the offset accordingly. It's not as simple as it sounds though. The filebuf may have an associated locale, unknown at compile time. I tried getting the facet and passing the data through it's out member, but it doesn't work. The previously read data may have a non default mbstate_t value, and some codecvt objects also write a BOM.
Basically, it's almost impossible to calculate the 'on file' length of a section of file data after it has passed through a codecvt.
I have tagged this question with 'c' since 'c'-file streams also work with buffers and also use get and put pointers. std::basic_filebuf is just a wrapper around a 'c'-file stream. Answers in 'c' are also applicable to this problem.
Does anybody have any suggestions as to how to implement unlimited unget on file streams?

Difference between putback() and unget()

I'm using a Standard iostream to get some input from a file, and I'm confused about unget() versus putback(character). It seems to me from the documentation that these functions are effectively identical, where unget() just remembers the character put in, so I'm nervous. I've always used putback(character), but character is always the last read character and I've been thinking about changing to unget(). Is putback(character) always identical to unget(), if character is always the last read character?
You can't lie with unget(). It "ungets" the last-read character. You can lie with putback(c). You can "putback" some character other than the last-read character. Sometimes putting back a character other than the last-read character can be useful.
Also, if the underlying read buffer really does have buffering capability, you can "putback" more than one character. I think ungetc() is limited to one character.
Edit
Nope. It looks like unget() can go as far back as putback().
It's not the answer you probably expect, but want to introduce my reasoning. Documentation stays that the methods putback and unget call streambuf::sputbackc and streambuf::sungetc respectively. Definitions are as follow:
streambuf::sungetc
Moves the get pointer one character backwards, making the last character gotten by an input operation available once again for the next input operation.
During its operation, the function will call the protected virtual member function pbackfail if the get pointer gptr points to the same position as the beginning pointer eback.
The other one:
streambuf::sputbackc
The get pointer is moved back to point to the character right before its current position so the last character gotten, c, becomes available again as the character to be read at that position by the next input operation.
During its operation, the function calls the protected virtual member function pbackfail either if the character c doesn't match gptr()[-1] or if the get pointer gptr points to the same position as the beginning pointer eback.
When c does not match the character at that position, the default definition of pbackfail in streambuf will prepend c to be the character extracted at that position if possible, but derived classes may override this behavior.
The member function sungetc behaves in a similar way but without taking any parameter
As sputbackc calls pbackfail if character doesn't match, it means the method has to check if the values are equal. It looks like the additional check is the only overhead, but have no idea how it is solved in practise. I can imagine that if the last character is not stored in the object then it has to be reread, so you might expect it even when the characters are guaranteed to be the same.
I was a little bit concerned about situation when we call unget, but last character is not available. Would the putback put the value correctly? I doubt, but it shouldn't be the case while operating on files.