Clarification on fsetpos, C++ - c++

I am a little confused with function fsetpos in the stdio.h library. I want to be to write to different indexes (i.e do not want to write to a file contiguously) in a file. I was considering using fsetpos however the documentation states..
The internal file position indicator associated with stream is set to the position
represented by pos, which is a pointer to an fpos_t object whose value shall have been
previously obtained by a call to fgetpos.
It does not make sense to me that I have to set the position based on the call from fgetpos. Whats the point since it will just set it to the position it is already set at. Or I am I not understanding it correctly ?

From the C11 standard, fseek has a similar limitation:
For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET
The reason is that text streams don't have a one-to-one mapping between the actual bytes of the source and the bytes you would get from fgetc; e.g. on windows systems, the newline character in C tends to be translated into a sequence of two binary characters: carriage return, then line feed.
Consequently, the notion of arbitrarily positioning a text stream based on a numerical index is fraught with complications and surprises.
In fact, the documentation of ftell warns
For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.
Binary streams don't have this limitation, although
A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END
The above assumes you are working with byte-oriented streams. Wide-oriented streams have additional restrictions. e.g. under Streams:
Binary wide-oriented streams have the file-positioning restrictions ascribed to both text and binary streams
and
For wide-oriented streams, after a successful call to a file-positioning function that leaves the file position indicator prior to the end-of-file, a wide character output function can overwrite a partial multibyte character; any file contents beyond the byte(s) written are henceforth indeterminate
fsetpos does more than just set the file position: again from the C11 standard:
The fsetpos function sets the mbstate_t object (if any) and file position indicator
which makes it more suitable for setting the position in a wide-oriented streams.

Related

What is a marker of a stream and why there's only 1 marker in a stream?

From C++ Primer 5th (emphasized mine):
There Is Only One Marker
The fact that the library distinguishes between the “putting” and “getting” versions of the seek and tell functions can be misleading. Even though the library makes this distinction, it maintains only a single marker in a stream—there is not a distinct read marker and write marker.
When we’re dealing with an input-only or output-only stream, the distinction isn’t even apparent. We can use only the g or only the p versions on such streams. If we attempt to call tellp on an ifstream, the compiler will complain. Similarly, it will not let us call seekg on an ostringstream.
The fstream and stringstream types can read and write the same stream. In
these types there is a single buffer that holds data to be read and written and a single marker denoting the current position in the buffer. The library maps both the g and p positions to this single marker.
Because there is only a single marker, we must do a seek to reposition the
marker whenever we switch between reading and writing.
All I know about stream buffer is from this page https://en.cppreference.com/w/cpp/io/basic_streambuf. From the text and diagram, I know that a stream is a source containing data, which can at most have 2 buffers, maintained by 6 pointers.
So what is the marker mentioned in quote practically? And why is it saying that there's only 1 marker for a stream that allows both input and output, which violates my basic understanding of stream?
I'm not an expert, so just an educated guess based on my own knowledge:
the so called marker is the current offset from beginning of the stream, so the first byte is offset = 0.
the distinction between "putting" and "getting" allows the user to implement his own stream classes that uses different g and p, thus not relaying on a single marker for reading/writing, this approach provides more flexibility, since you're not tied to a single marker.

Difference between opening a file in binary vs text [duplicate]

This question already has answers here:
Difference between files written in binary and text mode
(7 answers)
Closed 9 years ago.
I've done some stuff like:
FILE* a = fopen("a.txt", "w");
const char* data = "abc123";
fwrite(data, 6, 1, a);
fclose(a);
and then in the generated text file, it says "abc123" just like expected. But then I do:
//this time it is "wb" not just "w"
FILE* a = fopen("a.txt", "wb");
const char* data = "abc123";
fwrite(data, 6, 1, a);
fclose(a);
and get the exact same result. If I read the file using binary or normal mode, it also gives me the same result. So my question is, what is the difference between fopening with or without binary mode.
Where I read about fopen modes: http://www.cplusplus.com/reference/cstdio/fopen/
The link you gave does actually describe the differences, but it's buried at the bottom of the page:
http://www.cplusplus.com/reference/cstdio/fopen/
Text files are files containing sequences of lines of text. Depending on the environment where the application runs, some special character conversion may occur in input/output operations in text mode to adapt them to a system-specific text file format. Although on some environments no conversions occur and both text files and binary files are treated the same way, using the appropriate mode improves portability.
The conversion could be to normalize \r\n to \n (or vice-versa), or maybe ignoring characters beyond 0x7F (a-la 'text mode' in FTP). Personally I'd open everything in binary-mode and use a good Unicode or other text-encoding library for dealing with text.
The most important difference to be aware of is that with a stream opened in text mode you get newline translation on non-*nix systems (it's also used for network communications, but this isn't supported by the standard library). In *nix newline is just ASCII linefeed, \n, both for internal and external representation of text. In Windows the external representation often uses a carriage return + linefeed pair, "CRLF" (ASCII codes 13 and 10), which is converted to a single \n on input, and conversely on output.
From the C99 standard (the N869 draft document), §7.19.2/2,
A text stream is an ordered sequence of characters composed into lines, each line
consisting of zero or more characters plus a terminating new-line character. Whether the
last line requires a terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to conform to differing
conventions for representing text in the host environment. Thus, there need not be a one-
to-one correspondence between the characters in a stream and those in the external
representation. Data read in from a text stream will necessarily compare equal to the data
that were earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last character is a new-line character.
Whether space characters that are written out immediately before a new-line character
appear when read in is implementation-defined.
And in §7.19.3/2
Binary files are not truncated, except as defined in 7.19.5.3. Whether a write on a text
stream causes the associated file to be truncated beyond that point is implementation-
defined.
About use of fseek, in §7.19.9.2/4:
For a text stream, either offset shall be zero, or offset shall be a value returned by
an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.
About use of ftell, in §17.19.9.4:
The ftell function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.
I think that’s the most important, but there are some more details.

How does the buffer know how many characters to transfer from the external file during a flush operation?

Say I have an input operation:
file >> x;
If the internal buffer of file is empty underflow() will be called to import characters from the external device to the internal buffer of file. It is implementation-defined if the buffer will be partially or completely filled after this flush operation. Taking that into account, is it possible that if x is a string and I am expecting an input value of a certain length, that the buffer is in its right to transfer fewer characters than that? Can this happen?
There is no real constraint on how many characters underflow() makes available. The only real constraint is that a stream which hasn't reached EOF needs to make, at least, one character available. With respect specifically std::filebuf (or std::basic_filebuf<...>) the stream may be unbuffered (if setbuf(0, 0) was called) in which case it would, indeed, make individual characters available. Otherwise, the stream will try to fill its internal buffer and rely on the operating system to have a the underlying operation return a suitable amount of bytes if there are few available, yet.
I'm not sure I quite understand your question: the operation file >> x will return once x is completely read which can happen if the stream indicated by file has reached its end or when a whitespace character is found (and if with "string" you mean char*, a non-zero value stored in file.width() is also taken into account). With respect to the underlying stream buffer, clearly x may require multiple reads to the underlying representation, i.e., it is unpredictable how many calls to underflow() are made. Given that the file's internal buffer is probably matching the disc's block size, I would expect that at most one call to underflow() is made for "normal" strings. However, if the file read is a huge and doesn't contain any spaces many calls to underflow() may be made. Given that the stream needs to find whitespaces it has no way to predict how many characters are needed in the first place.

What is the rationale behind not having a writesome function for basic_ostream in the STL?

While working on a network library I recently noticed that having a pendant to basic_streambuf::in_avail in conjunction with a writesome function would be pretty handy for asynchronous I/O.
I have searched the web and checked several C++ references if there is any function which meets these demands, but it seems I had no luck. The only source which mentions similar functionality is Boost's Asio library, however the description clearly states that the function call will block until at least one byte has been sent which does not reflect my desired behaviour.
To elaborate on my question, I created a rough draft based on the C++ N3337 publication.
27.6.3.2.5 Put area [streambuf.pub.put]
streamsize in_depart();
Returns: If a write position is available, returns epptr() - pptr(). Otherwise
returns showmanycp().
27.6.3.4.5 Put area [streambuf.virt.put]
streamsize showmanycp();
Returns: An estimate of the number of characters which can be written to the sequence,
or -1. If it returns a positive value, then successive calls to overflow() will not
return traits::eof() until at least that number of characters have been written to the
stream. If showmanycp() returns -1, then calls to overflow()will fail.
Default behavior: Returns zero.
Remarks: Uses traits::eof().
27.7.3.7 Unformatted output functions [ostream.unformatted]
streamsize writesome(char_type* s, streamsize n);
Effects: Behaves as an unformatted output function (as described in 27.7.3.7, paragraph
1). After constructing a sentry object, if !good() callssetstate(failbit) which may
throw an exception, and return. Otherwise writes n characters designated by s. If
rdbuf()->in_depart() == -1, calls setstate(badbit) (which may throw
ios_base::failure (27.5.5.4)), and writes no characters;
— If rdbuf()->in_depart() == 0, writes no characters.
— If rdbuf()->in_depart() > 0, writes min(rdbuf()->in_depart(), n)) characters.
Returns: The number of characters written.
I'd guess you misinterpret the meaning of in_avail() and readsome(): all these say is that the stream had read a block of data and there are still characters in the buffer. Yes, it could theoretically do something different but in particukar when reading from a network you don't know how much data is available until you tried reading it.
Similarily, there is no way to guarantee to be able to get rid of a y specific number of characters: what would out.writesome(buf, n) mean? If you want it to mean that you dumped n characters into out's buffer, you can just create a suitable stream buffer and use write(). Guaranteeing that n bytes are sent with blocking, however, can't be done (at least, for 1 < n). I guess you want the latter, though.

Formatted and unformatted input and output and streams

I had been reading a few articles on some sites about Formatted and Unformatted I/O, however i have my mind more messed up now.
I know this is a very basic question, but i would request anyone can give a link [ to some site or previously answered question on Stackoverflow ] which explains, the idea of streams in C and C++.
Also, i would like to know about Formatted and Unformatted I/O.
The standard doesn't define what these terms mean, it just says which of the functions defined in the standard are formatted IO and which are not. It places some requirements on the implementation of these functions.
Formatted IO is simply the IO done using the << and >> operators. They are meant to be used with text representation of the data, they involve some parsing, analyzing and conversion of the data being read or written. Formatted input skips whitespace:
Each formatted input function begins execution by constructing an object of class sentry with the noskipws (second) argument false.
Unformatted IO reads and writes the data just as a sequence of 'characters' (with possibly applying the codecvt of the imbued locale). It's meant to read and write binary data, or function as a lower-level used by the formatted IO implementation. Unformatted input doesn't skip whitespace:
Each unformatted input function begins execution by constructing an object of class sentry with the default argument noskipws (second) argument true.
And allows you to retrieve the number of characters read by the last input operation using gcount():
Returns: The number of characters extracted by the last unformatted input member function called for the object.
Formatted IO means that your output is determined by a "format string", that means you provide a string with certain placeholders, and you additionally give arguments that should be used to fill these placeholders:
const char *daughter_name = "Lisa";
int daughter_age = 5;
printf("My daughter %s is %d years old\n", daughter_name, daughter_age);
The placeholders in the example are %s, indicating that this shall be substituted using a string, and %d, indicating that this is to be replaced by a signed integer number. There are a lot more options that give you control over how the final string will present itself. It's a convenience for you as the programmer, because it relieves you from the burden of converting the different data types into a string and it additionally relieves you from string appending operations via strcat or anything similar.
Unformatted IO on the other hand means you simply write character or byte sequences to a stream, not using any format string while you are doing so.
Which brings us to your question about streams. The general concept behind "streaming" is that you don't have to load a file or whatever input as a whole all the time. For small data this does work though, but imagine you need to process terabytes of data - no way this will fit into a single byte array without your machine running out of memory. That's why streaming allows you to process data in smaller-sized chunks, one at a time, one after the other, so that at any given time you just have to deal with a fix-sized amount of data. You read the data into a helper variable over and over again and process it, until your underlying stream tells you that you are done and there is no more data left.
The same works on the output side, you write your output step for step, chunk for chunk, rather than writing the whole thing at once.
This concept brings other nice features, too. Because you can nest streams within streams within streams, you can build a whole chain of transformations, where each stream may modify the data until you finally receive the end result, not knowing about the single transformations, because you treat your stream as if there were just one.
This can be very useful, for example C or C++ streams buffer the data that they read natively from e.g. a file to avoid unnecessary calls and to read the data in optimized chunks, so that the overall performance will be much better than if you would read directly from the file system.
Unformatted Input/Output is the most basic form of input/output. Unformatted input/output transfers the internal binary representation of the data directly between memory and the file. Formatted output converts the internal binary representation of the data to ASCII characters which are written to the output file. Formatted input reads characters from the input file and converts them to internal form. Formatted