Get amount of read characters from formatted input operation - c++

I am reading numbers from an istreamby using the >> operator overload. This works fine, but now I need to know how many characters have been consumed by that operation. I'm currently using something like
int startPos = in.tellg();
double number;
in >> number;
int readChars = in.tellg() - startPos;
This does work in some cases but it is quite fragile. When using std::cin as in, this doesn't work at all though (I assume that this is because std::cin doesn't have a position in the stream a it's potentially an endless one).
My question is (I think) rather simple: How can I get the amount of characters that have been read when using the >> operator?
During my search I encountered gcount() but this only works for unformatted input.
The documentation of the >> operator doesn't seem to give a hint on this either: http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/

If the stream is formatted can't you just check the length of it?
Anways, std::istream::operator>> for C++ 98:
The function is considered to perform formatted input: Internally, the function accesses the input sequence by first constructing a sentry object (with noskipws set to false). Then (if good), it extracts characters from its associated stream buffer object as if calling its member functions sbumpc or sgetc, and finally destroys the sentry object before returning.
For C++ 11:
The function is considered to perform unformatted input: Internally, the function accesses the input sequence by first constructing a sentry object (with noskipws set to true). Then (if good), it extracts characters from its associated stream buffer object as if calling its member functions sbumpc or sgetc, and finally destroys the sentry object before returning.
The number of characters successfully read and stored by this function can be accessed by calling member gcount.
So it seems that you can only count characters from unformatted input.
But:
The unformatted input operations that modify the value returned by this function (gcount()) are: get, getline, ignore, peek, read, readsome, putback and unget.
Notice though, that peek, putback and unget do not actually extract any characters, and thus gcount will always return zero after calling any of them.
So maybe you can use, for instance, istream& getline (istream& is, string& str); or std::istream::get to get gcount() to count a formatted stream,

Related

What the general purpose when using cin.clear? [duplicate]

This question already has answers here:
Why would we call cin.clear() and cin.ignore() after reading input?
(4 answers)
Closed 5 years ago.
I am a beginner to c++, and I just can't wrap my head around whats cin.ignore & cin.clear, they make absolutely no sense to me. When you explain this to me, please be very descriptive
In C++ input processing, cin.fail() would return true if the last cin command failed.
Usually, cin.fail() would return true in the following cases:
anytime you reach the EOF and try to read anything, cin.fail() would return true.
if you try to read an integer and it receives something that cannot be converted to an integer.
When cin.fail() return true and error occurs, the input buffer of cin is placed in an "error state". The state would block the further input processing.
Therefore, you have to use cin.clear(). It would overwrite the current value of the stream internal error flag => All bits are replaced by those in state, if state is good bit all error flags are cleared.
For cin.ignore, first it would accesses the input sequence by first constructing a sentry object. After that, it extracts characters from its associated stream buffer object as if calling its member functions sbumpc or sgetc, and finally destroys the sentry object before returning.
Therefore, It commonly used to perform extracting and discarding characters. A classical cases of cin.ignore is that when you're using getline() after cin, it would leaves a newline in your buffer until you switch function. That why you MUST flush the newline out of the buffer.
std::cin.ignore() can be called three different ways:
No arguments: A single character is taken from the input buffer and discarded:
std::cin.ignore(); //discard 1 character
One argument: The number of characters specified are taken from the input buffer and discarded:
std::cin.ignore(33); //discard 33 characters
Two arguments: discard the number of characters specified, or discard characters up to and including the specified delimiter (whichever comes first):
std::cin.ignore(26, '\n'); //ignore 26 characters or to a newline, whichever comes first
source: http://www.augustcouncil.com/~tgibson/tutorial/iotips.html

Reading double from binary files

Need's formatting, editing will take some time.
Reading using fin >> d and using fin.read does different things. As fin >> d works it seems you have a file where the string representation of a double is written. Using fin.read suggests that your file is written in binary which it seems it is not. Also you should better use sizeof(double) instead of the hard coded constant 8.
The problem is that you missunderstood the semantic of std::ifstream::read function. According to C++ reference:
Note: This doc is for std::istream but applies to ifstream
std::istream::operator>>()
This operator (>>) applied to an input stream is known as extraction operator. It is overloaded as a member function for:
arithmetic types
Extracts and parses characters sequentially from the stream to interpret them as the representation of a value of the proper type, which is stored as the value of val.
Internally, the function accesses the input sequence by first constructing a sentry object (with noskipws set to false). Then (if good), it calls num_get::get (using the stream's selected locale) to perform both the extraction and the parsing operations, adjusting the stream's internal state flags accordingly. Finally, it destroys the sentry object before returning.
stream buffers and manipulators.
while for std::istream::read:
This function simply copies a block of data, without checking its contents nor appending a null character at the end.
So, when you do:
double d;
...
fin >> d;
You're storing a double into a double var. But ...
if you do:
double d;
...
fin.read((char*)&d, ...);
you are telling to c++: Ok here (&d) I have an address, I want you trate it as it was char* (the cast). And the function do what you want to do. But as you see in doc, the function will put in &d a block of data that has nothing to do with the walue you're expecting.
That's why operator>> works while read doesn't.

C++ file stream

Could someone please explain how c++ reads in files? I'm not asking the code to read in a file but after the ifstream >> variable, what are the rules to how c++ grabs the data ?
Is the file read like how a cin would read in the user input? meaning it stops after each whitespace? What happens after it reaches the end of a line? does it automatically proceed to the next line or do I have to write code for that? I know that it stops after eof, but I'm unsure of the process of extracting data and I can't write code if I don't understand the process. Thanks
Yes the input operator >> always "tokenizes" (stops at) whitespace. And reading from any input stream is working the same.
For very good information I suggest this reference. Especially the reference for the input operator is very detailed.
Basically when you call ifstream f;, you are creating a variable with access to the library. From there you must declare your intentions with that variable. Using f.open(fileName, ios::in); you can input from fileName using the >> operator, which actually operates like cin. It stops at white spaces like you'd expect. Once it reaches the end of a line, it continues as long as you have code that asks the operator to extract more. You dont have to do anything extra to tell it to move on to the next line.
More info can be found here.
The iostream formatted input and output operators are essentially defined in terms of the C library functions strtol/strtoul/strtod (cf. 22.4.2.1.2) and sprintf (cf. 22.4.2.2.2), respectively.
In C/C++ there is generally no difference in any stream input (besides details eg.: seeking).
Having C++ there are two distinguished ways of input:
Formatted Input
Unformatted Input
All formatted input operations involve an operator stream& operator >> (stream&, T). However, not all stream& operator >> (stream&, T) are performing formatted input (eg.: some are involving manipulators or a stream buffer)
Each formatted input starts with skipping white spaces and stops at the first character not being part of the input format (Note: It may be any character, it is not limited to white spaces).
Unformatted input reads all characters (does not ignore any white space) and stops if a requested amount of characters is retrieved or the stream reaches the end (EOF). Specialized functions (like std::getline) might stop early and ignore the delimiting condition character.

Why does string extraction from a stream set the eof bit?

Let's say we have a stream containing simply:
hello
Note that there's no extra \n at the end like there often is in a text file. Now, the following simple code shows that the eof bit is set on the stream after extracting a single std::string.
int main(int argc, const char* argv[])
{
std::stringstream ss("hello");
std::string result;
ss >> result;
std::cout << ss.eof() << std::endl; // Outputs 1
return 0;
}
However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&) is defined as behaving like a formatted input function. This means it constructs a sentry object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry construction completes with no problems. When converted to a bool, the sentry object gives true, so the extractor continues to get on with the actual extraction of the string.
The extraction is then defined as:
Characters are extracted and appended until any of the following occurs:
n characters are stored;
end-of-file occurs on the input sequence;
isspace(c,is.getloc()) is true for the next available input character c.
After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.5.5.4).
Nothing here actually causes the eof bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof bit should only be set if we do another ss >> result;, because when the sentry attempts to gobble up whitespace, the following situation will occur:
If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(failbit | eofbit)
However, this is definitely not happening yet because the failbit isn't being set.
The consequence of the eof bit being set is that the only reason the evil-idiom while (!stream.eof()) doesn't work when reading files is because of the extra \n at the end and not because the eof bit isn't yet set. My compiler is happily setting the eof bit when the extraction stops at the end of file.
So should this be happening? Or did the standard mean to say that setstate(eofbit) should occur?
To make it easier, the relevant sections of the standard are:
21.4.8.9 Inserters and extractors [string.io]
27.7.2.2 Formatted input functions [istream.formatted]
27.7.2.1.3 Class basic_istream::sentry [istream::sentry]
std::stringstream is a basic_istream and the operator>> of std::string "extracts" characters from it (as you found out).
27.7.2.1 Class template basic_istream
2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as
explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_-
base::failure (27.5.5.4), before returning.
Also, "extracting" means calling these two functions.
3 Two groups of member function signatures share common properties: the formatted input functions (or
extractors) and the unformatted input functions. Both groups of input functions are described as if they
obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use
other public members of istream.
So eof must be set.
Intuitively speaking, the EOF bit is set because during the read operation to extract the string, the stream did indeed hit the end of the file. Specifically, it continuously read characters out of the input stream, stopping because it hit the end of the stream before encountering a whitespace character. Accordingly, the stream set the EOF bit to mark that the end of stream was reached. Note that this is not the same as reporting failure - the operation was completed successfully - but the point of the EOF bit is not to report failure. It's to mark that the end of the stream was encountered.
I don't have a specific part of the spec to back this up, though I'll try to look for one when I get the chance.

How do I set EOF on an istream without reading formatted input?

I'm doing a read in on a file character by character using istream::get(). How do I end this function with something to check if there's nothing left to read in formatted in the file (eg. only whitespace) and set the corresponding flags (EOF, bad, etc)?
Construct an istream::sentry on the stream. This will have a few side effects, the one we care about being:
If its skipws format flag is set, and the constructor is not passed true as second argument (noskipws), all leading whitespace characters (locale-specific) are extracted and discarded. If this operation exhausts the source of characters, the function sets both the failbit and eofbit internal state flags
You can strip any amount of leading (or trailing, as it were) whitespace from a stream at any time by reading to std::ws. For instance, if we were reading a file from STDIN, we would do:
std::cin >> std::ws
Credit to this comment on another version of this question, asked four years later.
How do I end this function with something to check if there's nothing left to read in formatted in the file (eg. only whitespace)?
Whitespace characters are characters in the stream. You cannot assume that the stream will do intelligent processing for you. Until and unless, you write your own filtering stream.
By default, all of the formatted extraction operations (overloads of operator>>()) skip over whitespace before extracting an item -- are you sure you want to part ways with this approach?
If yes, then you could probably achieve what you want by deriving a new class, my_istream, from istream, and overriding each operator>>() to call the following method at the end:
void skip_whitespace() {
char ch;
ios_base old_flags = flags(ios_base::skipws);
*this >> ch; // Skips over whitespace to read a character
flags(old_flags);
if (*this) { // I.e. not at end of file and no errors occurred
unget();
}
}
It's quite a bit of work. I'm leaving out a few details here (such as the fact that a more general solution would be to override the class template basic_istream<CharT, Traits>).
istream is not going to help a lot - it functions as designed. However, it delegates the actual reading to streambufs. If your streambuf wrapper trims trailing whitespace, an istream reading from that streambuf won't notice it.