Inconsistent behavior when parsing numbers from stringstream on different platforms

Inconsistent behavior when parsing numbers from stringstream on different platforms - c++

In a project i'm using a stringstream to read numeric values using the operator>>. I'm now getting reports indicating that the parsing bahaviour is inconsistent across different platforms if additional characters are appended to the number (for instance "2i"). Compiling the sample below with GCC/VCC/LLVM on Linux results in:
val=2; fail=0
Compiling and running it on iOS with either GCC or LLVM reportedly yields:
val=0; fail=1
What does the standard say about the behavior of operator>> in such a case?
--- Sample Code ---------------------------------------------
#include <sstream>
#include <iostream>
int main(int argc, const char **args)
{
double val;
std::stringstream ss("2i");
ss >> val;
std::cout << "val=" << val << "; fail=" << ss.fail() << std::endl;
return 0;
}

According to this reference:
Thus, in either case:
if your compiler is pre C++11 and reading fails it would leave the value of val intact and flag failbit with 0.
if you compiler is post C++11 and reading fails it would set the value of val equal to 0 and flag failbit with 0.
However, operator>>extracts and parses characters sequentially with function num_get::get [27.7.2.2.2 Arithmetic extractors], from the stream as long as it can interpret them as the representation of a value of the proper type.
Thus, in your case operator>> will call num_get::get for the first character (i.e., 2) and the reading will succeed, then it will move on to read the next character (i.e., i). i doesn't fit a numerical value and consequently num_get::get will fail and reading will stop. However, there are already valid characters been read. These valid characters will be processed and assigned to val, the rest of the characters will remain in the stringstream. To illustrate this I'll give an example:
#include <sstream>
#include <iostream>
#include <string>
int main(int argc, const char **args)
{
double val(0.0);
std::stringstream ss("2i");
ss >> val;
std::cout << "val=" << val << "; fail=" << ss.fail() << std::endl;
std::string str;
ss >> str;
std::cout << str << std::endl;
return 0;
}
Output:
val=2; fail=0
i
You see that if I use extract operator again to a std::string, the character i is extracted.
The above however, doesn't explain why you don't get the same behaviour in ios.

This is a known bug with libc++ that was submitted to Bugzilla. The problem as I see it is with std::num_get::do_get()'s double overload somehow continuing to parse the characters a, b, c, d, e, f, i, x, p, n and their captial equivalents despite those being invalid characters for an integral type (other than e where it denotes scientific notation but must be followed by a numeric value otherwise failure). Normally do_get() would stop when it finds an invalid character and not set failbit as long as characters were sucessfully extracted (as explained above).

Related

C++ - How to recover istream if self defined extractor fails

I need a self defined extractor (operator>>) to read a specific string
into my own datatype.
The problem is that the requirements for the string are large.
Hence the easiest way is probably to read the whole string from the istream
and then check if all requirements are fulfilled.
My Problem is if the string is not valid.
Up to my knowledge it is common in C++ that the stream is unchanged.
What is best practice to recover the istream in this case?
Is the exception handling in the following example enough?
std::istream& operator>>(std::istream& is, Foo& f)
{
std::string str;
if (is >> str)
{
// check if string is valid
if ( is_valid( str ) )
{
// set new values in f
}
else
{
// recover stream
std::for_each(str.rbegin(), str.rend(),
[&] (char c)
{
is.putback(c);
});
// ste failbit
is.clear(std::ios_base::failbit);
}
}
return is;
}
And what about std::getline() instead of is >> str ? Are there other pitfalls?
Thanks
Marco

You can't get streams back to the initial position where you started reading, at least not in general. In theory, you can put back characters or seek to a location where you had been before but many stream buffers don't support putting back characters or seeking. The standard library gives some limited guidance but it deals with rather simple types, e.g., integers: the characters are read as long as the format matches and it stops just there. Even if the format matches, there may be some errors which could have been detected earlier.
Here is a test program demonstrating the standard library behavior:
#include <iostream>
#include <sstream>
void test(std::string const& input)
{
std::istringstream in(input);
int i;
std::string tail;
bool result(in >> i);
in.clear();
std::getline(in, tail);
std::cout << "input='" << input << "' "
<< "fail=" << std::boolalpha << result << " "
<< "tail='" << tail << "'\n";
}
int main()
{
test("10 y");
test("-x y");
test("0123456789 x");
test("123456789012345678901234567890 x");
}
Just to explain the four test cases:
Just to make sure the test does what it is meant to do, the first input is actually OK and there is no problem.
The second input starts with a character matching the format followed by something not matching and reading stops right after the '-' character.
The third test reads an int using octal numbers. The failure could have been detected upon the character '8' but both the '8' and the '9' are consumed and the input fails.
The last example results in an overflow which could be detected before all digits are read but still all digits are read.
Based on that, I'd think there wouldn't be an expectation to reset the stream to the original position when semantics checks on a well-formed input fail.

C++ c_str() doesn't return complete string

I'm doing a C++ assignment that requires taking user input of an expression (eg: 2 * (6-1) + 2 ) and outputting the result. Everything works correctly unless a space is encountered in the user input.
It is a requirement to pass the user input to the following method;
double Calculate(char* expr);
I'm aware the issue is caused by c_str() where the space characters act as a terminating null byte, though I'm not sure how to overcome this problem.
Ideally I'd like to preserve the space characters but I'd settle for simply removing them, as a space serves no purpose in the expression. I get the same result when using string::data instead of c_str.
int main(int argc, char **argv)
{
string inputExpr;
Calc myCalc;
while(true) {
cin >> inputExpr;
if(inputExpr == "q") break;
cout << "You wrote:" << (char*)inputExpr.c_str() << endl; // debug
printf("Result: %.3f \n\n", myCalc.Calculate( (char*)temp.c_str() ) );
}
return 0;
}

c_str works just fine. Your problem is cin >> inputExpr. The >> operator only reads until the next space, so you do not read your equation fully.
What you want to use is std::getline:
std::getline (std::cin,inputExpression);
which will read until it reaches a newline character. See the function description if you need a specific delimiter.

Problem is not with inputExpr.c_str() and c_str as such, c_str() returns pointer to a character array that contains a null-terminated sequence. While reading through cin, you get space or tab etc separating as multiple strings. Check with the content of the string that way to solve the intended operation

First, I think your Calculate() method should take as input a const char* string, since expr should be an input (read-only) parameter:
double Calculate(const char* expr);
Note that if you use const char*, you can simply call std::string::c_str() without any ugly cast to remove const-ness.
And, since this is C++ and not C, using std::string would be nice:
double Calculate(const std::string& expr);
On the particular issue of reading also whitespaces, this is not a problem of terminating NUL byte: a space is not a NUL.
You should just change the way you read the string, using std::getline() instead of simple std::cin >> overload:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string line;
getline(cin, line);
cout << "'" << line << "'" << endl;
}
If you compile and run this code, and enter something like Hello World, you get the whole string as output (including the space separating the two words).

Why does std::operator>>(istream&, char&) extract whitespace?

I was compiling the following program and I learned that the extractor for a char& proceeds to extract a character even if it is a whitespace character. I disabled the skipping of leading whitespace characters expecting the proceeding read attempts to fail (because formatted extraction stops at whitespace), but was surprised when it succeeded.
#include <iostream>
#include <sstream>
int main()
{
std::istringstream iss("a b c");
char a, b, c;
iss >> std::noskipws;
if (iss >> a >> b >> c)
{
std::cout << "a = \"" << a
<< "\"\nb = \"" << b
<< "\"\nc = \"" << c << '\n';
}
}
Output:
a = "a"
b = " "
c = "b"
As you can see from the output, b was given the value of the space between "a" and "b"; and c was given the following character "b". I was expecting both b and c to not have a value at all since the extraction should fail because of the leading whitespace. What is the reason for this behavior?

In IOStreams, characters have virtually no formatting requirements. Any and all characters in the character sequence are valid candidates for an extraction. For the extractors that use the numeric facets, extraction is defined to stop at whitespace. However, the extractor for charT& works directly on the buffer, indiscriminately returning the next available character presumably by a call to rdbuf()->sbumpc().
Do not assume that this behavior extends to the extractor for pointers to characters as for them extraction is explicitly defined to stop at whitespace.

Stringstream don't copy new lines

Special characters disappear when I pass a string into a stringstream.
I tried this code which can directly be tested:
#include <iostream>
#include <sstream>
using namespace std;
int main(int argc, char* argv[]) {
string txt("hehehaha\n\t hehe\n\n<New>\n\ttest:\t130\n\ttest_end:\n<New_end>\n");
cout << txt << endl; // No problem with new lines and tabs
stringstream stream;
stream << txt;
string s;
while(stream >> s) {
cout << s; // Here special characters like '\n' and '\t' don't exist anymore.
}
cout << "\n\n";
return 0;
}
What can I do to overcome this?
Edit: I tried this:
stream << txt.c_str();
and it worked. But I don't know why...

basically, you are just printing it wrong, it should be:
cout << stream.str() << endl;
Some details. You are calling operator<<(string) which
overloads operator<< to behave as described in ostream::operator<<
for c-strings
The referred to behaviour is explained here:
(2) character sequence Inserts the C-string s into os. The terminating
null character is not inserted into os. The length of the c-string is
determined beforehand (as if calling strlen).
Strlen documentation says that the result is affected by nothing but
the terminating null-character
Indeed, strlen(tmp) in your examples outputs 55.
The stream, hence, gets "assigned" everything which comes up to the 55th character in your input string.
cout << stream.str() << endl;
will show you that this is indeed what happens.
A parenthesis: you can modify the behaviour of the stream << txt line by means of setting/unsetting flags, as in
stream.unsetf ( std::ios::skipws );
which you should try out.

The statement
while(stream >> s)
Is the problem, it gives you one token on each call, using white spaces for splitting and therefor ignoring them.

C++ iostream >> operator behaves differently than get() unsigned char

I was working on a piece of code to do some compression, and I wrote a bitstream class.
My bitstream class kept track of the current bit we are reading and the current byte (unsigned char).
I noticed that reading the next unsigned character from the file was done differently if I used the >> operator vs get() method in the istream class.
I was just curious why I was getting different results?
ex:
this->m_inputFileStream.open(inputFile, std::ifstream::binary);
unsigned char currentByte;
this->m_inputFileStream >> currentByte;
vs.
this->m_inputFileStream.open(inputFile, std::ifstream::binary);
unsigned char currentByte;
this->m_inputFileStream.get((char&)currentByte);
Additional Info:
To be specific the byte I was reading was 0x0A however when using >> it would read it as 0x6F
I'm not sure how they're even related ? (they're not the 2s complement of each other?)
The >> operator is also defined to work for unsigned char as well however (see c++ istream class reference

operator>> is for formatted input. It'll read "23" as an integer if you stream it into an int, and it'll eat whitespace between tokens. get() on the other hand is for unformatted, byte-wise input.

If you aren't parsing text, don't use operator>> or operator<<. You'll get weird bugs that are hard to track down. They are also resilient to unit tests, unless you know what to look for. Reading a uint8 for instance will fail on 9 for instance.
edit:
#include <iostream>
#include <sstream>
#include <cstdint>
void test(char r) {
std::cout << "testing " << r << std::endl;
char t = '!';
std::ostringstream os(std::ios::binary);
os << r;
if (!os.good()) std::cout << "os not good" << std::endl;
std::istringstream is(os.str(), std::ios::binary);
is >> t;
if (!is.good()) std::cout << "is not good" << std::endl;
std::cout << std::hex << (uint16_t)r
<< " vs " << std::hex << (uint16_t)t << std::endl;
}
int main(int argc, char ** argv) {
test('z');
test('\n');
return 0;
}
produces:
testing z
7a vs 7a
testing
is not good
a vs 21
I suppose that would never have been evident a priori.

C++'s formatted input (operator >>) treats char and unsigned char as a character, rather than an integer. This is a little annoying, but understandable.
You have to use get, which returns the next byte, instead.
However, if you open a file with the binary flag, you should not be using formatted I/O. You should be using read, write and related functions. Formatted I/O won't behave correctly, as it's intended to operate on text formats, not binary formats.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Inconsistent behavior when parsing numbers from stringstream on different platforms - c++

Related

C++ - How to recover istream if self defined extractor fails

C++ c_str() doesn't return complete string

Why does std::operator>>(istream&, char&) extract whitespace?

Stringstream don't copy new lines

C++ iostream >> operator behaves differently than get() unsigned char

Categories

Resources