Canot read char8_t from basic_stringstream<char8_t>

Canot read char8_t from basic_stringstream<char8_t> - c++

I'm simply trying stringstream in UTF-8:
#include<iostream>
#include<string>
#include<sstream>
int main()
{
std::basic_stringstream<char8_t> ss(u8"hello");
char8_t c;
std::cout << (ss.rdstate() & std::ios_base::goodbit) << " " << (ss.rdstate() & std::ios_base::badbit) << " "
<< (ss.rdstate() & std::ios_base::failbit) << " " << (ss.rdstate() & std::ios_base::eofbit) << "\n";
ss >> c;
std::cout << (ss.rdstate() & std::ios_base::goodbit) << " " << (ss.rdstate() & std::ios_base::badbit) << " "
<< (ss.rdstate() & std::ios_base::failbit) << " " << (ss.rdstate() & std::ios_base::eofbit) << "\n";
std::cout << c;
return 0;
}
Compile using:
g++-9 -std=c++2a -g -o bin/test test/test.cpp
The result on screen is:
0 0 0 0
0 1 4 0
0
It seems that something goes wrong when reading c, but I don't know how to correct it. Please help me!

This is actually an old issue not specific to support for char8_t. The same issue occurs with char16_t or char32_t in C++11 and newer. The following gcc bug report has a similar test case.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88508
The issue is also discussed at the following:
GCC 4.8 and char16_t streams - bug?
Why does `std::basic_ifstream<char16_t>` not work in c++11?
http://gcc.1065356.n8.nabble.com/UTF-16-streams-td1117792.html
The issue is that gcc does not implicitly imbue the global locale with facets for ctype<char8_t>, ctype<char16_t>, or ctype<char32_t>. When attempting to perform an operation that requires one of these facets, a std::bad_cast exception is thrown from std::__check_facet (which is subsequently silently swallowed by the IOS sentry object created for the character extraction operator and which then sets badbit and failbit).
The C++ standard only requires that ctype<char> and ctype<wchar_t> be provided. See [locale.category]p2.

Related

Why does this stringstream fail when parsing into double?

I have the following code:
#include <string>
#include <iostream>
#include <sstream>
int main()
{
size_t x, y;
double a = std::stod("1_", &x);
double b = std::stod("1i", &y);
std::cout << "a: " << a << ", x: " << x << std::endl;
std::cout << "b: " << b << ", y: " << y << std::endl;
std::stringstream s1("1_");
std::stringstream s2("1i");
s1 >> a;
s2 >> b;
std::cout << "a: " << a << ", fail: " << s1.fail() << std::endl;
std::cout << "b: " << b << ", fail: " << s2.fail() << std::endl;
}
I want to parse a double and stop when an invalid character is hit. Here I try to parse "1_" and "1i", both of which should give me the double with value: 1.
here is my output:
a: 1, x: 1
b: 1, y: 1
a: 1, fail: 0
b: 0, fail: 1
So the stod function worked as expected, however the stringstream method did not. It makes no sense to me that 2 standard methods of parsing double, both in the standard library would give different results?
Why does the stringstream method fail when parsing: "1i"?
Edit:
this appears to give different results for some people. My compiler info is the following:
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Edit2:
Is this a bug with libc++, or is the specification just vague about what counts as valid parsing for a double?

This is a libc++ bug. Per [facet.num.get.virtuals] bullet 3.2, a character is only supposed to be accumulated if it is allowed as the next character of the input field of the conversion specifier determined in stage 1 (%g for double). Having accumulated 1, i is not allowed, so stage 2 should terminate.
libc++ is accumulating characters indiscriminately until it reaches a non-atom character (it also extended the atoms to include i, which is required to parse inf).

stringstream::seekp not functioning on Visual Studio 2015

I want to read a chunk of data from file into stringstream, which later will be used to parse the data (using getline, >>, etc). After reading the bytes, I set the buffer of the stringstream, but I cant make it to set the p pointer.
I tested the code on some online services, such as onlinegdb.com and cppreference.com and it works. However, on microsoft, I get an error - the pointers get out of order.
Here's the code, I replaced the file-read with a char array.
#include <sstream>
#include <iostream>
int main()
{
char* a = new char [30];
for (int i=0;i<30;i++)
a[i]='-';
std::stringstream os;
std::cout << "g " << os.tellg() << " p " << os.tellp() << std::endl;
os.rdbuf()->pubsetbuf(a,30);
os.seekp(7);
std::cout << "g " << os.tellg() << " p " << os.tellp() << std::endl;
}
the output I get when it works
g 0 p 0
g 0 p 7
the output I get on visual studio 2015
g 0 p 0
g -1 p -1
any ides?
thanks

std::sstream::setbuf may do nothing:
If s is a null pointer and n is zero, this function has no effect.
Otherwise, the effect is implementation-defined: some implementations do nothing, while some implementations clear the std::string member currently used as the buffer and begin using the user-supplied character array of size n, whose first element is pointed to by s, as the buffer and the input/output character sequence.
You are better off using the std::stringstream constructor to set the data or call str():
#include <sstream>
#include <iostream>
int main()
{
std::string str( 30, '-' );
std::stringstream os;
std::cout << "g " << os.tellg() << " p " << os.tellp() << std::endl;
os.str( str );
os.seekp(7);
std::cout << "g " << os.tellg() << " p " << os.tellp() << std::endl;
}

C++ single quotes syntax

I am learning C++ and just started reading "Programming Principles and Practice" by Bjarne Stroustrup and he uses this code to illustrate a point:
#include "std_lib_facilities.h"
using namespace std;
int main() // C++ programs start by executing the function main
{
char c = 'x';
int i1 = c;
int i2 = 'x';
char c2 = i1;
cout << c << ' << i1 << ' << c2 << '\n';
return 0;
}
I am familiar in general with the difference between double and single quotes in the C++ world, but would someone kindly explain the construction and purpose of the section ' << i1 << '
Thanks

cout << c << ' << i1 << ' << c2 << '\n';
appears to be a typo in the book. I see it in Programming Principles and Practice Using C++ (Second Edition) Second printing. I do not see it listed in the errata.
According to the book, the intended output is
x 120 x
But what happens here is ' << i1 << ' attempts to compress the << i1 << to a multi-byte character and prints out an integer (most likely 540818464-> 0x203C3C20 -> ASCII values of ' ', '<', '<', ' ') because cout doesn't know wide characters. You'd need wcout for that. End result is output something like
x540818464x
and a warning or two from the compiler because while it's valid C++ code, it's almost certainly not what you want to be doing.
The line should most likely read
cout << c << ' ' << i1 << ' ' << c2 << '\n';
which will output the expected x 120 x
In other words, Linker3000, you are not crazy and not misunderstanding the example code.
Anyone know who I should contact to log errata or get a clarification on the off chance there is some top secret sneakiness going way over my head?

Before answering your question, here is a little background on what that is actually doing. Also note that there is a typo in the example, the string constant should have been double quoted:
cout << c << " << i1 << " << c2 << "\n";
In C++, operators can be overloaded so that they mean different things with different functions. In the case of cout, the << operator is overloaded as the "Insertion Operator". Think of it as taking the operand on the right, and inserting it (or sending it) into the operator on the left.
For example,
cout << "Hello World";
This takes the string "Hello World", and sends it to cout for processing.
So what beginners do not get is what something like this means:
cout << "Hello" << " World";
This is doing the same thing, but the operator precedence says to perform the injections from left to right. To make this work, the cout object returns itself as a function return value. Why is this important? Because the above statement is actually two separate operator evaluations:
(cout << "Hello") << " World";
This first injects "Hello" to cout, which outputs it, then continues to evaluate the next inject operator. Because cout returns itself, after the (cout << "Hello") is executed you have the following still to be evaluated:
cout << " World";
This expression injects " World" into the cout object, which then outputs " World", with the net effect being that you see "Hello World" just like the first time.
So in your example, what is it doing?
cout << c << " << i1 << " << c2 << "\n";
This is evaluated left to right as follows:
((((cout << c) << " << i1 << ") << c2) << "\n"); => Outputs value of c
((((cout ) << " << i1 << ") << c2) << "\n"); => Outputs string " << i1 << "
((( cout ) << c2) << "\n"); => Outputs value of c2
(( cout ) << "\n"); => Outputs newline character
( cout ); => No more output
Expression completes and returns the cout object as the expression value.
Assuming c='x' and c2='x', the final output from this expression is the following character string output on a single line:
x << i1 << x
For beginners, all those insertion operators << look a little strange. It is because you are dealing with objects. You could build the string up as a complete formatted object before injecting it into cout, and while that make the cout expression look simpler, we do not do that in C++ because it makes your code more complex and error prone. Note also, there is nothing special about the cout object. If you wanted to output to the standard error stream, you would use cerr instead. If you wanted to output to a file, your would instantiate a stream object that outputs to the desired file. That rest of the code in your example would be the same.
In C, the same thing would be done procedurally using a format string:
printf("%d << i1 << %d\n", i1, c2);
This is allowed in C++ too, because C++ is a superset of C. Many C++ programmers still use this output method, but that is because those programmers learned C first, and may not have fully embraced the object oriented nature of C++
Note that you may also have seen the << operator in the context of mathematical expressions like:
A = A << 8;
In this case, the << operator is the bitwise rotate operation. It has nothing to do with output to cout. It will rotate the bits in A to the left by eight bits.

std::cout gives different output from qDebug

I am using Qt, and I have an unsigned char *bytePointer and want to print out a number-value of the current byte. Below is my code, which is meant to give the int-value and the hex-value of the continuous bytes that I receive from a machine attached to the computer:
int byteHex=0;
byteHex = (int)*bytePointer;
qDebug << "\n int: " //this is the main issue here.
<< *bytePointer;
std::cout << " (hex: "
<< std::hex
<< byteHex
<< ")\n";
}
This gives perfect results, and I get actual numbers, however this code is going into an API and I don't want to use Qt-only functions, such as qDebug. So when I try this:
int byteHex=0;
byteHex = (int)*bytePointer;
std::cout << "\n int: " //I changed qDebug to std::cout
<< *bytePointer;
std::cout << " (hex: "
<< std::hex
<< byteHex
<< ")\n";
}
The output does give the hex-values perfectly, however the int-values return symbols (like ☺, └, §, to list a few).
My question is: How do I get std::cout to give the same output as qDebug?
EDIT: for some reason the symbols only occur with a certain Qt setting. I have no idea why it happened but it's fixed now.

As others pointed out in comment, you change the outputting to hex, but you do not actually set it back here:
std::cout << " (hex: "
<< std::hex
<< byteHex
<< ")\n";
You will need to apply this afterwards:
std::cout << std::dec;

Standard output streams will output any character type as a character, not a numeric value. To output the numeric value, convert to a non-character integer type:
std::cout << int(*bytePointer);

Another istream discrepancy between libstdc++ and libc++

This simple code:
#include <iostream>
#include <sstream>
int main()
{
float x = 0.0;
std::stringstream ss("NA");
ss >> x;
std::cout << ( ss.eof() ? "is" : "is not") << " at eof; x is " << x << std::endl;
}
returns different results, depending on which library I choose (I'm running clang from xcode 5 on OSX 10.9):
clang++ -std=c++11 -stdlib=libstdc++ -> not at eof
clang++ -stdlib=libstdc++ -> not at eof
/usr/bin/g++-4.2 -> not at eof
clang++ -std=c++11 -stdlib=libc++ -> at eof
clang++ -stdlib=libc+ -> at eof
It seems to me that if I try to read an alphabetic character into a float, the operation should fail, but it shouldn't eat the invalid characters - so I should get fail() but not eof(), so this looks like a bug in libc++.
Is there a c++ standard somewhere that describes what the behavior should be?
p.s. I've expanded the original test program like so:
#include <iostream>
#include <sstream>
int main()
{
float x = 0.0;
std::stringstream ss("NA");
ss >> x;
std::cout << "stream " << ( ss.eof() ? "is" : "is not") << " at eof and " << (ss.fail() ? "is" : "is not") << " in fail; x is " << x << std::endl;
if (ss.fail())
{
std::cout << "Clearing fail flag" << std::endl;
ss.clear();
}
char c = 'X';
ss >> c;
std::cout << "Read character: \'" << c << "\'" << std::endl;
}
and here's what I see:
With libc++:
stream is at eof and is in fail; x is 0
Clearing fail flag
Read character: 'X'
with stdlibc++:
stream is not at eof and is in fail; x is 0
Clearing fail flag
Read character: 'N'
p.p.s. As described in the issue that n.m. links to, the problem doesn't occur if the stringstream is set to "MA" instead of "NA". Apparently libc++ starts parsing the string, thinking it's going to get "NAN" and then when it doesn't, it gets all upset.

In this case, you should get fail, but not eof. Basically, if,
after failure, you can clear the error and successfully do
a get() of a character, you should not get eof. (If you
cannot successfully read a character after clear(), whether
you get eof or not may depend on the type you were trying to
read, and perhaps in some cases, even on the implementation.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Canot read char8_t from basic_stringstream<char8_t> - c++

Related

Why does this stringstream fail when parsing into double?

stringstream::seekp not functioning on Visual Studio 2015

C++ single quotes syntax

std::cout gives different output from qDebug

Another istream discrepancy between libstdc++ and libc++

Categories

Resources