Consider code like this:
#include <iostream>
#include <sstream>
int main()
{
std::stringstream ss;
ss << "a0 b1";
uint8_t byte;
ss >> std::hex >> byte;
std::cout << std::hex << byte << std::endl;
return 0;
}
Why does this output a instead of a0 even though a0 fits in uint8_t as hex?
Because uint8_t is also (probably) unsigned char, for which special rules exist when you perform formatted extraction from a C++ stream.
Unfortunately this is just an alias, not a distinct type.
Basically it's skipping the "lexically convert to a number" step because it thinks you want to pull out a character. The character 'a'.
I think you'll want to read into an unsigned int then downsize if needed.
If you do downsize to a uint8_t, you're also going then to have to promote it back to a larger int (lol) for much the same reason, to trigger serialisation.
(live demo)
To be honest I'd just avoid the small fixed-width types when dealing with streams (unless you're doing unformatted work with read() and write()). It's too easy to forget about this problem.
Related
I'm trying to use std::cin after a while.
Using uint8_t or unsigned char:
unsigned char data;
std::cin >> std::dec >> data;
Whatever std::dec is used or not, I get the first ASCII character I type.
If I type 12, data is 0x31 not 12. Why can't it parse number until 255 to be stored in a char?
int data;
std::cin >> std::dec >> data;
gives correctly data=12/0xC not 0x31
Why?
Using char[N] with std::hex
char data[128];
std::cin >> std::hex >> data;
Also gets the ASCII characters instead of the hexadecimal.
Writting 0x010203040506... data is 0xFFFFFFFFF...
Isn't std::cin>>std::hex able to parse the string I type into hexadecimal automatically?
In short:
cin >> charVar scans a single character from stdin
cin >> intVar scans characters from stdin until a non-numeric character is entered
Explaining your observation:
A char variable can store a single ASCII character.
When you type 12, only the character 1 is scanned.
The ASCII code of the character 1 is 0x31.
std::dec and std::hex affect the format of integers.
But as far as the streaming operators are concerned, char and its variants (including uint8_t aren't integers, they're single characters. They will always read a single character, and never parse an integer.
That's just how these functions are defined. There is no way around it. If you want an integer with a limited range, first read into an int (or other integer type that is not a char variant), and then range-check afterwards. You can, if you want, cast it to a small type afterwards, but you probably shouldn't. char types are awkward to work with numerically.
Similarly, reading into an array of char reads a string. (Also, never do that without using setw() to limit the length to fit in the buffer you have. Better yet, use std::string instead.) That's just how it's defined.
My understanding is that reading a uint8_t from a stringstream is a problem because the stringstream will interpret the uint8_t as a char. I would like to know how I can read a uint8_t from a stringstream as a numeric type. For instance, the following code:
#include <iostream>
#include <sstream>
using namespace std;
int main()
{
uint8_t ui;
std::stringstream ss("46");
ss >> ui;
cout << unsigned(ui);
return 0;
}
prints out 52. I would like it to print out 46.
EDIT: An alternative would to just read a string from the stringstream and then convert the solution to uint8_t, but this breaks the nice chaining properties. For example, in the actual code I have to write, I often need something like this:
void foobar(std::istream & istream){
uint8_t a,b,c;
istream >> a >> b >> c;
// TODO...
}
You can overload the input operator>> for uint8_t, such as:
std::stringstream& operator>>(std::stringstream& str, uint8_t& num) {
uint16_t temp;
str >> temp;
/* constexpr */ auto max = std::numeric_limits<uint8_t>::max();
num = std::min(temp, (uint16_t)max);
if (temp > max) str.setstate(std::ios::failbit);
return str;
}
Live demo: https://wandbox.org/permlink/cVjLXJk11Gigf5QE
To say the truth I am not sure whether such a solution is problem-free. Someone more experienced might clarify.
UPDATE
Note that this solution is not generally applicable to std::basic_istream (as well as it's instance std::istream), since there is an overloaded operator>> for unsigned char: [istream.extractors]. The behavior will then depend on how uint8_t is implemented.
Please do not use char or unsigned char(uint8_t) if you want to read in a formatted way. Your example code and its result is an expected behavior.
As we can see from https://en.cppreference.com/w/cpp/io/basic_istream/operator_gtgt2
template< class Traits >
basic_istream<char,Traits>& operator>>( basic_istream<char,Traits>& st, unsigned char& ch );
This does "Performs character input operations".
52 is an ascii code for '4'. Which means that the stringstream has read only one byte and still ready to read '6'.
So if you want work in the desired way, you should use 2-byte or bigger integer types for sstream::operator>> then cast it to uint8_t - the exact way that you self-answered.
Here's a reference for those overloads.
https://en.cppreference.com/w/cpp/io/basic_istream/operator_gtgt
After much back and forth, the answer seems to be that there is no standard way of doing this. The options are to either read off the uint8_t as either a uint16_t or std::string, and then convert those values to uint8_t:
#include <iostream>
#include <sstream>
using namespace std;
int main()
{
uint8_t ui;
uint16_t tmp;
std::stringstream ss("46");
ss >> tmp;
ui = static_cast<uint8_t>(tmp);
cout << unsigned(ui);
return 0;
}
However, such a solution disregards range checking. So you will need to implement that yourself if you need it.
Today I've discovered that the following compiles and prints 42:
#include <iostream>
#include <sstream>
int main()
{
std::stringstream s;
s << 42;
char c[8];
s >> c;
std::cout << c;
}
But this is a potential buffer overflow attack, right? If we are reading from the user-supplied stream, we can't easily know the size of the data and therefore can't allocate enough storage. std::gets was removed, maybe this should be too?
Well, you can prevent buffer overflow in this case by writing:
s >> setw(sizeof c) >> c;
So I think it is more akin to the case of fgets, which can be used to shoot yourself in the foot, but can also be used correctly and it is a perfectly viable option when used correctly.
I expect there is still enough live code that uses this overload of operator>> that it's not really viable to deprecate it, e.g.:
void func(char *buf, size_t buf_len)
{
std::cin >> setw(buf_len) >> buf;
}
But for writing new code my advice would be to avoid using arrays entirely (C-style arrays, that is). Instead use std::string, or std::array, or other such containers which are harder to cause buffer overflows on.
How can I effectively insert data from one stream into another stream of different type?
I have tried the following:
#include <iostream>
#include <sstream>
using namespace std;
int main(void)
{
basic_stringstream<unsigned short> uss;
stringstream cs;
unsigned short val = 0xffff;
uss.write(&val, 1); // write value to 'uss'
uss.read(&val, 1); // read data from 'uss' into 'val'
cout << hex << val << endl; // gives 0xffff
cs << uss.rdbuf(); // copy 'uss' contents into 'cs'
cs.read((char*) &val, 2); // read data from 'cs' into 'val'
cout << hex << val << endl; // gives 0x3030 ?
return 0;
}
First, as noted in this question, you can't instantiate basic_strings and streams with types like unsigned short without writing a hell lot of custom template specializations.
Second, this line
cs << uss.rdbuf();
doesn't do what you think it does. basic_ostream's operator<< that takes a basic_streambuf is
basic_ostream<charT,traits>& operator<< (basic_streambuf<char_type,traits>* sb);
where char_type is a typedef for charT. In other words, the character types must match.
In your case, they don't match, so you end up calling operator<<(const void *) instead, and just printing out the address. When I tested this on coliru, it printed out 7830 instead, for the characters 0x.
I have a string containing hexadecimal values (two characters representing a byte). I would like to use std::stringstream to make the conversion as painless as possible, so I came up with the following code:
std::string a_hex_number = "e3";
{
unsigned char x;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> x;
std::cout << x << std::endl;
}
To my biggest surprise this prints out "e" ... Of course I don't give up so easily, and I modify the code to be:
{
unsigned short y;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> y;
std::cout << y << std::endl;
}
This, as expected, prints out 227 ...
I looked at http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/ and http://www.cplusplus.com/reference/ios/hex/ but I just could not find a reference which tells me more about why this behaviour comes ...(yes, I feel that it is right because when extracting a character it should take one character, but I am a little bit confused that std:hex is ignored for characters). Is there a mention about this situation somewhere?
(http://ideone.com/YHt7Fz)
Edit I am specifically interested if this behaviour is mentioned in any of the STL standards.
If I understand correctly, you're trying to convert a string in
hex to an unsigned char. So for starters, since this is
"input", you should be using std::istringstream:
std::istringstream ss( a_hex_number );
ss >> std::hex >> variable;
Beyond that, you want the input to parse the string as an
integral value. Streams do not consider character types as
numeric values; they read a single character into them (after
skipping leading white space). To get a numeric value, you
should input to an int, and then convert that to unsigned
char. Characters don't have a base, so std::hex is
irrelevant for them. (The same thing holds for strings, for
example, and even for floating point.)
With regards to the page you site: the page doesn't mention
inputting into a character type (strangely enough, because it
does talk about all other types, including some very special
cases). The documentation for the std::hex manipulator is
also weak: in the running text, it only says that "extracted
values are also expected to be in hexadecimal base", which isn't
really correct; in the table, however, it clearly talks about
"integral values". In the standard, this is documented in
ยง27.7.2.2.3. (The >> operators for character types are not
member functions, but free functions, so are defined in
a different section.) What we are missing, however, is a good
document which synthesizes these sort of things: whether the
>> operator is a member or a free function doesn't really
affect the user much; you want to see all of the >> available,
with their semantics, in one place.
Let's put it simple: variable type is 'stronger' than 'hex'. That's why 'hex' is ignored for 'char' variable.
Longer story:
'Hex' modifies internal state of stringstream object telling it how to treat subsequent operations on integers. However, this does not apply to chars.
When you print out a character (i.e. unsigned char), it's printed as a character, not as a number.