A story of stringstream, hexadecimal and characters - c++

I have a string containing hexadecimal values (two characters representing a byte). I would like to use std::stringstream to make the conversion as painless as possible, so I came up with the following code:
std::string a_hex_number = "e3";
{
unsigned char x;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> x;
std::cout << x << std::endl;
}
To my biggest surprise this prints out "e" ... Of course I don't give up so easily, and I modify the code to be:
{
unsigned short y;
std::stringstream ss;
ss << std::hex << a_hex_number;
ss >> y;
std::cout << y << std::endl;
}
This, as expected, prints out 227 ...
I looked at http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/ and http://www.cplusplus.com/reference/ios/hex/ but I just could not find a reference which tells me more about why this behaviour comes ...(yes, I feel that it is right because when extracting a character it should take one character, but I am a little bit confused that std:hex is ignored for characters). Is there a mention about this situation somewhere?
(http://ideone.com/YHt7Fz)
Edit I am specifically interested if this behaviour is mentioned in any of the STL standards.

If I understand correctly, you're trying to convert a string in
hex to an unsigned char. So for starters, since this is
"input", you should be using std::istringstream:
std::istringstream ss( a_hex_number );
ss >> std::hex >> variable;
Beyond that, you want the input to parse the string as an
integral value. Streams do not consider character types as
numeric values; they read a single character into them (after
skipping leading white space). To get a numeric value, you
should input to an int, and then convert that to unsigned
char. Characters don't have a base, so std::hex is
irrelevant for them. (The same thing holds for strings, for
example, and even for floating point.)
With regards to the page you site: the page doesn't mention
inputting into a character type (strangely enough, because it
does talk about all other types, including some very special
cases). The documentation for the std::hex manipulator is
also weak: in the running text, it only says that "extracted
values are also expected to be in hexadecimal base", which isn't
really correct; in the table, however, it clearly talks about
"integral values". In the standard, this is documented in
§27.7.2.2.3. (The >> operators for character types are not
member functions, but free functions, so are defined in
a different section.) What we are missing, however, is a good
document which synthesizes these sort of things: whether the
>> operator is a member or a free function doesn't really
affect the user much; you want to see all of the >> available,
with their semantics, in one place.

Let's put it simple: variable type is 'stronger' than 'hex'. That's why 'hex' is ignored for 'char' variable.
Longer story:
'Hex' modifies internal state of stringstream object telling it how to treat subsequent operations on integers. However, this does not apply to chars.

When you print out a character (i.e. unsigned char), it's printed as a character, not as a number.

Related

Input number to variable with type char in C++

In C++, how can I input a number to unsigned char variable? In C, I can accept the input using %hhu format specifier:
unsigned char var_name;
scanf("%hhu", &var_name);
//lets say I inputted 27
printf("%hhu", var_name);
//the output is 27
How can I do that in C++? The code below is my attempt to do this in C++, but it does a wrong thing. How can I write equivalent code in C++?
unsigned char var_name;
std::cin >> var_name;
//Input 27 again
std::cout << var_name;
//The output is just 2, how can I make the '7' appear?
This happens because when reading an unsigned char from std::istream, a character is read. That's just what happens, that's how std::istream works. It also makes a lot of sense, because it's quite common to want to read a single character.
The trivial solution is to use a temp variable:
unsigned char var_name;
unsigned int tmp;
std::cin >> tmp; // input 27
// optionally add checking that tmp is small enough
var_name = tmp; // truncation of unsigned ints is well defined
std::cout << var_name; // should print 27
You are only getting a 2 when printing the variable because of cin.
The maximum size of an unsigned char in C++ is usually 8 bits, which is fine for any actual character, and any digit up to 255. However this number depends on the compiler and the system. The maximum value that can be stored is in the header, as UCHAR_MAX.
Your issue here is that you are using cin, which only ever reads the first 'character' of an input if it is storing that input as a char. There are several ways around this, including taking the input as an integer and then converting to a char, or making your program work with an integer.
Hope this helps :)
When var_name is of type unsigned char, then the line
std::cin >> var_name;
is similar to
std::scanf("%c", &var_name);
i.e. C++ will assume that you want to read a single character and write the character code into var_name.
If you instead want to read a number and write that number into var_name, then you cannot use the data type char or unsigned char when using operator >>, even if the data type is technically able to represent the desired range of values. Instead, you will first have to use a variable with a larger data type, such as unsigned short, for reading the number. Afterwards, you can assign it to another variable of type unsigned char:
unsigned char var_name;
unsigned short temp;
std::cin >> temp;
if ( std::cin )
{
var_name = static_cast<unsigned char>( temp );
std::cout << var_name << '\n';
}
else
{
//TODO: handle error
}
The static_cast is not necessary, but some compilers may emit a warning due to the truncation, which will probably be suppressed by the cast. Also, using the cast makes the code more readable, because it becomes obvious that the value is being truncated.
However, I generally do not recommend that you use operator >> for user input, because it will do strange things, such as
not always read one line of input at a time, and
accept garbage such as "6sdfj23jlj" as valid input for the number 6, although the input should probably be rejected in this case.
If you want to read a number from the user with proper input validation, I recommend that you take a look at my function get_int_from_user in this answer of mine to another question.

Sign & Unsigned Char is not working in C++

In C++ Primer 5th Edition I saw this
when I tried to use it---
At this time it didn't work, but the program's output did give a weird symbol, but signed is totally blank And also they give some warnings when I tried to compile it. But C++ primer and so many webs said it should work... So I don't think they give the wrong information did I do something wrong?
I am newbie btw :)
But C++ primer ... said it should work
No it doesn't. The quote from C++ primer doesn't use std::cout at all. The output that you see doesn't contradict with what the book says.
So I don't think they give the wrong information
No1.
did I do something wrong?
It seems that you've possibly misunderstood what the value of a character means, or possibly misunderstood how character streams work.
Character types are integer types (but not all integer types are character types). The values of unsigned char are 0..255 (on systems where size of byte is 8 bits). Each2 of those values represent some textual symbol. The mapping from a set of values to a set of symbols is called a "character set" or "character encoding".
std::cout is a character stream. << is stream insertion operator. When you insert a character into a stream, the behaviour is not to show the numerical value. Instead, the behaviour to show the symbol that the value is mapped to3 in the character set that your system uses. In this case, it appears that the value 255 is mapped to whatever strange symbol you saw on the screen.
If you wish to print the numerical value of a character, what you can do is convert to a non-character integer type and insert that to the character stream:
int i = c;
std::cout << i;
1 At least, there's no wrong information regarding your confusion. The quote is a bit inaccurate and outdated in case of c2. Before C++20, the value was "implementation defined" rather than "undefined". Since C++20, the value is actually defined, and the value is 0 which is the null terminator character that signifies end of a string. If you try to print this character, you'll see no output.
2 This was bit of a lie for simplicity's sake. Some characters are not visible symbols. For example, there is the null terminator charter as well as other control characters. The situation becomes even more complex in the case of variable width encodings such as the ubiquitous Unicode, where symbols may consist of a sequence of several char. In such encoding, and individual char cannot necessarily be interpreted correctly without other char that are part of such sequence.
3 And this behaviour should feel natural once you grok the purpose of character types. Consider following program:
unsigned char c = 'a';
std::cout << c;
It would be highly confusing if the output would be a number that is the value of the character (such as 97 which may be the value of the symbol 'a' on the system) rather than the symbol 'a'.
For extra meditation, think about what this program might print (and feel free to try it out):
char c = 57;
std::cout << c << '\n';
int i = c;
std::cout << i << '\n';
c = '9';
std::cout << c << '\n';
i = c;
std::cout << i << '\n';
This is due to the behavior of the << operator on the char type and the character stream cout. Note, the << is known as formatted output means it does some implicit formatting.
We can say that the value of a variable is not the same as its representation in certain contexts. For example:
int main() {
bool t = true;
std::cout << t << std::endl; // Prints 1, not "true"
}
Think of it this way, why would we need char if it would still behave like a number when printed, why not to use int or unsigned? In essence, we have different types so to have different behaviors which can be deduced from these types.
So, the underlying numeric value of a char is probably not what we looking for, when we print one.
Check this for example:
int main() {
unsigned char c = -1;
int i = c;
std::cout << i << std::endl; // Prints 255
}
If I recall correctly, you're somewhat close in the Primer to the topic of built-in types conversions, it will bring in clarity when you'll get to know these rules better. Anyway, I'm sure, you will benefit greatly from looking into this article. Especially the "Printing chars as integers via type casting" part.

Put a non-numeric input into an integer variable

I’m having a bit of a problem in C++. When I wrote this:
int a = ‘:‘;
cout << a;
This printed out 58. It checks out with the ASCII table.
But if I write this:
int a;
cin >> a;
//i type in “:”
cout << a;
This will print out 0. It seems like if I put in any non-numeric input, a will be 0. I expected it to print out the equivalent ASCII number.
Can someone explain this for me? Thank you!
There are two things at work here.
First, ':' is a char, and although a char looks like a piece of text in your source code, it's really just a number (typically, an index into ASCII). This number can be assigned to other numeric types, such as int.
However, to deal with this oddity in a useful way, the IOStreams library treats char specially, for a numeric type. When you insert an int into a stream using formatted insertion (e.g. cout << 42), it automatically generates a string that looks like that number; but, when you insert a char into a stream using formatted extraction (e.g. cout << ';'), it does not do that.
Similarly, when you do formatted extraction, extracting into an int will interpret the user's input string as a number. Forgetting the char oddity, : in a more general sense is not a number, so your cin >> a does not succeed, as there is no string that looks like a number to interpret. (If a were a char, this "decoding" would again be disabled, and the task would succeed by simply copying the character from the user input.)
It can be confusing, but you're working in two separate data domains: user input as interpreted by IOStreams, and C++ data types. What is true for one, is not necessarily true for the other.
You're declaring a as an int, then the operator>> expects digits, but you give a punctuation, which makes extraction fails. As the result, since C++11, a is set to 0; before C++11 a won't be modified.
If extraction fails (e.g. if a letter was entered where a digit is expected), value is left unmodified and failbit is set. (until C++11)
If extraction fails, zero is written to value and failbit is set. (since C++11)
And
I expected it to print out the equivalent ASCII number.
No, even for valid digits, e.g. if you input 1, a will be set with value 1, but not its ASCII number, i.e. 49.
This will print out 0. It seems like if I put in any non-numeric input, a will be 0. I expected it to print out the equivalent ASCII number.
Since C++11 when extraction fails 0 will be automatically assigned.
However, there is a way where you can take a char input from std::cin and then print its ASCII value. It is called type-casting.
Here is an example:
#include <iostream>
int main()
{
char c;
std::cin >> c;
std::cout << int(c);
return 0;
}
Output:
:
58

Why does std cin work only when used with int variable?

I'm trying to use std::cin after a while.
Using uint8_t or unsigned char:
unsigned char data;
std::cin >> std::dec >> data;
Whatever std::dec is used or not, I get the first ASCII character I type.
If I type 12, data is 0x31 not 12. Why can't it parse number until 255 to be stored in a char?
int data;
std::cin >> std::dec >> data;
gives correctly data=12/0xC not 0x31
Why?
Using char[N] with std::hex
char data[128];
std::cin >> std::hex >> data;
Also gets the ASCII characters instead of the hexadecimal.
Writting 0x010203040506... data is 0xFFFFFFFFF...
Isn't std::cin>>std::hex able to parse the string I type into hexadecimal automatically?
In short:
cin >> charVar scans a single character from stdin
cin >> intVar scans characters from stdin until a non-numeric character is entered
Explaining your observation:
A char variable can store a single ASCII character.
When you type 12, only the character 1 is scanned.
The ASCII code of the character 1 is 0x31.
std::dec and std::hex affect the format of integers.
But as far as the streaming operators are concerned, char and its variants (including uint8_t aren't integers, they're single characters. They will always read a single character, and never parse an integer.
That's just how these functions are defined. There is no way around it. If you want an integer with a limited range, first read into an int (or other integer type that is not a char variant), and then range-check afterwards. You can, if you want, cast it to a small type afterwards, but you probably shouldn't. char types are awkward to work with numerically.
Similarly, reading into an array of char reads a string. (Also, never do that without using setw() to limit the length to fit in the buffer you have. Better yet, use std::string instead.) That's just how it's defined.

C++ convert UTF8 string to hexadecimal and vice versa

It took time looking as utf8 convert string to hexadecimal string, and backwards
I found some examples and possible solutions, but all work well only without special characters.
I have a folowing :
string in="áéíóúñü"
The result shoud be:
"c3a1c3a9c3adc3b3c3bac3b1c3bc"
I try following post, and others:
C++ convert string to hexadecimal and vice versa
How to convert a string in hexadecimal string?
http://www.cplusplus.com/forum/beginner/161703/
I will try to explain better, but I can not speak English properly. Sorry.
I have to send some data using socket. For that I have to convert names to hexadecimal using UTF-8, but in some cases have specials characters for example á, é, í...
When converting normal letters get a string length of 2 per letter.
a-> "61"
e-> "65"
But special characters are encoded (on UTF-8) with length 4
á-> "c3a1" this is the correct conversion
é-> "c3a9" this is the correct conversion
I have attempted the conversion of all the ways I've found, including that suggested me down. But every time you convert a special character gives me an answer of 2 digits, that is not correct.
á-> "e1" this isnt correct
é-> "e9" this isnt correct
Loop over each "character" in the std::string object, output it's two-digit hexadecimal equivalent as an int.
For looping, I recommend you look into range-based for loops.
To set the number of digits to print, read about setting stream precision.
To print a number as hexadecimal, read about the base I/O manipulators.
To convert to an int read about static_cast.
Oh, and I recommend using an unsigned char for the single "characters".
Simple solution based on the above:
std::string stoh(std::string const& in)
{
std::ostringstream os;
for(unsigned char const& c : in)
{
os << std::hex << std::setprecision(2) << std::setw(2)
<< std::setfill('0') << static_cast<int>(c);
}
return os.str();
}