When an array is declared as unsigned char and initialized with values in the range 0x00-0xff and printed using cout, I get garbage values as follows
+ ( �
~ � � �
� O
� � <
May I know how to use use single byte for the numbers and yet be able to use cout ?
Because it's an unsigned char, std::cout is passing them to the terminal and it's being displayed as a character set (Well, attempting, anyway - the values are outside the range of valid printable characters for the character set you're using).
Cast to unsigned int when outputting with cout.
Char types are displayed as characters by default. If you want them displayed as integers, you will have to convert them first:
unsigned char value = 42;
std::cout << static_cast<unsigned int>(value);
Those aren't garbage values. Those are what the character represents. To print it as an int, simply cast to unsigned int at output time:
cout << (unsigned int) some_char;
Related
I (think I) understand how the maths with different variable types works. For example, if I go over the max limit of an unsigned int variable, it will loop back to 0.
I don't understand the behavior of this code with unsigned char:
#include<iostream>
int main() {
unsigned char var{ 0 };
for(int i = 0; i < 501; ++i) {
var += 1;
std::cout << var << '\n';
}
}
This just outputs 1...9, then some symbols and capital letters, and then it just doesn't print anything. It doesn't loop back to the values 1...9 etc.
On the other hand, if I cast to int before printing:
#include<iostream>
int main() {
unsigned char var{ 0 };
for(int i = 0; i < 501; ++i) {
var += 1;
std::cout << (int)var << '\n';
}
}
It does print from 1...255 and then loops back from 0...255.
Why is that? It seems that the unsgined char variable does loop (as we can see from the int cast).
Is it safe to to maths with unsigned char variables? What is the behavior that I see here?
Why doesn't it print the expected integer value?
The issue is not with the looping of char. The issue is with the insertion operation for std::ostream objects and 8-bit integer types. The non-member operator<< functions for these types treat all 8-bit integers (char, signed char, and unsigned char) as their ASCII character types.
operator<<(std::basic_ostream)
The canonical way to handle outputing 8-bit integer types is the way you're doing it. I personally prefer this instead:
char foo;
std::cout << +foo;
The unary + operator promotes the char type to an integer type, which then causes the integer printing function to be called.
Note that integer overflow is only defined for unsigned integer types. If you repeat this with char or signed char, the behavior is undefined by the standard. SOMETHING will happen, for sure, because we live in reality, but that overflow behavior may differ from compiler to compiler.
Why doesn't it repeat the 0..9 characters
I tested this using g++ to compile, and bash on Ubuntu 20.04. My non-printable characters are handled as explicit symbols in some cases, or nothing printed in other cases. The non-repeating behavior must be due to how your shell handles these non-printable characters. We can't answer that without more information.
Unsigned chars aren't trated as numbers in this case. This data type is literally a byte:
1 byte = 8 bits = 0000 0000 which means 0.
What cout is printing is the character that represents that byte you changed by adding +1 to it.
For example:
0 = 0000 0000
1 = 0000 0001
2 = 0000 0010
.
.
.
9 = 0000 1001
Then, here start other chars that arent related to numbers.
So, if you cast it to int, it will give you the numeric representations of that byte, giving you a 0-255 output.
Hope this clarifies!
Edit: Made the explanation more clear.
I have a weird input file with all kinds of control characters like nulls. I want to remove all control characters from this Windows-1252 encoded text file, but if you do this:
std::string test="tést";
for (int i=0;i<test.length();i++)
{
if (test[i]<32) test[i]=32; // change all control characters into spaces
}
It will change the é into a space as well.
So if you have a string like this, encoded in Windows-1252:
std::string test="tést";
The hex values would be:
t é s t
74 E9 73 74
See https://en.wikipedia.org/wiki/ASCII and https://en.wikipedia.org/wiki/Windows-1252
test[0] would equal to decimal 116 (=0x74), but apparently with é/0xE9, test[1] does not equal the decimal value 233.
So how can you recognize that é properly?
32 is a signed integer, comparing the char with the signed integer is performed by the compiler as signed: E9 (-23)<32 which return true.
Using an unsigned literal of 32, that is 32umakes the comparison to be performed on unsigned values: E9 (233) < 32 which return false.
Replace :
if (test[i]<32) test[i]=32;
By:
if (test[i]<32u) test[i]=32u;
And you should get the expected result.
Test this here:
https://onlinegdb.com/BJ8tj0kbd
Note: you can check that char is signed with the following code:
#include <limits>
...
std::cout << std::numeric_limits<char>::is_signed << std::endl;
Change
if (test[i]<32)
to
if (test[i] >= 0 && test[i] < 32)
chars are often signed types and 0xE9 is a negative value in an eight bit integer.
Need to read the value of character as a number and find corresponding hexadecimal value for that.
#include <iostream>
#include <iomanip>
using namespace std;
int main() {
char c = 197;
cout << hex << uppercase << setw(2) << setfill('0') << (short)c << endl;
}
Output:
FFC5
Expected output:
C5
The problem is that when you use char c = 197 you are overflowing the char type, producing a negative number (-59). Starting there it doesn't matter what conversion you make to larger types, it will remain a negative number.
To fully understand why you must know how two's complement works.
Basically, -59 and 192 have the same binary representation: 1100 0101, depending on the data type it is interpreted in one way or another. When you print it using hexadecimal format, the binary representation (the actual value stored in memory) is the one used, producing C5.
When the char is converted into an short/unsigned short, it is converting the -59 into its short/unsigned short representation, which is 1111 1111 1100 0101 (FFC5) for both cases.
The correct way to do it would be to store the initial value (197) into a variable which data type is able to represent it (unsigned char, short, unsigned short, ...) from the very beginning.
If I want to compile the following code:
int x = 8;
int y = 17;
char final[2];
final[0] = (char) x;
final[1] = (char) y%10;
cout << final[0] << final[1] << endl;
It shows nothing. I don't know why? So how can I successfully convert it?
(char)8 is not '8', but the ASCII value 8 (the backspace character). To display the character 8 you can add it to '0':
int x = 8;
int y = 17;
char final[2];
final[0] = '0' + x;
final[1] = '0' + (y%10);
cout << final[0] << final[1] << endl;
As per your program you are printing char 8, and char 7.
They are not printable. In fact they are BELL and Backspace characters respectively.
Just run you program and redirect it to a file.
Then do an hexdump, you will see what is printed.
./a.out > /tmp/b
hd /tmp/b
00000000 08 07 0a |...|
00000003
What you need to understand is that in C++, chars are numbers, just like ints, only in a smaller range. When you cast the integer 8 to char, C++ thinks you want a char holding the number 8. If we look at our handy ASCII table, we can see that 8 is BS (backspace) and 7 is BEL (which sometimes rings a bell when you print it to a terminal). Neither of those are printable, which is why you aren't seeing anything.
If you just want to print the 87 to standard output, then this code will do that:
cout << x << (y%10) << endl;
If you really need to convert it chars first, then fear not, it can still be done. I'm not much of a C++ person, but this is the way to do it in C:
char final[2];
snprintf(final, sizeof final, "%d%d", x, y % 10);
(See snprintf(3))
This code treats the 2-char array as a string, and writes the two numbers you specified to it using a printf format string. Strings in C normally end with a NUL byte, and if you extended the array to three chars, the final char would be set to NUL. But because we told snprintf how long the string is, it won't write more chars than it can.
This should also work in C++.
I get std:string which should include bytes (array of chars), I'm trying to display the bytes, but first byte always include weird data:
4294967169, how this can be a byte/char?!
void do_log(const std::string & data) {
std::stringstream ss;
ss<< "do_log: ";
int min = min((data.length()), (20)); // first 20 bytes
for (int i=0;i<min;i++)
{
ss << setfill ('0') << setw(2) << hex << (unsigned int)data.at(i) <<" ";
}
log(ss.str());
}
Data I log is:
ffffff81 0a 53 3a 30 30 38 31 30 36 43 38
Why and how this ffffff81 appear? if the string.at should return char?
When you write (unsigned int)data.at(i), data.at(i) is a char which then is submitted to integer promotion. If on your system char is signed, the values greater than 127 are interpreted as negative number. The sign bit will remain in the integer promotion, giving such strange results.
You can verify if char is signed by looking at numeric_limits<char>::is_signed
You can easily solve the issue by getting rid of the bits added by the integer promotion, by ANDing the integer with 0xff: (static_cast<int>(data.at(i)) & 0xff)
Another way is to force your compiler to work with unsigned chars. For example, with option -funsigned-char on gcc or /J with MSVC.
string contains signed characters.
So a character 0x81 is interpreted as a negative number, like 0xFFFFFF81.
You cast this character to an unsigned int, so it becomes a very large number.