Unknown behaviour to me about "cout"ing unsigned chars [duplicate] - c++

I have a simple program. Notice that I use an unsigned fixed-width integer 1 byte in size.
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 12;
std::cout << (x << 1) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
My output is the following.
24
-13
I tested larger numbers and operator << always gives me positive numbers, while operator ~ always gives me negative numbers. I then used sizeof() and found...
When I use the left shift bitwise operator(<<), I receive an unsigned 4 byte integer.
When I use the bitwise not operator(~), I receive a signed 4 byte integer.
It seems that the bitwise not operator(~) does a signed integral promotion like the arithmetic operators do. However, the left shift operator(<<) seems to promote to an unsigned integral.
I feel obligated to know when the compiler is changing something behind my back. If I'm correct in my analysis, do all the bitwise operators promote to a 4 byte integer? And why are some signed and some unsigned? I'm so confused!
Edit: My assumption of always getting positive or always getting negative values was wrong. But from being wrong, I understand what was really happening thanks to the great answers below.

[expr.unary.op]
The operand of ~ shall have integral or unscoped enumeration type; the
result is the one’s complement of its operand. Integral promotions are
performed.
[expr.shift]
The shift operators << and >> group left-to-right. [...] The operands shall be of integral or unscoped enumeration type and integral promotions are performed.
What's the integral promotion of uint8_t (which is usually going to be unsigned_char behind the scenes)?
[conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.
So int, because all of the values of a uint8_t can be represented by int.
What is int(12) << 1 ? int(24).
What is ~int(12) ? int(-13).

For performance reasons the C and C++ language consider int to be the "most natural" integer type and instead types that are "smaller" than an int are considered sort of "storage" type.
When you use a storage type in an expression it gets automatically converted to an int or to an unsigned int implicitly. For example:
// Assume a char is 8 bit
unsigned char x = 255;
unsigned char one = 1;
int y = x + one; // result will be 256 (too large for a byte!)
++x; // x is now 0
what happened is that x and one in the first expression have been implicitly converted to integers, the addition has been computed and the result has been stored back in an integer. In other words the computation has NOT been performed using two unsigned chars.
Likewise if you have a float value in an expression the first thing the compiler will do is promoting it to a double (in other words float is a storage type and double is instead the natural size for floating point numbers). This is the reason for which if you use printf to print floats you don't need to say %lf int the format strings and %f is enough (%lf is needed for scanf however because that function stores a result and a float can be smaller than a double).
C++ complicated the matter quite a bit because when passing parameters to functions you can discriminate between ints and smaller types. Thus it's not ALWAYS true that a conversion is performed in every expression... for example you can have:
void foo(unsigned char x);
void foo(int x);
where
unsigned char x = 255, one = 1;
foo(x); // Calls foo(unsigned char), no promotion
foo(x + one); // Calls foo(int), promotion of both x and one to int

I tested larger numbers and operator << always gives me positive
numbers, while operator ~ always gives me negative numbers. I then
used sizeof() and found...
Wrong, test it:
uint8_t v = 1;
for (int i=0; i<32; i++) cout << (v<<i) << endl;
gives:
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
uint8_t is an 8-bit long unsigned integer type, which can represent values in the range [0,255], as that range in included in the range of int it is promoted to int (not unsigned int). Promotion to int has precedence over promotion to unsigned.

Look into two's complement and how computer stores negative integers.
Try this
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 1;
int shiftby=0;
shiftby=8*sizeof(int)-1;
std::cout << (x << shiftby) << '\n'; // or std::cout << (x << 31) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
The output is -2147483648
In general if the first bit of a signed number is 1 it is considered negative. when you take a large number and shift it. If you shift it so that the first bit is 1 it will be negative
** EDIT **
Well I can think of a reason why shift operators would use unsigned int. Consider right shift operation >> if you right shift -12 you will get 122 instead of -6. This is because it adds a zero in the beginning without considering the sign

Related

24-bit to 32-bit conversion in C++

I need to convert a 24-bit integer (2s compliment) to 32-bit integer in C++. I have found a solution here, which is given as
int interpret24bitAsInt32(unsigned char* byteArray)
{
return (
(byteArray[0] << 24)
| (byteArray[1] << 16)
| (byteArray[2] << 8)
) >> 8;
}
Though I found it is working, I have the following concern about the piece of code.
byteArray[0] is only 8-bits, and hence how the operations like byteArray[0] << 24 will be possible?
It will be possible if the compiler up-converts the byteArray to an integer and does the operation. This may be the reason it is working now. But my question is whether this behaviour is guaranteed in all compilers and explicitly mentioned in the standard? It is not trivial to me as we are not explicitly giving the compiler any clue that the target is a 32-bit integer!
Also, please let me know any improvisation like vectorization is possible to improve the speed (may be using C++11), as I need to convert huge amount of 24-bit data to 32-bit.
int32_t interpret24bitAsInt32(unsigned char* byteArray)
{
int32_t number =
(((int32_t)byteArray[0]) << 16)
| (((int32_t)byteArray[1]) << 8)
| byteArray[2];
if (number >= ((int32_t)1) << 23)
//return (uint32_t)number | 0xFF000000u;
return number - 16777216;
return number;
}
this function should do what you want without invoking undefined behavior by shifting a 1 into the sign bit of int.
The int32_t cast is only necessary if sizeof(int) < 4, otherwise the default integer promotion to int happens.
If someone does not like the if: It does not get translated to a conditional jump by the compiler (gcc 9.2): https://godbolt.org/z/JDnJM2
It leaves a cmovg.
[expr.shift]/1 The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand...
[conv.prom] 7.6 Integral promotions
1 A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (7.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
So yes, the standard requires that an argument of a shift operator, that has the type unsigned char, be promoted to int before the evaluation.
That said, the technique in your code relies on int a) being 32 bits large, and b) using two's-complement to represent negative values. Neither of which is guaranteed by the standard, though it's common with modern systems.
A version without branch; but multiplication:
int32_t interpret24bitAsInt32(unsigned char* bytes) {
unsigned char msb = UINT8_C(0xFF) * (bytes[0] >> UINT8_C(7));
uint32_t number =
(msb << UINT32_C(24))
| (bytes[0] << UINT32_C(16)))
| (bytes[1] << UINT32_C(8)))
| bytes[2];
return number;
}
You need to test if omitting the branch really gives you a performance advantage, though!
Adapted from older code of me which did this for 10 bit numbers. Test before use!
Oh, and it still relies upon implementation defined behaviour with regards to the conversion uint32_t to int32_t. If you want to go down that rabbit hole, have fun but be warned.
Or, much more simple: Use the trick from mchs answer. And also use shifts instead of multiplication:
int32_t interpret24bitAsInt32(unsigned char* bytes) {
int32_t const number =
(bytes[0] << INT32_C(16))
| (bytes[1] << INT32_C(8))
| bytes[2];
int32_t const correction =
(bytes[0] >> UINT8_C(7)) << INT32_C(24);
return number - correction;
}
Test case
There is indeed Integral_promotion for type smaller than int for operator_arithmetic
So assuming sizeof(char) < sizeof(int)
in
byteArray[0] << 24
byteArray is promoted in int and you do bit-shift on int.
First issue is that int can only be 16 bits.
Second issue (before C++20), int is signed, and Bitwise shift can easily lead to implementation-defined or UB (And you have both for negative 24 bits numbers).
In C++20, behavior of Bitwise shift has been simplified (behavior defined) and the problematic UB has been removed too.
The leading 1 of negative number are kept in neg >> 8.
So before C++20, you have to do something like:
std::int32_t interpret24bitAsInt32(const unsigned char* byteArray)
{
const std::int32_t res =
(std::int32_t(byteArray[0]) << 16)
| (byteArray[1] << 8)
| byteArray[2];
const std::int32_t int24Max = (std::int32_t(1) << 24) - 1;
return res <= int24Max ?
res : // Positive 24 bit numbers
int24Max - res; // Negative number
}
Integral promotions [conv.prom] are performed on the operands of a shift expression [expr.shift]/1. In your case, that means that your values of type unsigned char will be converted to type int before << is applied [conv.prom]/1. Thus, the C++ standard guarantees that the operands be "up-converted".
However, the standard only guarantees that int has at least 16 Bit. There is also no guarantee that unsigned char has exactly 8 Bit (it may have more). Thus, it is not guaranteed that int is always large enough to represent the result of these left shifts. If int does not happen to be large enough, the resulting signed integer overflow will invoke undefined behavior [expr]/4. Chances are that int has 32 Bit on your target platform and, thus, everything works out in the end.
If you need to work with a guaranteed, fixed number of Bits, I would generally recommend to use fixed-width integer types, for example:
std::int32_t interpret24bitAsInt32(const std::uint8_t* byteArray)
{
return
static_cast<std::int32_t>(
(std::uint32_t(byteArray[0]) << 24) |
(std::uint32_t(byteArray[1]) << 16) |
(std::uint32_t(byteArray[2]) << 8)
) >> 8;
}
Note that right shift of a negative value is currently implementation-defined [expr.shift]/3. Thus, it is not strictly guaranteed that this code will end up performing sign extension on a negative number. However, your compiler is required to document what exactly right-shifting a negative integer does [defns.impl.defined] (i.e., you can go and make sure it does what you need). And I have never heard of a compiler that does not implement right shift of a negative value as an arithmetic shift in practice. Also, it looks like C++20 is going to mandate arithmetic shift behavior…

Bit wise '&' with signed vs unsigned operand

I faced an interesting scenario in which I got different results depending on the right operand type, and I can't really understand the reason for it.
Here is the minimal code:
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0x8123U;
uint64_t new_check = (check & 0xFFFF) << 16;
std::cout << std::hex << new_check << std::endl;
new_check = (check & 0xFFFFU) << 16;
std::cout << std::hex << new_check << std::endl;
return 0;
}
I compiled this code with g++ (gcc version 4.5.2) on Linux 64bit: g++ -std=c++0x -Wall example.cpp -o example
The output was:
ffffffff81230000
81230000
I can't really understand the reason for the output in the first case.
Why at some point would any of the temporal calculation results be promoted to a signed 64bit value (int64_t) resulting in the sign extension?
I would accept a result of '0' in both cases if a 16bit value is shifted 16 bits left in the first place and then promoted to a 64bit value. I also do accept the second output if the compiler first promotes the check to uint64_t and then performs the other operations.
But how come & with 0xFFFF (int32_t) vs. 0xFFFFU (uint32_t) would result in those two different outputs?
That's indeed an interesting corner case. It only occurs here because you use uint16_t for the unsigned type when you architecture use 32 bits for ìnt
Here is a extract from Clause 5 Expressions from draft n4296 for C++14 (emphasize mine):
10 Many binary operators that expect operands of arithmetic or enumeration type cause conversions ...
This pattern is called the usual arithmetic conversions, which are defined as follows:
...(10.5.3) — Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
(10.5.4) — Otherwise, if the type of the operand with signed integer type can represent all of the values of
the type of the operand with unsigned integer type, the operand with unsigned integer type shall
be converted to the type of the operand with signed integer type.
You are in the 10.5.4 case:
uint16_t is only 16 bits while int is 32
int can represent all the values of uint16_t
So the uint16_t check = 0x8123U operand is converted to the signed 0x8123 and result of the bitwise & is still 0x8123.
But the shift (bitwise so it happens at the representation level) causes the result to be the intermediate unsigned 0x81230000 which converted to an int gives a negative value (technically it is implementation defined, but this conversion is a common usage)
5.8 Shift operators [expr.shift]...Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value;...
and
4.7 Integral conversions [conv.integral]...
3 If the destination type is signed, the value is unchanged if it can be represented in the destination type;
otherwise, the value is implementation-defined.
(beware this was true undefined behaviour in C++11...)
So you end with a conversion of the signed int 0x81230000 to an uint64_t which as expected gives 0xFFFFFFFF81230000, because
4.7 Integral conversions [conv.integral]...
2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type).
TL/DR: There is no undefined behaviour here, what causes the result is the conversion of signed 32 bits int to unsigned 64 bits int. The only part part that is undefined behaviour is a shift that would cause a sign overflow but all common implementations share this one and it is implementation defined in C++14 standard.
Of course, if you force the second operand to be unsigned everything is unsigned and you get evidently the correct 0x81230000 result.
[EDIT] As explained by MSalters, the result of the shift is only implementation defined since C++14, but was indeed undefined behaviour in C++11. The shift operator paragraph said:
...Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Let's take a look at
uint64_t new_check = (check & 0xFFFF) << 16;
Here, 0xFFFF is a signed constant, so (check & 0xFFFF) gives us a signed integer by the rules of integer promotion.
In your case, with 32-bit int type, the MSbit for this integer after the left shift is 1, and so the extension to 64-bit unsigned will do a sign extension, filling the bits to the left with 1's. Interpreted as a two's complement representation that gives the same negative value.
In the second case, 0xFFFFU is unsigned, so we get unsigned integers and the left shift operator works as expected.
If your toolchain supports __PRETTY_FUNCTION__, a most-handy feature, you can quickly determine how the compiler perceives expression types:
#include <iostream>
#include <cstdint>
template<typename T>
void typecheck(T const& t)
{
std::cout << __PRETTY_FUNCTION__ << '\n';
std::cout << t << '\n';
}
int main()
{
uint16_t check = 0x8123U;
typecheck(0xFFFF);
typecheck(check & 0xFFFF);
typecheck((check & 0xFFFF) << 16);
typecheck(0xFFFFU);
typecheck(check & 0xFFFFU);
typecheck((check & 0xFFFFU) << 16);
return 0;
}
Output
void typecheck(const T &) [T = int]
65535
void typecheck(const T &) [T = int]
33059
void typecheck(const T &) [T = int]
-2128412672
void typecheck(const T &) [T = unsigned int]
65535
void typecheck(const T &) [T = unsigned int]
33059
void typecheck(const T &) [T = unsigned int]
2166554624
The first thing to realize is that binary operators like a&b for built-in types only work if both sides have the same type. (With user-defined types and overloads, anything goes). This might be realized via implicit conversions.
Now, in your case, there definitely is such a conversion, because there simply isn't a binary operator & that takes a type smaller than int. Both sides are converted to at least int size, but what exact types?
As it happens, on your GCC int is indeed 32 bits. This is important, because it means that all values of uint16_t can be represented as an int. There is no overflow.
Hence, check & 0xFFFF is a simple case. The right side is already an int, the left side promotes to int, so the result is int(0x8123). This is perfectly fine.
Now, the next operation is 0x8123 << 16. Remember, on your system int is 32 bits, and INT_MAX is 0x7FFF'FFFF. In the absence of overflow, 0x8123 << 16 would be 0x81230000, but that clearly is bigger than INT_MAX so there is in fact overflow.
Signed integer overflow in C++11 is Undefined Behavior. Literally any outcome is correct, including purple or no output at all. At least you got a numerical value, but GCC is known to outright eliminate code paths which unavoidably cause overflow.
[edit]
Newer GCC versions support C++14, where this particular form of overflow has become implementation-defined - see Serge's answer.
0xFFFF is a signed int. So after the & operation, we have a 32-bit signed value:
#include <stdint.h>
#include <type_traits>
uint64_t foo(uint16_t a) {
auto x = (a & 0xFFFF);
static_assert(std::is_same<int32_t, decltype(x)>::value, "not an int32_t")
static_assert(std::is_same<uint16_t, decltype(x)>::value, "not a uint16_t");
return x;
}
http://ideone.com/tEQmbP
Your original 16 bits are then left-shifted which results in 32-bit value with the high-bit set (0x80000000U) so it has a negative value. During the 64-bit conversion sign-extension occurs, populating the upper words with 1s.
This is the result of integer promotion. Before the & operation happens, if the operands are "smaller" than an int (for that architecture), compiler will promote both operands to int, because they both fit into a signed int:
This means that the first expression will be equivalent to (on a 32-bit architecture):
// check is uint16_t, but it fits into int32_t.
// the constant is signed, so it's sign-extended into an int
((int32_t)check & (int32_t)0xFFFFFFFF)
while the other one will have the second operand promoted to:
// check is uint16_t, but it fits into int32_t.
// the constant is unsigned, so the upper 16 bits are zero
((int32_t)check & (int32_t)0x0000FFFFU)
If you explicitly cast check to an unsigned int, then the result will be the same in both cases (unsigned * signed will result in unsigned):
((uint32_t)check & 0xFFFF) << 16
will be equal to:
((uint32_t)check & 0xFFFFU) << 16
Your platform has 32-bit int.
Your code is exactly equivalent to
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0x8123U;
auto a1 = (check & 0xFFFF) << 16
uint64_t new_check = a1;
std::cout << std::hex << new_check << std::endl;
auto a2 = (check & 0xFFFFU) << 16;
new_check = a2;
std::cout << std::hex << new_check << std::endl;
return 0;
}
What's the type of a1 and a2?
For a2, the result is promoted to unsigned int.
More interestingly, for a1 the result is promoted to int, and then it gets sign-extended as it's widened to uint64_t.
Here's a shorter demonstration, in decimal so that the difference between signed and unsigned types is apparent:
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0;
std::cout << check
<< " " << (int)(check + 0x80000000)
<< " " << (uint64_t)(int)(check + 0x80000000) << std::endl;
return 0;
}
On my system (also 32-bit int), I get
0 -2147483648 18446744071562067968
showing where the promotion and sign-extension happens.
The & operation has two operands. The first is an unsigned short, which will undergo the usual promotions to become an int. The second is a constant, in one case of type int, in the other case of type unsigned int. The result of the & is therefore int in one case, unsigned int in the other case. That value is shifted to the left, resulting either in an int with the sign bit set, or an unsigned int. Casting a negative int to uint64_t will give a large negative integer.
Of course you should always follow the rule: If you do something, and you don't understand the result, then don't do that!

What is going on with bitwise operators and integer promotion?

I have a simple program. Notice that I use an unsigned fixed-width integer 1 byte in size.
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 12;
std::cout << (x << 1) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
My output is the following.
24
-13
I tested larger numbers and operator << always gives me positive numbers, while operator ~ always gives me negative numbers. I then used sizeof() and found...
When I use the left shift bitwise operator(<<), I receive an unsigned 4 byte integer.
When I use the bitwise not operator(~), I receive a signed 4 byte integer.
It seems that the bitwise not operator(~) does a signed integral promotion like the arithmetic operators do. However, the left shift operator(<<) seems to promote to an unsigned integral.
I feel obligated to know when the compiler is changing something behind my back. If I'm correct in my analysis, do all the bitwise operators promote to a 4 byte integer? And why are some signed and some unsigned? I'm so confused!
Edit: My assumption of always getting positive or always getting negative values was wrong. But from being wrong, I understand what was really happening thanks to the great answers below.
[expr.unary.op]
The operand of ~ shall have integral or unscoped enumeration type; the
result is the one’s complement of its operand. Integral promotions are
performed.
[expr.shift]
The shift operators << and >> group left-to-right. [...] The operands shall be of integral or unscoped enumeration type and integral promotions are performed.
What's the integral promotion of uint8_t (which is usually going to be unsigned_char behind the scenes)?
[conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.
So int, because all of the values of a uint8_t can be represented by int.
What is int(12) << 1 ? int(24).
What is ~int(12) ? int(-13).
For performance reasons the C and C++ language consider int to be the "most natural" integer type and instead types that are "smaller" than an int are considered sort of "storage" type.
When you use a storage type in an expression it gets automatically converted to an int or to an unsigned int implicitly. For example:
// Assume a char is 8 bit
unsigned char x = 255;
unsigned char one = 1;
int y = x + one; // result will be 256 (too large for a byte!)
++x; // x is now 0
what happened is that x and one in the first expression have been implicitly converted to integers, the addition has been computed and the result has been stored back in an integer. In other words the computation has NOT been performed using two unsigned chars.
Likewise if you have a float value in an expression the first thing the compiler will do is promoting it to a double (in other words float is a storage type and double is instead the natural size for floating point numbers). This is the reason for which if you use printf to print floats you don't need to say %lf int the format strings and %f is enough (%lf is needed for scanf however because that function stores a result and a float can be smaller than a double).
C++ complicated the matter quite a bit because when passing parameters to functions you can discriminate between ints and smaller types. Thus it's not ALWAYS true that a conversion is performed in every expression... for example you can have:
void foo(unsigned char x);
void foo(int x);
where
unsigned char x = 255, one = 1;
foo(x); // Calls foo(unsigned char), no promotion
foo(x + one); // Calls foo(int), promotion of both x and one to int
I tested larger numbers and operator << always gives me positive
numbers, while operator ~ always gives me negative numbers. I then
used sizeof() and found...
Wrong, test it:
uint8_t v = 1;
for (int i=0; i<32; i++) cout << (v<<i) << endl;
gives:
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
uint8_t is an 8-bit long unsigned integer type, which can represent values in the range [0,255], as that range in included in the range of int it is promoted to int (not unsigned int). Promotion to int has precedence over promotion to unsigned.
Look into two's complement and how computer stores negative integers.
Try this
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 1;
int shiftby=0;
shiftby=8*sizeof(int)-1;
std::cout << (x << shiftby) << '\n'; // or std::cout << (x << 31) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
The output is -2147483648
In general if the first bit of a signed number is 1 it is considered negative. when you take a large number and shift it. If you shift it so that the first bit is 1 it will be negative
** EDIT **
Well I can think of a reason why shift operators would use unsigned int. Consider right shift operation >> if you right shift -12 you will get 122 instead of -6. This is because it adds a zero in the beginning without considering the sign

Signed arithmetic

I'm running this piece of code, and I'm getting the output value as (converted to hex) 0xFFFFFF93 and 0xFFFFFF94.
#include <iostream>
using namespace std;
int main()
{
char x = 0x91;
char y = 0x02;
unsigned out;
out = x + y;
cout << out << endl;
out = x + y + 1;
cout << out << endl;
}
I'm confused about the arithmetic going on here. Is it because all the higher bits in out are taken to be 1 by default?
When I typecast out to an int, I get the answers as (in int) -109 and -108. Any idea why this is happening?
So there are a couple of things going on here. One, char can be either signed or unsigned, in your case it is signed. Two assignment will covert the right hand side to the type of the left hand side. Using the right warning flags would help, clang with the -Wconversion flags warns:
warning: implicit conversion changes signedness: 'int' to 'unsigned int' [-Wsign-conversion]
out = x + y;
~ ~~^~~
In this case to do this conversion it will basically add or subtract the unsigned max + 1 to the number to be converted.
We can see the same results using the limits header:
#include <limits>
//....
std::cout << std::hex << (std::numeric_limits<unsigned>::max() + 1) + (x+y) << std::endl ;
//...
and the result is:
ffffff93
For reference the draft C++ standard section 5.17 Assignment and compound assignment operators says:
If the left operand is not of class type, the expression is implicitly converted (Clause 4) to the cv-unqualified type of the left operand.
Clause 4 under 4.7 Integral conversions says:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]
which is equivalent to adding or subtracting UMAX + 1.
A plain char usually also represents a signed type! Since compatibility reasons with C syntax, this isn't specified further, and may be compiler implementation dependent. You always can make it distinct signed arithmetic behavior, by explicitly specifying the signed / unsigned keywords.
Try replacing your char definitions like this
unsigned char x = 0x91;
unsigned char y = 0x02;
to get the results you expect!
See the fully working sample here.
The negative numbers are represented internally as 2's complement and hence, their first bit is a 1. When you work in hex (and print in hex), the significant bits are displayed as 1's leading to numbers like you showed.
C++ doesn't specify whether char is signed or unsigned. Here they are signed, so when they are promoted to int's, the negative value is used which is then converted to unsigned. Use or cast to unsigned char.

what does it mean to bitwise left shift an unsigned char with 16

i am reading a .cpp file containing a unsigned char variable, it's trying the bitwise left shift 16 bits, since an unsigned char is composed of 8 bits, left shift 16 bits will erase all the bits and fill it with eight 0s.
unsigned char byte=0xff; byte << 16;
When you shift a value,
unsigned char x = ...;
int y = x << 16;
The type of x is promoted to int if unsigned char fits in an int (most systems), or to unsigned if unsigned char does not fit in an int (rare1). As long as your int is 25 bits wide or wider, then no data will be discarded2.
Note that this is completely unrelated to the fact that 16 has type int.
/* All three are exactly equivalent */
x << 16;
x << 16u;
x << (unsigned char) 16;
Source: from n1516 (C99 draft):
§6.5.7 paragraph 3: Bitwise Shift Operators
The integer promotions are performed on each of the operands. The type of the result is
that of the promoted left operand.
§6.3.1.1 paragraph 2: Boolean, characters, and integers
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions.
Footnotes:
1: Some DSP chips as well as certain Cray supercomputers are known to have sizeof(char) == sizeof(int). This simplifies design of the processor's load-store unit at the cost of additional memory consumption.
2: If your left shift is promoted to int and then overflows the int, this is undefined behavior (demons may fly out your nose). By comparison, overflowing an unsigned is always well-defined, so bit shifts should usually be done on unsigned types.
If char fits inside int, it will be promoted to an int and the result will be as you expect. If that is not the case it is undefined behavior according to the standard, and will probably emit a compile warning. From the standard:
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.