I need to convert a 24-bit integer (2s compliment) to 32-bit integer in C++. I have found a solution here, which is given as
int interpret24bitAsInt32(unsigned char* byteArray)
{
return (
(byteArray[0] << 24)
| (byteArray[1] << 16)
| (byteArray[2] << 8)
) >> 8;
}
Though I found it is working, I have the following concern about the piece of code.
byteArray[0] is only 8-bits, and hence how the operations like byteArray[0] << 24 will be possible?
It will be possible if the compiler up-converts the byteArray to an integer and does the operation. This may be the reason it is working now. But my question is whether this behaviour is guaranteed in all compilers and explicitly mentioned in the standard? It is not trivial to me as we are not explicitly giving the compiler any clue that the target is a 32-bit integer!
Also, please let me know any improvisation like vectorization is possible to improve the speed (may be using C++11), as I need to convert huge amount of 24-bit data to 32-bit.
int32_t interpret24bitAsInt32(unsigned char* byteArray)
{
int32_t number =
(((int32_t)byteArray[0]) << 16)
| (((int32_t)byteArray[1]) << 8)
| byteArray[2];
if (number >= ((int32_t)1) << 23)
//return (uint32_t)number | 0xFF000000u;
return number - 16777216;
return number;
}
this function should do what you want without invoking undefined behavior by shifting a 1 into the sign bit of int.
The int32_t cast is only necessary if sizeof(int) < 4, otherwise the default integer promotion to int happens.
If someone does not like the if: It does not get translated to a conditional jump by the compiler (gcc 9.2): https://godbolt.org/z/JDnJM2
It leaves a cmovg.
[expr.shift]/1 The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand...
[conv.prom] 7.6 Integral promotions
1 A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (7.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
So yes, the standard requires that an argument of a shift operator, that has the type unsigned char, be promoted to int before the evaluation.
That said, the technique in your code relies on int a) being 32 bits large, and b) using two's-complement to represent negative values. Neither of which is guaranteed by the standard, though it's common with modern systems.
A version without branch; but multiplication:
int32_t interpret24bitAsInt32(unsigned char* bytes) {
unsigned char msb = UINT8_C(0xFF) * (bytes[0] >> UINT8_C(7));
uint32_t number =
(msb << UINT32_C(24))
| (bytes[0] << UINT32_C(16)))
| (bytes[1] << UINT32_C(8)))
| bytes[2];
return number;
}
You need to test if omitting the branch really gives you a performance advantage, though!
Adapted from older code of me which did this for 10 bit numbers. Test before use!
Oh, and it still relies upon implementation defined behaviour with regards to the conversion uint32_t to int32_t. If you want to go down that rabbit hole, have fun but be warned.
Or, much more simple: Use the trick from mchs answer. And also use shifts instead of multiplication:
int32_t interpret24bitAsInt32(unsigned char* bytes) {
int32_t const number =
(bytes[0] << INT32_C(16))
| (bytes[1] << INT32_C(8))
| bytes[2];
int32_t const correction =
(bytes[0] >> UINT8_C(7)) << INT32_C(24);
return number - correction;
}
Test case
There is indeed Integral_promotion for type smaller than int for operator_arithmetic
So assuming sizeof(char) < sizeof(int)
in
byteArray[0] << 24
byteArray is promoted in int and you do bit-shift on int.
First issue is that int can only be 16 bits.
Second issue (before C++20), int is signed, and Bitwise shift can easily lead to implementation-defined or UB (And you have both for negative 24 bits numbers).
In C++20, behavior of Bitwise shift has been simplified (behavior defined) and the problematic UB has been removed too.
The leading 1 of negative number are kept in neg >> 8.
So before C++20, you have to do something like:
std::int32_t interpret24bitAsInt32(const unsigned char* byteArray)
{
const std::int32_t res =
(std::int32_t(byteArray[0]) << 16)
| (byteArray[1] << 8)
| byteArray[2];
const std::int32_t int24Max = (std::int32_t(1) << 24) - 1;
return res <= int24Max ?
res : // Positive 24 bit numbers
int24Max - res; // Negative number
}
Integral promotions [conv.prom] are performed on the operands of a shift expression [expr.shift]/1. In your case, that means that your values of type unsigned char will be converted to type int before << is applied [conv.prom]/1. Thus, the C++ standard guarantees that the operands be "up-converted".
However, the standard only guarantees that int has at least 16 Bit. There is also no guarantee that unsigned char has exactly 8 Bit (it may have more). Thus, it is not guaranteed that int is always large enough to represent the result of these left shifts. If int does not happen to be large enough, the resulting signed integer overflow will invoke undefined behavior [expr]/4. Chances are that int has 32 Bit on your target platform and, thus, everything works out in the end.
If you need to work with a guaranteed, fixed number of Bits, I would generally recommend to use fixed-width integer types, for example:
std::int32_t interpret24bitAsInt32(const std::uint8_t* byteArray)
{
return
static_cast<std::int32_t>(
(std::uint32_t(byteArray[0]) << 24) |
(std::uint32_t(byteArray[1]) << 16) |
(std::uint32_t(byteArray[2]) << 8)
) >> 8;
}
Note that right shift of a negative value is currently implementation-defined [expr.shift]/3. Thus, it is not strictly guaranteed that this code will end up performing sign extension on a negative number. However, your compiler is required to document what exactly right-shifting a negative integer does [defns.impl.defined] (i.e., you can go and make sure it does what you need). And I have never heard of a compiler that does not implement right shift of a negative value as an arithmetic shift in practice. Also, it looks like C++20 is going to mandate arithmetic shift behavior…
Related
I have a simple program. Notice that I use an unsigned fixed-width integer 1 byte in size.
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 12;
std::cout << (x << 1) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
My output is the following.
24
-13
I tested larger numbers and operator << always gives me positive numbers, while operator ~ always gives me negative numbers. I then used sizeof() and found...
When I use the left shift bitwise operator(<<), I receive an unsigned 4 byte integer.
When I use the bitwise not operator(~), I receive a signed 4 byte integer.
It seems that the bitwise not operator(~) does a signed integral promotion like the arithmetic operators do. However, the left shift operator(<<) seems to promote to an unsigned integral.
I feel obligated to know when the compiler is changing something behind my back. If I'm correct in my analysis, do all the bitwise operators promote to a 4 byte integer? And why are some signed and some unsigned? I'm so confused!
Edit: My assumption of always getting positive or always getting negative values was wrong. But from being wrong, I understand what was really happening thanks to the great answers below.
[expr.unary.op]
The operand of ~ shall have integral or unscoped enumeration type; the
result is the one’s complement of its operand. Integral promotions are
performed.
[expr.shift]
The shift operators << and >> group left-to-right. [...] The operands shall be of integral or unscoped enumeration type and integral promotions are performed.
What's the integral promotion of uint8_t (which is usually going to be unsigned_char behind the scenes)?
[conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.
So int, because all of the values of a uint8_t can be represented by int.
What is int(12) << 1 ? int(24).
What is ~int(12) ? int(-13).
For performance reasons the C and C++ language consider int to be the "most natural" integer type and instead types that are "smaller" than an int are considered sort of "storage" type.
When you use a storage type in an expression it gets automatically converted to an int or to an unsigned int implicitly. For example:
// Assume a char is 8 bit
unsigned char x = 255;
unsigned char one = 1;
int y = x + one; // result will be 256 (too large for a byte!)
++x; // x is now 0
what happened is that x and one in the first expression have been implicitly converted to integers, the addition has been computed and the result has been stored back in an integer. In other words the computation has NOT been performed using two unsigned chars.
Likewise if you have a float value in an expression the first thing the compiler will do is promoting it to a double (in other words float is a storage type and double is instead the natural size for floating point numbers). This is the reason for which if you use printf to print floats you don't need to say %lf int the format strings and %f is enough (%lf is needed for scanf however because that function stores a result and a float can be smaller than a double).
C++ complicated the matter quite a bit because when passing parameters to functions you can discriminate between ints and smaller types. Thus it's not ALWAYS true that a conversion is performed in every expression... for example you can have:
void foo(unsigned char x);
void foo(int x);
where
unsigned char x = 255, one = 1;
foo(x); // Calls foo(unsigned char), no promotion
foo(x + one); // Calls foo(int), promotion of both x and one to int
I tested larger numbers and operator << always gives me positive
numbers, while operator ~ always gives me negative numbers. I then
used sizeof() and found...
Wrong, test it:
uint8_t v = 1;
for (int i=0; i<32; i++) cout << (v<<i) << endl;
gives:
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
uint8_t is an 8-bit long unsigned integer type, which can represent values in the range [0,255], as that range in included in the range of int it is promoted to int (not unsigned int). Promotion to int has precedence over promotion to unsigned.
Look into two's complement and how computer stores negative integers.
Try this
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 1;
int shiftby=0;
shiftby=8*sizeof(int)-1;
std::cout << (x << shiftby) << '\n'; // or std::cout << (x << 31) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
The output is -2147483648
In general if the first bit of a signed number is 1 it is considered negative. when you take a large number and shift it. If you shift it so that the first bit is 1 it will be negative
** EDIT **
Well I can think of a reason why shift operators would use unsigned int. Consider right shift operation >> if you right shift -12 you will get 122 instead of -6. This is because it adds a zero in the beginning without considering the sign
The following code has been compiled on 3 different compilers and 3 different processors and gave 2 different results:
typedef unsigned long int u32;
typedef signed long long s64;
int main ()
{ u32 Operand1,Operand2;
s64 Result;
Operand1=95;
Operand2=100;
Result= (s64)(Operand1-Operand2);}
Result produces 2 results:
either
-5 or 4294967291
I do understand that the operation of (Operand1-Operand2) is done in as 32-bit unsigned calculation, then when casted to s64 sign extension was done correctly in the first case but not done correctly for the 2nd case.
My question is whether the sign extension is possible to be controlled via compiler options, or it is compiler-dependent or maybe it is target-dependent.
Your issue is that you assume unsigned long int to be 32 bit wide and signed long long to be 64 bit wide. This assumption is wrong.
We can visualize what's going on by using types that have a guaranteed (by the standard) bit width:
int main() {
{
uint32_t large = 100, small = 95;
int64_t result = (small - large);
std::cout << "32 and 64 bits: " << result << std::endl;
} // 4294967291
{
uint32_t large = 100, small = 95;
int32_t result = (small - large);
std::cout << "32 and 32 bits: " << result << std::endl;
} // -5
{
uint64_t large = 100, small = 95;
int64_t result = (small - large);
std::cout << "64 and 64 bits: " << result << std::endl;
} // -5
return 0;
}
In every of these three cases, the expression small - large results in a result of unsigned integer type (of according width). This result is calculated using modular arithmetic.
In the first case, because that unsigned result can be stored in the wider signed integer, no conversion of the value is performed.
In the other cases the result cannot be stored in the signed integer. Thus an implementation defined conversion is performed, which usually means interpreting the bit pattern of the unsigned value as signed value. Because the result is "large", the highest bits will be set, which when treated as signed value (under two's complement) is equivalent to a "small" negative value.
To highlight the comment from Lưu Vĩnh Phúc:
Operand1-Operand2 is unsigned therefore when casting to s64 it's always zero extension. [..]
The sign extension is only done in the first case as only then there is a widening conversion, and it is indeed always zero extension.
Quotes from the standard, emphasis mine. Regarding small - large:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n$ where n is the number of bits used to represent the unsigned type). [..]
§ 4.7/2
Regarding the conversion from unsigned to signed:
If the destination type [of the integral conversion] is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.
§ 4.7/3
Sign extension is platform dependent, where platform is a combination of a compiler, target hardware architecture and operating system.
Moreover, as Paul R mentioned, width of built-in types (like unsigned long) is platform-dependent too. Use types from <cstdint> to get fixed-width types. Nevertheless, they are just platform-dependent definitions, so their sign extension behavior still depends on the platform.
Here is a good almost-duplicate question about type sizes. And here is a good table about type size relations.
Type promotions, and the corresponding sign-extensions are specified by the C++ language.
What's not specified, but is platform-dependent, is the range of integer types provided. It's even Standard-compliant for char, short int, int, long int and long long int all to have the same range, provided that range satisfies the C++ Standard requirements for long long int. On such a platform, no widening or narrowing would ever happen, but signed<->unsigned conversion could still alter values.
I faced an interesting scenario in which I got different results depending on the right operand type, and I can't really understand the reason for it.
Here is the minimal code:
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0x8123U;
uint64_t new_check = (check & 0xFFFF) << 16;
std::cout << std::hex << new_check << std::endl;
new_check = (check & 0xFFFFU) << 16;
std::cout << std::hex << new_check << std::endl;
return 0;
}
I compiled this code with g++ (gcc version 4.5.2) on Linux 64bit: g++ -std=c++0x -Wall example.cpp -o example
The output was:
ffffffff81230000
81230000
I can't really understand the reason for the output in the first case.
Why at some point would any of the temporal calculation results be promoted to a signed 64bit value (int64_t) resulting in the sign extension?
I would accept a result of '0' in both cases if a 16bit value is shifted 16 bits left in the first place and then promoted to a 64bit value. I also do accept the second output if the compiler first promotes the check to uint64_t and then performs the other operations.
But how come & with 0xFFFF (int32_t) vs. 0xFFFFU (uint32_t) would result in those two different outputs?
That's indeed an interesting corner case. It only occurs here because you use uint16_t for the unsigned type when you architecture use 32 bits for ìnt
Here is a extract from Clause 5 Expressions from draft n4296 for C++14 (emphasize mine):
10 Many binary operators that expect operands of arithmetic or enumeration type cause conversions ...
This pattern is called the usual arithmetic conversions, which are defined as follows:
...(10.5.3) — Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
(10.5.4) — Otherwise, if the type of the operand with signed integer type can represent all of the values of
the type of the operand with unsigned integer type, the operand with unsigned integer type shall
be converted to the type of the operand with signed integer type.
You are in the 10.5.4 case:
uint16_t is only 16 bits while int is 32
int can represent all the values of uint16_t
So the uint16_t check = 0x8123U operand is converted to the signed 0x8123 and result of the bitwise & is still 0x8123.
But the shift (bitwise so it happens at the representation level) causes the result to be the intermediate unsigned 0x81230000 which converted to an int gives a negative value (technically it is implementation defined, but this conversion is a common usage)
5.8 Shift operators [expr.shift]...Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value;...
and
4.7 Integral conversions [conv.integral]...
3 If the destination type is signed, the value is unchanged if it can be represented in the destination type;
otherwise, the value is implementation-defined.
(beware this was true undefined behaviour in C++11...)
So you end with a conversion of the signed int 0x81230000 to an uint64_t which as expected gives 0xFFFFFFFF81230000, because
4.7 Integral conversions [conv.integral]...
2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type).
TL/DR: There is no undefined behaviour here, what causes the result is the conversion of signed 32 bits int to unsigned 64 bits int. The only part part that is undefined behaviour is a shift that would cause a sign overflow but all common implementations share this one and it is implementation defined in C++14 standard.
Of course, if you force the second operand to be unsigned everything is unsigned and you get evidently the correct 0x81230000 result.
[EDIT] As explained by MSalters, the result of the shift is only implementation defined since C++14, but was indeed undefined behaviour in C++11. The shift operator paragraph said:
...Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Let's take a look at
uint64_t new_check = (check & 0xFFFF) << 16;
Here, 0xFFFF is a signed constant, so (check & 0xFFFF) gives us a signed integer by the rules of integer promotion.
In your case, with 32-bit int type, the MSbit for this integer after the left shift is 1, and so the extension to 64-bit unsigned will do a sign extension, filling the bits to the left with 1's. Interpreted as a two's complement representation that gives the same negative value.
In the second case, 0xFFFFU is unsigned, so we get unsigned integers and the left shift operator works as expected.
If your toolchain supports __PRETTY_FUNCTION__, a most-handy feature, you can quickly determine how the compiler perceives expression types:
#include <iostream>
#include <cstdint>
template<typename T>
void typecheck(T const& t)
{
std::cout << __PRETTY_FUNCTION__ << '\n';
std::cout << t << '\n';
}
int main()
{
uint16_t check = 0x8123U;
typecheck(0xFFFF);
typecheck(check & 0xFFFF);
typecheck((check & 0xFFFF) << 16);
typecheck(0xFFFFU);
typecheck(check & 0xFFFFU);
typecheck((check & 0xFFFFU) << 16);
return 0;
}
Output
void typecheck(const T &) [T = int]
65535
void typecheck(const T &) [T = int]
33059
void typecheck(const T &) [T = int]
-2128412672
void typecheck(const T &) [T = unsigned int]
65535
void typecheck(const T &) [T = unsigned int]
33059
void typecheck(const T &) [T = unsigned int]
2166554624
The first thing to realize is that binary operators like a&b for built-in types only work if both sides have the same type. (With user-defined types and overloads, anything goes). This might be realized via implicit conversions.
Now, in your case, there definitely is such a conversion, because there simply isn't a binary operator & that takes a type smaller than int. Both sides are converted to at least int size, but what exact types?
As it happens, on your GCC int is indeed 32 bits. This is important, because it means that all values of uint16_t can be represented as an int. There is no overflow.
Hence, check & 0xFFFF is a simple case. The right side is already an int, the left side promotes to int, so the result is int(0x8123). This is perfectly fine.
Now, the next operation is 0x8123 << 16. Remember, on your system int is 32 bits, and INT_MAX is 0x7FFF'FFFF. In the absence of overflow, 0x8123 << 16 would be 0x81230000, but that clearly is bigger than INT_MAX so there is in fact overflow.
Signed integer overflow in C++11 is Undefined Behavior. Literally any outcome is correct, including purple or no output at all. At least you got a numerical value, but GCC is known to outright eliminate code paths which unavoidably cause overflow.
[edit]
Newer GCC versions support C++14, where this particular form of overflow has become implementation-defined - see Serge's answer.
0xFFFF is a signed int. So after the & operation, we have a 32-bit signed value:
#include <stdint.h>
#include <type_traits>
uint64_t foo(uint16_t a) {
auto x = (a & 0xFFFF);
static_assert(std::is_same<int32_t, decltype(x)>::value, "not an int32_t")
static_assert(std::is_same<uint16_t, decltype(x)>::value, "not a uint16_t");
return x;
}
http://ideone.com/tEQmbP
Your original 16 bits are then left-shifted which results in 32-bit value with the high-bit set (0x80000000U) so it has a negative value. During the 64-bit conversion sign-extension occurs, populating the upper words with 1s.
This is the result of integer promotion. Before the & operation happens, if the operands are "smaller" than an int (for that architecture), compiler will promote both operands to int, because they both fit into a signed int:
This means that the first expression will be equivalent to (on a 32-bit architecture):
// check is uint16_t, but it fits into int32_t.
// the constant is signed, so it's sign-extended into an int
((int32_t)check & (int32_t)0xFFFFFFFF)
while the other one will have the second operand promoted to:
// check is uint16_t, but it fits into int32_t.
// the constant is unsigned, so the upper 16 bits are zero
((int32_t)check & (int32_t)0x0000FFFFU)
If you explicitly cast check to an unsigned int, then the result will be the same in both cases (unsigned * signed will result in unsigned):
((uint32_t)check & 0xFFFF) << 16
will be equal to:
((uint32_t)check & 0xFFFFU) << 16
Your platform has 32-bit int.
Your code is exactly equivalent to
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0x8123U;
auto a1 = (check & 0xFFFF) << 16
uint64_t new_check = a1;
std::cout << std::hex << new_check << std::endl;
auto a2 = (check & 0xFFFFU) << 16;
new_check = a2;
std::cout << std::hex << new_check << std::endl;
return 0;
}
What's the type of a1 and a2?
For a2, the result is promoted to unsigned int.
More interestingly, for a1 the result is promoted to int, and then it gets sign-extended as it's widened to uint64_t.
Here's a shorter demonstration, in decimal so that the difference between signed and unsigned types is apparent:
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0;
std::cout << check
<< " " << (int)(check + 0x80000000)
<< " " << (uint64_t)(int)(check + 0x80000000) << std::endl;
return 0;
}
On my system (also 32-bit int), I get
0 -2147483648 18446744071562067968
showing where the promotion and sign-extension happens.
The & operation has two operands. The first is an unsigned short, which will undergo the usual promotions to become an int. The second is a constant, in one case of type int, in the other case of type unsigned int. The result of the & is therefore int in one case, unsigned int in the other case. That value is shifted to the left, resulting either in an int with the sign bit set, or an unsigned int. Casting a negative int to uint64_t will give a large negative integer.
Of course you should always follow the rule: If you do something, and you don't understand the result, then don't do that!
I have a simple program. Notice that I use an unsigned fixed-width integer 1 byte in size.
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 12;
std::cout << (x << 1) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
My output is the following.
24
-13
I tested larger numbers and operator << always gives me positive numbers, while operator ~ always gives me negative numbers. I then used sizeof() and found...
When I use the left shift bitwise operator(<<), I receive an unsigned 4 byte integer.
When I use the bitwise not operator(~), I receive a signed 4 byte integer.
It seems that the bitwise not operator(~) does a signed integral promotion like the arithmetic operators do. However, the left shift operator(<<) seems to promote to an unsigned integral.
I feel obligated to know when the compiler is changing something behind my back. If I'm correct in my analysis, do all the bitwise operators promote to a 4 byte integer? And why are some signed and some unsigned? I'm so confused!
Edit: My assumption of always getting positive or always getting negative values was wrong. But from being wrong, I understand what was really happening thanks to the great answers below.
[expr.unary.op]
The operand of ~ shall have integral or unscoped enumeration type; the
result is the one’s complement of its operand. Integral promotions are
performed.
[expr.shift]
The shift operators << and >> group left-to-right. [...] The operands shall be of integral or unscoped enumeration type and integral promotions are performed.
What's the integral promotion of uint8_t (which is usually going to be unsigned_char behind the scenes)?
[conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.
So int, because all of the values of a uint8_t can be represented by int.
What is int(12) << 1 ? int(24).
What is ~int(12) ? int(-13).
For performance reasons the C and C++ language consider int to be the "most natural" integer type and instead types that are "smaller" than an int are considered sort of "storage" type.
When you use a storage type in an expression it gets automatically converted to an int or to an unsigned int implicitly. For example:
// Assume a char is 8 bit
unsigned char x = 255;
unsigned char one = 1;
int y = x + one; // result will be 256 (too large for a byte!)
++x; // x is now 0
what happened is that x and one in the first expression have been implicitly converted to integers, the addition has been computed and the result has been stored back in an integer. In other words the computation has NOT been performed using two unsigned chars.
Likewise if you have a float value in an expression the first thing the compiler will do is promoting it to a double (in other words float is a storage type and double is instead the natural size for floating point numbers). This is the reason for which if you use printf to print floats you don't need to say %lf int the format strings and %f is enough (%lf is needed for scanf however because that function stores a result and a float can be smaller than a double).
C++ complicated the matter quite a bit because when passing parameters to functions you can discriminate between ints and smaller types. Thus it's not ALWAYS true that a conversion is performed in every expression... for example you can have:
void foo(unsigned char x);
void foo(int x);
where
unsigned char x = 255, one = 1;
foo(x); // Calls foo(unsigned char), no promotion
foo(x + one); // Calls foo(int), promotion of both x and one to int
I tested larger numbers and operator << always gives me positive
numbers, while operator ~ always gives me negative numbers. I then
used sizeof() and found...
Wrong, test it:
uint8_t v = 1;
for (int i=0; i<32; i++) cout << (v<<i) << endl;
gives:
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
uint8_t is an 8-bit long unsigned integer type, which can represent values in the range [0,255], as that range in included in the range of int it is promoted to int (not unsigned int). Promotion to int has precedence over promotion to unsigned.
Look into two's complement and how computer stores negative integers.
Try this
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 1;
int shiftby=0;
shiftby=8*sizeof(int)-1;
std::cout << (x << shiftby) << '\n'; // or std::cout << (x << 31) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
The output is -2147483648
In general if the first bit of a signed number is 1 it is considered negative. when you take a large number and shift it. If you shift it so that the first bit is 1 it will be negative
** EDIT **
Well I can think of a reason why shift operators would use unsigned int. Consider right shift operation >> if you right shift -12 you will get 122 instead of -6. This is because it adds a zero in the beginning without considering the sign
In the code below, why 1-byte anUChar is automatically converted into 4 bytes to produce the desired result 0x300 (instead of 0x0 if anUChar would remain 1 byte in size):
unsigned char anUChar = 0xc0; // only the two most significant bits are set
int anInt = anUChar << 2; // 0x300 (correct)
But in this code, aimed at a 64-bit result, no automatic conversion into 8 bytes happens:
unsigned int anUInt = 0xc0000000; // only the two most significant bits are set
long long aLongLong = anUInt << 2; // 0x0 (wrong, means no conversion occurred)
And only placing an explicit type cast works:
unsigned int anUInt = 0xc0000000;
long long aLongLong = (long long)anUInt << 2; // 0x300000000 (correct)
And most importantly, would this behavior be the same in a program that targets 64-bit machines?
By the way, which of the two is most right and portable: (type)var << 1 or ((type)var) << 1?
char always gets promoted to int during arithmetic. I think this is specified behavior in the C standard.
However, int is not automatically promoted to long long.
Under some situations, some compilers (Visual Studio) will actually warn you about this if you try to left-shift a smaller integer and store it into a larger one.
By the way, which of the two is most right and portable: (type)var <<
1 or ((type)var) << 1?
Both are fine and portable. Though I prefer the first one since it's shorter. Casting has higher precedence than shift.
Conversion does happen. The problem is the result of the expression anUInt << 2 is an unsigned int because anUInt is an unsigned int.
Casting anUInt to a long long (actually, this is conversion in this particular case) is the correct thing to do.
Neither (type)var << 1 or ((type)var) << 1 is more correct or portable because operator precedence is strictly defined by the Standard. However, the latter is probably better because it's easier to understand to humans looking at the code casually. Others may disagree with this assertion.
EDIT:
Note that in your first example:
unsigned char anUChar = 0xc0;
int anInt = anUChar << 2;
...the result of the expression anUChar << 2 is not an unsigned char as you might expect, but an int because of Integral Promotion.
The operands of operator<< are integral or enumeration type (See Standard 5.8/1). When a binary operator that expects operands of arithmetic or enumeration type is called, the compiler attempts to convert both operands to the same type, so that the expression may yield a common type. In this case, integral promotion is performed on both operands (5/9). When an unsigned char takes part in integral promotion, it will be converted to an int if your platform can accomodate all possible values of unsigned char in an int, else it will be converted to an unsigned int (4.5/1).
Shorter integral types are promoted to an int type for bitshift operations. This has nothing to do with the type to which you assign the result of the shift.
On 64-bit machines, your second piece of code would be equally problematic since the int types are usually also 32 bit wide. (Between x86 and x64, long long int is typically always 64 and int 32 bits, only long int depends on the platform.)
(In the spirit of C++, I would write the conversion as (unsigned long long int)(anUInt) << 2, evocative of the conversion-constructor syntax. The first set of parentheses is purely because the type name consists of several tokens.)
I would also prefer to do bitshifting exclusively on unsigned types, because only unsigned types can be considered equivalent (in terms of values) to their own bit pattern value.
Because of integer promotions. For most operators (e.g. <<), char operands are promoted to int first.
This has nothing to do with where the result of the calculation is going. In other words, the fact that your second example assigns to a long long does not affect the promotion of the operands.