Bitshift - Need explanation to understand the code

Bitshift - Need explanation to understand the code - c++

I was wondering what this function actually performs.
To my understanding it should return pSrc[1].
So why does it bother left-shifting pSrc[0] by 8 bits, which zeroes out those 8 bits.
And when these zeroes are ORed with pSrc[1], pSrc[1] is not affected so you get pSrc[1] anyway as if the bitwise OR had never happened.
/*
* Get 2 big-endian bytes.
*/
INLINE u2 get2BE(unsigned char const* pSrc)
{
return (pSrc[0] << 8) | pSrc[1];
}
This function is from the source code of the dalvik virtual machine.
https://android.googlesource.com/platform/dalvik/+/android-4.4.4_r1/vm/Bits.h
Update:
OK, now I got it thanks to all the answers here.
(1) pSrc[0] is originally an unsigned char (1 byte).
(2) When it is left-shifted (pSrc[0] << 8) with the literal 8 of int type, pSrc[0] is therefore int-promoted to a signed int (4 byte).
(3) The result of pSrc[0] << 8 is that the interested 8 bits in pSrc[0] are shifted over to the second byte of the 4 bytes of the signed int, thereby leaving zeroes in the other bytes(1st,3rd and 4th bytes).
(4) And when it is ORed ( intermediate result from step (3) | pSrc[1]), pSrc[1] is then int-promoted to a signed int (4 bytes).
(5) The result of ( intermediate result from step (3) | pSrc[1]) leaves the first two least significant bytes the way we want with zeroes all in the two most significant bytes.
(6) return only the first two least significant bytes to get the 2 big-endian bytes by returning the result as a u2 type.

For arithmetic operations like this, the unsigned char is converted via a process called integral promotions.
C++11 - N3485 §5.8 [expr.shift]/1:
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand.
And §13.6 [over.built]/17:
For every pair of promoted integral types L and R, there exist candidate operator functions of the form
LR operator%(L , R );
LR operator&(L , R );
LR operator^(L , R );
LR operator|(L , R );
L operator<<(L , R );
L operator>>(L , R );
where LR is the result of the usual arithmetic conversions between types L and R.
When integral promotions are done (§4.5 [conv.prom]/1):
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion
rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned
int.
By integral promotions, the unsigned char will be promoted to int. The other operand is already int, so no changes in type are made to it. The return type then becomes int as well.
Thus, what you have is the first unsigned char's bits shifted left, but still in the now-bigger int, and then the second unsigned char's bits at the end.
You'll notice that the return type of operator| is the result of usual arithmetic conversions between the two operands. At this point, those are the int from the shift and the second unsigned char.
This conversion is defined as follows (§5 [expr]/10):
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
…
Otherwise, the integral promotions (4.5) shall be performed on both operands. Then the following
rules shall be applied to the promoted operands:
…
If both operands have the same type, no further conversion is needed.
Since L and R, being promoted before this, are already int, the promotion leaves them the same and the overall return type of the expression is thus int, which is then converted to u2, whatever that happens to be.

There are no operations (other than type conversions) on
unsigned char. Before any operation, integral promotion
occurs, which converts the unsigned char to an int. So the
operation is shifting an int left, not an unsigned char.

C11 6.5.7 Bitwise shift operators
The integer promotions are performed on each of the operands. The type
of the result is that of the promoted left operand. If the value of
the right operand is negative or is greater than or equal to the
width of the promoted left operand, the behavior is undefined.
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with
zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.
So pSrc[0] is integer promoted to an int. The literal 8 is already an int, so no integer promotion takes place. The usual arithmetic converstions do not apply to shift operators: they are a special case.
Since the original variable was an unsigned char which gets left shifted 8 bits, we also encounter the issue where "E1" (our promoted variable) is signed and potentially the result cannot be representable in the result type, which leads to undefined behavior if this is a 16 bit system.
In plain English: if you shift something into the sign bits of a signed variable, anything can happen. In general: relying on implicit type promotions is bad programming and dangerous practice.
You should fix the code to this:
((unsigned int)pSrc[0] << 8) | (unsigned int)pSrc[1]

Related

unsigned int / signed int / long long: inexplicable output

So I was playing around with types and I came out with this weird result below. Debugging it made no sense, and then the only result was to check out the c++ spects, which didn't helped much. I was wondering if you might know what is happening here exactly, and if it is 32Bit and/or 64Bit specific issue.
#include <iostream>
using namespace std;
int main() {
unsigned int u = 1;
signed int i = 1;
long long lu = -1 * u;
long long li = -1 * i;
std::cout<<"this is a weird " << lu << " " << li << std::endl;
return 0;
}
Where the output is
this is a weird 4294967295 -1

The key observation is that the expression -1 * u is of type unsigned int. That is because the rules for arithmetic conversions* say that if one operand is unsigned int and the other is signed int, then the latter operand is converted to unsigned int. The arithmetic expressions are ultimately only defined for homogeneous operands, so the conversions happen before the operation proper.
The result of the conversion of -1 to unsigned int is a large, positive number, which is representable as a long long int, and which is the number you see in the output.
Currently, that's [expr]/(11.5.3).

The type of -1 is signed int. When you perform an arithmetic operation between objects of different fundamental type, one or both of the arguments will be converted so that both have the same type. (For non-fundamental types, there may be operator overloads for mixed operands). In this case, the signed value is converted to unsigned, following the conversion rules †.
So, -1 was converted to unsigned. But negative numbers cannot be represented by unsigned types. What happens, is that the resulting value will be the smallest positive value that can be represented by the unsigned type, that is congruent with the original signed value modulo the maximum value representable by unsigned type. Which on your platform happens to be 4294967295.
†The rules ([expr], standard draft):
... rules that apply to non-integers ...
Otherwise, the integral promotions (4.5) shall be performed on both operands.61 Then the following
rules shall be applied to the promoted operands:
— If both operands have the same type, no further conversion is needed.
— Otherwise, if both operands have signed integer types or both have unsigned integer types, the
operand with the type of lesser integer conversion rank shall be converted to the type of the
operand with greater rank.
— Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type. (this applies to your case)
— Otherwise, if the type of the operand with signed integer type can represent all of the values of
the type of the operand with unsigned integer type, the operand with unsigned integer type shall
be converted to the type of the operand with signed integer type.
— Otherwise, both operands shall be converted to the unsigned integer type corresponding to the
type of the operand with signed integer type.

The evaluation of
-1 * i
is trivial multiplication of two int types: nothing strange there. And a long long must be capable of holding any int.
First note is there is no such thing as a negative literal in C++, so
-1 * u
is evaluated as (-1) * u due to operator precedence. The type of (-1) must be int. But this will be converted to unsigned int due to C++'s rule of argument conversion as the other argument is an unsigned int In doing that it is converted modulo UINT_MAX + 1, so you end up with UINT_MAX multiplied by 1, which is the number you observe, albeit converted to a long long type.
As a final note, the behaviour of this conversion is subject to the rules of conversion from an unsigned to a signed type: if unsigned int and long long were both 64 bits on your platform then the behaviour is implementation-defined.

The bit pattern "0xFFFFFFFF" corresponds with "-1" when interpreted as a 32b signed integer and corresponds with "4294967295" when interpreted as a 32b unsigned integer.
If used -2 the result is "4294967294"
If used -3 the result is "4294967293"
If used -4 the result is "4294967292"
....

Subtraction between signed and unsigned followed by division

The following results make me really confused:
int i1 = 20-80u; // -60
int i2 = 20-80; // -60
int i3 =(20-80u)/2; // 2147483618
int i4 =(20-80)/2; // -30
int i5 =i1/2; // -30
i3 seems to be computed as (20u-80u)/2, instead of (20-80u)/2
supposedly i3 is the same as i5.

IIRC, an arithmetic operation between signed and unsigned int will produce an unsigned result.
Thus, 20 - 80u produces the unsigned result equivalent to -60: if unsigned int is a 32-bit type, that result is 4294967236.
Incidentally, assigning that to i1 produces an implementation-defined result because the number is too large to fit. Getting -60 is typical, but not guaranteed.

int i1 = 20-80u; // -60
This has subtle demons! The operands are different, so a conversion is necessary. Both operands are converted to a common type (an unsigned int, in this case). The result, which will be a large unsigned int value (60 less than UINT_MAX + 1 if my calculations are correct) will be converted to an int before it's stored in i1. Since that value is out of range of int, the result will be implementation defined, might be a trap representation and thus might cause undefined behaviour when you attempt to use it. However, in your case it coincidentally converts to -60.
int i3 =(20-80u)/2; // 2147483618
Continuing on from the first example, my guess was that the result of 20-80u would be 60 less than UINT_MAX + 1. If UINT_MAX is 4294967295 (a common value for UINT_MAX), that would mean 20-80u is 4294967236... and 4294967236 / 2 is 2147483618.
As for i2 and the others, there should be no surprises. They follow conventional mathematical calculations with no conversions, truncations, overflows or other implementation-defined behaviour what-so-ever.

The binary arithmetic operators will perform the usual arithmetic conversions on their operands to bring them to a common type.
In the case of i1, i3 and i5 the common type will be unsigned int and so the result will also be unsigned int. Unsigned numbers will wrap via modulo arithmetic and so subtracting a slightly larger unsigned value will result in a number close to unsigned int max which can not be represented by an int.
So in the case of i1 we end up with an implementation defined conversion since the value can not be represented. In the case of i3 dividing by 2 brings the unsigned value back into the range of int and so we end up with a large signed int value after conversion.
The relevant sections form the C++ draft standard are as follows. Section 5.7 [expr.add]:
The additive operators + and - group left-to-right. The usual arithmetic conversions are performed for
operands of arithmetic or enumeration type.
The usual arithmetic conversions are covered in section 5 and it says:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
[...]
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
and for the conversion from a value that can not be represented for a signed type, section 4.7 [conv.integral]:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
and for unsigned integers obeys modulo arithmetic section 3.9.1 [basic.fundamental]:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value
representation of that particular size of integer.48

Why is static_cast on an expression acting distributively?

I need to take 2 unsigned 8-bit values and subtract them, then add this value to a 32-bit accumulator. The 8-bit subtraction may underflow, and that's ok (unsigned int underflow is defined behavior, so no problems there).
I would expect that static_cast<uint32_t>(foo - bar) should do what I want (where foo and bar are both uint8_t). But it would appear that this casts them first and then performs a 32-bit subtraction, whereas I need it to underflow as an 8-bit variable. I know I could just mod 256, but I'm trying to figure out why it works this way.
Example here: https://ideone.com/TwOmTO
uint8_t foo = 5;
uint8_t bar = 250;
uint8_t diff8bit = foo - bar;
uint32_t diff1 = static_cast<uint32_t>(diff8bit);
uint32_t diff2 = static_cast<uint32_t>(foo) - static_cast<uint32_t>(bar);
uint32_t diff3 = static_cast<uint32_t>(foo - bar);
printf("diff1 = %u\n", diff1);
printf("diff2 = %u\n", diff2);
printf("diff3 = %u\n", diff3);
Output:
diff1 = 11
diff2 = 4294967051
diff3 = 4294967051
I would suspect diff3 would have the same behavior as diff1, but it's actually the same as diff2.
So why does this happen? As far as I can tell the compiler should be subtracting the two 8-bit values and then casting to 32-bit, but that's clearly not the case. Is this something to do with the specification of how static_cast behaves on an expression?

For most of the arithmetic operators (including -), the operands undergo the usual arithmetic conversions. One of these conversions is that any value of type narrower than int is promoted to int. (Standard reference: [expr]/10).
So the expression foo - bar becomes (int)foo - (int)bar giving (int)-245. Then you cast that to uint32_t which will give a large positive number.
To get the result you are intending , cast to uint8_t instead of uint32_t. Alternatively, use the modulus operator % on the result of the cast to uint32_t.
It is not possible to do a calculation directly in narrower precision than int

The issue is not the static_cast but the subtraction, the operands of additive operators have the usual arithmetic conversions applied to them and in this case the integral promotions which results in both operands of the subtraction being promoted to int:
static_cast<uint32_t>(foo - bar);
^^^ ^^^
On the other hand:
static_cast<uint8_t>(foo - bar);
would produce desired result.
from the draft C++ standard section 5.7 [expr.add] says:
The additive operators + and - group left-to-right. The usual arithmetic conversions are performed for
operands of arithmetic or enumeration type.
this results in the integral promotions, section 5 [expr] says:
Otherwise, the integral promotions (4.5) shall be performed on both operands
which results in both operands being converted to int, section 4.5 [conv.prom] says:
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion
rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned
int.
and then the static_cast to uint32_t is applied which results in a conversion which is defined as follows in section 4.7 [conv.integral]:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type). [
The questions Why must a short be converted to an int before arithmetic operations in C and C++? explains why types smaller than int are promoted for arithmetic operations.

unary minus for 0x80000000 (signed and unsigned)

The n3337.pdf draft, 5.3.1.8, states that:
The operand of the unary - operator shall have arithmetic or unscoped enumeration type and the result is the negation of its operand. Integral promotion is performed on integral or enumeration operands. The negative of an unsigned quantity is computed by subtracting its value from 2ⁿ, where n is the number of bits in the promoted operand. The type of the result is the type of the promoted operand.
For some cases it is enough. Suppose unsigned int is 32 bits wide, then (-(0x80000000u)) == 0x80000000u, isn't it?
Still, I can not find anything about unary minus on unsigned 0x80000000. Also, C99 standard draft n1336.pdf, 6.5.3.3 seems to say nothing about it:
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
UPDATE2: Let us suppose that unsigned int is 32 bits wide. So, the question is: what about unary minus in C (signed and unsigned), and unary minus in C++ (signed only)?
UPDATE1: both run-time behavior and compile-time behavior (i.e. constant-folding) are interesting.
(related: Why is abs(0x80000000) == 0x80000000?)

For your question, the important part of the quote you've included is this:
The negative of an unsigned quantity is computed by subtracting its
value from 2ⁿ, where n is the number of bits in the promoted operand.
So, to know what the value of -0x80000000u is, we need to know n, the number of bits in the type of 0x80000000u. This is at least 32, but this is all we know (without further information about the sizes of types in your implementation). Given some values of n, we can calculate what the result will be:
n | -0x80000000u
----+--------------
32 | 0x80000000
33 | 0x180000000
34 | 0x380000000
48 | 0xFFFF80000000
64 | 0xFFFFFFFF80000000
(For example, an implementation where unsigned int is 16 bits and unsigned long is 64 bits would have an n of 64).
C99 has equivalent wording hidden away in §6.2.5 Types p9:
A computation involving unsigned operands can never overflow, because
a result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting type.
The result of the unary - operator on an unsigned operand other than zero will always be caught by this rule.
With a 32 bit int, the type of 0x80000000 will be unsigned int, regardless of the lack of a u suffix, so the result will still be the value 0x80000000 with type unsigned int.
If instead you use the decimal constant 2147483648, it will have type long and the calculation will be signed. The result will be the value -2147483648 with type long.

In n1336, 6.3.1.3 Signed and Unsigned Integers, paragraph 2 defines the conversion to an unsigned integer:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
So for 32-bit unsigned int, -0x80000000u==-0x80000000 + 0x100000000==0x80000000u.

Unsigned and signed comparison

Here is very simple code,
#include <iostream>
using namespace std;
int main() {
unsigned int u=10;
int i;
int count=0;
for (i=-1;i<=u;i++){
count++;
}
cout<<count<<"\n";
return 0;
}
The value of count is 0. Why?

Both operands of <= have to be promoted to the same type.
Evidently they are promoted to unsigned int (I don't have the rule from the standard in front of me, I'll look it up in a second). Since (unsigned int)(-1) <= u is false, the loop never executes.
The rule is found in section 5 (expr) of the standard, paragraph 10, which states (I've highlighted the rule which applies here):
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
If either operand is of scoped enumeration type (7.2), no conversions are performed; if the other operand does not have the same type, the expression is ill-formed.
If either operand is of type long double, the other shall be converted to long double.
Otherwise, if either operand is double, the other shall be converted to double.
Otherwise, if either operand is float, the other shall be converted to float.
Otherwise, the integral promotions (4.5) shall be performed on both operands. 60 Then the following
rules shall be applied to the promoted operands:
If both operands have the same type, no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

During the comparison (i <= u), i is upgraded to an unsigned integer, and in the process -1 is converted to UINT_MAX.
A conversion of a negative number to an unsigned int will add (UINT_MAX + 1) to that number, so -1 becomes UINT_MAX, -2 becomes UINT_MAX - 1, etc.
If you think about it, one had to be converted to the other in order for the comparison to even work, and as a rule the compiler converts the signed value to unsigned. In this case, of course, it'd make more sense to convert the unsigned value to signed instead, but the compiler can't just decide to follow a different spec based on what you intend. You should explicitly cast the unsigned int to signed (or just have it as signed all along) here.

Its because -1 is casted as an unsigned int, so the for loop code is never executed.
Try compiling with -Wall -Wextra so you can get the respective warnings (if not getting them so far, and compiling with g++)
http://en.wikipedia.org/wiki/Two's_complement

This is because i is promoted to an unsigned value before comparison. This will set it to the value of UINT_MAX, which on a 32 bit machine equals to 4294967295. So your loop is essentially the same as:
// will never run
for (i = 4294967295; i <= u; i++) {
count++;
}

On a system where an integer is stored in 4 bytes, I believe that the value of -1 equals the value of 2147483649 (1000 0000 0000 0000 0000 0000 0000 0001) - It's 1 with the MSB set to 1 to indicate it's negative.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js