Bit shift compiler bug or a corner case?

Bit shift compiler bug or a corner case? - c++

The following code outputs 0,1,32,33. Which is counter intuitive to say the least. But if I replace the literal 1 with the type annonated constant "ONE", the loop runs fine.
This is with gcc 4.6.2 and -std=c++0x.
#include<iostream>
#include<cstdint>
using namespace std;
int main()
{
int64_t bitmask = 3;
int64_t k;
const int64_t ONE = 1;
cout<<"bitmask = "<<bitmask<<endl;
for(k=0; k<64; k++)
{
if(bitmask & (1<<k))
{
cout<<"k="<<k<<endl;
}
}
return 0;
}
EDIT
Question: As Ben pointed out, 1 is seen to be 32 bit wide by default. Why is it not promoted to 64 bits when it's co-operand is 64 bits.
SOLUTION
No. << does not require that each side have the same type. After all, why make the right side an int64_t when the maximum shift available fits in a char? The promotion only occurs when you are dealing with arithmetic operators, not all operators.
Copied from Bill's comments below

This is a problem: (1<<k).
1 is a integral literal that fits in an int.
If int has fewer than 64 bits on your platform, then (1<<k) will have undefined behavior toward the end of the loop, when k is large. In your case, the compiler is using an Intel bitshift instruction, and the undefined behavior comes out the way Intel defines shifts larger than the operand size -- the high bits are ignored.
You probably want (1LL<<k)
What the standard says (section 5.8 expr.shift):
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
This in contrast to the wording "The usual arithmetic conversions are performed for
operands of arithmetic or enumeration type." which is present for e.g. addition and subtraction operators.
This language didn't change between C++03 and C++11.

Related

Why is signed and unsigned addition converted differently for 16 and 32 bit integers?

It seems the GCC and Clang interpret addition between a signed and unsigned integers differently, depending on their size. Why is this, and is the conversion consistent on all compilers and platforms?
Take this example:
#include <cstdint>
#include <iostream>
int main()
{
std::cout <<"16 bit uint 2 - int 3 = "<<uint16_t(2)+int16_t(-3)<<std::endl;
std::cout <<"32 bit uint 2 - int 3 = "<<uint32_t(2)+int32_t(-3)<<std::endl;
return 0;
}
Result:
$ ./out.exe
16 bit uint 2 - int 3 = -1
32 bit uint 2 - int 3 = 4294967295
In both cases we got -1, but one was interpreted as an unsigned integer and underflowed. I would have expected both to be converted in the same way.
So again, why do the compilers convert these so differently, and is this guaranteed to be consistent? I tested this with g++ 11.1.0, clang 12.0. and g++ 11.2.0 on Arch Linux and Debian, getting the same result.

When you do uint16_t(2)+int16_t(-3), both operands are types that are smaller than int. Because of this, each operand is promoted to an int and signed + signed results in a signed integer and you get the result of -1 stored in that signed integer.
When you do uint32_t(2)+int32_t(-3), since both operands are the size of an int or larger, no promotion happens and now you are in a case where you have unsigned + signed which results in a conversion of the signed integer into an unsigned integer, and the unsigned value of -1 wraps to being the largest value representable.

So again, why do the compilers convert these so differently,
Standard quotes for [language-lawyer]:
[expr.arith.conv]
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way.
The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
...
Otherwise, the integral promotions ([conv.prom]) shall be performed on both operands.
Then the following rules shall be applied to the promoted operands:
If both operands have the same type, no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
[conv.prom]
A prvalue of an integer type other than bool, char8_t, char16_t, char32_t, or wchar_t whose integer conversion rank ([conv.rank]) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
These conversions are called integral promotions.
std::uint16_t type may have a lower conversion rank than int in which case it will be promoted when used as an operand. int may be able to represent all values of std::uint16_t in which case the promotion will be to int. The common type of two int is int.
std::uint32_t type may have the same or a higher conversion rank than int in which case it won't be promoted. The common type of an unsigned type and a signed of same rank is an unsigned type.
For an explanation why this conversion behaviour was chosen, see chapter "6.3.1.1 Booleans, characters, and integers" of "
Rationale for
International Standard—
Programming Languages—
C". I won't quote the entire chapter here.
is this guaranteed to be consistent?
The consistency depends on relative sizes of the integer types which are implementation defined.

Why is this,
C (and hence C++) has a rule that effectively says when a type smaller than int is used in an expression it is first promoted to int (the actual rule is a little more complex than that to allow for multiple distinct types of the same size).
Section 6.3.1.1 of the Rationale for International Standard Programming Languages C claims that in early C compilers there were two versions of the promotion rule. "unsigned preserving" and "value preserving" and talks about why they chose the "value preserving" option. To summarise they believed it would produce correct results in a greater proportion of situations.
It does not however explain why the concept of promotion exists in the first place. I would speculate that it existed because on many processors, including the PDP-11 for which C was originally designed, arithmetic operations only operated on words, not on units smaller than words. So it was simpler and more efficient to convert everything smaller than a word to a word at the start of an expression.
On most platforms today int is 32 bits. So both uint16_t and int16_t are promoted to int. The artithmetic proceeds to produce a result of type int with a value of -1.
OTOH uint32_t and int32_t are not smaller than int, so they retain their original size and signedness through the promotion step. The rules for when the operands to an arithmetic operator are of different types come into play and since the operands are the same size the signed operand is converted to unsigned.
The rationale does not seem to talk about this rule, which suggests it goes back to pre-standard C.
and is the conversion consistent on all compilers and platforms?
On an Ansi C or ISO C++ platform it depends on the size of int. With 16 bit int both examples would give large positive values. With 64-bit int both examples would give -1.
On pre-standard implementations it's possible that both expressions might return large positive numbers.

A 16-bit unsigned int can be promoted to a 32-bit int without any lost values due to range differences, so that's what happens. Not so for the 32-bit integers.

Why is unsigned short (multiply) unsigned short converted to signed int? [duplicate]

This question already has answers here:
Implicit type conversion rules in C++ operators
(9 answers)
Closed 7 years ago.
Why is unsigned short * unsigned short converted to int in C++11?
The int is too small to handle max values as demonstrated by this line of code.
cout << USHRT_MAX * USHRT_MAX << endl;
overflows on MinGW 4.9.2
-131071
because (source)
USHRT_MAX = 65535 (2^16-1) or greater*
INT_MAX = 32767 (2^15-1) or greater*
and (2^16-1)*(2^16-1) = ~2^32.
Should I expect any problems with this solution?
unsigned u = static_cast<unsigned>(t*t);
This program
unsigned short t;
cout<<typeid(t).name()<<endl;
cout<<typeid(t*t).name()<<endl;
gives output
t
i
on
gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
gcc version 4.8.2 (GCC)
MinGW 4.9.2
with both
g++ p.cpp
g++ -std=c++11 p.cpp
which proves that t*t is converted to int on these compilers.
Usefull resources:
Signed to unsigned conversion in C - is it always safe?
Signed & unsigned integer multiplication
https://bytes.com/topic/c-sharp/answers/223883-multiplication-types-smaller-than-int-yields-int
http://www.cplusplus.com/reference/climits
http://en.cppreference.com/w/cpp/language/types
Edit: I have demonstrated the problem on the following image.

You may want to read about implicit conversions, especially the section about numeric promotions where it says
Prvalues of small integral types (such as char) may be converted to prvalues of larger integral types (such as int). In particular, arithmetic operators do not accept types smaller than int as arguments
What the above says is that if you use something smaller than int (like unsigned short) in an expression that involves arithmetic operators (which of course includes multiplication) then the values will be promoted to int.

It's the usual arithmetic conversions in action.
Commonly called argument promotion, although the standard uses that term in a more restricted way (the eternal conflict between reasonable descriptive terms and standardese).
C++11 §5/9:
” Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions […]
The paragraph goes on to describe the details, which amount to conversions up a ladder of more general types, until all arguments can be represented. The lowest rung on this ladder is integral promotion of both operands of a binary operation, so at least that is performed (but the conversion can start at a higher rung). And integral promotion starts with this:
C++11 §4.5/1:
” A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion
rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int
Crucially, this is about types, not arithmetic expressions. In your case the arguments of the multiplication operator * are converted to int. Then the multiplication is performed as an int multiplication, yielding an int result.

As pointed out by Paolo M in comments, USHRT_MAX has type int (this is specified by 5.2.4.2.1/1: all such macros have a type at least as big as int).
So USHRT_MAX * USHRT_MAX is already an int x int, no promotions occur.
This invokes signed integer overflow on your system, causing undefined behaviour.
Regarding the proposed solution:
unsigned u = static_cast<unsigned>(t*t);
This does not help because t*t itself causes undefined behaviour due to signed integer overflow. As explained by the other answers, t is promoted to int before the multiplication occurs, for historical reasons.
Instead you could use:
auto u = static_cast<unsigned int>(t) * t;
which, after integer promotion, is an unsigned int multiplied by an int; and then according to the rest of the usual arithmetic conversions, the int is promoted to unsigned int, and a well-defined modular multiplication occurs.

With integer promotion rules
USHRT_MAX value is promoted to int.
then we do the multiplication of 2 int (with possible overflow).

It seems that nobody has answered this part of the question yet:
Should I expect any problems with this solution?
u = static_cast<unsigned>(t*t);
Yes, there is a problem here: it first computes t*t and allows it to overflow, then it converts the result to unsigned. Integer overflow causes undefined behavior according to the C++ standard (even though it may always work fine in practice). The correct solution is:
u = static_cast<unsigned>(t)*t;
Note that the second t is promoted to unsigned before the multiplication because the first operand is unsigned.

As it has been pointed out by other answers, this happens due to integer promotion rules.
The simplest way to avoid the conversion from an unsigned type with a smaller rank than a signed type with a larger rank, is to make sure the conversion is done into an unsigned int and not int.
This is done by multiplying by the value 1 that is of type unsigned int. Due to 1 being a multiplicative identity, the result will remain unchanged:
unsigned short c = t * 1U * t;
First the operands t and 1U are evaluated. Left operand is signed and has a smaller rank than the unsigned right operand, so it gets converted to the type of the right operand. Then the operands are multiplied and the same happens with the result and the remaining right operand. The last paragraph in the Standard cited below is used for this promotion.
Otherwise, the integer promotions are performed on both operands. Then the
following rules are applied to the promoted operands:
-If both operands have the same type, then no further conversion is needed.
-Otherwise, if both operands have signed integer types or both have unsigned
integer types, the operand with the type of lesser integer conversion rank is
converted to the type of the operand with greater rank.
-Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.

Subtraction between signed and unsigned followed by division

The following results make me really confused:
int i1 = 20-80u; // -60
int i2 = 20-80; // -60
int i3 =(20-80u)/2; // 2147483618
int i4 =(20-80)/2; // -30
int i5 =i1/2; // -30
i3 seems to be computed as (20u-80u)/2, instead of (20-80u)/2
supposedly i3 is the same as i5.

IIRC, an arithmetic operation between signed and unsigned int will produce an unsigned result.
Thus, 20 - 80u produces the unsigned result equivalent to -60: if unsigned int is a 32-bit type, that result is 4294967236.
Incidentally, assigning that to i1 produces an implementation-defined result because the number is too large to fit. Getting -60 is typical, but not guaranteed.

int i1 = 20-80u; // -60
This has subtle demons! The operands are different, so a conversion is necessary. Both operands are converted to a common type (an unsigned int, in this case). The result, which will be a large unsigned int value (60 less than UINT_MAX + 1 if my calculations are correct) will be converted to an int before it's stored in i1. Since that value is out of range of int, the result will be implementation defined, might be a trap representation and thus might cause undefined behaviour when you attempt to use it. However, in your case it coincidentally converts to -60.
int i3 =(20-80u)/2; // 2147483618
Continuing on from the first example, my guess was that the result of 20-80u would be 60 less than UINT_MAX + 1. If UINT_MAX is 4294967295 (a common value for UINT_MAX), that would mean 20-80u is 4294967236... and 4294967236 / 2 is 2147483618.
As for i2 and the others, there should be no surprises. They follow conventional mathematical calculations with no conversions, truncations, overflows or other implementation-defined behaviour what-so-ever.

The binary arithmetic operators will perform the usual arithmetic conversions on their operands to bring them to a common type.
In the case of i1, i3 and i5 the common type will be unsigned int and so the result will also be unsigned int. Unsigned numbers will wrap via modulo arithmetic and so subtracting a slightly larger unsigned value will result in a number close to unsigned int max which can not be represented by an int.
So in the case of i1 we end up with an implementation defined conversion since the value can not be represented. In the case of i3 dividing by 2 brings the unsigned value back into the range of int and so we end up with a large signed int value after conversion.
The relevant sections form the C++ draft standard are as follows. Section 5.7 [expr.add]:
The additive operators + and - group left-to-right. The usual arithmetic conversions are performed for
operands of arithmetic or enumeration type.
The usual arithmetic conversions are covered in section 5 and it says:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
[...]
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
and for the conversion from a value that can not be represented for a signed type, section 4.7 [conv.integral]:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
and for unsigned integers obeys modulo arithmetic section 3.9.1 [basic.fundamental]:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value
representation of that particular size of integer.48

Bitwise operators and signed types

I'm reading C++ Primer and I'm slightly confused by a few comments which talk about how Bitwise operators deal with signed types. I'll quote:
Quote #1
(When talking about Bitwise operators) "If the operand is signed and
its value is negative, then the way that the “sign bit” is handled in
a number of the bitwise operations is machine dependent. Moreover,
doing a left shift that changes the value of the sign bit is
undefined"
Quote #2
(When talking about the rightshift operator) "If that operand is
unsigned, then the operator inserts 0-valued bits on the left; if it
is a signed type, the result is implementation defined—either copies
of the sign bit or 0-valued bits are inserted on the left."
The bitwise operators promote small integers (such as char) to signed ints. Isn't there an issue with this promotion to signed ints when bitwise operators often gives undefined or implementation-defined behaviour on signed operator types? Why wouldn't the standard promote char to unsigned int?
Edit: Here is the question I took out, but I've placed it back for context with some answers below.
An exercise later asks
"What is the value of ~'q' << 6 on a machine with 32-bit ints and 8 bit chars, that uses Latin-1 character set in which 'q' has the bit pattern 01110001?"
Well, 'q' is a character literal and would be promoted to int, giving
~'q' == ~0000000 00000000 00000000 01110001 == 11111111 11111111 11111111 10001110
The next step is to apply a left shift operator to the bits above, but as quote #1 mentions
"doing a left shift that changes the value of the sign bit is
undefined"
well I don't exactly know which bit is the sign bit, but surely the answer is undefined?

You're quite correct -- the expression ~'q' << 6 is undefined behavior according to the standard. Its even worse than you state, as the ~ operator is defined as computing "The one's complement" of the value, which is meaningless for a signed (2s-complement) integer -- the term "one's complement" only really means anything for an unsigned integer.
When doing bitwise operations, if you want strictly well-defined (according to the standard) results, you generally have to ensure that the values being operated on are unsigned. You can do that either with explicit casts, or by using explicitly unsigned constants (U-suffix) in binary operations. Doing a binary operation with a signed and unsigned int is done as unsigned (the signed value is converted to unsigned).
C and C++ are subtley different with the integer promotions, so you need to be careful here -- C++ will convert a smaller-than-int unsigned value to int (signed) before comparing with the other operand to see what should be done, while C will compare operands first.

It might be simplest to read the exact text of the Standard, instead of a summary like in Primer Plus. (The summary has to leave out detail by virtue of being a summary!)
The relevant portions are:
[expr.shift]
The shift operators << and >> group left-to-right.
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
[expr.unary.op]/10
The operand of ˜ shall have integral or unscoped enumeration type; the result is the one’s complement of its operand. Integral promotions are performed. The type of the result is the type of the promoted operand.
Note that neither of these performs the usual arithmetic conversions (which is the conversion to a common type that is done by most of the binary operators).
The integral promotions:
[conv.prom]/1
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
(There are other entries for the types in the "other than" list, I have omitted them here but you can look it up in a Standard draft).
The thing to remmeber about the integer promotions is that they are value-preserving , if you have a char of value -30, then after promotion it will be an int of value -30. You don't need to think about things like "sign extension".
Your initial analysis of ~'q' is correct, and the result has type int (because int can represent all the values of char on normal systems).
It turns out that any int whose most significant bit is set represents a negative value (there are rules about this in another part of the standard that I haven't quoted here), so ~'q' is a negative int.
Looking at [expr.shift]/2 we see that this means left-shifting it causes undefined behaviour (it's not covered by any of the earlier cases in that paragraph).

Of course, by editing the question, my answer is now partly answering a different question than the one posed, so here goes an attempt to answer the "new" question:
The promotion rules (what gets converted to what) are well defined in the standard. The type char may be either signed or unsigned - in some compilers you can even give a flag to the compiler to say "I want unsigned char type" or "I want signed char type" - but most compilers just define char as either signed or unsigned.
A constant, such as 6 is signed by default. When an operation, such as 'q' << 6 is written in the code, the compiler will convert any smaller type to any larger type [or if you do any arithmetic in general, char is converted to int], so 'q' becomes the integer value of 'q'. If you want to avoid that, you should use 6u, or an explicit cast, such as static_cast<unsigned>('q') << 6 - that way, you are ensured that the operand is converted to unsigned, rather than signed.
The operations are undefined because different hardware behaves differently, and there are architectures with "strange" numbering systems, which means that the standards committee has to choose between "ruling out/making operations extremely inefficient" or "defining the standard in a way that isn't very clear". In a few architectures, overflowing integers may also be a trap, and if you shift such that you change the sign on the number, that typically counts as an overflow - and since trapping typically means "your code no longer runs", that would not be what your average programmer expects -> falls under the umbrella of "undefined behaviour". Most processors don't, and nothing really bad will happen if you do that.
Old answer:
So the solution to avoid this is to always cast your signed values (including char) to unsigned before shifting them (or accept that your code may not work on another compiler, the same compiler with different options, or the next release of the same compiler).
It is also worth noting that the resulting value is "nearly always what you expect" (in that the compiler/processor will just perform the left or right shift on the value, on right shifts using the sign bit to shift down), it's just undefined or implementation defined because SOME machine architectures may not have hardware to "do this right", and C compilers still need to work on those systems.
The sign bit is the highest bit in a twos-complement, and you are not changing that by shifting that number:
11111111 11111111 11111111 10001110 << 6 =
111111 11111111 11111111 11100011 10000000
^^^^^^--- goes away.
result=11111111 11111111 11100011 10000000
Or as a hex number: 0xffffe380.

Bitwise AND between an 8-bit integer and 32 bit integer in C++

If I perform a Bitwise AND between a 8 bit integer (int8_t) and 32 bit integer (int) will the result be a 8 bit integer or a 32 bit integer?
I am using GNU/Linux and GCC compiler
To put the question slightly differently, before performing the bitwise AND, are the first 24 bits of the 32 bit integer discarded, or is the 8 bit integer first typecast to a 32 bit integer ?
EDIT: In this little code
#include <iostream>
#include <stdint.h>
int main()
{
int i=34;
int8_t j=2;
std::cout<<sizeof((i&j))<<std::endl;//Bitwise and between a 32 bit integer and 8 bit integer
return 0;
}
I get the output as 4. I would assume that means that the result is a 32 bit integer then. But I don't know if the result depends on the machine, compiler or OS.

For the & operator (and most other operators), any operands smaller than int will be promoted to int before the operation is evaluated.
From the C99 standard (6.5.10 - describing the bitwise AND operator):
The usual arithmetic conversions are performed on the operands.
(6.3.1.8 - describing the usual arithmetic conversions):
the integer promotions are performed on both operands
(6.3.1.1 - describing the integer promotions):
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer
promotions.

Regardless of what the language specifies, the answer to the question is that it does not matter whatsoever if the high 24 bits are dropped before the bitwise and is performed, since they're all-zero bits in one operand and thus in the result.

8 bit integer is typecast to a 32 bit integer. Bitwise operators trigger the usual arithmetic conversions (which means the smaller type is promoted to match the larger). That's defined in chapter 5.1 of the spec.

Integer types smaller than int are promoted to int before any operation is performed on them. You might want to look at what the CERT Secure Coding Standard says about "understanding integer conversion rules".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js