The following piece of code compiles with g++ and not gcc, and am stuck wondering why?
inline unsigned FloatFlip(unsigned f)
{
unsigned mask = -int(f >> 31) | 0x80000000;
return f ^ mask;
}
I would assume that in C++,
int(f >> 31)
is a constructor but that leaves me wondering why it's included in the code. Is it necessary?
C doesn't support the C++ "function-style" casting. You need to write it like this
unsigned mask = -(int)(f >> 31) | 0x80000000;
See cast operator
You can use C syntax (int)(f>>31), it will work for both as this C-syntax is C++-compliant.
As the other answers discussed, int(f >> 31) is a C++ function-style cast, and it has the same meaning as the C-style (int) (f >> 31).
How is this cast important?
Actually, this code looks like it's trying to be too clever. f >> 31 retains only the highest-order bit in f, so it's either 1 or 0. The code then casts it to an int and performs unary negation on it, giving you either -1 or 0, and finally bitwise-OR's the result with 0x80000000, which sets the highest bit and then stores the result into mask. Assuming a two's complement system, -1's representation is 0xFFFFFFFF and so mask will become 0xFFFFFFFF if the highest-order bit of f is set, and 0x80000000 otherwise.
The problem, however, is that ints do not have to use two-complement. If the system is one's complement, for instance, -1's representation would be 0xFFFFFFFE, and if the system is sign-magnitude, -1 would be 0x80000001, and mask will have the wrong value.
The irony is that unary negation for unsigned operands is well-defined to do what the author of this code presumably wanted to do, in §5.3.1 [expr.unary.op]/p8 of the standard:
The operand of the unary - operator shall have arithmetic or unscoped
enumeration type and the result is the negation of its operand.
Integral promotion is performed on integral or enumeration operands.
The negative of an unsigned quantity is computed by subtracting its
value from 2n, where n is the number of bits in the promoted operand.
The type of the result is the type of the promoted operand.
In other words, assuming 32-bit ints, -1u is defined to be 0xFFFFFFFFu. The cast to int is not just superfluous, it actually causes the code to be nonportable.
Related
Does the C++11 standard guarantee that the unary minus of a zero-valued signed integer is zero?
For example:
int zero = 0;
int n = -zero;
int m = -0;
assert(memcmp(&n, &zero, sizeof(int)) == 0);
assert(memcmp(&m, &zero, sizeof(int)) == 0);
I know that -0 and 0 are identical in two's compliment representation, but I'd like to know if the standard allows for the negation of signed integer zero to be negative-zero for other representations, such as one's compliment or signed-magnitude.
All I could find in the C++11 draft is §5.3.1, paragraph 8:
The operand of the unary-operator shall have arithmetic or unscoped
enumeration type and the result is the negation of its operand.
Integral promotion is performed on integral or enumeration operands.
The negative of an unsigned quantity is computed by subtracting its
value from 2^n, where n is the number of bits in the promoted operand.
The type of the result is the type of the promoted operand.
I can't find a definition of negation within the draft.
Motivation: I'm writing a specialized integer number parser for a library (that may be open-sourced eventually) and I want to know if I should be concerned about the possibility of "-0" being interpreted as a negative-zero signed integer on uncommon architectures.
Note: I already know about negative-zero floating-point numbers.
The standard does not mandate bit patterns for integers, because it is meant to be applicable to the widest possible range of machines. It is entirely possible for a C++ compiler to use ones-complement, where zero and negative zero would be different.
This question is motivated by me implementing cryptographic algorithms (e.g. SHA-1) in C/C++, writing portable platform-agnostic code, and thoroughly avoiding undefined behavior.
Suppose that a standardized crypto algorithm asks you to implement this:
b = (a << 31) & 0xFFFFFFFF
where a and b are unsigned 32-bit integers. Notice that in the result, we discard any bits above the least significant 32 bits.
As a first naive approximation, we might assume that int is 32 bits wide on most platforms, so we would write:
unsigned int a = (...);
unsigned int b = a << 31;
We know this code won't work everywhere because int is 16 bits wide on some systems, 64 bits on others, and possibly even 36 bits. But using stdint.h, we can improve this code with the uint32_t type:
uint32_t a = (...);
uint32_t b = a << 31;
So we are done, right? That's what I thought for years. ... Not quite. Suppose that on a certain platform, we have:
// stdint.h
typedef unsigned short uint32_t;
The rule for performing arithmetic operations in C/C++ is that if the type (such as short) is narrower than int, then it gets widened to int if all values can fit, or unsigned int otherwise.
Let's say that the compiler defines short as 32 bits (signed) and int as 48 bits (signed). Then these lines of code:
uint32_t a = (...);
uint32_t b = a << 31;
will effectively mean:
unsigned short a = (...);
unsigned short b = (unsigned short)((int)a << 31);
Note that a is promoted to int because all of ushort (i.e. uint32) fits into int (i.e. int48).
But now we have a problem: shifting non-zero bits left into the sign bit of a signed integer type is undefined behavior. This problem happened because our uint32 was promoted to int48 - instead of being promoted to uint48 (where left-shifting would be okay).
Here are my questions:
Is my reasoning correct, and is this a legitimate problem in theory?
Is this problem safe to ignore because on every platform the next integer type is double the width?
Is a good idea to correctly defend against this pathological situation by pre-masking the input like this?: b = (a & 1) << 31;. (This will necessarily be correct on every platform. But this could make a speed-critical crypto algorithm slower than necessary.)
Clarifications/edits:
I'll accept answers for C or C++ or both. I want to know the answer for at least one of the languages.
The pre-masking logic may hurt bit rotation. For example, GCC will compile b = (a << 31) | (a >> 1); to a 32-bit bit-rotation instruction in assembly language. But if we pre-mask the left shift, it is possible that the new logic is not translated into bit rotation, which means now 4 operations are performed instead of 1.
Speaking to the C side of the problem,
Is my reasoning correct, and is this a legitimate problem in theory?
It is a problem that I had not considered before, but I agree with your analysis. C defines the behavior of the << operator in terms of the type of the promoted left operand, and it it conceivable that the integer promotions result in that being (signed) int when the original type of that operand is uint32_t. I don't expect to see that in practice on any modern machine, but I'm all for programming to the actual standard as opposed to my personal expectations.
Is this problem safe to ignore because on every platform the next integer type is double the width?
C does not require such a relationship between integer types, though it is ubiquitous in practice. If you are determined to rely only on the standard, however -- that is, if you are taking pains to write strictly conforming code -- then you cannot rely on such a relationship.
Is a good idea to correctly defend against this pathological situation by pre-masking the input like this?: b = (a & 1) << 31;.
(This will necessarily be correct on every platform. But this could
make a speed-critical crypto algorithm slower than necessary.)
The type unsigned long is guaranteed to have at least 32 value bits, and it is not subject to promotion to any other type under the integer promotions. On many common platforms it has exactly the same representation as uint32_t, and may even be the same type. Thus, I would be inclined to write the expression like this:
uint32_t a = (...);
uint32_t b = (unsigned long) a << 31;
Or if you need a only as an intermediate value in the computation of b, then declare it as an unsigned long to begin with.
Q1: Masking before the shift does prevent undefined behavior that OP has concern.
Q2: "... because on every platform the next integer type is double the width?" --> no. The "next" integer type could be less than 2x or even the same size.
The following is well defined for all compliant C compilers that have uint32_t.
uint32_t a;
uint32_t b = (a & 1) << 31;
Q3: uint32_t a; uint32_t b = (a & 1) << 31; is not expected to incur code that performs a mask - it is not needed in the executable - just in the source. If a mask does occur, get a better compiler should speed be an issue.
As suggested, better to emphasize the unsigned-ness with these shifts.
uint32_t b = (a & 1U) << 31;
#John Bollinger good answer well details how to handle OP's specific problem.
The general problem is how to form a number that is of at least n bits, a certain sign-ness and not subject to surprising integer promotions - the core of OP's dilemma. The below fulfills this by invoking an unsigned operation that does not change the value - effective a no-op other than type concerns. The product will be at least the width of unsigned or uint32_t. Casting, in general, may narrow the type. Casting needs to be avoided unless narrowing is certain to not occur. An optimization compiler will not create unnecessary code.
uint32_t a;
uint32_t b = (a + 0u) << 31;
uint32_t b = (a*1u) << 31;
Taking a clue from this question about possible UB in uint32 * uint32 arithmetic, the following simple approach should work in C and C++:
uint32_t a = (...);
uint32_t b = (uint32_t)((a + 0u) << 31);
The integer constant 0u has type unsigned int. This promotes the addition a + 0u to uint32_t or unsigned int, whichever is wider. Because the type has rank int or higher, no more promotion occurs, and the shift can be applied with the left operand being uint32_t or unsigned int.
The final cast back to uint32_t will just suppress potential warnings about a narrowing conversion (say if int is 64 bits).
A decent C compiler should be able to see that adding zero is a no-op, which is less onerous than seeing that a pre-mask has no effect after an unsigned shift.
To avoid unwanted promotion, you may use the greater type with some typedef, as
using my_uint_at_least32 = std::conditional_t<(sizeof(std::uint32_t) < sizeof(unsigned)),
unsigned,
std::uint32_t>;
For this segment of code:
uint32_t a = (...);
uint32_t b = a << 31;
To promote a to a unsigned type instead of signed type, use:
uint32_t b = a << 31u;
When both sides of << operator is an unsigned type, then this line in 6.3.1.8 (C standard draft n1570) applies:
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
The problem you are describing is caused you use 31 which is signed int type so another line in 6.3.1.8
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
forces a to promoted to a signed type
Update:
This answer is not correct because 6.3.1.1(2) (emphasis mine):
...
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions.58) All other types are unchanged by the integer
promotions.
and footnote 58 (emphasis mine):
58) The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators, as specified by their respective subclauses.
Since only integer promotion is happening and not common arithmetic conversion, using 31u does not guarantee a to be converted to unsigned int as stated above.
I'm reading C++ Primer and I'm slightly confused by a few comments which talk about how Bitwise operators deal with signed types. I'll quote:
Quote #1
(When talking about Bitwise operators) "If the operand is signed and
its value is negative, then the way that the “sign bit” is handled in
a number of the bitwise operations is machine dependent. Moreover,
doing a left shift that changes the value of the sign bit is
undefined"
Quote #2
(When talking about the rightshift operator) "If that operand is
unsigned, then the operator inserts 0-valued bits on the left; if it
is a signed type, the result is implementation defined—either copies
of the sign bit or 0-valued bits are inserted on the left."
The bitwise operators promote small integers (such as char) to signed ints. Isn't there an issue with this promotion to signed ints when bitwise operators often gives undefined or implementation-defined behaviour on signed operator types? Why wouldn't the standard promote char to unsigned int?
Edit: Here is the question I took out, but I've placed it back for context with some answers below.
An exercise later asks
"What is the value of ~'q' << 6 on a machine with 32-bit ints and 8 bit chars, that uses Latin-1 character set in which 'q' has the bit pattern 01110001?"
Well, 'q' is a character literal and would be promoted to int, giving
~'q' == ~0000000 00000000 00000000 01110001 == 11111111 11111111 11111111 10001110
The next step is to apply a left shift operator to the bits above, but as quote #1 mentions
"doing a left shift that changes the value of the sign bit is
undefined"
well I don't exactly know which bit is the sign bit, but surely the answer is undefined?
You're quite correct -- the expression ~'q' << 6 is undefined behavior according to the standard. Its even worse than you state, as the ~ operator is defined as computing "The one's complement" of the value, which is meaningless for a signed (2s-complement) integer -- the term "one's complement" only really means anything for an unsigned integer.
When doing bitwise operations, if you want strictly well-defined (according to the standard) results, you generally have to ensure that the values being operated on are unsigned. You can do that either with explicit casts, or by using explicitly unsigned constants (U-suffix) in binary operations. Doing a binary operation with a signed and unsigned int is done as unsigned (the signed value is converted to unsigned).
C and C++ are subtley different with the integer promotions, so you need to be careful here -- C++ will convert a smaller-than-int unsigned value to int (signed) before comparing with the other operand to see what should be done, while C will compare operands first.
It might be simplest to read the exact text of the Standard, instead of a summary like in Primer Plus. (The summary has to leave out detail by virtue of being a summary!)
The relevant portions are:
[expr.shift]
The shift operators << and >> group left-to-right.
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
[expr.unary.op]/10
The operand of ˜ shall have integral or unscoped enumeration type; the result is the one’s complement of its operand. Integral promotions are performed. The type of the result is the type of the promoted operand.
Note that neither of these performs the usual arithmetic conversions (which is the conversion to a common type that is done by most of the binary operators).
The integral promotions:
[conv.prom]/1
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
(There are other entries for the types in the "other than" list, I have omitted them here but you can look it up in a Standard draft).
The thing to remmeber about the integer promotions is that they are value-preserving , if you have a char of value -30, then after promotion it will be an int of value -30. You don't need to think about things like "sign extension".
Your initial analysis of ~'q' is correct, and the result has type int (because int can represent all the values of char on normal systems).
It turns out that any int whose most significant bit is set represents a negative value (there are rules about this in another part of the standard that I haven't quoted here), so ~'q' is a negative int.
Looking at [expr.shift]/2 we see that this means left-shifting it causes undefined behaviour (it's not covered by any of the earlier cases in that paragraph).
Of course, by editing the question, my answer is now partly answering a different question than the one posed, so here goes an attempt to answer the "new" question:
The promotion rules (what gets converted to what) are well defined in the standard. The type char may be either signed or unsigned - in some compilers you can even give a flag to the compiler to say "I want unsigned char type" or "I want signed char type" - but most compilers just define char as either signed or unsigned.
A constant, such as 6 is signed by default. When an operation, such as 'q' << 6 is written in the code, the compiler will convert any smaller type to any larger type [or if you do any arithmetic in general, char is converted to int], so 'q' becomes the integer value of 'q'. If you want to avoid that, you should use 6u, or an explicit cast, such as static_cast<unsigned>('q') << 6 - that way, you are ensured that the operand is converted to unsigned, rather than signed.
The operations are undefined because different hardware behaves differently, and there are architectures with "strange" numbering systems, which means that the standards committee has to choose between "ruling out/making operations extremely inefficient" or "defining the standard in a way that isn't very clear". In a few architectures, overflowing integers may also be a trap, and if you shift such that you change the sign on the number, that typically counts as an overflow - and since trapping typically means "your code no longer runs", that would not be what your average programmer expects -> falls under the umbrella of "undefined behaviour". Most processors don't, and nothing really bad will happen if you do that.
Old answer:
So the solution to avoid this is to always cast your signed values (including char) to unsigned before shifting them (or accept that your code may not work on another compiler, the same compiler with different options, or the next release of the same compiler).
It is also worth noting that the resulting value is "nearly always what you expect" (in that the compiler/processor will just perform the left or right shift on the value, on right shifts using the sign bit to shift down), it's just undefined or implementation defined because SOME machine architectures may not have hardware to "do this right", and C compilers still need to work on those systems.
The sign bit is the highest bit in a twos-complement, and you are not changing that by shifting that number:
11111111 11111111 11111111 10001110 << 6 =
111111 11111111 11111111 11100011 10000000
^^^^^^--- goes away.
result=11111111 11111111 11100011 10000000
Or as a hex number: 0xffffe380.
The n3337.pdf draft, 5.3.1.8, states that:
The operand of the unary - operator shall have arithmetic or unscoped enumeration type and the result is the negation of its operand. Integral promotion is performed on integral or enumeration operands. The negative of an unsigned quantity is computed by subtracting its value from 2ⁿ, where n is the number of bits in the promoted operand. The type of the result is the type of the promoted operand.
For some cases it is enough. Suppose unsigned int is 32 bits wide, then (-(0x80000000u)) == 0x80000000u, isn't it?
Still, I can not find anything about unary minus on unsigned 0x80000000. Also, C99 standard draft n1336.pdf, 6.5.3.3 seems to say nothing about it:
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
UPDATE2: Let us suppose that unsigned int is 32 bits wide. So, the question is: what about unary minus in C (signed and unsigned), and unary minus in C++ (signed only)?
UPDATE1: both run-time behavior and compile-time behavior (i.e. constant-folding) are interesting.
(related: Why is abs(0x80000000) == 0x80000000?)
For your question, the important part of the quote you've included is this:
The negative of an unsigned quantity is computed by subtracting its
value from 2ⁿ, where n is the number of bits in the promoted operand.
So, to know what the value of -0x80000000u is, we need to know n, the number of bits in the type of 0x80000000u. This is at least 32, but this is all we know (without further information about the sizes of types in your implementation). Given some values of n, we can calculate what the result will be:
n | -0x80000000u
----+--------------
32 | 0x80000000
33 | 0x180000000
34 | 0x380000000
48 | 0xFFFF80000000
64 | 0xFFFFFFFF80000000
(For example, an implementation where unsigned int is 16 bits and unsigned long is 64 bits would have an n of 64).
C99 has equivalent wording hidden away in §6.2.5 Types p9:
A computation involving unsigned operands can never overflow, because
a result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting type.
The result of the unary - operator on an unsigned operand other than zero will always be caught by this rule.
With a 32 bit int, the type of 0x80000000 will be unsigned int, regardless of the lack of a u suffix, so the result will still be the value 0x80000000 with type unsigned int.
If instead you use the decimal constant 2147483648, it will have type long and the calculation will be signed. The result will be the value -2147483648 with type long.
In n1336, 6.3.1.3 Signed and Unsigned Integers, paragraph 2 defines the conversion to an unsigned integer:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
So for 32-bit unsigned int, -0x80000000u==-0x80000000 + 0x100000000==0x80000000u.
On my compiler, the following pseudo code (values replaced with binary):
sint32 word = (10000000 00000000 00000000 00000000);
word >>= 16;
produces a word with a bitfield that looks like this:
(11111111 11111111 10000000 00000000)
Can I rely on this behaviour for all platforms and C++ compilers?
From the following link:
INT34-C. Do not shift an expression by a negative number of bits or by greater than or equal to the number of bits that exist in the operand
Noncompliant Code Example (Right Shift)
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the resulting value is implementation defined and can be either an arithmetic (signed) shift:
Or a logical (unsigned) shift:
This noncompliant code example fails to test whether the right operand is greater than or equal to the width of the promoted left operand, allowing undefined behavior.
unsigned int ui1;
unsigned int ui2;
unsigned int uresult;
/* Initialize ui1 and ui2 */
uresult = ui1 >> ui2;
Making assumptions about whether a right shift is implemented as an arithmetic (signed) shift or a logical (unsigned) shift can also lead to vulnerabilities. See recommendation INT13-C. Use bitwise operators only on unsigned operands.
From the latest C++20 draft:
Right-shift on signed integral types is an arithmetic right shift, which performs sign-extension.
No, you can't rely on this behaviour. Right shifting of negative quantities (which I assume your example is dealing with) is implementation defined.
In C++, no. It is implementation and/or platform dependent.
In some other languages, yes. In Java, for example, the >> operator is precisely defined to always fill using the left most bit (thereby preserving sign). The >>> operator fills using 0s. So if you want reliable behavior, one possible option would be to change to a different language. (Although obviously, this may not be an option depending on your circumstances.)
AFAIK integers may be represented as sign-magnitude in C++, in which case sign extension would fill with 0s. So you can't rely on this.