Is over/underflow an undefined behavior at execution time? - c++

I was reading about undefined behavior, and I'm not sure if it's a compile-time only feature, or if it can occurs at execution-time.
I understand this example well (this is extracted from the Undefined Behavior page of Wikipedia):
An example for the C language:
int foo(unsigned x)
{
int value = 5;
value += x;
if (value < 5)
bar();
return value;
}
The value of x cannot be negative and, given that signed integer overflow is undefined behavior in C, the compiler can assume that at the line of the if check value >= 5. Thus the if and the call to the function bar can be ignored by the compiler since the if has no side effects and its condition will never be satisfied. The code above is therefore semantically equivalent to:
int foo(unsigned x)
{
int value = 5;
value += x;
return value;
}
But this occurs at compilation-time.
What if I write, for example:
void foo(int x) {
if (x + 150 < 5)
bar();
}
int main() {
int x;
std::cin >> x;
foo(x);
}
and then the user type in MAX_INT - 100 ("2147483547", if 32 bits-integer).
There will be an integer overflow, but AFAIK, it is the arithmetic logic unit of the CPU that will make an overflow, so the compiler is not involved here.
Is it still undefined behavior?
If yes, how does the compiler detect the overflow?
The best I could imagine is with the overflow flag of the CPU. If this is the case, does it means that the compiler can do anything he wants if the overflow flag of the CPU is set anytime at execution-time?

Yes but not necessarily in the way I think you might have meant it, that is, if in the machine code there is an addition and at runtime that addition wraps (or otherwise overflows, but on most architectures it would wrap) that is not UB by itself. The UB is solely in the domain of C (or C++). That addition may have been adding unsigned integers or be some sort of optimizations that the compiler can make because it knows the semantics of the target platform and can safely use optimizations that rely on wrapping (but you cannot, unless of course you do it with unsigned types).
Of course that does not at all mean that it is safe to use constructs that "wrap only at runtime", because those code paths are poisoned at compile time as well. For example in your example,
extern void bar(void);
void foo(int x) {
if (x + 150 < 5)
bar();
}
Is compiled by GCC 6.3 targeting x64 to
foo:
cmp edi, -145
jl .L4
ret
.L4:
jmp bar
Which is the equivalent of
void foo(int x) {
if (x < -145)
bar(); // with tail call optimization
}
.. which is the same if you assume that signed integer overflow is impossible (in the sense that it puts an implicit precondition on the inputs to be such that overflow will not happen).

Your analysis of the first example is incorrect. value += x; is equivalent to:
value = value + x;
In this case value is int and x is unsigned, so the usual arithmetic conversion means that value is first converted to unsigned, so we have an unsigned addition which by definition cannot overflow (it has well-defined semantics in accordance with modular arithmetic).
When the unsigned result is assigned back to value, if it is larger than INT_MAX then this is an out-of-range assignment which has implementation-defined behaviour. This is NOT overflow because it is assignment, not an arithmetic operation.
Which optimizations are possible therefore depends on how the implementation defines the behaviour of out-of-range assignment for integers. Modern systems all take the value which has the same 2's complement representation, but historically other systems have done some different things.
So the original example does not have undefined behaviour in any circumstance and the suggested optimization is , for most systems, not possible.
Your second example has nothing to do with your first example since it does not involve any unsigned arithmetic. If x > INT_MAX - 150 then the expression x + 150 causes undefined behaviour due to signed integer overflow. The language definition does not mention ALUs or CPUs so we can be certain that those things are not related to whether or not the behaviour is undefined.
If yes, how does the compiler detect the overflow?
It doesn't have to. Precisely because the behaviour is undefined, it means the compiler is not constrained by having to worry about what happens when there is overflow. It only has to emit an executable that exemplifies the behaviour for the cases which are defined.
In this program those are the inputs in the range [INT_MIN, INT_MAX-150] and so the compiler can transform the comparison to x < -145 because that has the same behaviour for all inputs in the well-defined range, and it doesn't matter about the undefined cases.

Related

Clang 14 and 15 apparently optimizing away code that compiles as expected under Clang 13, ICC, GCC, MSVC

I have the following sample code:
inline float successor(float f, bool const check)
{
const unsigned long int mask = 0x7f800000U;
unsigned long int i = *(unsigned long int*)&f;
if (check)
{
if ((i & mask) == mask)
return f;
}
i++;
return *(float*)&i;
}
float next1(float a)
{
return successor(a, true);
}
float next2(float a)
{
return successor(a, false);
}
Under x86-64 clang 13.0.1, the code compiles as expected.
Under x86-64 clang 14.0.0 or 15, the output is merely a ret op for next1(float) and next2(float).
Compiler options: -march=x86-64-v3 -O3
The code and output are here: Godbolt.
The successor(float,bool) function is not a no-op.
As a note, the output is as expected under GCC, ICC, and MSVCC. Am I missing something here?
*(unsigned long int*)&f is an immediate aliasing violation. f is a float. You are not allowed to access it through a pointer to unsigned long int. (And the same applies to *(float*)&i.)
So the code has undefined behavior and Clang likes to assume that code with undefined behavior is unreachable.
Compile with -fno-strict-aliasing to force Clang to not consider aliasing violations as undefined behavior that cannot happen (although that is probably not sufficient here, see below) or better do not rely on undefined behavior. Instead use either std::bit_cast (since C++20) or std::memcpy to create a copy of f with the new type but same object representation. That way your program will be valid standard C++ and not rely on the -fno-strict-aliasing compiler extension.
(And if you use std::memcpy add a static_assert to verify that unsigned long int and float have the same size. That is not true on all platforms and also not on all common platforms. std::bit_cast has the test built-in.)
As noticed by #CarstenS in the other answer, given that you are (at least on compiler explorer) compiling for the SysV ABI, unsigned long int (64bit) is indeed a different size than float (32bit). Consequently there is much more direct UB in that you are accessing memory out-of-bounds in the initialization of i. And as he also noticed Clang does seem to compile the code as intended when an integer type of matching size is used, even without -fno-strict-aliasing. This does not invalidate what I wrote above in general though.
Standards and UB aside, on your target platform float is 32 bits and long is 64 bits, so I am surprised by the clang 13 code (indeed I think you will get actual UB with -O0). If you use uint32_t instead of long, the problem goes away.
Some compiler writers interpret the Standard as deprecating "non-portable or erroneous" program constructs, including constructs which implementations for commonplace hardware had to date had unanimously processed in a manner consistent with implementation-defined behavioral traits such as numeric representations.
Compilers that are designed for paying customers will look at a construct like:
unsigned long int i = *(unsigned long int*)&f; ; f is of type float
and recognize that while converting the address of a float to an unsigned long* is non-portable construct, it was almost certainly written for the purpose of examining the bits of a float type. This is a very different situation from the one offered in the published Rationale as being the reason for the rule, which was more like:
int x;
int test(double *p)
{
x = 1;
*p = 2.0;
return x;
}
In the latter situation, it would be theoretically possible that *p points to or overlaps x, and that the programmer knows what precedes and/or follows x in memory, and the authors of the Standard recognized that having the function unconditionally returned 1 would be incorrect behavior if that were the case, but decided that there was no need to mandate support for such dubious possibilities.
Returning to the original, that represents a completely different situation since any compiler that isn't willfully blind to such things would know that the address being accessed via type unsigned long* was formed from a pointer of type float*. While the Standard wouldn't forbid compilers from being willfully blind to the possibility that a float* might actually hold the address of storage that will be accessed using type float, that's because the Standard saw no need to mandate that compiler writers do things which anyone wanting to sell compilers would do, with or without a mandate.
Probably not coincidentally, the compilers I'm aware of that would require a -fno-strict-aliasing option to usefully process constructs such as yours also require that flag in order to correctly process some constructs whose behavior is unambiguously specified by the Standard. Rather than jumping through hoops to accommodate a deficient compiler configurations, a better course of action would be to simply use the "don't make buggy aliasing optimizations" option.

Optimisation and strict aliasing

My question is regarding a code fragment, such as below:
#include <iostream>
int main() {
double a = -50;
std::cout << a << "\n";
uint8_t* b = reinterpret_cast<uint8_t*>(&a);
b[7] &= 0x7F;
std::cout << a << "\n";
return 0;
}
As far as I can tell I am not breaking any rules and everything is well defined (as noted below I forgot that uint8_t is not allowed to alias other types). There is some implementation defined behavior going on, but for the purpose of this question I don't think that is relevant.
I would expect this code to print -50, then 50 on systems where the double follows the IEEE standard, is 8 bytes long and is stored in little endian format. Now the question is. Does the compiler guarantee that this happens. More specifically, turning on optimisations can the compiler optimise away the middle b[7], either explicitly or implicitly, by simply keeping a in a register through the whole function. The second one obviously could be solved by specifying volatile double a, but is that needed?
Edit: As an a note I (mistakenly) remembered that uint8_t was required to be an alias for unsigned char, but indeed the standard does not specify such. I have also written the question in a way that, yes the compiler can ahead of time know everything here, but modified to
#include <iostream>
int main() {
double a;
std::cin >> a;
std::cout << a << "\n";
unsigned char* b = reinterpret_cast<unsigned char*>(&a);
b[7] &= 0x7F;
std::cout << a << "\n";
return 0;
}
one can see where the problem might arise. Here the strict aliasing rule is no longer violated, and a is not a compile time constant. Richard Critten's comment however is curious if the aliased data can be examined, but not written, is there a way one can set individual bytes, while still following the standard?
More specifically, turning on optimisations can the compiler optimise away the middle b[7], either explicitly or implicitly, by simply keeping a in a register through the whole function.
The compiler can generate the double value 50 as a constant, and pass that directly to the output function. b can be optimised away completely. Like most optimisation, this is due to the as-if rule:
[intro.abstract]
The semantic descriptions in this document define a parameterized nondeterministic abstract machine.
This document places no requirement on the structure of conforming implementations.
In particular, they need not copy or emulate the structure of the abstract machine.
Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
The second one obviously could be solved by specifying volatile double a
That would prevent the optimisation, which would generally be considered to be the opposite of a solution.
Does the compiler guarantee that [50 is printed].
You didn't mention what compiler you are asking about. I'm going to assume that you mean whether the standard guarantees this. It doesn't guarantee that universally. You are relying on several assumptions about the implementation:
If sizeof(double) < 8, then you access the object outside of its bounds, and behaviour of the program is undefined.
If std::uint8_t is not an a type alias of unsigned char, then it isn't allowed to alias double, and the behaviour of the program is undefined.
Given the assumptions hold and thus behviour is well-defined, then the second output will be of a double value that is like -50, but whose most significant bit(s from 8th forward) of the byte at position 7 will have been set to 0. In case of little endian IEEE-754 representation, that value would be 50. volatile is not needed to guarantee this, and it won't add a guarantee in case the behaviour of the program is undefined.

Does arithmetic overflow overwrite data?

std::uint8_t x = 256; //Implicitly converts to 0
std::uint8_t y = 255;
y++;
For x, I assume everything is handled because 100000000 gets converted to 00000000 using some defined conversion from int to uint8_t. x's memory should be 0 00000000 not 1 00000000.
However with y I believe the overflow stays in memory. y is initially 11111111. After adding 1, it becomes 1 00000000. This wraps around back to 0 because y only looks at the 8 LSB.
Does the 1 after y++; stay in memory, or is it discarded when the addition is done?
If it is there, could it corrupt data before y?
Does arithmetic overflow overwrite data?
The behaviour of signed arithmetic overflow is undefined. It's neither guaranteed to overwrite data, nor guaranteed to not overwrite data.
std::uint8_t y = 255;
y++;
Unsigned overflow is well defined. y will be 0, and there are no other side-effects.
Citation from the C++ standard (latest draft):
[basic.fundamental]
... The range of representable values for the unsigned type is 0 to 2N−1 (inclusive); arithmetic for the
unsigned type is performed modulo 2N.
[Note 2: Unsigned arithmetic does not overflow.
Overflow for signed arithmetic yields undefined behavior ([expr.pre]).
— end note]
[expr.pre]
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
Since unsigned arithmetic is modular, the result can never be outside of representable values.
When using gcc with optimizations enabled, unless one uses the -fwrapv compiler option, integer overflow may arbitrarily disrupt program behavior, leading to memory corruption, even in cases where the result of the computation that overflowed would not be used. Reading through the published Rationale, it seems unlikely that the authors of the Standard would have expected a general-purpose compiler for a commonplace platform to behave in such fashion, but the Standard makes no attempt to anticipate and forbid all the gratuitously nonsensical ways implementations might process things.

Threshold an absolute value

I have the following function:
char f1( int a, unsigned b ) { return abs(a) <= b; }
For execution speed, I want to rewrite it as follows:
char f2( int a, unsigned b ) { return (unsigned)(a+b) <= 2*b; } // redundant cast
Or alternatively with this signature that could have subtle implications even for non-negative b:
char f3( int a, int b ) { return (unsigned)(a+b) <= 2*b; }
Both of these alternatives work under a simple test on one platform, but I need it to portable. Assuming non-negative b and no risk of overflow, is this a valid optimization for typical hardware and C compilers? Is it also valid for C++?
Note: As C++ on gcc 4.8 x86_64 with -O3, f1() uses 6 machine instructions and f2() uses 4. The instructions for f3() are identical to those for f2(). Also of interest: if b is given as a literal, both functions compile to 3 instructions that directly map to the operations specified in f2().
Starting with the original code with signature
char f2( int a, unsigned b );
this contains the expression
a + b
Since one of these operands has a signed and the other an (corresponding) unsigned integer type (thus they have the same "integer conversion rank"), then - following the "Usual arithmetic conversions" (§ 6.3.1.8) - the operand with signed integer type is converted to the unsigned type of the other operand.
Conversion to an unsigned integer type is well defined, even if the value in question cannot be represented by the new type:
[..] if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. 60
§ 6.3.1.3/2
Footnote 60 just says that the described arithmetic works with the mathematical value, not the typed one.
Now, with the updated code
char f2_updated( int a, int b ); // called f3 in the question
things would look different. But since b is assumed to be non-negative, and assuming that INT_MAX <= UINT_MAX you can convert b to an unsigned without fearing it to have a different mathematical value afterwards. Thus you could write
char f2_updated( int a, int b ) {
return f2(a, (unsigned)b); // cast unnecessary but to make it clear
}
Looking again at f2 the expression 2*b further limits the allowed range of b to be not larger than UINT_MAX/2 (otherwise the mathematical result would be wrong).
So as long as you stay within these bounds, every thing is fine.
Note: Unsigned types do not overflow, they "wrap" according to modular arithmetic.
Quotes from N1570 (a C11 working draft)
A final remark:
IMO the only really reasonable choice to write this function is as
#include <stdbool.h>
#include <assert.h>
bool abs_bounded(int value, unsigned bound) {
assert(bound <= (UINT_MAX / 2));
/* NOTE: Casting to unsigned makes the implicit conversion that
otherwise would happen explicit. */
return ((unsigned)value + bound) <= (2 * bound);
}
Using a signed type for the bound does not make much sense, because the absolute of a value cannot be less than a negative number. abs_bounded(value, something_negative) would be always false. If there's the possibility of a negative bound, then I'd catch this outside of this function (otherwise it does "too much"), like:
int some_bound;
// ...
if ((some_bound >= 0) && abs_bounded(my_value, some_bound)) {
// yeeeha
}
As OP wants fast and portable code (and b is positive), it first makes sense to code safely:
// return abs(a) <= b;
inline bool f1_safe(int a, unsigned b ) {
return (a >= 0 && a <= b) || (a < 0 && 0u - a <= b);
}
This works for all a,b (assuming UINT_MAX > INT_MAX). Next, compare alternatives using an optimized compile (let the compiler do what it does best).
The following slight variation on OP's code will work in C/C++ but risks portability issues unless "Assuming non-negative b and no risk of overflow" can be certain on all target machines.
bool f2(int a, unsigned b) { return a+b <= b*2; }
In the end, OP goal of fast and portable code may find code the works optimally for the select platform, but not with others - such is micro-optimization.
To determine if the 2 expressions are equivalent for your purpose, you must study the domain of definition:
abs(a) <= b is defined for all values of int a and unsigned b, with just one special case for a = INT_MIN;. On 2s complement architectures, abs(INT_MIN) is not defined but most likely evaluates to INT_MIN, which converted to unsigned as required for the <= with an unsigned value, yields the correct value.
(unsigned)(a+b) <= 2*b may produce a different result for b > UINT_MAX/2. For example, it will evaluate to false for a = 1 and b = UINT_MAX/2+1. There might be more cases where you alternate formula gives an incorrect result.
EDIT: OK, the question was edited... and b is now an int.
Note that a+b invokes undefined behavior in case of overflow and the same for 2*b. So you make the assumption that neither a+b nor 2*b overflow. Furthermore, if b is negative, you little trick does not work.
If a is in the range -INT_MAX/2..INT_MAX/2 and b in the range 0..INT_MAX/2, it seems to function as expected. The behavior is identical in C and C++.
Whether it is an optimization depends completely on the compiler, command line options, hardware capabilities, surrounding code, inlining, etc. You already address this part and tell us that you shave one or two instructions... Just remember that this kind of micro-optimization is not absolute. Even counting instructions does not necessarily help find the best performance. Did you perform some benchmarks to measure if this optimization is worthwhile? Is the difference even measurable?
Micro-optimizing such a piece of code is self-defeating: it makes the code less readable and potentially incorrect. b might not be negative in the current version, but if the next maintainer changes that, he/she might not see the potential implications.
Yes, this is portable to compliant platforms. The conversion from signed to unsigned is well defined:
Conversion between signed integer and unsigned integer
int to unsigned int conversion
Signed to unsigned conversion in C - is it always safe?
The description in the C spec is a bit contrived:
if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.
The C++ spec addresses the same conversion in a more sensible way:
In a two's complement representation, this conversion is conceptual
and there is no change in the bit pattern
In the question, f2() and f3() achieve the same results in a slightly different way.
In f2() the presence of the unsigned operand causes a conversion of the signed operand as required here for C++. The unsigned addition may-or-may-not then result in a wrap-around past zero, which is also well defined [citation needed].
In f3() the addition occurs in signed representation with no trickiness, and then the result is (explicitly) converted to unsigned. So this is slightly simpler than f2() (and also more clear).
In both cases, the you end up with the same unsigned representation of the sum, which can then be compared (as unsigned) to 2*b. And the trick of treating a signed value as an unsigned type allows you to check a two-sided range with only a single comparison. Note also that this is a bit more flexible than using the abs() function since the trick doesn't require that the range be centered around zero.
Commentary on the "usual arithmetic conversions"
I think this question demonstrated that using unsigned types is generally a bad idea. Look at the confusion it caused here.
It can be tempting to use unsigned for documentation purposes (or to take advantage of the shifted value range), but due to the conversion rules, this may tend to be a mistake. In my opinion, the "usual arithmetic conversions" are not sensible if you assume that arithmetic is more likely to involve negative values than to overflow signed values.
I asked this followup question to clarify the point: mixed-sign integer math depends on variable size. One new thing that I have learned is that mixed-sign operations are not generally portable because the conversion type will depend on the size relative to that of int.
In summary: Using type declarations or casts to perform unsigned operations is a low-level coding style that should be approached with the requisite caution.

Why is using the value of int i; undefined behaviour but using the value of rand() is not?

If I don't initialise i, I don't know its value, but also I cannot predict the value of rand().
But on the other hand, I know the value of the uninitialised i is between INT_MIN and INT_MAX, also I know the value of rand() is between 0 and RAND_MAX.
Why is using the value of the uninitialised i undefined behaviour but using the value of rand() is not?
The value of an uninitialized variable is undefined. The return value of rand() is well-defined as being the next number in the pseudo-random sequence for the given seed.
You can rely on rand() returning a pseudorandom number. You cannot rely on any characteristic of the value of an uninitialized int i.
I looked up the C99 standard (ISO/IEC 9899:1999), which is the only standard document I have available, but I seriously doubt these things have changed. In chapter 6.2.6 Representations of types, it is stated that integers are allowed to be stored in memory with padding bits, the value of which is unspecified but may include parity bits, which would be set upon initialization and any arithmetic operation on the integer. Certain representations (like e.g. a parity mismatch) could be trap representations, the behaviour of which is undefined (but might well terminate your program).
So, no, you cannot even rely on an uninitialized int i to be INT_MIN <= i <= INT_MAX.
The standard says reading from an uninitialized variable is undefined behaviour, so it is. You cannot say that it is between INT_MIN and INT_MAX. In fact, cannot really think of that variable as holding any value, so you couldn't even check that hypothesis*.
rand() on the other hand is designed to produce a random number within a range. If reading from the result of rand() were undefined, it wouldn't be part of any library because it couldn't be used.
* Usually "undefined behaviour" provides scope for optimization. The optimizer can do whatever it wants with an uninitialized variable, working under the assumption that it isn't read from.
Value of rand is defined to be (pseudo-)random. Value of an uninitialized variable is not defined to be anything. That is the difference, whether something is defined to be anything meaningful (while rand() is also meaningful - it gives (pseudo-)random numbers) or undefined.
The short answer, like others have said, is because the standard says so.
The act of accessing the value of a variable (described in the standard as performing an lvalue to rvalue conversion) that is uninitialised gives undefined behaviour.
rand() is specified as giving a value between 0 and RAND_MAX, where RAND_MAX (a macro declared in <cstdlib> for C++, and <stdlib.h> for C) is specified as having a value which is at least 32767.
Consider variable x in the following code:
uint32_t blah(uint32_t q, uint32_t r)
{
uint16_t x;
if (q)
x=foo(q); // Assume return type of foo() is uint16_t
if (r)
x=bar(r); // Assume return type of bar() is uint16_t
return x;
}
Since code never assigns x a value outside the range 0-65535, a compiler for a 32-bit processor could legitimately allocate a 32-bit register for it. Indeed, since the value of q is never used after the first time x is written, a compiler could use the same register to hold x has had been used to hold q. Ensuring that the function would always return a value 0-65535 would require adding some extra instructions compared with simply letting it return whatever happened to be in the register allocated to x.
Note that from the point of view of the Standard authors, if the Standard isn't going to require that the compiler make x hold a value from 0-65535, it may as well not specify anything about what may happen if code tries to use x. Implementations that wish to offer any guarantees about behavior in such cases are free to do so, but the Standard imposes no requirements.
rand uses an algorithm to generate your number. So rand is not undefined and if when you use srand to start the random generation you choose a value like 42, you will always get the same value each time you launch your app
try to launch this example you will see:
#include <stdio.h>
#include <stdlib.h>
int main()
{
srand(42);
printf("%d\n", rand());
return 0;
}
see also: https://en.wikipedia.org/wiki/List_of_random_number_generators