What if I have something like this:
int a = 20;
int min = INT_MIN;
if(-a - min)
//do something
Assume that INT_MIN if positive is more than INT_MAX. Would min ever be converted by the compiler to something like -min as in -INT_MIN, which could be undefined?
You are right that unary minus applied to INT_MIN can be undefined, but this does not happen in your example.
-a - min is parsed as (-a) - min. Variable min is only involved in binary subtraction, and the first operand only needs to be strictly negative for the result to be defined.
If the compiler transforms the subtraction to something else, it is its responsibility to ensure that the new version always computes the same thing as the old version.
The result of x - y is defined as the mathematical result of subtracting y from x. If the mathematical result can be represented in the result type (int in this case), then there is no overflow.
A compiler is free to transform the expression in any way it likes, such as by changing
x - y
to
x + (-y)
but only if the transformation keeps the same behavior in cases where the original behavior is well defined. In the case of y == INT_MIN, it can still perform the transformation as long as the undefined behavior of evaluating -INT_MIN yields the same end result (which it typically will).
To answer the question in the title:
Is INT_MIN subtracted from any integer considered undefined behavior?
INT_MIN - INT_MIN == 0, and cannot overflow.
Incidentally, I think you mean int rather than "integer". int is just one of several integer types.
Related
Let's assume the largest number an int variable can hold is 10. Consider the following situation:
main()
{
int r1 = 10;
int r2 = 1;
int x = r1 + r2;
}
According to my current little knowledge, r1 + r2 expression creates a temporary variable to hold the result before copying that value to x.
What i want to know is since the largest x can hold is 10, i know (it's a guess actually) that if i print x, i get 10. But what about r1 + r2 ?. Does this temporary variable that represent the result of r1 + r2 expression also hold 10 ?.
In other words does this temporary variable also has a largest it can hold ?
This is probably a noob question and i apologise.
Please Note:
I asked this question based on what i thought what overflowing is. That is; i thought when a variable reach to a state where (let's say for an integer case), if i add one more integer to it's value it's gonna overflow. And i thought when that happens the maximum value it holds gonna stay the same regardless of me increasing it. But that's not the case apparently. The behaviour is undefined when overflow for most types. check #bolov's answer
Signed integers
Computing a value larger than the maximum value or smaller than the minimum value of an signed integer type is called "overflow" and is Undefined Behavior.
E.g.:
int a = std::numeric_limits<int>::max();
int b = 1;
a + b;
The above program has Undefined Behavior because the type of a + b is int and the value computed would overflow.
§ 8 Expressions [expr]
§ 8.1 Preamble [expr.pre]
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for
its type, the behavior is undefined.
Unsigned integers
Unsigned integers do not overflow because they are always computed in modulo arithmetic.
unsigned a = std::numeric_limits<unsigned>::max();
a + 1; // guaranteed to be 0
unsigned b = 0;
b - 1; // guaranteed to be std::numeric_limits<unsigned>::max();
§6.9.1 Fundamental types [basic.fundamental]
Unsigned integers shall obey the laws of arithmetic modulo 2n where n
is the number of bits in the value representation of that particular
size of integer 49
49) This implies that unsigned arithmetic does not overflow because a
result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting unsigned integer type.
BTW, you cannot be sure, if it creates a new variable at all. I mean it all depends on the compiler, compiler options, etc. For instance, in some circumstances, the compiler can just calculate the value of the r-value (if it is vivid and possible for that point) and just put the calculated value into the variable.
For your example, it is obvious, that r1 + r2 == 11. Then the x might be constructed using a value 11. And this also doesn't mean, that the x will 100% constructed and a constructor will be called for him.
Once I debugged and saw, that a variable I declared (and defined) was not created at all (and also some calculations I had). That was because I didn't use the variable in any meaningful way and set the optimization to the highest level.
//Code here
long a = 42;
if(a > INT_MAX + 1)
When I do this comparison, a > INT_MAX + 1 actually returns true, which makes me confused.
The reason seems like INT_MAX + 1 is overflowed. But Why? INT_MAX should be just a macro which defined by a constant like 2^32 - 1, therefore INT_MAX + 1 should be just another constant value 2^32. And since a is long, then during compiling the compiler should also implicitly convert the INT_MAX + 1 to long type, which should be longer than int and not be overflowed.
I cannot understand why it is actually overflowed.
Could anybody help me? Thanks a lot.
therefore INT_MAX + 1 should be just another constant value
It is an arithmetic expression. More specifically, it is an addition operation. The addition overflows and behaviour of the program is undefined.
therefore during compiling the compiler should also implicitly convert the INT_MAX + 1 to long type
It does. But the conversion of the result happens after the operation.
You can fix the expression by using a - 1 > INT_MAX. Although that also has a failure case when A is LONG_MIN. Another approach is to convert one of the operands of the addition to a larger type (if a larger type exists on the system).
You can do:
(long long)INT_MAX + 1
In order to treat the values as 64-bit BEFORE the addition takes places, avoiding the overflow.
Keep in mind, long is 32-bit on some compilers (MSVC). long long, I believe, has a guaranty of at least 64.
INT_MAX + 1 is evaluated as an int before the comparison. It overflows and causes undefined behavior. Some implementations evaluate it to be -1 using wrap around logic. In some cases, that can be useful. You can read more about it at https://en.wikipedia.org/wiki/Integer_overflow.
If sizeof(long) is greater than sizeof(int) on your platform, you can get the expected result by using
if(a > INT_MAX + 1L)
the only thing you will have to do is just create another variable of type long and add 1 after that. here is the code for that:
long a = 42;
long b = INT_MAX;
b = b + 1;
if(a > b){
cout<<"long greater"<<b;
}
I need to calculate at compile-time the number of bits needed to represent a range.
For an unsigned range from 0 to n it is simple:
constexpr unsigned bits_to_represent(uintmax_t n)
{
return n > 0
? 1 + bits_to_represent(n/2)
: 0;
}
For a signed range, I have:
constexpr unsigned bits_in_range(intmax_t min,intmax_t max)
{
return bits_to_represent(max >= 0
? static_cast<uintmax_t>(max) - min
: max - min);
}
However this causes MSVC 2015 (recently updated) to complain:
warning C4308: negative integral constant converted to unsigned type
Can you explain why this happens? As a work-around, I static_cast min to uintmax_t, but I do not like this solution as it seems less portable than my preferred solution and probably even is undefined behaviour, even though I am sceptical is that can happen at compile time.
I'm not sure exactly why MSVC is giving a warning, but one thing that you are doing that could cause bad behavior is mixing signed and unsigned integers in arithmetic operations and comparisons.
You can read this for examples of problems caused by this: http://blog.regehr.org/archives/268
I would try rewriting your function like this:
constexpr unsigned bits_in_range(intmax_t min,intmax_t max)
{
return bits_to_represent(
static_cast<uintmax_t>(max) - static_cast<uintmax_t>(min));
}
This way is more programmer friendly. When you do arithmetic operations on mismatched integer types, the compiler is going to have to do implicit conversions to make them match. This way, it doesn't have to do that. Even if max and min are negative, this will still give well-defined and correct results, if you are sure that max >= min.
Do it in 4 parts. Each of min max at least zero.
If they share the same sign (with 0 as positive), 2s complement integers can have their difference represented as part of their own type.
That leaves max<min and max positive and min negative cases.
If we assume uint_max_t is big enough, arithmetic and conversion to that type all behaves according to math mod 2^n.
So unsigned(a)-unsigned(b) will actually be the unsigned distance to get from b to a as signed integers.
C = A-B mod X
C = A-B + kX
B+C=A+kX
With C positive and less than X, and X larger than B-A, gives us C must be the delta.
Thank you for your comments even though they did not explain the Microsoft warning. Clang compiles cleanly, so it might be a bug in the compiler.
Due to the nature of conversion from signed to unsigned values in C++ the correct answer will be obtained by simply casting both values (again assuming that min <= max):
constexpr unsigned bits_in_range(intmax_t min,intmax_t max)
{
return bits_to_represent(static_cast<largest_uint>(max) -
static_cast<largest_uint>(min));
}
The validity of the code can be inferred from this part of the draft standard (I looked at the newest draft but am confident that there has not been a change here).
4.7 Integral conversions [conv.integral]
If the destination type is unsigned, the resulting value is the least > unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the
unsigned type).
How can I portably find out the smallest of INT_MAX and abs(INT_MIN)? (That's the mathematical absolute value of INT_MIN, not a call to the abs function.)
It should be as same as INT_MAX in most systems, but I'm looking for a more portable way.
While the typical value of INT_MIN is -2147483648, and the typical value of INT_MAX is 2147483647, it is not guaranteed by the standard. TL;DR: The value you're searching for is INT_MAX in a conforming implementation. But calculating min(INT_MAX, abs(INT_MIN)) isn't portable.
The possible values of INT_MIN and INT_MAX
INT_MIN and INT_MAX are defined by the Annex E (Implementation limits) 1 (C standard, C++ inherits this stuff):
The contents of the header are given below, in alphabetical
order. The minimum magnitudes shown shall be replaced by
implementation-defined magnitudes with the same sign. The values shall
all be constant expressions suitable for use in #if preprocessing
directives. The components are described further in 5.2.4.2.1.
[...]
#define INT_MAX +32767
#define INT_MIN -32767
[...]
The standard requires the type int to be an integer type that can represent the range [INT_MIN, INT_MAX] (section 5.2.4.2.1.).
Then, 6.2.6.2. (Integer types, again part of the C standard), comes into play and further restricts this to what we know as two's or ones' complement:
For signed integer types, the bits of the object representation shall be divided into three
groups: value bits, padding bits, and the sign bit. There need not be any padding bits;
signed char shall not have any padding bits. There shall be exactly one sign bit.
Each bit that is a value bit shall have the same value as the same bit in the object
representation of the corresponding unsigned type (if there are M value bits in the signed
type and N in the unsigned type, then M ≤ N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the
following ways:
— the corresponding value with sign bit 0 is negated (sign and magnitude);
— the sign bit has the value −(2M) (two’s complement);
— the sign bit has the value −(2M − 1) (ones’ complement).
Section 6.2.6.2. is also very important to relate the value representation of the signed integer types with the value representation of its unsigned siblings.
This means, you either get the range [-(2^n - 1), (2^n - 1)] or [-2^n, (2^n - 1)], where n is typically 15 or 31.
Operations on signed integer types
Now for the second thing: Operations on signed integer types, that result in a value that is not within the range [INT_MIN, INT_MAX], the behavior is undefined. This is explicitly mandated in C++ by Paragraph 5/4:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of
representable values for its type, the behavior is undefined.
For C, 6.5/5 offers a very similar passage:
If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.
So what happens if the value of INT_MIN happens to be less than the negative of INT_MAX (e.g. -32768 and 32767 respectively)? Calculating -(INT_MIN) will be undefined, the same as INT_MAX + 1.
So we need to avoid ever calculating a value that may isn't in the range of [INT_MIN, INT_MAX]. Lucky, INT_MAX + INT_MIN is always in that range, as INT_MAX is a strictly positive value and INT_MIN a strictly negative value. Hence INT_MIN < INT_MAX + INT_MIN < INT_MAX.
Now we can check, whether, INT_MAX + INT_MIN is equal to, less than, or greater than 0.
`INT_MAX + INT_MIN` | value of -INT_MIN | value of -INT_MAX
------------------------------------------------------------------
< 0 | undefined | -INT_MAX
= 0 | INT_MAX = -INT_MIN | -INT_MAX = INT_MIN
> 0 | cannot occur according to 6.2.6.2. of the C standard
Hence, to determine the minimum of INT_MAX and -INT_MIN (in the mathematical sense), the following code is sufficient:
if ( INT_MAX + INT_MIN == 0 )
{
return INT_MAX; // or -INT_MIN, it doesn't matter
}
else if ( INT_MAX + INT_MIN < 0 )
{
return INT_MAX; // INT_MAX is smaller, -INT_MIN cannot be represented.
}
else // ( INT_MAX + INT_MIN > 0 )
{
return -INT_MIN; // -INT_MIN is actually smaller than INT_MAX, may not occur in a conforming implementation.
}
Or, to simplify:
return (INT_MAX + INT_MIN <= 0) ? INT_MAX : -INT_MIN;
The values in a ternary operator will only be evaluated if necessary. Hence, -INT_MIN is either left unevaluated (therefore cannot produce UB), or is a well-defined value.
Or, if you want an assertion:
assert(INT_MAX + INT_MIN <= 0);
return INT_MAX;
Or, if you want that at compile time:
static_assert(INT_MAX + INT_MIN <= 0, "non-conforming implementation");
return INT_MAX;
Getting integer operations right (i.e. if correctness matters)
If you're interested in safe integer arithmetic, have a look at my implementation of safe integer operations. If you want to see the patterns (rather than this lengthy text output) on which operations fail and which succeed, choose this demo.
Depending on the architecture, there may be other options to ensure correctness, such as gcc's option -ftrapv.
INT_MAX + INT_MIN < 0 ? INT_MAX : -INT_MIN
Edited to add explanation: Of course the difficulty is that -INT_MIN or abs(INT_MIN) will be undefined if -INT_MIN is too big to fit in an int. So we need some way of checking whether this is the case. The condition INT_MAX + INT_MIN < 0 tests whether -INT_MIN is greater than INT_MAX. If it is, then INT_MAX is the smaller of the two absolute values. If not, then INT_MAX is the larger of the two absolute values, and -INT_MIN is the correct answer.
In C99 and above, INT_MAX.
Quoth the spec:
For signed integer types, the bits of the object representation shall be divided into three
groups: value bits, padding bits, and the sign bit. There need not be any padding bits;
signed char shall not have any padding bits. There shall be exactly one sign bit.
Each bit that is a value bit shall have the same value as the same bit in the object
representation of the corresponding unsigned type (if there are M value bits in the signed
type and N in the unsigned type, then M ≤ N). If the sign bit is zero, it shall not affect
the resulting value. If the sign bit is one, the value shall be modified in one of the
following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value −(2^M) (two’s complement);
the sign bit has the value −(2^M − 1) (ones’ complement).
(Section 6.2.6.2 of http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf)
On most systems, abs (INT_MIN) is not defined. For example, on typical 32 bit machines, INT_MAX = 2^31 - 1, INT_MIN = - 2^31, and abs (INT_MIN) cannot be 2^31.
-INT_MAX is representable as an int in all C and C++ dialects, as far as I know. Therefore:
-INT_MAX <= INT_MIN ? -INT_MIN : INT_MAX
abs(INT_MIN) will invoke undefined behavior. Standard says
7.22.6.1 The abs, labs and llabs functions:
The abs, labs, and llabs functions compute the absolute value of an integer j. If the result cannot be represented, the behavior is undefined.
Try this instead :
Convert INT_MIN to unsignrd int. Since -ve numbers can't be represented as an unsigned int, INT_MAX will be converted to UINT_MAX + 1 + INT_MIN.
#include <stdio.h>
#include <stdlib.h>
unsigned min(unsigned a, unsigned b)
{
return a < b ? a : b;
}
int main(void)
{
printf("%u\n", min(INT_MAX, INT_MIN));
}
If I type:
int main() { return 0 % 0; }
I get back an error:
error C2124: divide or mod by zero
What is the reason behind this? Isn't the answer zero?
In mathematics, x mod 0 is undefined, hence the error.
From C++ standard, section 5.5:
If during the evaluation of an expression the result is not mathematically defined or not in the range of representable mathematical values for its type, the behavior is undefined. [...] Treatment of division by zero, forming a remainder using a zero divider, and all floating point exceptions vary among machines, and is usually adjustable by a library function.
Since remainder of a division by zero is mathematically undefined regardless of the number being divided, the answer is undefined according to the C++ standard.
The mod function is effectively the same as the integer division function, except that it gives you the remainder, rather than the quotient. You can't divide by zero...
(BTW, as an aside, 0/0 is not even infinity, it's indeterminate.)