Threshold an absolute value - c++

I have the following function:
char f1( int a, unsigned b ) { return abs(a) <= b; }
For execution speed, I want to rewrite it as follows:
char f2( int a, unsigned b ) { return (unsigned)(a+b) <= 2*b; } // redundant cast
Or alternatively with this signature that could have subtle implications even for non-negative b:
char f3( int a, int b ) { return (unsigned)(a+b) <= 2*b; }
Both of these alternatives work under a simple test on one platform, but I need it to portable. Assuming non-negative b and no risk of overflow, is this a valid optimization for typical hardware and C compilers? Is it also valid for C++?
Note: As C++ on gcc 4.8 x86_64 with -O3, f1() uses 6 machine instructions and f2() uses 4. The instructions for f3() are identical to those for f2(). Also of interest: if b is given as a literal, both functions compile to 3 instructions that directly map to the operations specified in f2().

Starting with the original code with signature
char f2( int a, unsigned b );
this contains the expression
a + b
Since one of these operands has a signed and the other an (corresponding) unsigned integer type (thus they have the same "integer conversion rank"), then - following the "Usual arithmetic conversions" (§ 6.3.1.8) - the operand with signed integer type is converted to the unsigned type of the other operand.
Conversion to an unsigned integer type is well defined, even if the value in question cannot be represented by the new type:
[..] if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. 60
§ 6.3.1.3/2
Footnote 60 just says that the described arithmetic works with the mathematical value, not the typed one.
Now, with the updated code
char f2_updated( int a, int b ); // called f3 in the question
things would look different. But since b is assumed to be non-negative, and assuming that INT_MAX <= UINT_MAX you can convert b to an unsigned without fearing it to have a different mathematical value afterwards. Thus you could write
char f2_updated( int a, int b ) {
return f2(a, (unsigned)b); // cast unnecessary but to make it clear
}
Looking again at f2 the expression 2*b further limits the allowed range of b to be not larger than UINT_MAX/2 (otherwise the mathematical result would be wrong).
So as long as you stay within these bounds, every thing is fine.
Note: Unsigned types do not overflow, they "wrap" according to modular arithmetic.
Quotes from N1570 (a C11 working draft)
A final remark:
IMO the only really reasonable choice to write this function is as
#include <stdbool.h>
#include <assert.h>
bool abs_bounded(int value, unsigned bound) {
assert(bound <= (UINT_MAX / 2));
/* NOTE: Casting to unsigned makes the implicit conversion that
otherwise would happen explicit. */
return ((unsigned)value + bound) <= (2 * bound);
}
Using a signed type for the bound does not make much sense, because the absolute of a value cannot be less than a negative number. abs_bounded(value, something_negative) would be always false. If there's the possibility of a negative bound, then I'd catch this outside of this function (otherwise it does "too much"), like:
int some_bound;
// ...
if ((some_bound >= 0) && abs_bounded(my_value, some_bound)) {
// yeeeha
}

As OP wants fast and portable code (and b is positive), it first makes sense to code safely:
// return abs(a) <= b;
inline bool f1_safe(int a, unsigned b ) {
return (a >= 0 && a <= b) || (a < 0 && 0u - a <= b);
}
This works for all a,b (assuming UINT_MAX > INT_MAX). Next, compare alternatives using an optimized compile (let the compiler do what it does best).
The following slight variation on OP's code will work in C/C++ but risks portability issues unless "Assuming non-negative b and no risk of overflow" can be certain on all target machines.
bool f2(int a, unsigned b) { return a+b <= b*2; }
In the end, OP goal of fast and portable code may find code the works optimally for the select platform, but not with others - such is micro-optimization.

To determine if the 2 expressions are equivalent for your purpose, you must study the domain of definition:
abs(a) <= b is defined for all values of int a and unsigned b, with just one special case for a = INT_MIN;. On 2s complement architectures, abs(INT_MIN) is not defined but most likely evaluates to INT_MIN, which converted to unsigned as required for the <= with an unsigned value, yields the correct value.
(unsigned)(a+b) <= 2*b may produce a different result for b > UINT_MAX/2. For example, it will evaluate to false for a = 1 and b = UINT_MAX/2+1. There might be more cases where you alternate formula gives an incorrect result.
EDIT: OK, the question was edited... and b is now an int.
Note that a+b invokes undefined behavior in case of overflow and the same for 2*b. So you make the assumption that neither a+b nor 2*b overflow. Furthermore, if b is negative, you little trick does not work.
If a is in the range -INT_MAX/2..INT_MAX/2 and b in the range 0..INT_MAX/2, it seems to function as expected. The behavior is identical in C and C++.
Whether it is an optimization depends completely on the compiler, command line options, hardware capabilities, surrounding code, inlining, etc. You already address this part and tell us that you shave one or two instructions... Just remember that this kind of micro-optimization is not absolute. Even counting instructions does not necessarily help find the best performance. Did you perform some benchmarks to measure if this optimization is worthwhile? Is the difference even measurable?
Micro-optimizing such a piece of code is self-defeating: it makes the code less readable and potentially incorrect. b might not be negative in the current version, but if the next maintainer changes that, he/she might not see the potential implications.

Yes, this is portable to compliant platforms. The conversion from signed to unsigned is well defined:
Conversion between signed integer and unsigned integer
int to unsigned int conversion
Signed to unsigned conversion in C - is it always safe?
The description in the C spec is a bit contrived:
if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.
The C++ spec addresses the same conversion in a more sensible way:
In a two's complement representation, this conversion is conceptual
and there is no change in the bit pattern
In the question, f2() and f3() achieve the same results in a slightly different way.
In f2() the presence of the unsigned operand causes a conversion of the signed operand as required here for C++. The unsigned addition may-or-may-not then result in a wrap-around past zero, which is also well defined [citation needed].
In f3() the addition occurs in signed representation with no trickiness, and then the result is (explicitly) converted to unsigned. So this is slightly simpler than f2() (and also more clear).
In both cases, the you end up with the same unsigned representation of the sum, which can then be compared (as unsigned) to 2*b. And the trick of treating a signed value as an unsigned type allows you to check a two-sided range with only a single comparison. Note also that this is a bit more flexible than using the abs() function since the trick doesn't require that the range be centered around zero.
Commentary on the "usual arithmetic conversions"
I think this question demonstrated that using unsigned types is generally a bad idea. Look at the confusion it caused here.
It can be tempting to use unsigned for documentation purposes (or to take advantage of the shifted value range), but due to the conversion rules, this may tend to be a mistake. In my opinion, the "usual arithmetic conversions" are not sensible if you assume that arithmetic is more likely to involve negative values than to overflow signed values.
I asked this followup question to clarify the point: mixed-sign integer math depends on variable size. One new thing that I have learned is that mixed-sign operations are not generally portable because the conversion type will depend on the size relative to that of int.
In summary: Using type declarations or casts to perform unsigned operations is a low-level coding style that should be approached with the requisite caution.

Related

Why is the sign different after subtracting unsigned and signed?

unsigned int t = 10;
int d = 16;
float c = t - d;
int e = t - d;
Why is the value of c positive but e negative?
Let's start by analysing the result of t - d.
t is an unsigned int while d is an int, so to do arithmetic on them, the value of d is converted to an unsigned int (C++ rules say unsigned gets preference here). So we get 10u - 16u, which (assuming 32-bit int) wraps around to 4294967290u.
This value is then converted to float in the first declaration, and to int in the second one.
Assuming the typical implementation of float (32-bit single-precision IEEE), its highest representable value is roughly 1e38, so 4294967290u is well within that range. There will be rounding errors, but the conversion to float won't overflow.
For int, the situation's different. 4294967290u is too big to fit into an int, so wrap-around happens and we arrive back at the value -6. Note that such wrap-around is not guaranteed by the standard: the resulting value in this case is implementation-defined(1), which means it's up to the compiler what the result value is, but it must be documented.
(1) C++17 (N4659), [conv.integral] 7.8/3:
If the destination type is signed, the value is unchanged if it can be represented in the destination type;
otherwise, the value is implementation-defined.
First, you have to understand "usual arithmetic conversions" (that link is for C, but the rules are the same in C++). In C++, if you do arithmetic with mixed types (you should avoid that when possible, by the way), there's a set of rules that decides which type the calculation is done in.
In your case, you are subtracting a signed int from an unsigned int. The promotion rules say that the actual calculation is done using unsigned int.
So your calculation is 10 - 16 in unsigned int arithmetic. Unsigned arithmetic is modulo arithmetic, meaning that it wraps around. So, assuming your typical 32-bit int, the result of this calculation is 2^32 - 6.
This is the same for both lines. Note that the subtraction is completely independent from the assignment; the type on the left side has absolutely no influence on how the calculation happens. It is a common beginner mistake to think that the type on the left side somehow influences the calculation; but float f = 5 / 6 is zero, because the division still uses integer arithmetic.
The difference, then, is what happens during the assignment. The result of the subtraction is implicitly converted to float in one case, and int in the other.
The conversion to float tries to find the closest value to the actual one that the type can represent. This will be some very large value; not quite the one the original subtraction yielded though.
The conversion to int says that if the value fits into the range of int, the value will be unchanged. But 2^32 - 6 is far larger than the 2^31 - 1 that a 32-bit int can hold, so you get the other part of the conversion rule, which says that the resulting value is implementation-defined. This is a term in the standard that means "different compilers can do different things, but they have to document what they do".
For all practical purposes, all compilers that you'll likely encounter say that the bit pattern stays the same and is just interpreted as signed. Because of the way 2's complement arithmetic works (the way that almost all computers represent negative numbers), the result is the -6 you would expect from the calculation.
But all this is a very long way of repeating the first point, which is "don't do mixed type arithmetic". Cast the types first, explicitly, to types that you know will do the right thing.

Is masking before unsigned left shift in C/C++ too paranoid?

This question is motivated by me implementing cryptographic algorithms (e.g. SHA-1) in C/C++, writing portable platform-agnostic code, and thoroughly avoiding undefined behavior.
Suppose that a standardized crypto algorithm asks you to implement this:
b = (a << 31) & 0xFFFFFFFF
where a and b are unsigned 32-bit integers. Notice that in the result, we discard any bits above the least significant 32 bits.
As a first naive approximation, we might assume that int is 32 bits wide on most platforms, so we would write:
unsigned int a = (...);
unsigned int b = a << 31;
We know this code won't work everywhere because int is 16 bits wide on some systems, 64 bits on others, and possibly even 36 bits. But using stdint.h, we can improve this code with the uint32_t type:
uint32_t a = (...);
uint32_t b = a << 31;
So we are done, right? That's what I thought for years. ... Not quite. Suppose that on a certain platform, we have:
// stdint.h
typedef unsigned short uint32_t;
The rule for performing arithmetic operations in C/C++ is that if the type (such as short) is narrower than int, then it gets widened to int if all values can fit, or unsigned int otherwise.
Let's say that the compiler defines short as 32 bits (signed) and int as 48 bits (signed). Then these lines of code:
uint32_t a = (...);
uint32_t b = a << 31;
will effectively mean:
unsigned short a = (...);
unsigned short b = (unsigned short)((int)a << 31);
Note that a is promoted to int because all of ushort (i.e. uint32) fits into int (i.e. int48).
But now we have a problem: shifting non-zero bits left into the sign bit of a signed integer type is undefined behavior. This problem happened because our uint32 was promoted to int48 - instead of being promoted to uint48 (where left-shifting would be okay).
Here are my questions:
Is my reasoning correct, and is this a legitimate problem in theory?
Is this problem safe to ignore because on every platform the next integer type is double the width?
Is a good idea to correctly defend against this pathological situation by pre-masking the input like this?: b = (a & 1) << 31;. (This will necessarily be correct on every platform. But this could make a speed-critical crypto algorithm slower than necessary.)
Clarifications/edits:
I'll accept answers for C or C++ or both. I want to know the answer for at least one of the languages.
The pre-masking logic may hurt bit rotation. For example, GCC will compile b = (a << 31) | (a >> 1); to a 32-bit bit-rotation instruction in assembly language. But if we pre-mask the left shift, it is possible that the new logic is not translated into bit rotation, which means now 4 operations are performed instead of 1.
Speaking to the C side of the problem,
Is my reasoning correct, and is this a legitimate problem in theory?
It is a problem that I had not considered before, but I agree with your analysis. C defines the behavior of the << operator in terms of the type of the promoted left operand, and it it conceivable that the integer promotions result in that being (signed) int when the original type of that operand is uint32_t. I don't expect to see that in practice on any modern machine, but I'm all for programming to the actual standard as opposed to my personal expectations.
Is this problem safe to ignore because on every platform the next integer type is double the width?
C does not require such a relationship between integer types, though it is ubiquitous in practice. If you are determined to rely only on the standard, however -- that is, if you are taking pains to write strictly conforming code -- then you cannot rely on such a relationship.
Is a good idea to correctly defend against this pathological situation by pre-masking the input like this?: b = (a & 1) << 31;.
(This will necessarily be correct on every platform. But this could
make a speed-critical crypto algorithm slower than necessary.)
The type unsigned long is guaranteed to have at least 32 value bits, and it is not subject to promotion to any other type under the integer promotions. On many common platforms it has exactly the same representation as uint32_t, and may even be the same type. Thus, I would be inclined to write the expression like this:
uint32_t a = (...);
uint32_t b = (unsigned long) a << 31;
Or if you need a only as an intermediate value in the computation of b, then declare it as an unsigned long to begin with.
Q1: Masking before the shift does prevent undefined behavior that OP has concern.
Q2: "... because on every platform the next integer type is double the width?" --> no. The "next" integer type could be less than 2x or even the same size.
The following is well defined for all compliant C compilers that have uint32_t.
uint32_t a;
uint32_t b = (a & 1) << 31;
Q3: uint32_t a; uint32_t b = (a & 1) << 31; is not expected to incur code that performs a mask - it is not needed in the executable - just in the source. If a mask does occur, get a better compiler should speed be an issue.
As suggested, better to emphasize the unsigned-ness with these shifts.
uint32_t b = (a & 1U) << 31;
#John Bollinger good answer well details how to handle OP's specific problem.
The general problem is how to form a number that is of at least n bits, a certain sign-ness and not subject to surprising integer promotions - the core of OP's dilemma. The below fulfills this by invoking an unsigned operation that does not change the value - effective a no-op other than type concerns. The product will be at least the width of unsigned or uint32_t. Casting, in general, may narrow the type. Casting needs to be avoided unless narrowing is certain to not occur. An optimization compiler will not create unnecessary code.
uint32_t a;
uint32_t b = (a + 0u) << 31;
uint32_t b = (a*1u) << 31;
Taking a clue from this question about possible UB in uint32 * uint32 arithmetic, the following simple approach should work in C and C++:
uint32_t a = (...);
uint32_t b = (uint32_t)((a + 0u) << 31);
The integer constant 0u has type unsigned int. This promotes the addition a + 0u to uint32_t or unsigned int, whichever is wider. Because the type has rank int or higher, no more promotion occurs, and the shift can be applied with the left operand being uint32_t or unsigned int.
The final cast back to uint32_t will just suppress potential warnings about a narrowing conversion (say if int is 64 bits).
A decent C compiler should be able to see that adding zero is a no-op, which is less onerous than seeing that a pre-mask has no effect after an unsigned shift.
To avoid unwanted promotion, you may use the greater type with some typedef, as
using my_uint_at_least32 = std::conditional_t<(sizeof(std::uint32_t) < sizeof(unsigned)),
unsigned,
std::uint32_t>;
For this segment of code:
uint32_t a = (...);
uint32_t b = a << 31;
To promote a to a unsigned type instead of signed type, use:
uint32_t b = a << 31u;
When both sides of << operator is an unsigned type, then this line in 6.3.1.8 (C standard draft n1570) applies:
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
The problem you are describing is caused you use 31 which is signed int type so another line in 6.3.1.8
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
forces a to promoted to a signed type
Update:
This answer is not correct because 6.3.1.1(2) (emphasis mine):
...
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions.58) All other types are unchanged by the integer
promotions.
and footnote 58 (emphasis mine):
58) The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators, as specified by their respective subclauses.
Since only integer promotion is happening and not common arithmetic conversion, using 31u does not guarantee a to be converted to unsigned int as stated above.

Does cast change variable in c++, or only tell compiler its ok

If I have this code:
int A;
unsigned int B;
if (A==B) foo();
the compiler will complain about mixed types in comparison. If I cast A like this:
if ((unsigned int) A==B) foo();
does this instruct the compiler to insert code to convert A from int to unsigned int? Or does it just tell the compiler don't worry about, ignore the type mismatch?
UPDATE: If this is unsafe (as pointed out below), how should I handle this comparison? (Wouldn't assigning the contents of an int to an unsigned int for later comparison also be unsafe)
UPDATE: Wow are there some different answers (from people with thousands of posts). I've accepted what seems like the best, but anyone reading this question should read ALL answers carefully.
When casting, at least at the conceptual level, compiler will create a temporary variable of the type specified in the cast expression.
You may test that this expression:
(unsigned int) A = B; // This time assignment is intended
will generate an error pointing modification of a temporary (const) variable.
Of course compiler is free to optimize away any temporary variables created through a cast. Nevertheless a valid method to build a temporary must exist.
The cast implies a conversion, if necessary. But this is problematic for negative values. They are mapped to positive values on the unsigned type. Thus you have to make sure a negative value never compares equal any (positive) unsigned value:
int A;
unsigned int B;
...
if ( (A >= 0) && (static_cast<unsigned int>(A) == B) )
foo();
This works because the unsigned variant of an integer type is guaranteed to hold all positive values (including 0) of the corressponding signed type.
Notice the usage of a static_cast instead of the "classic" C-style cast.
With plain types, in C and C++, == is always done with both operands converted to the same type. In OP's code, A is converted to unsigned first.
If I cast ... does this instruct the compiler to insert code to convert A from int to unsigned int?
Yes, but that code would have occurred anyway. Without the cast, the compiler is simple warning that it is going to do something that the programmer may not have intended.
Or (If I cast ) does it just tell the compiler don't worry about, ignore the type mismatch?
The type mis-match is not ignored. By supplying the cast, there is no type mis-match to warn about.
How should I handle this comparison?
Insure A is not negative, then convert to unsigned with a cast.
int A;
unsigned int B;
// if (A==B) foo();
if (A >= 0 && (unsigned)A == B) foo();
Every non-negative int can be converted to an unsigned with no value change.
The range of nonnegative values of a signed integer type is a subrange of the
corresponding unsigned integer type C11dr §6.2.5 9
So you question is just about a signed/unsigned comparison.
C++ standard says in clause 5 Expressions [expr] § 10:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follow:...
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
and in 4.7 Integral conversions [conv.integral] §2
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s
complement representation, this conversion is conceptual and there is no change in the bit pattern (if there
is no truncation). —end note ]
That means that on a common system using 2-complement for negative numbers and 32 bits for an int or unsigned int, (unsigned int) -1 will end in 4294967295.
It may be what you want or not, the compiler just warn you that it will consider them as equal.
If it is not what you want, just first test whether the signed value is negative. If it is, say that they are not equal an skip the equality comparison.
It depends on type of cast and what you are casting. In your particular case nothing is going to happen, but in other cases the actual code will be performed. Simplest example:
void foo(double d) {};
...
int x;
foo(static_cast<double>(x));
In this example there would be code generated.

Why are unsigned integers error prone?

I was looking at this video. Bjarne Stroustrup says that unsigned ints are error prone and lead to bugs. So, you should only use them when you really need them. I've also read in one of the question on Stack Overflow (but I don't remember which one) that using unsigned ints can lead to security bugs.
How do they lead to security bugs? Can someone clearly explain it by giving an suitable example?
One possible aspect is that unsigned integers can lead to somewhat hard-to-spot problems in loops, because the underflow leads to large numbers. I cannot count (even with an unsigned integer!) how many times I made a variant of this bug
for(size_t i = foo.size(); i >= 0; --i)
...
Note that, by definition, i >= 0 is always true. (What causes this in the first place is that if i is signed, the compiler will warn about a possible overflow with the size_t of size()).
There are other reasons mentioned Danger – unsigned types used here!, the strongest of which, in my opinion, is the implicit type conversion between signed and unsigned.
One big factor is that it makes loop logic harder: Imagine you want to iterate over all but the last element of an array (which does happen in the real world). So you write your function:
void fun (const std::vector<int> &vec) {
for (std::size_t i = 0; i < vec.size() - 1; ++i)
do_something(vec[i]);
}
Looks good, doesn't it? It even compiles cleanly with very high warning levels! (Live) So you put this in your code, all tests run smoothly and you forget about it.
Now, later on, somebody comes along an passes an empty vector to your function. Now with a signed integer, you hopefully would have noticed the sign-compare compiler warning, introduced the appropriate cast and not have published the buggy code in the first place.
But in your implementation with the unsigned integer, you wrap and the loop condition becomes i < SIZE_T_MAX. Disaster, UB and most likely crash!
I want to know how they lead to security bugs?
This is also a security problem, in particular it is a buffer overflow. One way to possibly exploit this would be if do_something would do something that can be observed by the attacker. They might be able to find what input went into do_something, and that way data the attacker should not be able to access would be leaked from your memory. This would be a scenario similar to the Heartbleed bug. (Thanks to ratchet freak for pointing that out in a comment.)
I'm not going to watch a video just to answer a question, but one issue is the confusing conversions which can happen if you mix signed and unsigned values. For example:
#include <iostream>
int main() {
unsigned n = 42;
int i = -42;
if (i < n) {
std::cout << "All is well\n";
} else {
std::cout << "ARITHMETIC IS BROKEN!\n";
}
}
The promotion rules mean that i is converted to unsigned for the comparison, giving a large positive number and a surprising result.
Although it may only be considered as a variant of the existing answers: Referring to "Signed and unsigned types in interfaces," C++ Report, September 1995 by Scott Meyers, it's particularly important to avoid unsigned types in interfaces.
The problem is that it becomes impossible to detect certain errors that clients of the interface could make (and if they could make them, they will make them).
The example given there is:
template <class T>
class Array {
public:
Array(unsigned int size);
...
and a possible instantiation of this class
int f(); // f and g are functions that return
int g(); // ints; what they do is unimportant
Array<double> a(f()-g()); // array size is f()-g()
The difference of the values returned by f() and g() might be negative, for an awful number of reasons. The constructor of the Array class will receive this difference as a value that is implicitly converted to be unsigned. Thus, as the implementor of the Array class, one can not distinguish between an erreonously passed value of -1, and a very large array allocation.
The big problem with unsigned int is that if you subtract 1 from an unsigned int 0, the result isn't a negative number, the result isn't less than the number you started with, but the result is the largest possible unsigned int value.
unsigned int x = 0;
unsigned int y = x - 1;
if (y > x) printf ("What a surprise! \n");
And this is what makes unsigned int error prone. Of course unsigned int works exactly as it is designed to work. It's absolutely safe if you know what you are doing and make no mistakes. But most people make mistakes.
If you are using a good compiler, you turn on all the warnings that the compiler produces, and it will tell you when you do dangerous things that are likely to be mistakes.
The problem with unsigned integer types is that depending upon their size they may represent one of two different things:
Unsigned types smaller than int (e.g. uint8) hold numbers in the range 0..2ⁿ-1, and calculations with them will behave according to the rules of integer arithmetic provided they don't exceed the range of the int type. Under present rules, if such a calculation exceeds the range of an int, a compiler is allowed to do anything it likes with the code, even going so far as to negate the laws of time and causality (some compilers will do precisely that!), and even if the result of the calculation would be assigned back to an unsigned type smaller than int.
Unsigned types unsigned int and larger hold members of the abstract wrapping algebraic ring of integers congruent mod 2ⁿ; this effectively means that if a calculation goes outside the range 0..2ⁿ-1, the system will add or subtract whatever multiple of 2ⁿ would be required to get the value back in range.
Consequently, given uint32_t x=1, y=2; the expression x-y may have one of two meanings depending upon whether int is larger than 32 bits.
If int is larger than 32 bits, the expression will subtract the number 2 from the number 1, yielding the number -1. Note that while a variable of type uint32_t can't hold the value -1 regardless of the size of int, and storing either -1 would cause such a variable to hold 0xFFFFFFFF, but unless or until the value is coerced to an unsigned type it will behave like the signed quantity -1.
If int is 32 bits or smaller, the expression will yield a uint32_t value which, when added to the uint32_t value 2, will yield the uint32_t value 1 (i.e. the uint32_t value 0xFFFFFFFF).
IMHO, this problem could be solved cleanly if C and C++ were to define new unsigned types [e.g. unum32_t and uwrap32_t] such that a unum32_t would always behave as a number, regardless of the size of int (possibly requiring the right-hand operation of a subtraction or unary minus to be promoted to the next larger signed type if int is 32 bits or smaller), while a wrap32_t would always behave as a member of an algebraic ring (blocking promotions even if int were larger than 32 bits). In the absence of such types, however, it's often impossible to write code which is both portable and clean, since portable code will often require type coercions all over the place.
Numeric conversion rules in C and C++ are a byzantine mess. Using unsigned types exposes yourself to that mess to a much greater extent than using purely signed types.
Take for example the simple case of a comparison between two variables, one signed and the other unsigned.
If both operands are smaller than int then they will both be converted to int and the comparison will give numerically correct results.
If the unsigned operand is smaller than the signed operand then both will be converted to the type of the signed operand and the comparison will give numerically correct results.
If the unsigned operand is greater than or equal in size to the signed operand and also greater than or equal in size to int then both will be converted to the type of the unsigned operand. If the value of the signed operand is less than zero this will lead to numerically incorrect results.
To take another example consider multiplying two unsigned integers of the same size.
If the operand size is greater than or equal to the size of int then the multiplication will have defined wraparound semantics.
If the operand size is smaller than int but greater than or equal to half the size of int then there is the potential for undefined behaviour.
If the operand size is less than half the size of int then the multiplication will produce numerically correct results. Assigning this result back to a variable of the original unsigned type will produce defined wraparound semantics.
In addition to range/warp issue with unsigned types. Using mix of unsigned and signed integer types impact significant performance issue for processor. Less then floating point cast, but quite a lot to ignore that. Additionally compiler may place range check for the value and change the behavior of further checks.

Why does this if condition fail for comparison of negative and positive integers [duplicate]

This question already has answers here:
sizeof() operator in if-statement
(5 answers)
Closed 4 years ago.
#include <stdio.h>
int arr[] = {1,2,3,4,5,6,7,8};
#define SIZE (sizeof(arr)/sizeof(int))
int main()
{
printf("SIZE = %d\n", SIZE);
if ((-1) < SIZE)
printf("less");
else
printf("more");
}
The output after compiling with gcc is "more". Why the if condition fails even when -1 < 8?
The problem is in your comparison:
if ((-1) < SIZE)
sizeof typically returns an unsigned long, so SIZE will be unsigned long, whereas -1 is just an int. The rules for promotion in C and related languages mean that -1 will be converted to size_t before the comparison, so -1 will become a very large positive value (the maximum value of an unsigned long).
One way to fix this is to change the comparison to:
if (-1 < (long long)SIZE)
although it's actually a pointless comparison, since an unsigned value will always be >= 0 by definition, and the compiler may well warn you about this.
As subsequently noted by #Nobilis, you should always enable compiler warnings and take notice of them: if you had compiled with e.g. gcc -Wall ... the compiler would have warned you of your bug.
TL;DR
Be careful with mixed signed/unsigned operations (use -Wall compiler warnings). The Standard has a long section about it. In particular, it is often but not always true that signed is value-converted to unsigned (although it does in your particular example). See this explanation below (taken from this Q&A)
Relevant quote from the C++ Standard:
5 Expressions [expr]
10 Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows:
[2 clauses about equal types or types of equal sign omitted]
— Otherwise, if the operand that has unsigned integer type has rank
greater than or equal to the rank of the type of the other operand,
the operand with signed integer type shall be converted to the type of
the operand with unsigned integer type.
— Otherwise, if the type of
the operand with signed integer type can represent all of the values
of the type of the operand with unsigned integer type, the operand
with unsigned integer type shall be converted to the type of the
operand with signed integer type.
— Otherwise, both operands shall be
converted to the unsigned integer type corresponding to the type of
the operand with signed integer type.
Your actual example
To see into which of the 3 cases your program falls, modify it slightly to this
#include <stdio.h>
int arr[] = {1,2,3,4,5,6,7,8};
#define SIZE (sizeof(arr)/sizeof(int))
int main()
{
printf("SIZE = %zu, sizeof(-1) = %zu, sizeof(SIZE) = %zu \n", SIZE, sizeof(-1), sizeof(SIZE));
if ((-1) < SIZE)
printf("less");
else
printf("more");
}
On the Coliru online compiler, this prints 4 and 8 for the sizeof() of -1 and SIZE, respectively, and selects the "more" branch (live example).
The reason is that the unsigned type is of greater rank than the signed type. Hence, clause 1 applies and the signed type is value-converted to the unsigned type (on most implementation, typically by preserving the bit-representation, so wrapping around to a very large unsigned number), and the comparison then proceeds to select the "more" branch.
Variations on a theme
Rewriting the condition to if ((long long)(-1) < (unsigned)SIZE) would take the "less" branch (live example).
The reason is that the signed type is of greater rank than the unsigned type and can also accomodate all the unsigned values. Hence, clause 2 applies and the unsigned type is converted to the signed type, and the comparison then proceeds to select the "less" branch.
Of course, you would never write such a contrived if() statement with explicit casts, but the same effect could happen if you compare variables with types long long and unsigned. So it illustrates the point that mixed signed/unsigned arithmetic is very subtle and depends on the relative sizes ("ranking" in the words of the Standard). In particular, there is no fixed rules saying that signed will always be converted to unsigned.
When you do comparison between signed and unsigned where unsigned has at least an equal rank to that of the signed type (see TemplateRex's answer for the exact rules), the signed is converted to the type of the unsigned.
With regards to your case, on a 32bit machine the binary representation of -1 as unsigned is 4294967295. So in effect you are comparing if 4294967295 is smaller than 8 (it isn't).
If you had enabled warnings, you would have been warned by the compiler that something fishy is going on:
warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Since the discussion has shifted a bit on how appropriate the use of unsigned is, let me put a quote by James Gosling with regards to the lack of unsigned types in Java (and I will shamelessly link to another post of mine on the subject):
Gosling: For me as a language designer, which I don't really count
myself as these days, what "simple" really ended up meaning was could
I expect J. Random Developer to hold the spec in his head. That
definition says that, for instance, Java isn't -- and in fact a lot of
these languages end up with a lot of corner cases, things that nobody
really understands. Quiz any C developer about unsigned, and pretty
soon you discover that almost no C developers actually understand what
goes on with unsigned, what unsigned arithmetic is. Things like that
made C complex. The language part of Java is, I think, pretty simple.
The libraries you have to look up.
This is an historical design bug of C that was also repeated in C++.
It dates back to 16-bit computers and the error was deciding to use all 16 bits to represent sizes up to 65536 giving up the possibility to represent negative sizes.
This in se wouldn't have been an error if unsigned meaning was "non-negative integer" (a size cannot logically be negative) but it's a problem with the conversion rules of the language.
Given the conversion rules of the language the unsigned type in C doesn't represent a non-negative number, but it's instead more like a bitmask (the mathematical term is actually "a member of the ℤ/n ring"). To see why consider that for the C and C++ language
unsigned - unsigned gives an unsigned result
signed + unsigned gives and unsigned result
both of them clearly make no sense at all if you read unsigned as "non-negative number".
Of course saying that the size of an object is a member of ℤ/n ring doesn't make any sense at all and here it's where the error resides.
Practical implications:
Every time you deal with the size of an object be careful because the value is unsigned and that type in C/C++ has a lot of properties that are illogical for a number. Please always remember that unsigned doesn't mean "non-negative integer" but "member of ℤ/n algebraic ring" and that, most dangerous, in case of a mixed operation an int is converted to unsigned int and not the opposite.
For example:
void drawPolyline(const std::vector<P2d>& pts) {
for (int i=0; i<pts.size()-1; i++) {
drawLine(pts[i], pts[i+1]);
}
}
is buggy, because if passed an empty vector of points it will do illegal (UB) operations. The reason is that pts.size() is an unsigned.
The rules of the language will convert 1 (an integer) to 1{mod n}, will perform the subtraction in ℤ/n resulting in (size-1){mod n}, will convert i also to a {mod n} representation and will do the comparison in ℤ/n.
C/C++ actually defines a < operator in ℤ/n (rarely done in math) and you will end up accessing pts[0], pts[1] ... and so on until huge numbers even if the input vector was empty.
A correct loop could be
void drawPolyline(const std::vector<P2d>& pts) {
for (int i=1; i<pts.size(); i++) {
drawLine(pts[i-1], pts[i]);
}
}
but I normally prefer
void drawPolyline(const std::vector<P2d>& pts) {
for (int i=0,n=pts.size(); i<n-1; i++) {
drawLine(pts[i], pts[i+1]);
}
}
in other words getting rid of unsigned as soon as possible, and just working with regular ints.
Never use unsigned to represent size of containers or counters because unsigned means "member of ℤ/n" and the size of a container is not one of those things. Unsigned types are useful, but NOT to represent size of objects.
The standard C/C++ library unfortunately made this wrong choice, and it's too late to fix it. You are not forced to do the same mistake however.
In the words of Bjarne Stroustrup:
Using an unsigned instead of an int to gain one more bit to represent
positive integers is almost never a good idea. Attempts to ensure that
some values are positive by declaring variables unsigned will
typically be defeated by the implicit conversion rules
well, i'm not going to repeat the strong words Paul R said, but when you are comparing unsigned and integers you are going to experience dome bad things.
do if ((-1) < (int)SIZE)
instead of your if condition
Convert the unsigned type returned from sizeof operator to signed
when you compare two unsigned and signed number compiler implicitly converts signed to unsigned.
-1 signed representation in 4 byte int is 11111111 11111111 11111111 11111111 when converted to unsigned this representation would refer to 2^16-1
So basically your are comparing that 2^16-1>SIZE, which would be true.
You have to override that by explicitly casting the unsigned value to signed.
Since sizeof operator returns unsigned long long you should cast it to signed long long
if((-1)<(signed long long)SIZE)
use this if condition in your code