Why are unsigned integers error prone? - c++

I was looking at this video. Bjarne Stroustrup says that unsigned ints are error prone and lead to bugs. So, you should only use them when you really need them. I've also read in one of the question on Stack Overflow (but I don't remember which one) that using unsigned ints can lead to security bugs.
How do they lead to security bugs? Can someone clearly explain it by giving an suitable example?

One possible aspect is that unsigned integers can lead to somewhat hard-to-spot problems in loops, because the underflow leads to large numbers. I cannot count (even with an unsigned integer!) how many times I made a variant of this bug
for(size_t i = foo.size(); i >= 0; --i)
...
Note that, by definition, i >= 0 is always true. (What causes this in the first place is that if i is signed, the compiler will warn about a possible overflow with the size_t of size()).
There are other reasons mentioned Danger – unsigned types used here!, the strongest of which, in my opinion, is the implicit type conversion between signed and unsigned.

One big factor is that it makes loop logic harder: Imagine you want to iterate over all but the last element of an array (which does happen in the real world). So you write your function:
void fun (const std::vector<int> &vec) {
for (std::size_t i = 0; i < vec.size() - 1; ++i)
do_something(vec[i]);
}
Looks good, doesn't it? It even compiles cleanly with very high warning levels! (Live) So you put this in your code, all tests run smoothly and you forget about it.
Now, later on, somebody comes along an passes an empty vector to your function. Now with a signed integer, you hopefully would have noticed the sign-compare compiler warning, introduced the appropriate cast and not have published the buggy code in the first place.
But in your implementation with the unsigned integer, you wrap and the loop condition becomes i < SIZE_T_MAX. Disaster, UB and most likely crash!
I want to know how they lead to security bugs?
This is also a security problem, in particular it is a buffer overflow. One way to possibly exploit this would be if do_something would do something that can be observed by the attacker. They might be able to find what input went into do_something, and that way data the attacker should not be able to access would be leaked from your memory. This would be a scenario similar to the Heartbleed bug. (Thanks to ratchet freak for pointing that out in a comment.)

I'm not going to watch a video just to answer a question, but one issue is the confusing conversions which can happen if you mix signed and unsigned values. For example:
#include <iostream>
int main() {
unsigned n = 42;
int i = -42;
if (i < n) {
std::cout << "All is well\n";
} else {
std::cout << "ARITHMETIC IS BROKEN!\n";
}
}
The promotion rules mean that i is converted to unsigned for the comparison, giving a large positive number and a surprising result.

Although it may only be considered as a variant of the existing answers: Referring to "Signed and unsigned types in interfaces," C++ Report, September 1995 by Scott Meyers, it's particularly important to avoid unsigned types in interfaces.
The problem is that it becomes impossible to detect certain errors that clients of the interface could make (and if they could make them, they will make them).
The example given there is:
template <class T>
class Array {
public:
Array(unsigned int size);
...
and a possible instantiation of this class
int f(); // f and g are functions that return
int g(); // ints; what they do is unimportant
Array<double> a(f()-g()); // array size is f()-g()
The difference of the values returned by f() and g() might be negative, for an awful number of reasons. The constructor of the Array class will receive this difference as a value that is implicitly converted to be unsigned. Thus, as the implementor of the Array class, one can not distinguish between an erreonously passed value of -1, and a very large array allocation.

The big problem with unsigned int is that if you subtract 1 from an unsigned int 0, the result isn't a negative number, the result isn't less than the number you started with, but the result is the largest possible unsigned int value.
unsigned int x = 0;
unsigned int y = x - 1;
if (y > x) printf ("What a surprise! \n");
And this is what makes unsigned int error prone. Of course unsigned int works exactly as it is designed to work. It's absolutely safe if you know what you are doing and make no mistakes. But most people make mistakes.
If you are using a good compiler, you turn on all the warnings that the compiler produces, and it will tell you when you do dangerous things that are likely to be mistakes.

The problem with unsigned integer types is that depending upon their size they may represent one of two different things:
Unsigned types smaller than int (e.g. uint8) hold numbers in the range 0..2ⁿ-1, and calculations with them will behave according to the rules of integer arithmetic provided they don't exceed the range of the int type. Under present rules, if such a calculation exceeds the range of an int, a compiler is allowed to do anything it likes with the code, even going so far as to negate the laws of time and causality (some compilers will do precisely that!), and even if the result of the calculation would be assigned back to an unsigned type smaller than int.
Unsigned types unsigned int and larger hold members of the abstract wrapping algebraic ring of integers congruent mod 2ⁿ; this effectively means that if a calculation goes outside the range 0..2ⁿ-1, the system will add or subtract whatever multiple of 2ⁿ would be required to get the value back in range.
Consequently, given uint32_t x=1, y=2; the expression x-y may have one of two meanings depending upon whether int is larger than 32 bits.
If int is larger than 32 bits, the expression will subtract the number 2 from the number 1, yielding the number -1. Note that while a variable of type uint32_t can't hold the value -1 regardless of the size of int, and storing either -1 would cause such a variable to hold 0xFFFFFFFF, but unless or until the value is coerced to an unsigned type it will behave like the signed quantity -1.
If int is 32 bits or smaller, the expression will yield a uint32_t value which, when added to the uint32_t value 2, will yield the uint32_t value 1 (i.e. the uint32_t value 0xFFFFFFFF).
IMHO, this problem could be solved cleanly if C and C++ were to define new unsigned types [e.g. unum32_t and uwrap32_t] such that a unum32_t would always behave as a number, regardless of the size of int (possibly requiring the right-hand operation of a subtraction or unary minus to be promoted to the next larger signed type if int is 32 bits or smaller), while a wrap32_t would always behave as a member of an algebraic ring (blocking promotions even if int were larger than 32 bits). In the absence of such types, however, it's often impossible to write code which is both portable and clean, since portable code will often require type coercions all over the place.

Numeric conversion rules in C and C++ are a byzantine mess. Using unsigned types exposes yourself to that mess to a much greater extent than using purely signed types.
Take for example the simple case of a comparison between two variables, one signed and the other unsigned.
If both operands are smaller than int then they will both be converted to int and the comparison will give numerically correct results.
If the unsigned operand is smaller than the signed operand then both will be converted to the type of the signed operand and the comparison will give numerically correct results.
If the unsigned operand is greater than or equal in size to the signed operand and also greater than or equal in size to int then both will be converted to the type of the unsigned operand. If the value of the signed operand is less than zero this will lead to numerically incorrect results.
To take another example consider multiplying two unsigned integers of the same size.
If the operand size is greater than or equal to the size of int then the multiplication will have defined wraparound semantics.
If the operand size is smaller than int but greater than or equal to half the size of int then there is the potential for undefined behaviour.
If the operand size is less than half the size of int then the multiplication will produce numerically correct results. Assigning this result back to a variable of the original unsigned type will produce defined wraparound semantics.

In addition to range/warp issue with unsigned types. Using mix of unsigned and signed integer types impact significant performance issue for processor. Less then floating point cast, but quite a lot to ignore that. Additionally compiler may place range check for the value and change the behavior of further checks.

Related

What's the difference between casting a long to int versus using a bitwise AND in order to get the 4 least significant bytes?

I know that in order to get the 4 least significant bytes of a number of type long I can cast it to int/unsigned int or use a bitwise AND (& 0xFFFFFFFF).
This code produces the following output:
#include <stdio.h>
int main()
{
long n = 0x8899AABBCCDDEEFF;
printf("0x%016lX\n", n);
printf("0x%016X\n", (int)n);
printf("0x%016X\n", (unsigned int)n);
printf("0x%016lX\n", n & 0xFFFFFFFF);
}
Output:
0x8899AABBCCDDEEFF
0x00000000CCDDEEFF
0x00000000CCDDEEFF
0x00000000CCDDEEFF
Does that mean that the two methods used are equivalent? If so, do they always produce the same output regardless of the platform/compiler?
Also, is there any catch or pitfall while casting to unsigned int rather than int for the purpose of this question?
Finally, why is the output the same if you change the number n to be an unsigned long instead?
The methods are definitely different.
According to integral conversion rules (cf, for example, this online c++11 standard), a conversion (e.g. through an explicit cast) from one integral type to another depends on whether the destination type is signed or unsigned. If the destination type is unsigned, one can rely on a "modulo 2n" truncation, whereas with signed destination types one could tap into implementation defined behaviour:
4.7 Integral conversions [conv.integral]
2 If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type). [
Note: In a two's complement representation, this conversion is
conceptual and there is no change in the bit pattern (if there is no
truncation). — end note ]
3 If the destination type is signed, the value is unchanged if it can
be represented in the destination type (and bit-field width);
otherwise, the value is implementation-defined.
For your first question, as others have pointed out, the size of int and long is dependent on the platform, so the methods are not equivalent. In C data types, check that the types say "at least XX bits in size"
For the second question, it comes down to this: long and int are signed, meaning that one bit is reserved for sign (take a look also to two's complement). If you were the compiler, what can you do with negative values (especially the long ones)? As Stepahn Lechner mentioned, this is implementation defined (that is, is up to the compiler).
Finally, in the spirit of "your code must do what it says it does", the best thing to do if you need to do masks is to use masks (and, if you use masks, use unsigned types). Don't try to use cleaver answers. Believe me, they always bite you in the rear. I've dealt with a lot of legacy code to know that by heart.
What's the difference between casting a long to int versus using a bitwise AND in order to get the 4 least significant bytes?
Type. Casting makes the value an int. And'ing does not change the type.
Range. Depending on int,long range, a cast may not change the value at all.
IDB and UB. implementation defined behavior and undefined behavior are present with mixing signed-ness.
To "get" the 4 LSBytes, use & 0xFFFFFFFFu or cast to uint32_t.
OP's question is unnecessarily convoluted.
long n = 0x8899AABBCCDDEEFF; --> Converting a value outside the range of a signed integer type is implementation-defined.
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
C11 §6.3.1.3 3
printf("0x%016lX\n", n); --> Printing a long with a "%lX" outside the the common range of long/unsigned long is undefined behavior.
Let's go forward with unsigned long:
unsigned long n = 0x8899AABBCCDDEEFF; // no problem,
printf("0x%016lX\n", n); // no problem,
printf("0x%016X\n", (int)n); // problem, C11 6.3.1.3 3
printf("0x%016X\n", (unsigned int)n); // no problem,
printf("0x%016lX\n", n & 0xFFFFFFFF); // no problem,
The "no problem" are OK even is unsigned long is 32-bit or 64-bit. The output will differ, yet is OK.
Recall that int,long are not always 32,64 bit. (16,32), (32,32), (32,64) are common.
int is at least 16 bit.
long is at least that of int and at least 32 bit.

Threshold an absolute value

I have the following function:
char f1( int a, unsigned b ) { return abs(a) <= b; }
For execution speed, I want to rewrite it as follows:
char f2( int a, unsigned b ) { return (unsigned)(a+b) <= 2*b; } // redundant cast
Or alternatively with this signature that could have subtle implications even for non-negative b:
char f3( int a, int b ) { return (unsigned)(a+b) <= 2*b; }
Both of these alternatives work under a simple test on one platform, but I need it to portable. Assuming non-negative b and no risk of overflow, is this a valid optimization for typical hardware and C compilers? Is it also valid for C++?
Note: As C++ on gcc 4.8 x86_64 with -O3, f1() uses 6 machine instructions and f2() uses 4. The instructions for f3() are identical to those for f2(). Also of interest: if b is given as a literal, both functions compile to 3 instructions that directly map to the operations specified in f2().
Starting with the original code with signature
char f2( int a, unsigned b );
this contains the expression
a + b
Since one of these operands has a signed and the other an (corresponding) unsigned integer type (thus they have the same "integer conversion rank"), then - following the "Usual arithmetic conversions" (§ 6.3.1.8) - the operand with signed integer type is converted to the unsigned type of the other operand.
Conversion to an unsigned integer type is well defined, even if the value in question cannot be represented by the new type:
[..] if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. 60
§ 6.3.1.3/2
Footnote 60 just says that the described arithmetic works with the mathematical value, not the typed one.
Now, with the updated code
char f2_updated( int a, int b ); // called f3 in the question
things would look different. But since b is assumed to be non-negative, and assuming that INT_MAX <= UINT_MAX you can convert b to an unsigned without fearing it to have a different mathematical value afterwards. Thus you could write
char f2_updated( int a, int b ) {
return f2(a, (unsigned)b); // cast unnecessary but to make it clear
}
Looking again at f2 the expression 2*b further limits the allowed range of b to be not larger than UINT_MAX/2 (otherwise the mathematical result would be wrong).
So as long as you stay within these bounds, every thing is fine.
Note: Unsigned types do not overflow, they "wrap" according to modular arithmetic.
Quotes from N1570 (a C11 working draft)
A final remark:
IMO the only really reasonable choice to write this function is as
#include <stdbool.h>
#include <assert.h>
bool abs_bounded(int value, unsigned bound) {
assert(bound <= (UINT_MAX / 2));
/* NOTE: Casting to unsigned makes the implicit conversion that
otherwise would happen explicit. */
return ((unsigned)value + bound) <= (2 * bound);
}
Using a signed type for the bound does not make much sense, because the absolute of a value cannot be less than a negative number. abs_bounded(value, something_negative) would be always false. If there's the possibility of a negative bound, then I'd catch this outside of this function (otherwise it does "too much"), like:
int some_bound;
// ...
if ((some_bound >= 0) && abs_bounded(my_value, some_bound)) {
// yeeeha
}
As OP wants fast and portable code (and b is positive), it first makes sense to code safely:
// return abs(a) <= b;
inline bool f1_safe(int a, unsigned b ) {
return (a >= 0 && a <= b) || (a < 0 && 0u - a <= b);
}
This works for all a,b (assuming UINT_MAX > INT_MAX). Next, compare alternatives using an optimized compile (let the compiler do what it does best).
The following slight variation on OP's code will work in C/C++ but risks portability issues unless "Assuming non-negative b and no risk of overflow" can be certain on all target machines.
bool f2(int a, unsigned b) { return a+b <= b*2; }
In the end, OP goal of fast and portable code may find code the works optimally for the select platform, but not with others - such is micro-optimization.
To determine if the 2 expressions are equivalent for your purpose, you must study the domain of definition:
abs(a) <= b is defined for all values of int a and unsigned b, with just one special case for a = INT_MIN;. On 2s complement architectures, abs(INT_MIN) is not defined but most likely evaluates to INT_MIN, which converted to unsigned as required for the <= with an unsigned value, yields the correct value.
(unsigned)(a+b) <= 2*b may produce a different result for b > UINT_MAX/2. For example, it will evaluate to false for a = 1 and b = UINT_MAX/2+1. There might be more cases where you alternate formula gives an incorrect result.
EDIT: OK, the question was edited... and b is now an int.
Note that a+b invokes undefined behavior in case of overflow and the same for 2*b. So you make the assumption that neither a+b nor 2*b overflow. Furthermore, if b is negative, you little trick does not work.
If a is in the range -INT_MAX/2..INT_MAX/2 and b in the range 0..INT_MAX/2, it seems to function as expected. The behavior is identical in C and C++.
Whether it is an optimization depends completely on the compiler, command line options, hardware capabilities, surrounding code, inlining, etc. You already address this part and tell us that you shave one or two instructions... Just remember that this kind of micro-optimization is not absolute. Even counting instructions does not necessarily help find the best performance. Did you perform some benchmarks to measure if this optimization is worthwhile? Is the difference even measurable?
Micro-optimizing such a piece of code is self-defeating: it makes the code less readable and potentially incorrect. b might not be negative in the current version, but if the next maintainer changes that, he/she might not see the potential implications.
Yes, this is portable to compliant platforms. The conversion from signed to unsigned is well defined:
Conversion between signed integer and unsigned integer
int to unsigned int conversion
Signed to unsigned conversion in C - is it always safe?
The description in the C spec is a bit contrived:
if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.
The C++ spec addresses the same conversion in a more sensible way:
In a two's complement representation, this conversion is conceptual
and there is no change in the bit pattern
In the question, f2() and f3() achieve the same results in a slightly different way.
In f2() the presence of the unsigned operand causes a conversion of the signed operand as required here for C++. The unsigned addition may-or-may-not then result in a wrap-around past zero, which is also well defined [citation needed].
In f3() the addition occurs in signed representation with no trickiness, and then the result is (explicitly) converted to unsigned. So this is slightly simpler than f2() (and also more clear).
In both cases, the you end up with the same unsigned representation of the sum, which can then be compared (as unsigned) to 2*b. And the trick of treating a signed value as an unsigned type allows you to check a two-sided range with only a single comparison. Note also that this is a bit more flexible than using the abs() function since the trick doesn't require that the range be centered around zero.
Commentary on the "usual arithmetic conversions"
I think this question demonstrated that using unsigned types is generally a bad idea. Look at the confusion it caused here.
It can be tempting to use unsigned for documentation purposes (or to take advantage of the shifted value range), but due to the conversion rules, this may tend to be a mistake. In my opinion, the "usual arithmetic conversions" are not sensible if you assume that arithmetic is more likely to involve negative values than to overflow signed values.
I asked this followup question to clarify the point: mixed-sign integer math depends on variable size. One new thing that I have learned is that mixed-sign operations are not generally portable because the conversion type will depend on the size relative to that of int.
In summary: Using type declarations or casts to perform unsigned operations is a low-level coding style that should be approached with the requisite caution.

Implicit conversion in C++ between main and function

I have the below simple program:
#include <iostream>
#include <stdio.h>
void SomeFunction(int a)
{
std::cout<<"Value in function: a = "<<a<<std::endl;
}
int main(){
size_t a(0);
std::cout<<"Value in main: "<<a-1<<std::endl;
SomeFunction(a-1);
return 0;
}
Upon executing this I get:
Value in main: 18446744073709551615
Value in function: a = -1
I think I roughly understand why the function gets the 'correct' value of -1: there is an implicit conversion from the unsigned type to the signed one i.e. 18446744073709551615(unsigned) = -1(signed).
Is there any situation where the function will not get the 'correct' value?
Since size_t type is unsigned, subtracting 1 is well defined:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
However, the resultant value of 264-1 is out of ints range, so you get implementation-defined behavior:
[when] the new type is signed and the value cannot be represented in it, either the result is implementation-defined or an implementation-defined signal is raised.
Therefore, the answer to your question is "yes": there are platforms where the value of a would be different; there are also platforms where instead of calling SomeFunction the program will raise a signal.
Not on your computer... but technically yes, there is a situation where things can go wrong.
All modern PCs use the "two's complement" system for signed integer arithmetic (read Wikipedia for details). Two's complement has many advantages, but one of the biggest is this: unsaturated addition and subtraction of signed integers is identical to that of unsigned integers. As long as overflow/underflow causes the result to "wrap around" (i.e., 0-1 = UINT_MAX), the computer can add and subtract without even knowing whether you're interpreting the numbers as signed or unsigned.
BUT! C/C++ do not technically require two's complement for signed integers. There are two other permissible systems, known as "sign-magnitude" and "one's complement". These are unusual systems, never found outside antique architectures and embedded processors (and rarely even there). But in those systems, signed and unsigned arithmetic do not match up, and (signed)(a+b) will not necessarily equal (signed)a + (signed) b.
There's another, more mundane caveat when you're also narrowing types, as is the case between size_t and int on x64, because C/C++ don't require compilers to follow a particular rule when narrowing out-of-range values to signed types. This is likewise more a matter of language lawyering than actual unsafeness, though: VC++, GCC, Clang, and all other compilers I'm aware of narrow through truncation, leading to the expected behavior.
It's easier to compare and contrast signed and insigned of type same basic type, say signed int and unsigned int.
On a system that uses 32 bits for int, the range of unsigned int is [0 - 4294967295] and the range of signed int is [-2147483647 - 2147483647].
Say you have a variable of type unsigned int and its value is greater than 2147483647. If you pass such a variable to SomeFunction, you will see an incorrect value in the function.
Conversely, say you have a variable of type signed int and its value is less than zero. If you pass such a variable to a function that expects an unsigned int, you will see an incorrect value in the function.

Why does this if condition fail for comparison of negative and positive integers [duplicate]

This question already has answers here:
sizeof() operator in if-statement
(5 answers)
Closed 4 years ago.
#include <stdio.h>
int arr[] = {1,2,3,4,5,6,7,8};
#define SIZE (sizeof(arr)/sizeof(int))
int main()
{
printf("SIZE = %d\n", SIZE);
if ((-1) < SIZE)
printf("less");
else
printf("more");
}
The output after compiling with gcc is "more". Why the if condition fails even when -1 < 8?
The problem is in your comparison:
if ((-1) < SIZE)
sizeof typically returns an unsigned long, so SIZE will be unsigned long, whereas -1 is just an int. The rules for promotion in C and related languages mean that -1 will be converted to size_t before the comparison, so -1 will become a very large positive value (the maximum value of an unsigned long).
One way to fix this is to change the comparison to:
if (-1 < (long long)SIZE)
although it's actually a pointless comparison, since an unsigned value will always be >= 0 by definition, and the compiler may well warn you about this.
As subsequently noted by #Nobilis, you should always enable compiler warnings and take notice of them: if you had compiled with e.g. gcc -Wall ... the compiler would have warned you of your bug.
TL;DR
Be careful with mixed signed/unsigned operations (use -Wall compiler warnings). The Standard has a long section about it. In particular, it is often but not always true that signed is value-converted to unsigned (although it does in your particular example). See this explanation below (taken from this Q&A)
Relevant quote from the C++ Standard:
5 Expressions [expr]
10 Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows:
[2 clauses about equal types or types of equal sign omitted]
— Otherwise, if the operand that has unsigned integer type has rank
greater than or equal to the rank of the type of the other operand,
the operand with signed integer type shall be converted to the type of
the operand with unsigned integer type.
— Otherwise, if the type of
the operand with signed integer type can represent all of the values
of the type of the operand with unsigned integer type, the operand
with unsigned integer type shall be converted to the type of the
operand with signed integer type.
— Otherwise, both operands shall be
converted to the unsigned integer type corresponding to the type of
the operand with signed integer type.
Your actual example
To see into which of the 3 cases your program falls, modify it slightly to this
#include <stdio.h>
int arr[] = {1,2,3,4,5,6,7,8};
#define SIZE (sizeof(arr)/sizeof(int))
int main()
{
printf("SIZE = %zu, sizeof(-1) = %zu, sizeof(SIZE) = %zu \n", SIZE, sizeof(-1), sizeof(SIZE));
if ((-1) < SIZE)
printf("less");
else
printf("more");
}
On the Coliru online compiler, this prints 4 and 8 for the sizeof() of -1 and SIZE, respectively, and selects the "more" branch (live example).
The reason is that the unsigned type is of greater rank than the signed type. Hence, clause 1 applies and the signed type is value-converted to the unsigned type (on most implementation, typically by preserving the bit-representation, so wrapping around to a very large unsigned number), and the comparison then proceeds to select the "more" branch.
Variations on a theme
Rewriting the condition to if ((long long)(-1) < (unsigned)SIZE) would take the "less" branch (live example).
The reason is that the signed type is of greater rank than the unsigned type and can also accomodate all the unsigned values. Hence, clause 2 applies and the unsigned type is converted to the signed type, and the comparison then proceeds to select the "less" branch.
Of course, you would never write such a contrived if() statement with explicit casts, but the same effect could happen if you compare variables with types long long and unsigned. So it illustrates the point that mixed signed/unsigned arithmetic is very subtle and depends on the relative sizes ("ranking" in the words of the Standard). In particular, there is no fixed rules saying that signed will always be converted to unsigned.
When you do comparison between signed and unsigned where unsigned has at least an equal rank to that of the signed type (see TemplateRex's answer for the exact rules), the signed is converted to the type of the unsigned.
With regards to your case, on a 32bit machine the binary representation of -1 as unsigned is 4294967295. So in effect you are comparing if 4294967295 is smaller than 8 (it isn't).
If you had enabled warnings, you would have been warned by the compiler that something fishy is going on:
warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Since the discussion has shifted a bit on how appropriate the use of unsigned is, let me put a quote by James Gosling with regards to the lack of unsigned types in Java (and I will shamelessly link to another post of mine on the subject):
Gosling: For me as a language designer, which I don't really count
myself as these days, what "simple" really ended up meaning was could
I expect J. Random Developer to hold the spec in his head. That
definition says that, for instance, Java isn't -- and in fact a lot of
these languages end up with a lot of corner cases, things that nobody
really understands. Quiz any C developer about unsigned, and pretty
soon you discover that almost no C developers actually understand what
goes on with unsigned, what unsigned arithmetic is. Things like that
made C complex. The language part of Java is, I think, pretty simple.
The libraries you have to look up.
This is an historical design bug of C that was also repeated in C++.
It dates back to 16-bit computers and the error was deciding to use all 16 bits to represent sizes up to 65536 giving up the possibility to represent negative sizes.
This in se wouldn't have been an error if unsigned meaning was "non-negative integer" (a size cannot logically be negative) but it's a problem with the conversion rules of the language.
Given the conversion rules of the language the unsigned type in C doesn't represent a non-negative number, but it's instead more like a bitmask (the mathematical term is actually "a member of the ℤ/n ring"). To see why consider that for the C and C++ language
unsigned - unsigned gives an unsigned result
signed + unsigned gives and unsigned result
both of them clearly make no sense at all if you read unsigned as "non-negative number".
Of course saying that the size of an object is a member of ℤ/n ring doesn't make any sense at all and here it's where the error resides.
Practical implications:
Every time you deal with the size of an object be careful because the value is unsigned and that type in C/C++ has a lot of properties that are illogical for a number. Please always remember that unsigned doesn't mean "non-negative integer" but "member of ℤ/n algebraic ring" and that, most dangerous, in case of a mixed operation an int is converted to unsigned int and not the opposite.
For example:
void drawPolyline(const std::vector<P2d>& pts) {
for (int i=0; i<pts.size()-1; i++) {
drawLine(pts[i], pts[i+1]);
}
}
is buggy, because if passed an empty vector of points it will do illegal (UB) operations. The reason is that pts.size() is an unsigned.
The rules of the language will convert 1 (an integer) to 1{mod n}, will perform the subtraction in ℤ/n resulting in (size-1){mod n}, will convert i also to a {mod n} representation and will do the comparison in ℤ/n.
C/C++ actually defines a < operator in ℤ/n (rarely done in math) and you will end up accessing pts[0], pts[1] ... and so on until huge numbers even if the input vector was empty.
A correct loop could be
void drawPolyline(const std::vector<P2d>& pts) {
for (int i=1; i<pts.size(); i++) {
drawLine(pts[i-1], pts[i]);
}
}
but I normally prefer
void drawPolyline(const std::vector<P2d>& pts) {
for (int i=0,n=pts.size(); i<n-1; i++) {
drawLine(pts[i], pts[i+1]);
}
}
in other words getting rid of unsigned as soon as possible, and just working with regular ints.
Never use unsigned to represent size of containers or counters because unsigned means "member of ℤ/n" and the size of a container is not one of those things. Unsigned types are useful, but NOT to represent size of objects.
The standard C/C++ library unfortunately made this wrong choice, and it's too late to fix it. You are not forced to do the same mistake however.
In the words of Bjarne Stroustrup:
Using an unsigned instead of an int to gain one more bit to represent
positive integers is almost never a good idea. Attempts to ensure that
some values are positive by declaring variables unsigned will
typically be defeated by the implicit conversion rules
well, i'm not going to repeat the strong words Paul R said, but when you are comparing unsigned and integers you are going to experience dome bad things.
do if ((-1) < (int)SIZE)
instead of your if condition
Convert the unsigned type returned from sizeof operator to signed
when you compare two unsigned and signed number compiler implicitly converts signed to unsigned.
-1 signed representation in 4 byte int is 11111111 11111111 11111111 11111111 when converted to unsigned this representation would refer to 2^16-1
So basically your are comparing that 2^16-1>SIZE, which would be true.
You have to override that by explicitly casting the unsigned value to signed.
Since sizeof operator returns unsigned long long you should cast it to signed long long
if((-1)<(signed long long)SIZE)
use this if condition in your code

Why int plus uint returns uint?

int plus unsigned int returns an unsigned int. Should it be so?
Consider this code:
#include <boost/static_assert.hpp>
#include <boost/typeof/typeof.hpp>
#include <boost/type_traits/is_same.hpp>
class test
{
static const int si = 0;
static const unsigned int ui = 0;
typedef BOOST_TYPEOF(si + ui) type;
BOOST_STATIC_ASSERT( ( boost::is_same<type, int>::value ) ); // fails
};
int main()
{
return 0;
}
If by "should it be" you mean "does my compiler behave according to the standard": yes.
C++2003: Clause 5, paragraph 9:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
blah
Otherwise, blah,
Otherise, blah, ...
Otherwise, if either operand is unsigned, the other shall be converted to unsigned.
If by "should it be" you mean "would the world be a better place if it didn't": I'm not competent to answer that.
Unsigned integer types mostly behave as members of a wrapping abstract algebraic ring of values which are equivalent mod 2^N; one might view an N-bit unsigned integer not as representing a particular integer, but rather the set of all integers with a particular value in the bottom N bits. For example, if one adds together two binary numbers whose last 4 digits are ...1001 and ...0101, the result will be ...1110. If one adds ...1111 and ...0001, the result will be ...0000; if one subtracts ...0001 from ...0000 the result will be ...1111. Note that concepts of overflow or underflow don't really mean anything, since the upper-bit values of the operands are unknown and the upper-bit values of the result are of no interest. Note also that adding a signed integer whose upper bits are known to one whose upper bits are "don't know/don't care" should yield a number whose upper bits are "don't know/don't care" (which is what unsigned integer types mostly behave as).
The only places where unsigned integer types fail to behave as members of a wrapping algebraic ring is when they participate in comparisons, are used in numerical division (which implies comparisons), or are promoted to other types. If the only way to convert an unsigned integer type to something larger was to use an operator or function for that purpose, the use of such an operator or function could make clear that it was making assumptions about the upper bits (e.g. turning "some number whose lower bits are ...00010110" into "the number whose lower bits are ...00010110 and whose upper bits are all zeroes). Unfortunately, C doesn't do that. Adding a signed value to an unsigned value of equal size yields a like-size unsigned value (which makes sense with the interpretation of unsigned values above), but adding a larger signed integer to an unsigned type will cause the compiler to silently assume that all upper bits of the latter are zeroes. This behavior can be especially vexing in cases where, depending upon a compilers' promotion rules, some compilers may deem two expressions as having the same size while others may view them as different sizes.
It is likely that the behavior stems from the logic behind pointer types (memory location, e.g. std::size_t) plus a memory location difference (std::ptrdiff_t) is also a memory location.
In other words, std::size_t = std::size_t + std::ptrdiff_t.
When this logic is translated to underlaying types this means, unsigned long = unsigned long + long, or unsigned = unsigned + int.
The "other" explanation from #supercat is also possibly correct.
What is clear is that unsigned integer were not designed or should not be interpreted to be mathematical positive numbers, no even in principle. See https://www.youtube.com/watch?v=wvtFGa6XJDU