From the answers I got from this question, it appears that C++ inherited this requirement for conversion of short into int when performing arithmetic operations from C. May I pick your brains as to why this was introduced in C in the first place? Why not just do these operations as short?
For example (taken from dyp's suggestion in the comments):
short s = 1, t = 2 ;
auto x = s + t ;
x will have type of int.
If we look at the Rationale for International Standard—Programming Languages—C in section 6.3.1.8 Usual arithmetic conversions it says (emphasis mine going forward):
The rules in the Standard for these conversions are slight
modifications of those in K&R: the modifications accommodate the added
types and the value preserving rules. Explicit license was added to
perform calculations in a “wider” type than absolutely necessary,
since this can sometimes produce smaller and faster code, not to
mention the correct answer more often. Calculations can also be
performed in a “narrower” type by the as if rule so long as the same
end result is obtained. Explicit casting can always be used to obtain
a value in a desired type
Section 6.3.1.8 from the draft C99 standard covers the Usual arithmetic conversions which is applied to operands of arithmetic expressions for example section 6.5.6 Additive operators says:
If both operands have arithmetic type, the usual arithmetic
conversions are performed on them.
We find similar text in section 6.5.5 Multiplicative operators as well. In the case of a short operand, first the integer promotions are applied from section 6.3.1.1 Boolean, characters, and integers which says:
If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions.48) All other types are
unchanged by the integer promotions.
The discussion from section 6.3.1.1 of the Rationale or International Standard—Programming Languages—C on integer promotions is actually more interesting, I am going to selectively quote b/c it is too long to fully quote:
Implementations fell into two major camps which may be characterized
as unsigned preserving and value preserving.
[...]
The unsigned preserving approach calls for promoting the two smaller
unsigned types to unsigned int. This is a simple rule, and yields a
type which is independent of execution environment.
The value preserving approach calls for promoting those types to
signed int if that type can properly represent all the values of the
original type, and otherwise for promoting those types to unsigned
int. Thus, if the execution environment represents short as something
smaller than int, unsigned short becomes int; otherwise it becomes
unsigned int.
This can have some rather unexpected results in some cases as Inconsistent behaviour of implicit conversion between unsigned and bigger signed types demonstrates, there are plenty more examples like that. Although in most cases this results in the operations working as expected.
It's not a feature of the language as much as it is a limitation of physical processor architectures on which the code runs. The int typer in C is usually the size of your standard CPU register. More silicon takes up more space and more power, so in many cases arithmetic can only be done on the "natural size" data types. This is not universally true, but most architectures still have this limitation. In other words, when adding two 8-bit numbers, what actually goes on in the processor is some type of 32-bit arithmetic followed by either a simple bit mask or another appropriate type conversion.
short and char types are considered by the standard sort of "storage types" i.e. sub-ranges that you can use to save some space but that are not going to buy you any speed because their size is "unnatural" for the CPU.
On certain CPUs this is not true but good compilers are smart enough to notice that if you e.g. add a constant to an unsigned char and store the result back in an unsigned char then there's no need to go through the unsigned char -> int conversion.
For example with g++ the code generated for the inner loop of
void incbuf(unsigned char *buf, int size) {
for (int i=0; i<size; i++) {
buf[i] = buf[i] + 1;
}
}
is just
.L3:
addb $1, (%rdi,%rax)
addq $1, %rax
cmpl %eax, %esi
jg .L3
.L1:
where you can see that an unsigned char addition instruction (addb) is used.
The same happens if you're doing your computations between short ints and storing the result in short ints.
The linked question seems to cover it pretty well: the CPU just doesn't. A 32-bit CPU has its native arithmetic operations set up for 32-bit registers. The processor prefers to work in its favorite size, and for operations like this, copying a small value into a native-size register is cheap. (For the x86 architecture, the 32-bit registers are named as if they are extended versions of the 16-bit registers (eax to ax, ebx to bx, etc); see x86 integer instructions).
For some extremely common operations, particularly vector/float arithmetic, there may be specialized instructions that operate on a different register type or size. For something like a short, padding with (up to) 16 bits of zeroes has very little performance cost and adding specialized instructions is probably not worth the time or space on the die (if you want to get really physical about why; I'm not sure they would take actual space, but it does get way more complex).
Related
Many answers to similar questions point out that it is so due to the standard. But, I cannot understand the reasoning behind this decision by the standard setters.
From my understanding an unsigned char does not store the value in 2's complement form. So, I don't see a situation where let's say XORing two unsigned chars would produce unexpected behavior. Therefore, promoting them to int just seems like a waste of space (in most cases) and CPU cycles.
Moreover, why int? If a variable is being declared as unsigned, clearly the unsignedness is important to the programmer, therefore a promotion to an unsigned int would still make more sense than an int, in my opinion.
[EDIT #1] As stated out in the comments, promotion to unsigned int will take place if an int cannot sufficiently accommodate the value in the unsigned char.
[EDIT #2] To clarify the question, if it is about the performance benefit of operating over int than char, then why is it in the standard? This could have been given as a suggestion to compiler designers for better optimization. Now, if someone were to design a compiler which didn't do this that would make their compiler as one not adhering to the C/C++ standard fully, even though, hypothetically this compiler did support all other required features of the language. In a nutshell, I cannot figure out a reason for why I cannot operate directly over unsigned chars, therefore the requirement to promote them to ints, seems unnecessary. Can you give me an example which proves this wrong?
You can find this document on-line: Rationale for International Standard - Programming Languages - C (Revision 5.10, 2003).
Chapter 6.3 (p. 44 - 45) is about conversions
Between the publication of K&R and the development of C89, a serious divergence had occurred among implementations in the evolution of integer promotion rules. Implementations fell into two major camps which may be characterized as unsigned preserving and value preserving.
The difference between these approaches centered on the treatment of unsigned char and unsigned short when widened by the integer promotions, but the decision had an impact on the typing of constants as well (see §6.4.4.1).
The unsigned preserving approach calls for promoting the two smaller unsigned types to unsigned int. This is a simple rule, and yields a type which is independent of execution environment.
The value preserving approach calls for promoting those types to signed int if that type can properly represent all the values of the original type, and otherwise for promoting those types to unsigned int.
Thus, if the execution environment represents short as something smaller than int, unsigned short becomes int; otherwise it becomes unsigned int. Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with two's complement arithmetic and quiet wraparound on signed overflow - that is, in most current implementations. In such implementations, differences between the two only appear when these two conditions are both true:
An expression involving an unsigned char or unsigned short produces an int-wide result in which the sign bit is set, that is, either a unary operation on such a type, or a binary operation in which the other operand is an int or “narrower” type.
The result of the preceding expression is used in a context in which its signedness is significant:
• sizeof(int) < sizeof(long) and it is in a context where it must be widened to a long type, or
• it is the left operand of the right-shift operator in an implementation where this shift is defined as arithmetic, or
• it is either operand of /, %, <, <=, >, or >=.
In such circumstances a genuine ambiguity of interpretation arises. The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation. Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising, or it may be exactly what is desired by a knowledgeable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.
One of the important outcomes of exploring this problem is the understanding that high-quality compilers might do well to look for such questionable code and offer (optional) diagnostics, and that conscientious instructors might do well to warn programmers of the problems of implicit type conversions.
The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the C89 Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.
QUIET CHANGE IN C89
A program that depends upon unsigned preserving arithmetic conversions will behave differently, probably without complaint. This was considered the most serious semantic change made by the C89 Committee to a widespread current practice.
For reference, you can find more details about those conversions updated to C11 in this answer by Lundin.
What actually happens when we do the following:
1)
int i = -1; // 32 bit
void *p;
p = reinterpret_cast<void*>(i)
on 64 bit architecture, sizeof(void*) == 8
2)
long long i; // 64 bit
void *p = (unsigned int)(-1);
i = retinterpret_cast<long long>(p)
on 32 bit architecture sizeof(void*) = 4
I generally understand what will be the result, but I want somebody to describe the mechanism in terms of the C++ standard for better understanding.
In the second case the behaviour is like those described in "integral promotions"(4.5) (i will be -1)
for the case "int - unsigned long", so do we generally say that pointers are converted like signed integers?
3)
int i = ...;
unsigned long long L = ...;
i = L;
What rules apply here?
All conversions between pointers and integral types are
implementation defined. The only guarantee the standard makes
is that:
A value of integral type or enumeration type can be explicitly
converted to a pointer. A pointer converted to an integer of
sufficient size (if any such exists on the implementation) and
back to the same pointer type will have its original value
The standard does, however, make the statement (in
a non-normative note) that:
It is intended to be unsurprising to those who know the
addressing structure of the underlying machine.
From a quality of implementation point of view, if the machine
has linear addressing, casting an int to a pointer should
result in a value that corresponds to the value of the int, if
the word (whatever its size) is viewed as an integral type; in
other words, the bit pattern isn't changed. If the integral
type is smaller, and is negative, it is an open question whether
it should be sign extended, or whether the remaining bits should
be set to 0. I prefer the second, but I think that both can be
considered "unsurprising".
From a pragmatic point of view: there's a significant amount of
software out there which will occasionally cast a small
non-negative integral value to void*, and expects to get it
back with a cast later. Formally, getting it back requires
converting first to a intptr_t (or larger); otherwise the code
shouldn't compile. But I can't imagine a compiler breaking this
otherwise. For negative values, on the other hand, I'd feel
significantly less sure. And I'm not sure how small is small,
either. (I currently use the technique in one special case for
values less than some 40 or 50. I works with MSC, g++ and Sun
CC, at least, and I can't imagine it failing on any of the other
mainstream Unix machines I've used in the past. But I wouldn't
count on it on a 16 bit Intel, or some of the other more exotic
machines I've seen, and certainly not on an embedded system.)
Finally, as for your exact questions: it could vary. I'd try it
and see, counting on the fact that whatever it does, there's
some code somewhere which counts on it doing that, and that the
vendor won't risk changing the behavior. Formally, since it is
implementation defined, the implementor is required to document
it (and could, of course, change it in the next version), but
I've generally found it very, very difficult to find this
documentation.
EDIT:
I just noticed that your final question concerned unsigned long
long to int. This is an integral conversion, and not
a reinterpret_cast, so different rules apply. Or rather, the
rules are specified in a different section: the basic rule is
still "implementation defined":
If the destination type is signed, the value is unchanged if it
can be represented in the destination type (and bit-field
width); otherwise, the value is implementation-defined.
The C standard is somewhat different here, in that it explicitly
allows an implementation defined signal to be raised. (It's
stretching it somewhat to say that the "value" in the C++
standard could be the raising of a signal, even if this would be
the preferred implementation.) In practice: all of the
compilers I know just ignore the extra high order bits.
The behavior is Undefined. Anything may happen. It's not even guaranteed that the result is consistent. reinterpret_cast<void*>(1) == reinterpret_cast<void*>(1) may be false.
From the answers I got from this question, it appears that C++ inherited this requirement for conversion of short into int when performing arithmetic operations from C. May I pick your brains as to why this was introduced in C in the first place? Why not just do these operations as short?
For example (taken from dyp's suggestion in the comments):
short s = 1, t = 2 ;
auto x = s + t ;
x will have type of int.
If we look at the Rationale for International Standard—Programming Languages—C in section 6.3.1.8 Usual arithmetic conversions it says (emphasis mine going forward):
The rules in the Standard for these conversions are slight
modifications of those in K&R: the modifications accommodate the added
types and the value preserving rules. Explicit license was added to
perform calculations in a “wider” type than absolutely necessary,
since this can sometimes produce smaller and faster code, not to
mention the correct answer more often. Calculations can also be
performed in a “narrower” type by the as if rule so long as the same
end result is obtained. Explicit casting can always be used to obtain
a value in a desired type
Section 6.3.1.8 from the draft C99 standard covers the Usual arithmetic conversions which is applied to operands of arithmetic expressions for example section 6.5.6 Additive operators says:
If both operands have arithmetic type, the usual arithmetic
conversions are performed on them.
We find similar text in section 6.5.5 Multiplicative operators as well. In the case of a short operand, first the integer promotions are applied from section 6.3.1.1 Boolean, characters, and integers which says:
If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions.48) All other types are
unchanged by the integer promotions.
The discussion from section 6.3.1.1 of the Rationale or International Standard—Programming Languages—C on integer promotions is actually more interesting, I am going to selectively quote b/c it is too long to fully quote:
Implementations fell into two major camps which may be characterized
as unsigned preserving and value preserving.
[...]
The unsigned preserving approach calls for promoting the two smaller
unsigned types to unsigned int. This is a simple rule, and yields a
type which is independent of execution environment.
The value preserving approach calls for promoting those types to
signed int if that type can properly represent all the values of the
original type, and otherwise for promoting those types to unsigned
int. Thus, if the execution environment represents short as something
smaller than int, unsigned short becomes int; otherwise it becomes
unsigned int.
This can have some rather unexpected results in some cases as Inconsistent behaviour of implicit conversion between unsigned and bigger signed types demonstrates, there are plenty more examples like that. Although in most cases this results in the operations working as expected.
It's not a feature of the language as much as it is a limitation of physical processor architectures on which the code runs. The int typer in C is usually the size of your standard CPU register. More silicon takes up more space and more power, so in many cases arithmetic can only be done on the "natural size" data types. This is not universally true, but most architectures still have this limitation. In other words, when adding two 8-bit numbers, what actually goes on in the processor is some type of 32-bit arithmetic followed by either a simple bit mask or another appropriate type conversion.
short and char types are considered by the standard sort of "storage types" i.e. sub-ranges that you can use to save some space but that are not going to buy you any speed because their size is "unnatural" for the CPU.
On certain CPUs this is not true but good compilers are smart enough to notice that if you e.g. add a constant to an unsigned char and store the result back in an unsigned char then there's no need to go through the unsigned char -> int conversion.
For example with g++ the code generated for the inner loop of
void incbuf(unsigned char *buf, int size) {
for (int i=0; i<size; i++) {
buf[i] = buf[i] + 1;
}
}
is just
.L3:
addb $1, (%rdi,%rax)
addq $1, %rax
cmpl %eax, %esi
jg .L3
.L1:
where you can see that an unsigned char addition instruction (addb) is used.
The same happens if you're doing your computations between short ints and storing the result in short ints.
The linked question seems to cover it pretty well: the CPU just doesn't. A 32-bit CPU has its native arithmetic operations set up for 32-bit registers. The processor prefers to work in its favorite size, and for operations like this, copying a small value into a native-size register is cheap. (For the x86 architecture, the 32-bit registers are named as if they are extended versions of the 16-bit registers (eax to ax, ebx to bx, etc); see x86 integer instructions).
For some extremely common operations, particularly vector/float arithmetic, there may be specialized instructions that operate on a different register type or size. For something like a short, padding with (up to) 16 bits of zeroes has very little performance cost and adding specialized instructions is probably not worth the time or space on the die (if you want to get really physical about why; I'm not sure they would take actual space, but it does get way more complex).
I understand that the 'natural size' is the width of integer that is processed most efficiently by a particular hardware. When using short in an array or in arithmetic operations, the short integer must first be converted into int.
Q: What exactly determines this 'natural size'?
I am not looking for simple answers such as
If it has a 32-bit architecture, it's natural size is 32-bit
I want to understand why this is most efficient, and why a short must be converted before doing arithmetic operations on it.
Bonus Q: What happens when arithmetic operations are conducted on a long integer?
Generally speaking, each computer architecture is designed such that certain type sizes provide the most efficient numerical operations. The specific size then depends on the architecture, and the compiler will select an appropriate size. More detailed explanations as to why hardware designers selected certain sizes for perticular hardware would be out of scope for stckoverflow.
A short most be promoted to int before performing integral operations because that's the way it was in C, and C++ inherited that behavior with little or no reason to change it, possibly breaking existing code. I'm not sure the reason it was originally added in C, but one could speculate that it's related to "default int" where if no type were specified int was assumed by the compiler.
Bonus A: from 5/9 (expressions) we learn: Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
And then of interest specifically:
floating point rules that don't matter here
Otherwise, the integral promotions (4.5) shall be performed on both operands
Then, if either operand is unsigned long the other shall be converted to unsigned long.
Otherwise, if one operand is a long int and the other unsigned int, then if a long int can represent
all the values of an unsigned int, the unsigned int shall be converted to a long int;
otherwise both operands shall be converted to unsigned long int.
Otherwise, if either operand is long, the other shall be converted to long.
In summary the compiler tries to use the "best" type it can to do binary operations, with int being the smallest size used.
the 'natural size' is the width of integer that is processed most efficiently by a particular hardware.
Not really. Consider the x64 architecture. Arithmetic on any size from 8 to 64 bits will be essentially the same speed. So why have all x64 compilers settled on a 32-bit int? Well, because there was a lot of code out there which was originally written for 32-bit processors, and a lot of it implicitly relied on ints being 32-bits. And given the near-uselessness of a type which can represent values up to nine quintillion, the extra four bytes per integer would have been virtually unused. So we've decided that 32-bit ints are "natural" for this 64-bit platform.
Compare the 80286 architecture. Only 16 bits in a register. Performing 32-bit integer addition on such a platform basically requires splitting it into two 16-bit additions. Doing virtually anything with it involves splitting it up, really-- and an attendant slowdown. The 80286's "natural integer size" is most definitely not 32 bits.
So really, "natural" comes down to considerations like processing efficiency, memory usage, and programmer-friendliness. It is not an acid test. It is very much a matter of subjective judgment on the part of the architecture/compiler designer.
What exactly determines this 'natural size'?
For some processors (e.g. 32-bit ARM, and most DSP-style processors), it's determined by the architecture; the processor registers are a particular size, and arithmetic can only be done on values of that size.
Others (e.g. Intel x64) are more flexible, and there's no single "natural" size; it's up to the compiler designers to choose a size, a compromise between efficiency, range of values, and memory usage.
why this is most efficient
If the processor requires values to be a particular size for arithmetic, then choosing another size will force you to convert the values to the required size - probably for a cost.
why a short must be converted before doing arithmetic operations on it
Presumably, that was a good match for the behaviour of commonly-used processors when C was developed, half a century ago. C++ inherited the promotion rules from C. I can't really comment on exactly why it was deemed a good idea, since I wasn't born then.
What happens when arithmetic operations are conducted on a long integer?
If the processor registers are large enough to hold a long, then the arithmetic will be much the same as for int. Otherwise, the operations will have to be broken down into several operations on values split between multiple registers.
I understand that the 'natural size' is the width of integer that is processed most efficiently by a particular hardware.
That's an excellent start.
Q: What exactly determines this 'natural size'?
The paragraph above is the definition of "natural size". Nothing else determines it.
I want to understand why this is most efficient
By definition.
and why a short must be converted before doing arithmetic operations on it.
It is so because the C language definitions says so. There are no deep architectural reasons (there could have been some when C was invented).
Bonus Q: What happens when arithmetic operations are conducted on a long integer?
A bunch of electrons rushes through dirty sand and meets a bunch of holes. (No, really. Ask a vague question...)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'd like to understand better why choose int over unsigned?
Personally, I've never liked signed values unless there is a valid reason for them. e.g. count of items in an array, or length of a string, or size of memory block, etc., so often these things cannot possibly be negative. Such a value has no possible meaning. Why prefer int when it is misleading in all such cases?
I ask this because both Bjarne Stroustrup and Chandler Carruth gave the advice to prefer int over unsigned here (approx 12:30').
I can see the argument for using int over short or long - int is the "most natural" data width for the target machine architecture.
But signed over unsigned has always annoyed me. Are signed values genuinely faster on typical modern CPU architectures? What makes them better?
As per requests in comments: I prefer int instead of unsigned because...
it's shorter (I'm serious!)
it's more generic and more intuitive (i. e. I like to be able to assume that 1 - 2 is -1 and not some obscure huge number)
what if I want to signal an error by returning an out-of-range value?
Of course there are counter-arguments, but these are the principal reasons I like to declare my integers as int instead of unsigned. Of course, this is not always true, in other cases, an unsigned is just a better tool for a task, I am just answering the "why would anyone prefer defaulting to signed" question specifically.
Let me paraphrase the video, as the experts said it succinctly.
Andrei Alexandrescu:
No simple guideline.
In systems programming, we need integers of different sizes and signedness.
Many conversions and arcane rules govern arithmetic (like for auto), so we need to be careful.
Chandler Carruth:
Here's some simple guidelines:
Use signed integers unless you need two's complement arithmetic or a bit pattern
Use the smallest integer that will suffice.
Otherwise, use int if you think you could count the items, and a 64-bit integer if it's even more than you would want to count.
Stop worrying and use tools to tell you when you need a different type or size.
Bjarne Stroustrup:
Use int until you have a reason not to.
Use unsigned only for bit patterns.
Never mix signed and unsigned
Wariness about signedness rules aside, my one-sentence take away from the experts:
Use the appropriate type, and when you don't know, use an int until you do know.
Several reasons:
Arithmetic on unsigned always yields unsigned, which can be a problem when subtracting integer quantities that can reasonably result in a negative result — think subtracting money quantities to yield balance, or array indices to yield distance between elements. If the operands are unsigned, you get a perfectly defined, but almost certainly meaningless result, and a result < 0 comparison will always be false (of which modern compilers will fortunately warn you).
unsigned has the nasty property of contaminating the arithmetic where it gets mixed with signed integers. So, if you add a signed and unsigned and ask whether the result is greater than zero, you can get bitten, especially when the unsigned integral type is hidden behind a typedef.
There are no reasons to prefer signed over unsigned, aside from purely sociological ones, i.e. some people believe that average programmers are not competent and/or attentive enough to write proper code in terms of unsigned types. This is often the main reasoning used by various "speakers", regardless of how respected those speakers might be.
In reality, competent programmers quickly develop and/or learn the basic set of programming idioms and skills that allow them to write proper code in terms of unsigned integral types.
Note also that the fundamental differences between signed and unsigned semantics are always present (in superficially different form) in other parts of C and C++ language, like pointer arithmetic and iterator arithmetic. Which means that in general case the programmer does not really have the option of avoiding dealing with issues specific to unsigned semantics and the "problems" it brings with it. I.e. whether you want it or not, you have to learn to work with ranges that terminate abruptly at their left end and terminate right here (not somewhere in the distance), even if you adamantly avoid unsigned integers.
Also, as you probably know, many parts of standard library already rely on unsigned integer types quite heavily. Forcing signed arithmetic into the mix, instead of learning to work with unsigned one, will only result in disastrously bad code.
The only real reason to prefer signed in some contexts that comes to mind is that in mixed integer/floating-point code signed integer formats are typically directly supported by FPU instruction set, while unsigned formats are not supported at all, making the compiler to generate extra code for conversions between floating-point values and unsigned values. In such code signed types might perform better.
But at the same time in purely integer code unsigned types might perform better than signed types. For example, integer division often requires additional corrective code in order to satisfy the requirements of the language spec. The correction is only necessary in case of negative operands, so it wastes CPU cycles in situations when negative operands are not really used.
In my practice I devotedly stick to unsigned wherever I can, and use signed only if I really have to.
The integral types in C and many languages which derive from it have two general usage cases: to represent numbers, or represent members of an abstract algebraic ring. For those unfamiliar with abstract algebra, the primary notion behind a ring is that adding, subtracting, or multiplying two items of a ring should yield another item of that ring--it shouldn't crash or yield a value outside the ring. On a 32-bit machine, adding unsigned 0x12345678 to unsigned 0xFFFFFFFF doesn't "overflow"--it simply yields the result 0x12345677 which is defined for the ring of integers congruent mod 2^32 (because the arithmetic result of adding 0x12345678 to 0xFFFFFFFF, i.e. 0x112345677, is congruent to 0x12345677 mod 2^32).
Conceptually, both purposes (representing numbers, or representing members of the ring of integers congruent mod 2^n) may be served by both signed and unsigned types, and many operations are the same for both usage cases, but there are some differences. Among other things, an attempt to add two numbers should not be expected to yield anything other than the correct arithmetic sum. While it's debatable whether a language should be required to generate the code necessary to guarantee that it won't (e.g. that an exception would be thrown instead), one could argue that for code which uses integral types to represent numbers such behavior would be preferable to yielding an arithmetically-incorrect value and compilers shouldn't be forbidden from behaving that way.
The implementers of the C standards decided to use signed integer types to represent numbers and unsigned types to represent members of the algebraic ring of integers congruent mod 2^n. By contrast, Java uses signed integers to represent members of such rings (though they're interpreted differently in some contexts; conversions among differently-sized signed types, for example, behave differently from among unsigned ones) and Java has neither unsigned integers nor any primitive integral types which behave as numbers in all non-exceptional cases.
If a language provided a choice of signed and unsigned representations for both numbers and algebraic-ring numbers, it might make sense to use unsigned numbers to represent quantities that will always be positive. If, however, the only unsigned types represent members of an algebraic ring, and the only types that represent numbers are the signed ones, then even if a value will always be positive it should be represented using a type designed to represent numbers.
Incidentally, the reason that (uint32_t)-1 is 0xFFFFFFFF stems from the fact that casting a signed value to unsigned is equivalent to adding unsigned zero, and adding an integer to an unsigned value is defined as adding or subtracting its magnitude to/from the unsigned value according to the rules of the algebraic ring which specify that if X=Y-Z, then X is the one and only member of that ring such X+Z=Y. In unsigned math, 0xFFFFFFFF is the only number which, when added to unsigned 1, yields unsigned zero.
Speed is the same on modern architectures. The problem with unsigned int is that it can sometimes generate unexpected behavior. This can create bugs that wouldn't show up otherwise.
Normally when you subtract 1 from a value, the value gets smaller. Now, with both signed and unsigned int variables, there will be a time that subtracting 1 creates a value that is MUCH LARGER. The key difference between unsigned int and int is that with unsigned int the value that generates the paradoxical result is a commonly used value --- 0 --- whereas with signed the number is safely far away from normal operations.
As far as returning -1 for an error value --- modern thinking is that it's better to throw an exception than to test for return values.
It's true that if you properly defend your code you won't have this problem, and if you use unsigned religiously everywhere you will be okay (provided that you are only adding, and never subtracting, and that you never get near MAX_INT). I use unsigned int everywhere. But it takes a lot of discipline. For a lot of programs, you can get by with using int and spend your time on other bugs.
Use int by default: it plays nicer with the rest of the language
most common domain usage is regular arithmetic, not modular arithmetic
int main() {} // see an unsigned?
auto i = 0; // i is of type int
Only use unsigned for modulo arithmetic and bit-twiddling (in particular shifting)
has different semantics than regular arithmetic, make sure it is what you want
bit-shifting signed types is subtle (see comments by #ChristianRau)
if you need a > 2Gb vector on a 32-bit machine, upgrade your OS / hardware
Never mix signed and unsigned arithmetic
the rules for that are complicated and surprising (either one can be converted to the other, depending on the relative type sizes)
turn on -Wconversion -Wsign-conversion -Wsign-promo (gcc is better than Clang here)
the Standard Library got it wrong with std::size_t (quote from the GN13 video)
use range-for if you can,
for(auto i = 0; i < static_cast<int>(v.size()); ++i) if you must
Don't use short or large types unless you actually need them
current architectures data flow caters well to 32-bit non-pointer data (but note the comment by #BenVoigt about cache effects for smaller types)
char and short save space but suffer from integral promotions
are you really going to count to over all int64_t?
To answer the actual question: For the vast number of things, it doesn't really matter. int can be a little easier to deal with things like subtraction with the second operand larger than the first and you still get a "expected" result.
There is absolutely no speed difference in 99.9% of cases, because the ONLY instructions that are different for signed and unsigned numbers are:
Making the number longer (fill with the sign for signed or zero for unsigned) - it takes the same effort to do both.
Comparisons - a signed number, the processor has to take into account if either number is negative or not. But again, it's the same speed to make a compare with signed or unsigned numbers - it's just using a different instruction code to say "numbers that have the highest bit set are smaller than numbers with the highest bit not set" (essentially). [Pedantically, it's nearly always the operation using the RESULT of a comparison that is different - the most common case being a conditional jump or branch instruction - but either way, it's the same effort, just that the inputs are taken to mean slightly different things].
Multiply and divide. Obviously, sign conversion of the result needs to happen if it's a signed multiplication, where a unsigned should not change the sign of the result if the highest bit of one of the inputs is set. And again, the effort is (as near as we care for) identical.
(I think there are one or two other cases, but the result is the same - it really doesn't matter if it's signed or unsigned, the effort to perform the operation is the same for both).
The int type more closely resembles the behavior of mathematical integers than the unsigned type.
It is naive to prefer the unsigned type simply because a situation does not require negative values to be represented.
The problem is that the unsigned type has a discontinuous behavior right next to zero. Any operation that tries to compute a small negative value, instead produces some large positive value. (Worse: one that is implementation-defined.)
Algebraic relationships such as that a < b implies that a - b < 0 are wrecked in the unsigned domain, even for small values like a = 3 and b = 4.
A descending loop like for (i = max - 1; i >= 0; i--) fails to terminate if i is made unsigned.
Unsigned quirks can cause a problem which will affect code regardless of whether that code expects to be representing only positive quantities.
The virtue of the unsigned types is that certain operations that are not portably defined at the bit level for the signed types are that way for the unsigned types. The unsigned types lack a sign bit, and so shifting and masking through the sign bit isn't a problem. The unsigned types are good for bitmasks, and for code that implements precise arithmetic in a platform-independent way. Unsigned opearations will simulate two's complement semantics even on a non two's complement machine. Writing a multi-precision (bignum) library practically requires arrays of unsigned types to be used for the representation, rather than signed types.
The unsigned types are also suitable in situations in which numbers behave like identifiers and not as arithmetic types. For instance, an IPv4 address can be represented in a 32 bit unsigned type. You wouldn't add together IPv4 addresses.
int is preferred because it's most commonly used. unsigned is usually associated with bit operations. Whenever I see an unsigned, I assume it's used for bit twiddling.
If you need a bigger range, use a 64-bit integer.
If you're iterating over stuff using indexes, types usually have size_type, and you shouldn't care whether it's signed or unsigned.
Speed is not an issue.
For me, in addition to all the integers in the range of 0..+2,147,483,647 contained within the set of signed and unsigned integers on 32 bit architectures, there is a higher probability that I will need to use -1 (or smaller) than need to use +2,147,483,648 (or larger).
One good reason that I can think of is in case of detecting overflow.
For the use cases such as the count of items in an array, length of a string, or size of memory block, you can overflow an unsigned int and you may not notice a difference even when you take a look at the variable. If it is an signed int, the variable will be less than zero and clearly wrong.
You can simply check to see if the variable is zero when you want to use it. This way, you do not have to check for overflow after every arithmetic operation as is the case for unsigned ints.
It gives unexpected result when doing simple arithmetic operation:
unsigned int i;
i = 1 - 2;
//i is now 4294967295 on a 64bit machine
It gives unexpected result when doing simple comparison:
unsigned int j = 1;
std::cout << (j>-1) << std::endl;
//output 0 as false but 1 is greater than -1
This is because when doing the operations above, the signed ints are converted to unsigned, and it overflows and goes to a really big number.