Why do arithmetic operations on unsigned chars promote them to signed integers? - c++

Many answers to similar questions point out that it is so due to the standard. But, I cannot understand the reasoning behind this decision by the standard setters.
From my understanding an unsigned char does not store the value in 2's complement form. So, I don't see a situation where let's say XORing two unsigned chars would produce unexpected behavior. Therefore, promoting them to int just seems like a waste of space (in most cases) and CPU cycles.
Moreover, why int? If a variable is being declared as unsigned, clearly the unsignedness is important to the programmer, therefore a promotion to an unsigned int would still make more sense than an int, in my opinion.
[EDIT #1] As stated out in the comments, promotion to unsigned int will take place if an int cannot sufficiently accommodate the value in the unsigned char.
[EDIT #2] To clarify the question, if it is about the performance benefit of operating over int than char, then why is it in the standard? This could have been given as a suggestion to compiler designers for better optimization. Now, if someone were to design a compiler which didn't do this that would make their compiler as one not adhering to the C/C++ standard fully, even though, hypothetically this compiler did support all other required features of the language. In a nutshell, I cannot figure out a reason for why I cannot operate directly over unsigned chars, therefore the requirement to promote them to ints, seems unnecessary. Can you give me an example which proves this wrong?

You can find this document on-line: Rationale for International Standard - Programming Languages - C (Revision 5.10, 2003).
Chapter 6.3 (p. 44 - 45) is about conversions
Between the publication of K&R and the development of C89, a serious divergence had occurred among implementations in the evolution of integer promotion rules. Implementations fell into two major camps which may be characterized as unsigned preserving and value preserving.
The difference between these approaches centered on the treatment of unsigned char and unsigned short when widened by the integer promotions, but the decision had an impact on the typing of constants as well (see §6.4.4.1).
The unsigned preserving approach calls for promoting the two smaller unsigned types to unsigned int. This is a simple rule, and yields a type which is independent of execution environment.
The value preserving approach calls for promoting those types to signed int if that type can properly represent all the values of the original type, and otherwise for promoting those types to unsigned int.
Thus, if the execution environment represents short as something smaller than int, unsigned short becomes int; otherwise it becomes unsigned int. Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with two's complement arithmetic and quiet wraparound on signed overflow - that is, in most current implementations. In such implementations, differences between the two only appear when these two conditions are both true:
An expression involving an unsigned char or unsigned short produces an int-wide result in which the sign bit is set, that is, either a unary operation on such a type, or a binary operation in which the other operand is an int or “narrower” type.
The result of the preceding expression is used in a context in which its signedness is significant:
• sizeof(int) < sizeof(long) and it is in a context where it must be widened to a long type, or
• it is the left operand of the right-shift operator in an implementation where this shift is defined as arithmetic, or
• it is either operand of /, %, <, <=, >, or >=.
In such circumstances a genuine ambiguity of interpretation arises. The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation. Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising, or it may be exactly what is desired by a knowledgeable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.
One of the important outcomes of exploring this problem is the understanding that high-quality compilers might do well to look for such questionable code and offer (optional) diagnostics, and that conscientious instructors might do well to warn programmers of the problems of implicit type conversions.
The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the C89 Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.
QUIET CHANGE IN C89
A program that depends upon unsigned preserving arithmetic conversions will behave differently, probably without complaint. This was considered the most serious semantic change made by the C89 Committee to a widespread current practice.
For reference, you can find more details about those conversions updated to C11 in this answer by Lundin.

Related

How to use bitwise operator with unsigned char data type? [duplicate]

In the following C snippet that checks if the first two bits of a 16-bit sequence are set:
bool is_pointer(unsigned short int sequence) {
return (sequence >> 14) == 3;
}
CLion's Clang-Tidy is giving me a "Use of a signed integer operand with a binary bitwise operator" warning, and I can't understand why. Is unsigned short not unsigned enough?
The code for this warning checks if either operand to the bitwise operator is signed. It is not sequence causing the warning, but 14, and you can alleviate the problem by making 14 unsigned by appending a u to the end.
(sequence >> 14u)
This warning is bad. As Roland's answer describes, CLion is fixing this.
There is a check in clang-tidy that is called hicpp-signed-bitwise. This check follows the wording of the HIC++ standard. That standard is freely available and says:
5.6.1. Do not use bitwise operators with signed operands
Use of signed operands with bitwise operators is in some cases subject to undefined or implementation defined behavior. Therefore, bitwise operators should only be used with operands of unsigned integral types.
The authors of the HIC++ coding standard misinterpreted the intention of the C and C++ standards and either accidentally or intentionally focused on the type of the operands instead of the value of the operands.
The check in clang-tidy implements exactly this wording, in order to conform to that standard. That check is not intended to be generally useful, its only purpose is to help the poor souls whose programs have to conform to that one stupid rule from the HIC++ standard.
The crucial point is that by definition integer literals without any suffix are of type int, and that type is defined as being a signed type. HIC++ now wrongly concludes that positive integer literals might be negative and thus could invoke undefined behavior.
For comparison, the C11 standard says:
6.5.7 Bitwise shift operators
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
This wording is carefully chosen and emphasises that the value of the right operand is important, not its type. It also covers the case of a too large value, while the HIC++ standard simply forgot that case. Therefore, saying 1u << 1000u is ok in HIC++, while 1 << 3 isn't.
The best strategy is to explicitly disable this single check. There are several bug reports for CLion mentioning this, and it is getting fixed there.
Update 2019-12-16: I asked Perforce what the motivation behind this exact wording was and whether the wording was intentional. Here is their response:
Our C++ team who were involved in creating the HIC++ standard have taken a look at the Stack Overflow question you mentioned.
In short, referring to the object type in the HIC++ rule instead of the value is an intentional choice to allow easier automated checking of the code. The type of an object is always known, while the value is not.
HIC++ rules in general aim to be "decidable". Enforcing against the type ensures that a decidable check is always possible, ie. directly where the operator is used or where a signed type is converted to unsigned.
The rationale explicitly refers to "possible" undefined behavior, therefore a sensible implementation can exclude:
constants unless there is definitely an issue and,
unsigned types that are promoted to signed types.
The best operation is therefore for CLion to limit the checking to non-constant types before promotion.
I think the integer promotion causes here the warning. Operands smaller than an int are widened to integer for the arithmetic expression, which is signed. So your code is effectively return ( (int)sequence >> 14)==3; which leds to the warning. Try return ( (unsigned)sequence >> 14)==3; or return (sequence & 0xC000)==0xC000;.

Why does C++ standard specify signed integer be cast to unsigned in binary operations with mixed signedness?

The C and C++ standards stipulate that, in binary operations between a signed and an unsigned integer of the same rank, the signed integer is cast to unsigned. There are many questions on SO caused by this... let's call it strange behavior: unsigned to signed conversion, C++ Implicit Conversion (Signed + Unsigned), A warning - comparison between signed and unsigned integer expressions, % (mod) with mixed signedness, etc.
But none of these give any reasons as to why the standard goes this way, rather than casting towards signed ints. I did find a self-proclaimed guru who says it's the obvious right thing to do, but he doesn't give a reasoning either: http://embeddedgurus.com/stack-overflow/2009/08/a-tutorial-on-signed-and-unsigned-integers/.
Looking through my own code, wherever I combine signed and unsigned integers, I always need to cast from unsigned to signed. There are places where it doesn't matter, but I haven't found a single example of code where it makes sense to cast the signed integer to unsigned.
What are cases where casting to unsigned in the correct thing to do? Why is the standard the way it is?
Casting from unsigned to signed results in implementation-defined behaviour if the value cannot be represented. Casting from signed to unsigned is always modulo two to the power of the unsigned's bitsize, so it is always well-defined.
The standard conversion is to the signed type if every possible unsigned value is representable in the signed type. Otherwise, the unsigned type is chosen. This guarantees that the conversion is always well-defined.
Notes
As indicated in comments, the conversion algorithm for C++ was inherited from C to maintain compatibility, which is technically the reason it is so in C++.
When this note was written, the C++ standard allowed three binary representations, including sign-magnitude and ones' complement. That's no longer the case, and there's every reason to believe that it won't be the case for C either in the reasonably bear future. I'm leaving the footnote as a historical relic, but it says nothing relevant to the current language.
It has been suggested that the decision in the standard to define signed to unsigned conversions and not unsigned to signed conversion is somehow arbitrary, and that the other possible decision would be symmetric. However, the possible conversion are not symmetric.
In both of the non-2's-complement representations contemplated by the standard, an n-bit signed representation can represent only 2n−1 values, whereas an n-bit unsigned representation can represent 2n values. Consequently, a signed-to-unsigned conversion is lossless and can be reversed (although one unsigned value can never be produced). The unsigned-to-signed conversion, on the other hand, must collapse two different unsigned values onto the same signed result.
In a comment, the formula sint = uint > sint_max ? uint - uint_max : uint is proposed. This coalesces the values uint_max and 0; both are mapped to 0. That's a little weird even for non-2s-complement representations, but for 2's-complement it's unnecessary and, worse, it requires the compiler to emit code to laboriously compute this unnecessary conflation. By contrast the standard's signed-to-unsigned conversion is lossless and in the common case (2's-complement architectures) it is a no-op.
If the signed casting was chosen, then simple a+1 would always result in signed type (unless constant was typed as 1U).
Assume a was unsigned int, then this seemingly innocent increment a+1 could lead to things like undefined overflow or "index out of bound", in the case of arr[a+1]
Thus, "unsigned casting" seems like a safer approach because people probably don't even expect casting to be happening in the first place, when simply adding a constant.
This is sort of a half-answer, because I don't really understand the committee's reasoning.
From the C90 committee's rationale document: https://www.lysator.liu.se/c/rat/c2.html#3-2-1-1
Since the publication of K&R, a serious divergence has occurred among implementations of C in the evolution of integral promotion rules. Implementations fall into two major camps, which may be characterized as unsigned preserving and value preserving. The difference between these approaches centers on the treatment of unsigned char and unsigned short, when widened by the integral promotions, but the decision has an impact on the typing of constants as well (see §3.1.3.2).
... and apparently also on the conversions done to match the two operands for any operator. It continues:
Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with twos-complement arithmetic and quiet wraparound on signed overflow --- that is, in most current implementations.
It then specifies a case where ambiguity of interpretation arises, and states:
The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. (Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation.) Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising --- or it may be exactly what is desired by a knowledgable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.
and:
The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.
Thus, they consider the case of int + unsigned an unwanted situation, and chose conversion rules for char and short that yield as few of those situations as possible, even though most compilers at the time followed a different approach. If I understand right, this choice then forced them to follow the current choice of int + unsigned yielding an unsigned operation.
I still find all of this truly bizarre.
Why does C++ standard specify signed integer be cast to unsigned in binary operations with mixed signedness?
I suppose that you mean converted rather than "cast". A cast is an explicit conversion.
As I'm not the author nor have I encountered documentation about this decision, I cannot promise that my explanation is the truth. However, there is a fairly reasonable potential explanation: Because that's how C works, and C++ was based on C. Unless there was an opportunity improve upon the rules, there would be no reason to change what works and what programmers have been used to. I don't know if the committee even deliberated changing this.
I know what you may be thinking: "Why does C standard specify signed integer...". Well, I'm also not the author of C standard, but there is at least a fairly extensive document titled "Rationale for
American National Standard
for Information Systems -
Programming Language -
C". As extensive it is, it doesn't cover this question unfortunately (it does cover a very similar question of how to promote integer types narrower than int in which regard the standard differs from some of the C implementations that pre-date the standard).
I don't have access to a pre-standard K&R documents, but I did find a passage from book "Expert C Programming: Deep C Secrets" which quotes rules from the pre-standard K&R C (in context of comparing the rule with the standardised ones):
Section 6.6 Arithmetic Conversions
A great many operators cause conversions and yield result types in a similar way. This pattern will be called the "usual arithmetic conversions."
First, any operands of type char or short are converted to int, and any of type float are converted to double. Then if either operand is double, the other is converted to double and that is the type of the result. Otherwise, if either operand is long, the other is converted to long and that is the type of the result. Otherwise, if either operand is unsigned, the other is converted to unsigned and that is the type of the result. Otherwise, both operands must be int, and that is the type of the result.
So, it appears that this has been the rule from since before standardisation of C and was presumably the chosen by the designer himself. Unless someone can find a written rationale, we may never know the answer.
What are cases where casting to unsigned in the correct thing to do?
Here is an extremely simple case:
unsigned u = INT_MAX;
u + 42;
The type of the literal 42 is signed, so with your proposed / designer rule, u + 42 would also be signed. This would be quite surprising and would result in the shown program to have undefined behaviour due to signed integer overflow.
Basically, implicit conversion to signed and to unsigned each have their problems.

In C++, what happens when I use static_cast<char> on an integer value outside -128,127 range?

In a code compiled on i386 Linux using g++, I have used static_cast<char>() cast on a value that might exceed the valid range of -128,127 for a char. There were no errors or exceptions and so I used the code in production.
The problem is now I don't know how this code might behave when a value outside this range is thrown at it. There is no problem if data is modified or truncated, I only need to know how this modification behaves on this particular platform.
Also what would happen if C-style cast ((char)value) had been used? would it behave differently?
In your case this would be an explicit type conversion. Or to be more precise an integral conversions.
The standard says about this(4.7):
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
So your problem is implementation-defined. On the other hand I have not yet seen a compiler that does not just truncate the larger value to the smaller one. And I have never seen any compiler that uses the rule mentioned above.
So it should be fairly safe to just cast your integer/short to the char.
I don't know the rules for an C cast by heart and I really try to avoid them because it is not easy to say which rule will kick in.
This is dealt with in §4.7 of the standard (integral conversions).
The answer depends on whether in the implementation in question char is signed or unsigned. If it is unsigned, then modulo arithmetic is applied. §4.7/2 of C++11 states: "If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2 n where n is the number of bits used to represent the unsigned type)." This means that if the input integer is not negative, the normal bit truncation you expect will arise. If is is negative, the same will apply if negative numbers are represented by 2's complement, otherwise the conversion will be bit altering.
If char is signed, §4.7/3 of C++11 applies: "If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined." So it is up to the documentation for the particular implementation you use. Having said that, on 2's complement systems (ie all those in normal use) I have not seen a case where anything other than normal bit truncation occurs for char types: apart from anything else, by virtue of §3.9.1/1 of the c++11 standard all character types (char, unsigned char and signed char) must have the same object representation and alignment.
The effect of a C style case, an explicit static_cast and an implicit narrowing conversion is the same.
Technically the language specs for unsigned types agree in inposing a plain base-2. And for unsigned plain base-2 its pretty obvious what extension and truncation do.
When going to unsigned, however, the specs are more "tolerant" allowing potentially different kind of processor to use different ways to represent signed numbers. And since a same number may have in different platform different representations is practically not possible to provide a description on what happen to it when adding or removing bits.
For this reason, language specification remain more vague by saying that "the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined"
In other words, compiler manufacturer are required to do the best as they can to keep the numeric value. But when this cannot be done, they are free to adapt to what is more efficient for them.

Why prefer signed over unsigned in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'd like to understand better why choose int over unsigned?
Personally, I've never liked signed values unless there is a valid reason for them. e.g. count of items in an array, or length of a string, or size of memory block, etc., so often these things cannot possibly be negative. Such a value has no possible meaning. Why prefer int when it is misleading in all such cases?
I ask this because both Bjarne Stroustrup and Chandler Carruth gave the advice to prefer int over unsigned here (approx 12:30').
I can see the argument for using int over short or long - int is the "most natural" data width for the target machine architecture.
But signed over unsigned has always annoyed me. Are signed values genuinely faster on typical modern CPU architectures? What makes them better?
As per requests in comments: I prefer int instead of unsigned because...
it's shorter (I'm serious!)
it's more generic and more intuitive (i. e. I like to be able to assume that 1 - 2 is -1 and not some obscure huge number)
what if I want to signal an error by returning an out-of-range value?
Of course there are counter-arguments, but these are the principal reasons I like to declare my integers as int instead of unsigned. Of course, this is not always true, in other cases, an unsigned is just a better tool for a task, I am just answering the "why would anyone prefer defaulting to signed" question specifically.
Let me paraphrase the video, as the experts said it succinctly.
Andrei Alexandrescu:
No simple guideline.
In systems programming, we need integers of different sizes and signedness.
Many conversions and arcane rules govern arithmetic (like for auto), so we need to be careful.
Chandler Carruth:
Here's some simple guidelines:
Use signed integers unless you need two's complement arithmetic or a bit pattern
Use the smallest integer that will suffice.
Otherwise, use int if you think you could count the items, and a 64-bit integer if it's even more than you would want to count.
Stop worrying and use tools to tell you when you need a different type or size.
Bjarne Stroustrup:
Use int until you have a reason not to.
Use unsigned only for bit patterns.
Never mix signed and unsigned
Wariness about signedness rules aside, my one-sentence take away from the experts:
Use the appropriate type, and when you don't know, use an int until you do know.
Several reasons:
Arithmetic on unsigned always yields unsigned, which can be a problem when subtracting integer quantities that can reasonably result in a negative result — think subtracting money quantities to yield balance, or array indices to yield distance between elements. If the operands are unsigned, you get a perfectly defined, but almost certainly meaningless result, and a result < 0 comparison will always be false (of which modern compilers will fortunately warn you).
unsigned has the nasty property of contaminating the arithmetic where it gets mixed with signed integers. So, if you add a signed and unsigned and ask whether the result is greater than zero, you can get bitten, especially when the unsigned integral type is hidden behind a typedef.
There are no reasons to prefer signed over unsigned, aside from purely sociological ones, i.e. some people believe that average programmers are not competent and/or attentive enough to write proper code in terms of unsigned types. This is often the main reasoning used by various "speakers", regardless of how respected those speakers might be.
In reality, competent programmers quickly develop and/or learn the basic set of programming idioms and skills that allow them to write proper code in terms of unsigned integral types.
Note also that the fundamental differences between signed and unsigned semantics are always present (in superficially different form) in other parts of C and C++ language, like pointer arithmetic and iterator arithmetic. Which means that in general case the programmer does not really have the option of avoiding dealing with issues specific to unsigned semantics and the "problems" it brings with it. I.e. whether you want it or not, you have to learn to work with ranges that terminate abruptly at their left end and terminate right here (not somewhere in the distance), even if you adamantly avoid unsigned integers.
Also, as you probably know, many parts of standard library already rely on unsigned integer types quite heavily. Forcing signed arithmetic into the mix, instead of learning to work with unsigned one, will only result in disastrously bad code.
The only real reason to prefer signed in some contexts that comes to mind is that in mixed integer/floating-point code signed integer formats are typically directly supported by FPU instruction set, while unsigned formats are not supported at all, making the compiler to generate extra code for conversions between floating-point values and unsigned values. In such code signed types might perform better.
But at the same time in purely integer code unsigned types might perform better than signed types. For example, integer division often requires additional corrective code in order to satisfy the requirements of the language spec. The correction is only necessary in case of negative operands, so it wastes CPU cycles in situations when negative operands are not really used.
In my practice I devotedly stick to unsigned wherever I can, and use signed only if I really have to.
The integral types in C and many languages which derive from it have two general usage cases: to represent numbers, or represent members of an abstract algebraic ring. For those unfamiliar with abstract algebra, the primary notion behind a ring is that adding, subtracting, or multiplying two items of a ring should yield another item of that ring--it shouldn't crash or yield a value outside the ring. On a 32-bit machine, adding unsigned 0x12345678 to unsigned 0xFFFFFFFF doesn't "overflow"--it simply yields the result 0x12345677 which is defined for the ring of integers congruent mod 2^32 (because the arithmetic result of adding 0x12345678 to 0xFFFFFFFF, i.e. 0x112345677, is congruent to 0x12345677 mod 2^32).
Conceptually, both purposes (representing numbers, or representing members of the ring of integers congruent mod 2^n) may be served by both signed and unsigned types, and many operations are the same for both usage cases, but there are some differences. Among other things, an attempt to add two numbers should not be expected to yield anything other than the correct arithmetic sum. While it's debatable whether a language should be required to generate the code necessary to guarantee that it won't (e.g. that an exception would be thrown instead), one could argue that for code which uses integral types to represent numbers such behavior would be preferable to yielding an arithmetically-incorrect value and compilers shouldn't be forbidden from behaving that way.
The implementers of the C standards decided to use signed integer types to represent numbers and unsigned types to represent members of the algebraic ring of integers congruent mod 2^n. By contrast, Java uses signed integers to represent members of such rings (though they're interpreted differently in some contexts; conversions among differently-sized signed types, for example, behave differently from among unsigned ones) and Java has neither unsigned integers nor any primitive integral types which behave as numbers in all non-exceptional cases.
If a language provided a choice of signed and unsigned representations for both numbers and algebraic-ring numbers, it might make sense to use unsigned numbers to represent quantities that will always be positive. If, however, the only unsigned types represent members of an algebraic ring, and the only types that represent numbers are the signed ones, then even if a value will always be positive it should be represented using a type designed to represent numbers.
Incidentally, the reason that (uint32_t)-1 is 0xFFFFFFFF stems from the fact that casting a signed value to unsigned is equivalent to adding unsigned zero, and adding an integer to an unsigned value is defined as adding or subtracting its magnitude to/from the unsigned value according to the rules of the algebraic ring which specify that if X=Y-Z, then X is the one and only member of that ring such X+Z=Y. In unsigned math, 0xFFFFFFFF is the only number which, when added to unsigned 1, yields unsigned zero.
Speed is the same on modern architectures. The problem with unsigned int is that it can sometimes generate unexpected behavior. This can create bugs that wouldn't show up otherwise.
Normally when you subtract 1 from a value, the value gets smaller. Now, with both signed and unsigned int variables, there will be a time that subtracting 1 creates a value that is MUCH LARGER. The key difference between unsigned int and int is that with unsigned int the value that generates the paradoxical result is a commonly used value --- 0 --- whereas with signed the number is safely far away from normal operations.
As far as returning -1 for an error value --- modern thinking is that it's better to throw an exception than to test for return values.
It's true that if you properly defend your code you won't have this problem, and if you use unsigned religiously everywhere you will be okay (provided that you are only adding, and never subtracting, and that you never get near MAX_INT). I use unsigned int everywhere. But it takes a lot of discipline. For a lot of programs, you can get by with using int and spend your time on other bugs.
Use int by default: it plays nicer with the rest of the language
most common domain usage is regular arithmetic, not modular arithmetic
int main() {} // see an unsigned?
auto i = 0; // i is of type int
Only use unsigned for modulo arithmetic and bit-twiddling (in particular shifting)
has different semantics than regular arithmetic, make sure it is what you want
bit-shifting signed types is subtle (see comments by #ChristianRau)
if you need a > 2Gb vector on a 32-bit machine, upgrade your OS / hardware
Never mix signed and unsigned arithmetic
the rules for that are complicated and surprising (either one can be converted to the other, depending on the relative type sizes)
turn on -Wconversion -Wsign-conversion -Wsign-promo (gcc is better than Clang here)
the Standard Library got it wrong with std::size_t (quote from the GN13 video)
use range-for if you can,
for(auto i = 0; i < static_cast<int>(v.size()); ++i) if you must
Don't use short or large types unless you actually need them
current architectures data flow caters well to 32-bit non-pointer data (but note the comment by #BenVoigt about cache effects for smaller types)
char and short save space but suffer from integral promotions
are you really going to count to over all int64_t?
To answer the actual question: For the vast number of things, it doesn't really matter. int can be a little easier to deal with things like subtraction with the second operand larger than the first and you still get a "expected" result.
There is absolutely no speed difference in 99.9% of cases, because the ONLY instructions that are different for signed and unsigned numbers are:
Making the number longer (fill with the sign for signed or zero for unsigned) - it takes the same effort to do both.
Comparisons - a signed number, the processor has to take into account if either number is negative or not. But again, it's the same speed to make a compare with signed or unsigned numbers - it's just using a different instruction code to say "numbers that have the highest bit set are smaller than numbers with the highest bit not set" (essentially). [Pedantically, it's nearly always the operation using the RESULT of a comparison that is different - the most common case being a conditional jump or branch instruction - but either way, it's the same effort, just that the inputs are taken to mean slightly different things].
Multiply and divide. Obviously, sign conversion of the result needs to happen if it's a signed multiplication, where a unsigned should not change the sign of the result if the highest bit of one of the inputs is set. And again, the effort is (as near as we care for) identical.
(I think there are one or two other cases, but the result is the same - it really doesn't matter if it's signed or unsigned, the effort to perform the operation is the same for both).
The int type more closely resembles the behavior of mathematical integers than the unsigned type.
It is naive to prefer the unsigned type simply because a situation does not require negative values to be represented.
The problem is that the unsigned type has a discontinuous behavior right next to zero. Any operation that tries to compute a small negative value, instead produces some large positive value. (Worse: one that is implementation-defined.)
Algebraic relationships such as that a < b implies that a - b < 0 are wrecked in the unsigned domain, even for small values like a = 3 and b = 4.
A descending loop like for (i = max - 1; i >= 0; i--) fails to terminate if i is made unsigned.
Unsigned quirks can cause a problem which will affect code regardless of whether that code expects to be representing only positive quantities.
The virtue of the unsigned types is that certain operations that are not portably defined at the bit level for the signed types are that way for the unsigned types. The unsigned types lack a sign bit, and so shifting and masking through the sign bit isn't a problem. The unsigned types are good for bitmasks, and for code that implements precise arithmetic in a platform-independent way. Unsigned opearations will simulate two's complement semantics even on a non two's complement machine. Writing a multi-precision (bignum) library practically requires arrays of unsigned types to be used for the representation, rather than signed types.
The unsigned types are also suitable in situations in which numbers behave like identifiers and not as arithmetic types. For instance, an IPv4 address can be represented in a 32 bit unsigned type. You wouldn't add together IPv4 addresses.
int is preferred because it's most commonly used. unsigned is usually associated with bit operations. Whenever I see an unsigned, I assume it's used for bit twiddling.
If you need a bigger range, use a 64-bit integer.
If you're iterating over stuff using indexes, types usually have size_type, and you shouldn't care whether it's signed or unsigned.
Speed is not an issue.
For me, in addition to all the integers in the range of 0..+2,147,483,647 contained within the set of signed and unsigned integers on 32 bit architectures, there is a higher probability that I will need to use -1 (or smaller) than need to use +2,147,483,648 (or larger).
One good reason that I can think of is in case of detecting overflow.
For the use cases such as the count of items in an array, length of a string, or size of memory block, you can overflow an unsigned int and you may not notice a difference even when you take a look at the variable. If it is an signed int, the variable will be less than zero and clearly wrong.
You can simply check to see if the variable is zero when you want to use it. This way, you do not have to check for overflow after every arithmetic operation as is the case for unsigned ints.
It gives unexpected result when doing simple arithmetic operation:
unsigned int i;
i = 1 - 2;
//i is now 4294967295 on a 64bit machine
It gives unexpected result when doing simple comparison:
unsigned int j = 1;
std::cout << (j>-1) << std::endl;
//output 0 as false but 1 is greater than -1
This is because when doing the operations above, the signed ints are converted to unsigned, and it overflows and goes to a really big number.

Why is unsigned integer overflow defined behavior but signed integer overflow isn't?

Unsigned integer overflow is well defined by both the C and C++ standards. For example, the C99 standard (§6.2.5/9) states
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
However, both standards state that signed integer overflow is undefined behavior. Again, from the C99 standard (§3.4.3/1)
An example of undefined behavior is the behavior on integer overflow
Is there an historical or (even better!) a technical reason for this discrepancy?
The historical reason is that most C implementations (compilers) just used whatever overflow behaviour was easiest to implement with the integer representation it used. C implementations usually used the same representation used by the CPU - so the overflow behavior followed from the integer representation used by the CPU.
In practice, it is only the representations for signed values that may differ according to the implementation: one's complement, two's complement, sign-magnitude. For an unsigned type there is no reason for the standard to allow variation because there is only one obvious binary representation (the standard only allows binary representation).
Relevant quotes:
C99 6.2.6.1:3:
Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.
C99 6.2.6.2:2:
If the sign bit is one, the value shall be modified in one of the following ways:
— the corresponding value with sign bit 0 is negated (sign and magnitude);
— the sign bit has the value −(2N) (two’s complement);
— the sign bit has the value −(2N − 1) (one’s complement).
Nowadays, all processors use two's complement representation, but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization. See for instance this blog post by Ian Lance Taylor or this complaint by Agner Fog, and the answers to his bug report.
Aside from Pascal's good answer (which I'm sure is the main motivation), it is also possible that some processors cause an exception on signed integer overflow, which of course would cause problems if the compiler had to "arrange for another behaviour" (e.g. use extra instructions to check for potential overflow and calculate differently in that case).
It is also worth noting that "undefined behaviour" doesn't mean "doesn't work". It means that the implementation is allowed to do whatever it likes in that situation. This includes doing "the right thing" as well as "calling the police" or "crashing". Most compilers, when possible, will choose "do the right thing", assuming that is relatively easy to define (in this case, it is). However, if you are having overflows in the calculations, it is important to understand what that actually results in, and that the compiler MAY do something other than what you expect (and that this may very depending on compiler version, optimisation settings, etc).
First of all, please note that C11 3.4.3, like all examples and foot notes, is not normative text and therefore not relevant to cite!
The relevant text that states that overflow of integers and floats is undefined behavior is this:
C11 6.5/5
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
A clarification regarding the behavior of unsigned integer types specifically can be found here:
C11 6.2.5/9
The range of nonnegative values of a signed integer type is a subrange
of the corresponding unsigned integer type, and the representation of
the same value in each type is the same. A computation involving
unsigned operands can never overflow, because a result that cannot be
represented by the resulting unsigned integer type is reduced modulo
the number that is one greater than the largest value that can be
represented by the resulting type.
This makes unsigned integer types a special case.
Also note that there is an exception if any type is converted to a signed type and the old value can no longer be represented. The behavior is then merely implementation-defined, although a signal may be raised.
C11 6.3.1.3
6.3.1.3 Signed and unsigned integers
When a value with integer
type is converted to another integer type other than _Bool, if the
value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
Otherwise, the new type is signed and the value
cannot be represented in it; either the result is
implementation-defined or an implementation-defined signal is raised.
In addition to the other issues mentioned, having unsigned math wrap makes the unsigned integer types behave as abstract algebraic groups (meaning that, among other things, for any pair of values X and Y, there will exist some other value Z such that X+Z will, if properly cast, equal Y and Y-Z will, if properly cast, equal X). If unsigned values were merely storage-location types and not intermediate-expression types (e.g. if there were no unsigned equivalent of the largest integer type, and arithmetic operations on unsigned types behaved as though they were first converted them to larger signed types, then there wouldn't be as much need for defined wrapping behavior, but it's difficult to do calculations in a type which doesn't have e.g. an additive inverse.
This helps in situations where wrap-around behavior is actually useful - for example with TCP sequence numbers or certain algorithms, such as hash calculation. It may also help in situations where it's necessary to detect overflow, since performing calculations and checking whether they overflowed is often easier than checking in advance whether they would overflow, especially if the calculations involve the largest available integer type.
Perhaps another reason for why unsigned arithmetic is defined is because unsigned numbers form integers modulo 2^n, where n is the width of the unsigned number. Unsigned numbers are simply integers represented using binary digits instead of decimal digits. Performing the standard operations in a modulus system is well understood.
The OP's quote refers to this fact, but also highlights the fact that there is only one, unambiguous, logical way to represent unsigned integers in binary. By contrast, Signed numbers are most often represented using two's complement but other choices are possible as described in the standard (section 6.2.6.2).
Two's complement representation allows certain operations to make more sense in binary format. E.g., incrementing negative numbers is the same that for positive numbers (expect under overflow conditions). Some operations at the machine level can be the same for signed and unsigned numbers. However, when interpreting the result of those operations, some cases don't make sense - positive and negative overflow. Furthermore, the overflow results differ depending on the underlying signed representation.
The most technical reason of all, is simply that trying to capture overflow in an unsigned integer requires more moving parts from you (exception handling) and the processor (exception throwing).
C and C++ won't make you pay for that unless you ask for it by using a signed integer. This isn't a hard-fast rule, as you'll see near the end, but just how they proceed for unsigned integers. In my opinion, this makes signed integers the odd-one out, not unsigned, but it's fine they offer this fundamental difference as the programmer can still perform well-defined signed operations with overflow. But to do so, you must cast for it.
Because:
unsigned integers have well defined overflow and underflow
casts from signed -> unsigned int are well defined, [uint's name]_MAX - 1 is conceptually added to negative values, to map them to the extended positive number range
casts from unsigned -> signed int are well defined, [uint's name]_MAX - 1 is conceptually deducted from positive values beyond the signed type's max, to map them to negative numbers)
You can always perform arithmetic operations with well-defined overflow and underflow behavior, where signed integers are your starting point, albeit in a round-about way, by casting to unsigned integer first then back once finished.
int32_t x = 10;
int32_t y = -50;
// writes -60 into z, this is well defined
int32_t z = int32_t(uint32_t(y) - uint32_t(x));
Casts between signed and unsigned integer types of the same width are free, if the CPU is using 2's compliment (nearly all do). If for some reason the platform you're targeting doesn't use 2's Compliment for signed integers, you will pay a small conversion price when casting between uint32 and int32.
But be wary when using bit widths smaller than int
usually if you are relying on unsigned overflow, you are using a smaller word width, 8bit or 16bit. These will promote to signed int at the drop of a hat (C has absolutely insane implicit integer conversion rules, this is one of C's biggest hidden gotcha's), consider:
unsigned char a = 0;
unsigned char b = 1;
printf("%i", a - b); // outputs -1, not 255 as you'd expect
To avoid this, you should always cast to the type you want when you are relying on that type's width, even in the middle of an operation where you think it's unnecessary. This will cast the temporary and get you the signedness AND truncate the value so you get what you expected. It's almost always free to cast, and in fact, your compiler might thank you for doing so as it can then optimize on your intentions more aggressively.
unsigned char a = 0;
unsigned char b = 1;
printf("%i", (unsigned char)(a - b)); // cast turns -1 to 255, outputs 255