What does 'Natural Size' really mean in C++? - c++

I understand that the 'natural size' is the width of integer that is processed most efficiently by a particular hardware. When using short in an array or in arithmetic operations, the short integer must first be converted into int.
Q: What exactly determines this 'natural size'?
I am not looking for simple answers such as
If it has a 32-bit architecture, it's natural size is 32-bit
I want to understand why this is most efficient, and why a short must be converted before doing arithmetic operations on it.
Bonus Q: What happens when arithmetic operations are conducted on a long integer?

Generally speaking, each computer architecture is designed such that certain type sizes provide the most efficient numerical operations. The specific size then depends on the architecture, and the compiler will select an appropriate size. More detailed explanations as to why hardware designers selected certain sizes for perticular hardware would be out of scope for stckoverflow.
A short most be promoted to int before performing integral operations because that's the way it was in C, and C++ inherited that behavior with little or no reason to change it, possibly breaking existing code. I'm not sure the reason it was originally added in C, but one could speculate that it's related to "default int" where if no type were specified int was assumed by the compiler.
Bonus A: from 5/9 (expressions) we learn: Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
And then of interest specifically:
floating point rules that don't matter here
Otherwise, the integral promotions (4.5) shall be performed on both operands
Then, if either operand is unsigned long the other shall be converted to unsigned long.
Otherwise, if one operand is a long int and the other unsigned int, then if a long int can represent
all the values of an unsigned int, the unsigned int shall be converted to a long int;
otherwise both operands shall be converted to unsigned long int.
Otherwise, if either operand is long, the other shall be converted to long.
In summary the compiler tries to use the "best" type it can to do binary operations, with int being the smallest size used.

the 'natural size' is the width of integer that is processed most efficiently by a particular hardware.
Not really. Consider the x64 architecture. Arithmetic on any size from 8 to 64 bits will be essentially the same speed. So why have all x64 compilers settled on a 32-bit int? Well, because there was a lot of code out there which was originally written for 32-bit processors, and a lot of it implicitly relied on ints being 32-bits. And given the near-uselessness of a type which can represent values up to nine quintillion, the extra four bytes per integer would have been virtually unused. So we've decided that 32-bit ints are "natural" for this 64-bit platform.
Compare the 80286 architecture. Only 16 bits in a register. Performing 32-bit integer addition on such a platform basically requires splitting it into two 16-bit additions. Doing virtually anything with it involves splitting it up, really-- and an attendant slowdown. The 80286's "natural integer size" is most definitely not 32 bits.
So really, "natural" comes down to considerations like processing efficiency, memory usage, and programmer-friendliness. It is not an acid test. It is very much a matter of subjective judgment on the part of the architecture/compiler designer.

What exactly determines this 'natural size'?
For some processors (e.g. 32-bit ARM, and most DSP-style processors), it's determined by the architecture; the processor registers are a particular size, and arithmetic can only be done on values of that size.
Others (e.g. Intel x64) are more flexible, and there's no single "natural" size; it's up to the compiler designers to choose a size, a compromise between efficiency, range of values, and memory usage.
why this is most efficient
If the processor requires values to be a particular size for arithmetic, then choosing another size will force you to convert the values to the required size - probably for a cost.
why a short must be converted before doing arithmetic operations on it
Presumably, that was a good match for the behaviour of commonly-used processors when C was developed, half a century ago. C++ inherited the promotion rules from C. I can't really comment on exactly why it was deemed a good idea, since I wasn't born then.
What happens when arithmetic operations are conducted on a long integer?
If the processor registers are large enough to hold a long, then the arithmetic will be much the same as for int. Otherwise, the operations will have to be broken down into several operations on values split between multiple registers.

I understand that the 'natural size' is the width of integer that is processed most efficiently by a particular hardware.
That's an excellent start.
Q: What exactly determines this 'natural size'?
The paragraph above is the definition of "natural size". Nothing else determines it.
I want to understand why this is most efficient
By definition.
and why a short must be converted before doing arithmetic operations on it.
It is so because the C language definitions says so. There are no deep architectural reasons (there could have been some when C was invented).
Bonus Q: What happens when arithmetic operations are conducted on a long integer?
A bunch of electrons rushes through dirty sand and meets a bunch of holes. (No, really. Ask a vague question...)

Related

Are C++ arithmetic operations between different types of integers cost free?

I have a project that needs more speed and less memory consumption, so I learned about int_fast_t and started to use it.
In the meantime, I have the following question about it: Do C++ arithmetic operations between different types of integers require an implicit cast or the like, or are they cost-free?
Another way to ask, is int_fast8_t + int_fast32_t, for example, slower than int + int? Why?
The rules for arithmetic operations in C++ are rather nuts.
https://en.cppreference.com/w/c/language/conversion contains them.
But conversion up to int/unsigned int is logically done on every operation.
The result is then converted back on assignment.
This logical conversion does not have to happen in actual machine code, so 8 bit ints can be done in 8 bit hardware, so long as the results of converting to int then back to 8 bit is the same; and that can be faster. On other systems, that could be costly, and fast 8 int is 32 bits in size.
The performance is entirely implementation-dependent. In one implementation, converting int_fast8_t to int_fast32_t might involve overhead, and adding an int_fast8_t to an int_fast32_t might be slower than adding two ints in some circumstances.
In another implementation, however, it might be that each of int_fast8_t, int_fast32_t, and int are the same 32-bit signed integers. In this case, there would be no conversion, and the adding of values of any combination of these types would be equally fast.
Perhaps the biggest consideration to keep in mind is that the int_fastN_t types are not intrinsically faster than other integer types. They merely give the compiler the option to use more bits than requested if that could give a performance boost. Perhaps accessing a single byte from memory takes an extra instruction, so a 16-bit value can be processed faster than an 8-bit value can. If that's the case, int_fast8_t might be a typedef of short. You are not getting better performance than a short; the only promise being made is that using int_fast8_t is at least as fast as using unsigned char.
Another example: On today's computers, int_fast64_t is likely a typedef (or equivalent) of long long. Same type, no performance difference. Use the int_fastN_t types if you are more concerned with speed than size, but don't assume you will actually see a speed boost – you might already be at optimal performance.

Why isn't there an endianness modifier in C++ like there is for signedness?

(I guess this question could apply to many typed languages, but I chose to use C++ as an example.)
Why is there no way to just write:
struct foo {
little int x; // little-endian
big long int y; // big-endian
short z; // native endianness
};
to specify the endianness for specific members, variables and parameters?
Comparison to signedness
I understand that the type of a variable not only determines how many bytes are used to store a value but also how those bytes are interpreted when performing computations.
For example, these two declarations each allocate one byte, and for both bytes, every possible 8-bit sequence is a valid value:
signed char s;
unsigned char u;
but the same binary sequence might be interpreted differently, e.g. 11111111 would mean -1 when assigned to s but 255 when assigned to u. When signed and unsigned variables are involved in the same computation, the compiler (mostly) takes care of proper conversions.
In my understanding, endianness is just a variation of the same principle: a different interpretation of a binary pattern based on compile-time information about the memory in which it will be stored.
It seems obvious to have that feature in a typed language that allows low-level programming. However, this is not a part of C, C++ or any other language I know, and I did not find any discussion about this online.
Update
I'll try to summarize some takeaways from the many comments that I got in the first hour after asking:
signedness is strictly binary (either signed or unsigned) and will always be, in contrast to endianness, which also has two well-known variants (big and little), but also lesser-known variants such as mixed/middle endian. New variants might be invented in the future.
endianness matters when accessing multiple-byte values byte-wise. There are many aspects beyond just endianness that affect the memory layout of multi-byte structures, so this kind of access is mostly discouraged.
C++ aims to target an abstract machine and minimize the number of assumptions about the implementation. This abstract machine does not have any endianness.
Also, now I realize that signedness and endianness are not a perfect analogy, because:
endianness only defines how something is represented as a binary sequence, but now what can be represented. Both big int and little int would have the exact same value range.
signedness defines how bits and actual values map to each other, but also affects what can be represented, e.g. -3 can't be represented by an unsigned char and (assuming that char has 8 bits) 130 can't be represented by a signed char.
So that changing the endianness of some variables would never change the behavior of the program (except for byte-wise access), whereas a change of signedness usually would.
What the standard says
[intro.abstract]/1:
The semantic descriptions in this document define a parameterized nondeterministic abstract machine.
This document places no requirement on the structure of conforming implementations.
In particular, they need not copy or emulate the structure of the abstract machine.
Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
C++ could not define an endianness qualifier since it has no concept of endianness.
Discussion
About the difference between signness and endianness, OP wrote
In my understanding, endianness is just a variation of the same principle [(signness)]: a different interpretation of a binary pattern based on compile-time information about the memory in which it will be stored.
I'd argue signness both have a semantic and a representative aspect1. What [intro.abstract]/1 implies is that C++ only care about semantic, and never addresses the way a signed number should be represented in memory2. Actually, "sign bit" only appears once in the C++ specs and refer to an implementation-defined value.
On the other hand, endianness only have a representative aspect: endianness conveys no meaning.
With C++20, std::endian appears. It is still implementation-defined, but let us test the endian of the host without depending on old tricks based on undefined behaviour.
1) Semantic aspect: an signed integer can represent values below zero; representative aspect: one need to, for example, reserve a bit to convey the positive/negative sign.
2) In the same vein, C++ never describe how a floating point number should be represented, IEEE-754 is often used, but this is a choice made by the implementation, in any case enforced by the standard: [basic.fundamental]/8 "The value representation of floating-point types is implementation-defined".
In addition to YSC's answer, let's take your sample code, and consider what it might aim to achieve
struct foo {
little int x; // little-endian
big long int y; // big-endian
short z; // native endianness
};
You might hope that this would exactly specify layout for architecture-independent data interchange (file, network, whatever)
But this can't possibly work, because several things are still unspecified:
data type size: you'd have to use little int32_t, big int64_t and int16_t respectively, if that's what you want
padding and alignment, which cannot be controlled strictly within the language: use #pragma or __attribute__((packed)) or some other compiler-specific extension
actual format (1s- or 2s-complement signedness, floating-point type layout, trap representations)
Alternatively, you might simply want to reflect the endianness of some specified hardware - but big and little don't cover all the possibilities here (just the two most common).
So, the proposal is incomplete (it doesn't distinguish all reasonable byte-ordering arrangements), ineffective (it doesn't achieve what it sets out to), and has additional drawbacks:
Performance
Changing the endianness of a variable from the native byte ordering should either disable arithmetic, comparisons etc (since the hardware cannot correctly perform them on this type), or must silently inject more code, creating natively-ordered temporaries to work on.
The argument here isn't that manually converting to/from native byte order is faster, it's that controlling it explicitly makes it easier to minimise the number of unnecessary conversions, and much easier to reason about how code will behave, than if the conversions are implicit.
Complexity
Everything overloaded or specialized for integer types now needs twice as many versions, to cope with the rare event that it gets passed a non-native-endianness value. Even if that's just a forwarding wrapper (with a couple of casts to translate to/from native ordering), it's still a lot of code for no discernible benefit.
The final argument against changing the language to support this is that you can easily do it in code. Changing the language syntax is a big deal, and doesn't offer any obvious benefit over something like a type wrapper:
// store T with reversed byte order
template <typename T>
class Reversed {
T val_;
static T reverse(T); // platform-specific implementation
public:
explicit Reversed(T t) : val_(reverse(t)) {}
Reversed(Reversed const &other) : val_(other.val_) {}
// assignment, move, arithmetic, comparison etc. etc.
operator T () const { return reverse(val_); }
};
Integers (as a mathematical concept) have the concept of positive and negative numbers. This abstract concept of sign has a number of different implementations in hardware.
Endianness is not a mathematical concept. Little-endian is a hardware implementation trick to improve the performance of multi-byte twos-complement integer arithmetic on a microprocessor with 16 or 32 bit registers and an 8-bit memory bus. Its creation required using the term big-endian to describe everything else that had the same byte-order in registers and in memory.
The C abstract machine includes the concept of signed and unsigned integers, without details -- without requiring twos-complement arithmetic, 8-bit bytes or how to store a binary number in memory.
PS: I agree that binary data compatibility on the net or in memory/storage is a PIA.
That's a good question and I have often thought something like this would be useful. However you need to remember that C aims for platform independence and endianness is only important when a structure like this is converted into some underlying memory layout. This conversion can happen when you cast a uint8_t buffer into an int for example. While an endianness modifier looks neat the programmer still needs to consider other platform differences such as int sizes and structure alignment and packing.
For defensive programming when you want find grain control over how some variables or structures are represented in a memory buffer then it is best to code explicit conversion functions and then let the compiler optimiser generate the most efficient code for each supported platform.
Endianness is not inherently a part of a data type but rather of its storage layout.
As such, it would not be really akin to signed/unsigned but rather more like bit field widths in structs. Similar to those, they could be used for defining binary APIs.
So you'd have something like
int ip : big 32;
which would define both storage layout and integer size, leaving it to the compiler to do the best job of matching use of the field to its access. It's not obvious to me what the allowed declarations should be.
Short Answer: if it should not be possible to use objects in arithmetic expressions (with no overloaded operators) involving ints, then these objects should not be integer types. And there is no point in allowing addition and multiplication of big-endian and little-endian ints in the same expression.
Longer Answer:
As someone mentioned, endianness is processor-specific. Which really means that this is how numbers are represented when they are used as numbers in the machine language (as addresses and as operands/results of arithmetic operations).
The same is "sort of" true of signage. But not to the same degree. Conversion from language-semantic signage to processor-accepted signage is something that needs to be done to use numbers as numbers. Conversion from big-endian to little-endian and reverse is something that needs to be done to use numbers as data (send them over the network or represent metadata about data sent over the network such as payload lengths).
Having said that, this decision appears to be mostly driven by use cases. The flip side is that there is a good pragmatic reason to ignore certain use cases. The pragmatism arises out of the fact that endianness conversion is more expensive than most arithmetic operations.
If a language had semantics for keeping numbers as little-endian, it would allow developers to shoot themselves in the foot by forcing little-endianness of numbers in a program which does a lot of arithmetic. If developed on a little-endian machine, this enforcing of endianness would be a no-op. But when ported to a big-endian machine, there would a lot of unexpected slowdowns. And if the variables in question were used both for arithmetic and as network data, it would make the code completely non-portable.
Not having these endian semantics or forcing them to be explicitly compiler-specific forces the developers to go through the mental step of thinking of the numbers as being "read" or "written" to/from the network format. This would make the code which converts back and forth between network and host byte order, in the middle of arithmetic operations, cumbersome and less likely to be the preferred way of writing by a lazy developer.
And since development is a human endeavor, making bad choices uncomfortable is a Good Thing(TM).
Edit: here's an example of how this can go badly:
Assume that little_endian_int32 and big_endian_int32 types are introduced. Then little_endian_int32(7) % big_endian_int32(5) is a constant expression. What is its result? Do the numbers get implicitly converted to the native format? If not, what is the type of the result? Worse yet, what is the value of the result (which in this case should probably be the same on every machine)?
Again, if multi-byte numbers are used as plain data, then char arrays are just as good. Even if they are "ports" (which are really lookup values into tables or their hashes), they are just sequences of bytes rather than integer types (on which one can do arithmetic).
Now if you limit the allowed arithmetic operations on explicitly-endian numbers to only those operations allowed for pointer types, then you might have a better case for predictability. Then myPort + 5 actually makes sense even if myPort is declared as something like little_endian_int16 on a big endian machine. Same for lastPortInRange - firstPortInRange + 1. If the arithmetic works as it does for pointer types, then this would do what you'd expect, but firstPort * 10000 would be illegal.
Then, of course, you get into the argument of whether the feature bloat is justified by any possible benefit.
From a pragmatic programmer perspective searching Stack Overflow, it's worth noting that the spirit of this question can be answered with a utility library. Boost has such a library:
http://www.boost.org/doc/libs/1_65_1/libs/endian/doc/index.html
The feature of the library most like the language feature under discussion is a set of arithmetic types such as big_int16_t.
Because nobody has proposed to add it to the standard, and/or because compiler implementer have never felt a need for it.
Maybe you could propose it to the committee. I do not think it is difficult to implement it in a compiler: compilers already propose fundamental types that are not fundamental types for the target machine.
The development of C++ is an affair of all C++ coders.
#Schimmel. Do not listen to people who justify the status quo! All the cited arguments to justify this absence are more than fragile. A student logician could find their inconsistence without knowing anything about computer science. Just propose it, and just don't care about pathological conservatives. (Advise: propose new types rather than a qualifier because the unsigned and signed keywords are considered mistakes).
Endianness is compiler specific as a result of being machine specific, not as a support mechanism for platform independence. The standard -- is an abstraction that has no regard for imposing rules that make things "easy" -- its task is to create similarity between compilers that allows the programmer to create "platform independence" for their code -- if they choose to do so.
Initially, there was a lot of competition between platforms for market share and also -- compilers were most often written as proprietary tools by microprocessor manufacturers and to support operating systems on specific hardware platforms. Intel was likely not very concerned about writing compilers that supported Motorola microprocessors.
C was -- after all -- invented by Bell Labs to rewrite Unix.

Why are signed negative numbers converted to unsigned numbers?

Particularly i have a large unsigned number and a large signed negative number.
Why can't I divide a large number by a negative number C++
When I try to divide the two, I get the result zero simply because something somewhere is converting my signed number to unsigned (meaning that it has to be positive). >.>;
Edit.
Just to be absolutely clear. I know that it is resolved to zero and I'm happy with that. I'm also happy with the previous reason that it is resolved to zero because a number is converted away from being a signed number to an unsigned number.
However that doesn't answer why the number is converted from a signed number to an unsigned number in the first place. There doesn't seem to be any logical reason for doing so.
The operands of the arithmetic operators need to have the same types. That makes the language easier to define and easier to implement, in part because there are fewer cases to handle, and in part because hardware doesn't generally support mixed-type operations, either. Formally, C++ is defined in terms of a virtual machine, but the capabilities expected of that machine are influenced by those of real-world hardware. Operations dictated by the standard but not supported by hardware (such as your hypothetical signed-unsigned division) would need to be implemented by compiler-writers in library code, and that would affect performance.
Either the signed operand needs to be converted to unsigned, or vice versa. The standard committee picked the former.

Why prefer signed over unsigned in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'd like to understand better why choose int over unsigned?
Personally, I've never liked signed values unless there is a valid reason for them. e.g. count of items in an array, or length of a string, or size of memory block, etc., so often these things cannot possibly be negative. Such a value has no possible meaning. Why prefer int when it is misleading in all such cases?
I ask this because both Bjarne Stroustrup and Chandler Carruth gave the advice to prefer int over unsigned here (approx 12:30').
I can see the argument for using int over short or long - int is the "most natural" data width for the target machine architecture.
But signed over unsigned has always annoyed me. Are signed values genuinely faster on typical modern CPU architectures? What makes them better?
As per requests in comments: I prefer int instead of unsigned because...
it's shorter (I'm serious!)
it's more generic and more intuitive (i. e. I like to be able to assume that 1 - 2 is -1 and not some obscure huge number)
what if I want to signal an error by returning an out-of-range value?
Of course there are counter-arguments, but these are the principal reasons I like to declare my integers as int instead of unsigned. Of course, this is not always true, in other cases, an unsigned is just a better tool for a task, I am just answering the "why would anyone prefer defaulting to signed" question specifically.
Let me paraphrase the video, as the experts said it succinctly.
Andrei Alexandrescu:
No simple guideline.
In systems programming, we need integers of different sizes and signedness.
Many conversions and arcane rules govern arithmetic (like for auto), so we need to be careful.
Chandler Carruth:
Here's some simple guidelines:
Use signed integers unless you need two's complement arithmetic or a bit pattern
Use the smallest integer that will suffice.
Otherwise, use int if you think you could count the items, and a 64-bit integer if it's even more than you would want to count.
Stop worrying and use tools to tell you when you need a different type or size.
Bjarne Stroustrup:
Use int until you have a reason not to.
Use unsigned only for bit patterns.
Never mix signed and unsigned
Wariness about signedness rules aside, my one-sentence take away from the experts:
Use the appropriate type, and when you don't know, use an int until you do know.
Several reasons:
Arithmetic on unsigned always yields unsigned, which can be a problem when subtracting integer quantities that can reasonably result in a negative result — think subtracting money quantities to yield balance, or array indices to yield distance between elements. If the operands are unsigned, you get a perfectly defined, but almost certainly meaningless result, and a result < 0 comparison will always be false (of which modern compilers will fortunately warn you).
unsigned has the nasty property of contaminating the arithmetic where it gets mixed with signed integers. So, if you add a signed and unsigned and ask whether the result is greater than zero, you can get bitten, especially when the unsigned integral type is hidden behind a typedef.
There are no reasons to prefer signed over unsigned, aside from purely sociological ones, i.e. some people believe that average programmers are not competent and/or attentive enough to write proper code in terms of unsigned types. This is often the main reasoning used by various "speakers", regardless of how respected those speakers might be.
In reality, competent programmers quickly develop and/or learn the basic set of programming idioms and skills that allow them to write proper code in terms of unsigned integral types.
Note also that the fundamental differences between signed and unsigned semantics are always present (in superficially different form) in other parts of C and C++ language, like pointer arithmetic and iterator arithmetic. Which means that in general case the programmer does not really have the option of avoiding dealing with issues specific to unsigned semantics and the "problems" it brings with it. I.e. whether you want it or not, you have to learn to work with ranges that terminate abruptly at their left end and terminate right here (not somewhere in the distance), even if you adamantly avoid unsigned integers.
Also, as you probably know, many parts of standard library already rely on unsigned integer types quite heavily. Forcing signed arithmetic into the mix, instead of learning to work with unsigned one, will only result in disastrously bad code.
The only real reason to prefer signed in some contexts that comes to mind is that in mixed integer/floating-point code signed integer formats are typically directly supported by FPU instruction set, while unsigned formats are not supported at all, making the compiler to generate extra code for conversions between floating-point values and unsigned values. In such code signed types might perform better.
But at the same time in purely integer code unsigned types might perform better than signed types. For example, integer division often requires additional corrective code in order to satisfy the requirements of the language spec. The correction is only necessary in case of negative operands, so it wastes CPU cycles in situations when negative operands are not really used.
In my practice I devotedly stick to unsigned wherever I can, and use signed only if I really have to.
The integral types in C and many languages which derive from it have two general usage cases: to represent numbers, or represent members of an abstract algebraic ring. For those unfamiliar with abstract algebra, the primary notion behind a ring is that adding, subtracting, or multiplying two items of a ring should yield another item of that ring--it shouldn't crash or yield a value outside the ring. On a 32-bit machine, adding unsigned 0x12345678 to unsigned 0xFFFFFFFF doesn't "overflow"--it simply yields the result 0x12345677 which is defined for the ring of integers congruent mod 2^32 (because the arithmetic result of adding 0x12345678 to 0xFFFFFFFF, i.e. 0x112345677, is congruent to 0x12345677 mod 2^32).
Conceptually, both purposes (representing numbers, or representing members of the ring of integers congruent mod 2^n) may be served by both signed and unsigned types, and many operations are the same for both usage cases, but there are some differences. Among other things, an attempt to add two numbers should not be expected to yield anything other than the correct arithmetic sum. While it's debatable whether a language should be required to generate the code necessary to guarantee that it won't (e.g. that an exception would be thrown instead), one could argue that for code which uses integral types to represent numbers such behavior would be preferable to yielding an arithmetically-incorrect value and compilers shouldn't be forbidden from behaving that way.
The implementers of the C standards decided to use signed integer types to represent numbers and unsigned types to represent members of the algebraic ring of integers congruent mod 2^n. By contrast, Java uses signed integers to represent members of such rings (though they're interpreted differently in some contexts; conversions among differently-sized signed types, for example, behave differently from among unsigned ones) and Java has neither unsigned integers nor any primitive integral types which behave as numbers in all non-exceptional cases.
If a language provided a choice of signed and unsigned representations for both numbers and algebraic-ring numbers, it might make sense to use unsigned numbers to represent quantities that will always be positive. If, however, the only unsigned types represent members of an algebraic ring, and the only types that represent numbers are the signed ones, then even if a value will always be positive it should be represented using a type designed to represent numbers.
Incidentally, the reason that (uint32_t)-1 is 0xFFFFFFFF stems from the fact that casting a signed value to unsigned is equivalent to adding unsigned zero, and adding an integer to an unsigned value is defined as adding or subtracting its magnitude to/from the unsigned value according to the rules of the algebraic ring which specify that if X=Y-Z, then X is the one and only member of that ring such X+Z=Y. In unsigned math, 0xFFFFFFFF is the only number which, when added to unsigned 1, yields unsigned zero.
Speed is the same on modern architectures. The problem with unsigned int is that it can sometimes generate unexpected behavior. This can create bugs that wouldn't show up otherwise.
Normally when you subtract 1 from a value, the value gets smaller. Now, with both signed and unsigned int variables, there will be a time that subtracting 1 creates a value that is MUCH LARGER. The key difference between unsigned int and int is that with unsigned int the value that generates the paradoxical result is a commonly used value --- 0 --- whereas with signed the number is safely far away from normal operations.
As far as returning -1 for an error value --- modern thinking is that it's better to throw an exception than to test for return values.
It's true that if you properly defend your code you won't have this problem, and if you use unsigned religiously everywhere you will be okay (provided that you are only adding, and never subtracting, and that you never get near MAX_INT). I use unsigned int everywhere. But it takes a lot of discipline. For a lot of programs, you can get by with using int and spend your time on other bugs.
Use int by default: it plays nicer with the rest of the language
most common domain usage is regular arithmetic, not modular arithmetic
int main() {} // see an unsigned?
auto i = 0; // i is of type int
Only use unsigned for modulo arithmetic and bit-twiddling (in particular shifting)
has different semantics than regular arithmetic, make sure it is what you want
bit-shifting signed types is subtle (see comments by #ChristianRau)
if you need a > 2Gb vector on a 32-bit machine, upgrade your OS / hardware
Never mix signed and unsigned arithmetic
the rules for that are complicated and surprising (either one can be converted to the other, depending on the relative type sizes)
turn on -Wconversion -Wsign-conversion -Wsign-promo (gcc is better than Clang here)
the Standard Library got it wrong with std::size_t (quote from the GN13 video)
use range-for if you can,
for(auto i = 0; i < static_cast<int>(v.size()); ++i) if you must
Don't use short or large types unless you actually need them
current architectures data flow caters well to 32-bit non-pointer data (but note the comment by #BenVoigt about cache effects for smaller types)
char and short save space but suffer from integral promotions
are you really going to count to over all int64_t?
To answer the actual question: For the vast number of things, it doesn't really matter. int can be a little easier to deal with things like subtraction with the second operand larger than the first and you still get a "expected" result.
There is absolutely no speed difference in 99.9% of cases, because the ONLY instructions that are different for signed and unsigned numbers are:
Making the number longer (fill with the sign for signed or zero for unsigned) - it takes the same effort to do both.
Comparisons - a signed number, the processor has to take into account if either number is negative or not. But again, it's the same speed to make a compare with signed or unsigned numbers - it's just using a different instruction code to say "numbers that have the highest bit set are smaller than numbers with the highest bit not set" (essentially). [Pedantically, it's nearly always the operation using the RESULT of a comparison that is different - the most common case being a conditional jump or branch instruction - but either way, it's the same effort, just that the inputs are taken to mean slightly different things].
Multiply and divide. Obviously, sign conversion of the result needs to happen if it's a signed multiplication, where a unsigned should not change the sign of the result if the highest bit of one of the inputs is set. And again, the effort is (as near as we care for) identical.
(I think there are one or two other cases, but the result is the same - it really doesn't matter if it's signed or unsigned, the effort to perform the operation is the same for both).
The int type more closely resembles the behavior of mathematical integers than the unsigned type.
It is naive to prefer the unsigned type simply because a situation does not require negative values to be represented.
The problem is that the unsigned type has a discontinuous behavior right next to zero. Any operation that tries to compute a small negative value, instead produces some large positive value. (Worse: one that is implementation-defined.)
Algebraic relationships such as that a < b implies that a - b < 0 are wrecked in the unsigned domain, even for small values like a = 3 and b = 4.
A descending loop like for (i = max - 1; i >= 0; i--) fails to terminate if i is made unsigned.
Unsigned quirks can cause a problem which will affect code regardless of whether that code expects to be representing only positive quantities.
The virtue of the unsigned types is that certain operations that are not portably defined at the bit level for the signed types are that way for the unsigned types. The unsigned types lack a sign bit, and so shifting and masking through the sign bit isn't a problem. The unsigned types are good for bitmasks, and for code that implements precise arithmetic in a platform-independent way. Unsigned opearations will simulate two's complement semantics even on a non two's complement machine. Writing a multi-precision (bignum) library practically requires arrays of unsigned types to be used for the representation, rather than signed types.
The unsigned types are also suitable in situations in which numbers behave like identifiers and not as arithmetic types. For instance, an IPv4 address can be represented in a 32 bit unsigned type. You wouldn't add together IPv4 addresses.
int is preferred because it's most commonly used. unsigned is usually associated with bit operations. Whenever I see an unsigned, I assume it's used for bit twiddling.
If you need a bigger range, use a 64-bit integer.
If you're iterating over stuff using indexes, types usually have size_type, and you shouldn't care whether it's signed or unsigned.
Speed is not an issue.
For me, in addition to all the integers in the range of 0..+2,147,483,647 contained within the set of signed and unsigned integers on 32 bit architectures, there is a higher probability that I will need to use -1 (or smaller) than need to use +2,147,483,648 (or larger).
One good reason that I can think of is in case of detecting overflow.
For the use cases such as the count of items in an array, length of a string, or size of memory block, you can overflow an unsigned int and you may not notice a difference even when you take a look at the variable. If it is an signed int, the variable will be less than zero and clearly wrong.
You can simply check to see if the variable is zero when you want to use it. This way, you do not have to check for overflow after every arithmetic operation as is the case for unsigned ints.
It gives unexpected result when doing simple arithmetic operation:
unsigned int i;
i = 1 - 2;
//i is now 4294967295 on a 64bit machine
It gives unexpected result when doing simple comparison:
unsigned int j = 1;
std::cout << (j>-1) << std::endl;
//output 0 as false but 1 is greater than -1
This is because when doing the operations above, the signed ints are converted to unsigned, and it overflows and goes to a really big number.

Meaning of data type names in C/C++

In C and C++, the size of the build-in integer types is implementation dependent. But is there any predefined intended meaning of them such that int should represent the machine word size etc?
Historically, int was supposed to mean the most "natural" type for an
integer on the machine hardware; obviously, "natural" is somewhat
subjective, but in the past, it was usually pretty obvious, and there
weren't that many integral types available anyway, so makeing int the
same size as a long or a short was the normal course of things.
For various reasons, most 64 bit platforms make int 32 bits. One
could easily argue that this isn't the most "natural" length, but there
was a desire that 32 bit integers be the default, and int is clearly
the default integral type. Whether it is the most natural for the
architecture or not becomes secondary to whether it is the size wanted
as a default.
With regards to word size: historically, this was the most natural, but
in many ways, it's not clear what is meant by "word size" on a modern
machine: the largest size you can do arithmetic on? the size of bus
transfers to and from memory? etc. Traditionally, "word size" has been
used to mean both the width of internal registers (when the machine had
them), or the size of a basic bus transfer. (The 8088 was usually
referred to as an 8 bit machine, although it had 32 bit registers.) I
wouldn't put too much meaning in it today.
There's some wording on that, but it's not very rigid:
Objects declared as characters (char) shall be large enough to store any member of the implementation’s basic character set.
There are five standard signed integer types : “signed char”, “short int”, “int”, “long int”, and “long long int”. In this list, each type provides at least as much storage as those preceding it in the list. (...) Plain ints have the natural size suggested by the architecture of the execution environment, the other signed integer types are provided to meet special needs.
No strict recommendations about float sizes either:
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. (...) The value representation of floating-point types is implementation-defined.
C, unlike Java, was designed as a platform enabler and not a stand-alone platform. Cross platform compatibility took a much lower priority than working with data-type sizes that worked optimally for the given platform. Integer types are therefore not specified by the C standard and are totally platform specific