Should I use int or unsigned int when working with STL container? - c++

Referring to this guide:
https://google.github.io/styleguide/cppguide.html#Integer_Types
Google suggests to use int in the most of time.
I try to follow this guide and the only problem is with STL containers.
Example 1.
void setElement(int index, int value)
{
if (index > someExternalVector.size()) return;
...
}
Comparing index and .size() is generating a warning.
Example 2.
for (int i = 0; i < someExternalVector.size(); ++i)
{
...
}
Same warning between i and .size().
If I declare index or i as unsigned int, the warning is off, but the type declaration will propagate, then I have to declare more variables as unsigned int, then it contradicts the guide and loses consistency.
The best way I can think is to use a cast like:
if (index > static_cast<int>(someExternalVector.size())
or
for (int i = 0; i < static_cast<int>(someExternalVector.size()); ++i)
But I really don't like the casts.
Any suggestion?
Some detailed thoughts below:
To advantage to use only signed integer is like: I can avoid signed/unsigned warnings, castings, and be sure every value can be negative(to be consistent), so -1 could be used to represent invalid values.
There are many cases that the usage of loop counters are mixed with some other constants or struct members. So it would be problematic if signed/unsigned is not consistent. There will be full of warnings and castings.

Unsigned types have three characteristics, one of which is qualitatively 'good' and one of which is qualitatively 'bad':
They can hold twice as many values as the same-sized signed type (good)
The size_t version (that is, 32-bit on a 32-bit machine, 64-bit on a 64-bit machine, etc) is useful for representing memory (addresses, sizes, etc) (neutral)
They wrap below 0, so subtracting 1 in a loop or using -1 to represent an invalid index can cause bugs (bad.) Signed types wrap too.
The STL uses unsigned types because of the first two points above: in order to not limit the potential size of array-like classes such as vector and deque (although you have to question how often you would want 4294967296 elements in a data structure); because a negative value will never be a valid index into most data structures; and because size_t is the correct type to use for representing anything to do with memory, such as the size of a struct, and related things such as the length of a string (see below.) That's not necessarily a good reason to use it for indexes or other non-memory purposes such as a loop variable. The reason it's best practice to do so in C++ is kind of a reverse construction, because it's what's used in the containers as well as other methods, and once used the rest of the code has to match to avoid the same problem you are encountering.
You should use a signed type when the value can become negative.
You should use an unsigned type when the value cannot become negative (possibly different to 'should not'.)
You should use size_t when handling memory sizes (the result of sizeof, often things like string lengths, etc.) It is often chosen as a default unsigned type to use, because it matches the platform the code is compiled for. For example, the length of a string is size_t because a string can only ever have 0 or more elements, and there is no reason to limit a string's length method arbitrarily shorter than what can be represented on the platform, such as a 16-bit length (0-65535) on a 32-bit platform. Note (thanks commenter Morwen) std::intptr_t or std::uintptr_t which are conceptually similar - will always be the right size for your platform - and should be used for memory addresses if you want something that's not a pointer. Note 2 (thanks commenter rubenvb) that a string can only hold size_t-1 elements due to the value of npos. Details below.
This means that if you use -1 to represent an invalid value, you should use signed integers. If you use a loop to iterate backwards over your data, you should consider using a signed integer if you are not certain that the loop construct is correct (and as noted in one of the other answers, they are easy to get wrong.) IMO, you should not resort to tricks to ensure the code works - if code requires tricks, that's often a danger signal. In addition, it will be harder to understand for those following you and reading your code. Both these are reasons not to follow #Jasmin Gray's answer above.
Iterators
However, using integer-based loops to iterate over the contents of a data structure is the wrong way to do it in C++, so in a sense the argument over signed vs unsigned for loops is moot. You should use an iterator instead:
std::vector<foo> bar;
for (std::vector<foo>::const_iterator it = bar.begin(); it != bar.end(); ++it) {
// Access using *it or it->, e.g.:
const foo & a = *it;
When you do this, you don't need to worry about casts, signedness, etc.
Iterators can be forward (as above) or reverse, for iterating backwards. Use the same syntax of it != bar.end(), because end() signals the end of the iteration, not the end of the underlying conceptual array, tree, or other structure.
In other words, the answer to your question 'Should I use int or unsigned int when working with STL containers?' is 'Neither. Use iterators instead.' Read more about:
Why use iterators instead of array indices in C++?
Why again (some more interesting points in the answers to this question)
Iterators in general - the different kinds, how to use them, etc.
What's left?
If you don't use an integer type for loops, what's left? Your own values, which are dependent on your data, but which in your case include using -1 for an invalid value. This is simple. Use signed. Just be consistent.
I am a big believer in using natural types, such as enums, and signed integers fit into this. They match our conceptual expectation more closely. When your mind and the code are aligned, you are less likely to write buggy code and more likely to expressively write correct, clean code.

Use the type that the container returns. In this case, size_t - which is an integer type that is unsigned.
(To be technical, it's std::vector<MyType>::size_type, but that's usually defined to size_t, so you're safe using size_t. unsigned is also fine)
But in general, use the right tool for the right job. Is the 'index' ever supposed to be negative? If not, don't make it signed.
By the by, you don't have to type out 'unsigned int'. 'unsigned' is shorthand for the same variable type:
int myVar1;
unsigned myVar2;
The page linked to in the original question said:
Some people, including some textbook authors, recommend using unsigned
types to represent numbers that are never negative. This is intended
as a form of self-documentation. However, in C, the advantages of such
documentation are outweighed by the real bugs it can introduce.
It's not just self-documentation, it's use the right tool for the right job. Saying that 'unsigned variables can cause bugs so don't use unsigned variables' is silly. Signed variables can also cause bugs. So can floats (more than integers). The only guaranteed bug-free code is code that doesn't exist.
Their example of why unsigned is evil, is this loop:
for (unsigned int i = foo.Length()-1; i >= 0; --i)
I have difficulty iterating backwards over a loop, and I usually make mistakes (with signed or unsigned integers) with it. Do I subtract one from size? Do I make it greater-than-AND-equal-to 0, or just greater than? It's a sloppy situation to begin with.
So what do you do with code that you know you have problems with? You change your coding style to fix the problem, make it simpler, and make it easier to read, and make it easier to remember. There is a bug in the loop they posted. The bug is, they wanted to allow a value below zero, but they chose to make it unsigned. It's their mistake.
But here's a simple trick that makes it easier to read, remember, write, and run. With unsigned variables. Here's the intelligent thing to do (obviously, this is my opinion).
for(unsigned i = myContainer.size(); i--> 0; )
{
std::cout << myContainer[i] << std::endl;
}
It's unsigned. It always works. No negative to the starting size. No worrying about underflows. It just works. It's just smart. Do it right, don't stop using unsigned variables because someone somewhere once said they had a mistake with a for() loop and failed to train themselves to not make the mistake.
The trick to remembering it:
Set 'i' to the size. (don't worry about subtracting one)
Make 'i' point to 0 like an arrow. i --> 0 (it's a combination of post-decrementing (i--) and greater-than comparison (i > 0))
It's better to teach yourself tricks to code right, then to throw away tools because you don't code right.
Which would you want to see in your code?
for(unsigned i = myContainer.size()-1; i >= 0; --i)
Or:
for(unsigned i = myContainer.size(); i--> 0; )
Not because it's less characters to type (that'd be silly), but because it's less mental clutter. It's simpler to mentally parse when skimming through code, and easier to spot mistakes.
Try the code yourself

Related

Why is std::ssize being forced to a minimum size for its signed size type?

In C++20, std::ssize is being introduced to obtain the signed size of a container for generic code. (And the reason for its addition is explained here.)
Somewhat peculiarly, the definition given there (combining with common_type and ptrdiff_t) has the effect of forcing the return value to be "either ptrdiff_t or the signed form of the container's size() return value, whichever is larger".
P1227R1 indirectly offers a justification for this ("it would be a disaster for std::ssize() to turn a size of 60,000 into a size of -5,536").
This seems to me like an odd way to try to "fix" that, however.
Containers which intentionally define a uint16_t size and are known to never exceed 32,767 elements will still be forced to use a larger type than required.
The same thing would occur for containers using a uint8_t size and 127 elements, respectively.
In desktop environments, you probably don't care; but this might be important for embedded or otherwise resource-constrained environments, especially if the resulting type is used for something more persistent than a stack variable.
Containers which use the default size_t size on 32-bit platforms but which nevertheless do contain between 2B and 4B items will hit exactly the same problem as above.
If there still exist platforms for which ptrdiff_t is smaller than 32 bits, they will hit the same problem as well.
Wouldn't it be better to just use the signed type as-is (without extending its size) and to assert that a conversion error has not occurred (eg. that the result is not negative)?
Am I missing something?
To expand on that last suggestion a bit (inspired by Nicol Bolas' answer): if it were implemented the way that I suggested, then this code would Just Work™:
void DoSomething(int16_t i, T const& item);
for (int16_t i = 0, len = std::ssize(rng); i < len; ++i)
{
DoSomething(i, rng[i]);
}
With the current implementation, however, this produces warnings and/or errors unless static_casts are explicitly added to narrow the result of ssize, or to use int i instead and then narrow it in the function call (and the range indexing), neither of which seem like an improvement.
Containers which intentionally define a uint16_t size and are known to never exceed 32,767 elements will still be forced to use a larger type than required.
It's not like the container is storing the size as this type. The conversion happens via accessing the value.
As for embedded systems, embedded systems programmers already know about C++'s propensity to increase the size of small types. So if they expect a type to be an int16_t, they're going to spell that out in the code, because otherwise C++ might just promote it to an int.
Furthermore, there is no standard way to ask about what size a range is "known to never exceed". decltype(size(range)) is something you can ask for; sized ranges are not required to provide a max_size function. Without such an ability, the safest assumption is that a range whose size type is uint16_t can assume any size within that range. So the signed size should be big enough to store that entire range as a signed value.
Your suggestion is basically that any ssize call is potentially unsafe, since half of any size range cannot be validly stored in the return type of ssize.
Containers which use the default size_t size on 32-bit platforms but which nevertheless do contain between 2B and 4B items will hit exactly the same problem as above.
Assuming that it is valid for ptrdiff_t to not be a signed 64-bit integer on such platforms, there isn't really a valid solution to that problem. So yes, there will be cases where ssize is potentially unsafe.
ssize currently is potentially unsafe in cases where it is not possible to be safe. Your proposal would make ssize potentially unsafe in all cases.
That's not an improvement.
And no, merely asserting/contract checking is not a viable solution. The point of ssize is to make for(int i = 0; i < std::ssize(rng); ++i) work without the compiler complaining about signed/unsigned mismatch. To get an assert because of a conversion failure that didn't need to happen (and BTW, cannot be corrected without using std::size, which we are trying to avoid), one which is ultimately irrelevant to your algorithm? That's a terrible idea.
if it were implemented the way that I suggested, then this code would Just Work™:
Let us ignore the question of how often it is that a user would write this code.
The reason your compiler will expect/require you to use a cast there is because you are asking for an inherently dangerous operation: you are potentially losing data. Your code only "Just Works™" if the current size fits into an int16_t; that makes the conversion statically dangerous. This is not something that should implicitly take place, so the compiler suggests/requires you to explicitly ask for it. And users looking at that code get a big, fat eyesore reminding them that a dangerous thing is being done.
That is all to the good.
See, if your suggested implementation were how ssize behaved, then that means we must treat every use of ssize as just as inherently dangerous as the compiler treats your attempted implicit conversion. But unlike static_cast, ssize is small and easily missed.
Dangerous operations should be called out as such. Since ssize is small and difficult to notice by design, it therefore should be as safe as possible. Ideally, it should be as safe as size, but failing that, it should be unsafe only to the extend that it is impossible to make it safe.
Users should not look on ssize usage as something dubious or disconcerting; they should not fear to use it.

Which to use? int32_t vs uint32_t [duplicate]

When is it appropriate to use an unsigned variable over a signed one? What about in a for loop?
I hear a lot of opinions about this and I wanted to see if there was anything resembling a consensus.
for (unsigned int i = 0; i < someThing.length(); i++) {
SomeThing var = someThing.at(i);
// You get the idea.
}
I know Java doesn't have unsigned values, and that must have been a concious decision on Sun Microsystems' part.
I was glad to find a good conversation on this subject, as I hadn't really given it much thought before.
In summary, signed is a good general choice - even when you're dead sure all the numbers are positive - if you're going to do arithmetic on the variable (like in a typical for loop case).
unsigned starts to make more sense when:
You're going to do bitwise things like masks, or
You're desperate to to take advantage of the sign bit for that extra positive range .
Personally, I like signed because I don't trust myself to stay consistent and avoid mixing the two types (like the article warns against).
In your example above, when 'i' will always be positive and a higher range would be beneficial, unsigned would be useful. Like if you're using 'declare' statements, such as:
#declare BIT1 (unsigned int 1)
#declare BIT32 (unsigned int reallybignumber)
Especially when these values will never change.
However, if you're doing an accounting program where the people are irresponsible with their money and are constantly in the red, you will most definitely want to use 'signed'.
I do agree with saint though that a good rule of thumb is to use signed, which C actually defaults to, so you're covered.
I would think that if your business case dictates that a negative number is invalid, you would want to have an error shown or thrown.
With that in mind, I only just recently found out about unsigned integers while working on a project processing data in a binary file and storing the data into a database. I was purposely "corrupting" the binary data, and ended up getting negative values instead of an expected error. I found that even though the value converted, the value was not valid for my business case.
My program did not error, and I ended up getting wrong data into the database. It would have been better if I had used uint and had the program fail.
C and C++ compilers will generate a warning when you compare signed and unsigned types; in your example code, you couldn't make your loop variable unsigned and have the compiler generate code without warnings (assuming said warnings were turned on).
Naturally, you're compiling with warnings turned all the way up, right?
And, have you considered compiling with "treat warnings as errors" to take it that one step further?
The downside with using signed numbers is that there's a temptation to overload them so that, for example, the values 0->n are the menu selection, and -1 means nothing's selected - rather than creating a class that has two variables, one to indicate if something is selected and another to store what that selection is. Before you know it, you're testing for negative one all over the place and the compiler is complaining about how you're wanting to compare the menu selection against the number of menu selections you have - but that's dangerous because they're different types. So don't do that.
size_t is often a good choice for this, or size_type if you're using an STL class.

Advice on unsigned int (Gangnam Style edition)

The video "Gangnam Style" (I'm sure you've heard it) just exceeded 2 billion views on youtube. In fact, Google says that they never expected a video to be greater than a 32-bit integer... which alludes to the fact that Google used int instead of unsigned for their view counter. I think they had to re-write their code a bit to accommodate larger views.
Checking their style guide: https://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types
...they advise "don't use an unsigned integer type," and give one good reason why: unsigned could be buggy.
It's a good reason, but could be guarded against. My question is: is it bad coding practice in general to use unsigned int?
The Google rule is widely accepted in professional circles. The problem
is that the unsigned integral types are sort of broken, and have
unexpected and unnatural behavior when used for numeric values; they
don't work well as a cardinal type. For example, an index into an array
may never be negative, but it makes perfect sense to write
abs(i1 - i2) to find the distance between two indices. Which won't work if
i1 and i2 have unsigned types.
As a general rule, this particular rule in the Google style guidelines
corresponds more or less to what the designers of the language intended.
Any time you see something other than int, you can assume a special
reason for it. If it is because of the range, it will be long or
long long, or even int_least64_t. Using unsigned types is generally
a signal that you're dealing with bits, rather than the numeric value of
the variable, or (at least in the case of unsigned char) that you're
dealing with raw memory.
With regards to the "self-documentation" of using an unsigned: this
doesn't hold up, since there are almost always a lot of values that the
variable cannot (or should not) take, including many positive ones. C++
doesn't have sub-range types, and the way unsigned is defined means
that it cannot really be used as one either.
This guideline is extremely misleading. Blindly using int instead of unsigned int won't solve anything. That simply shifts the problems somewhere else. You absolutely must be aware of integer overflow when doing arithmetic on fixed precision integers. If your code is written in a way that it does not handle integer overflow gracefully for some given inputs, then your code is broken regardless of whether you use signed or unsigned ints. With unsigned ints you must be aware of integer underflow as well, and with doubles and floats you must be aware of many additional issues with floating point arithmetic.
Just take this article about a bug in the standard Java binary search algorithm published by none other than Google for why you must be aware of integer overflow. In fact, that very article shows C++ code casting to unsigned int in order to guarantee correct behavior. The article also starts out by presenting a bug in Java where guess what, they don't have unsigned int. However, they still ran into a bug with integer overflow.
Use the right type for the operations which you will perform. float wouldn't make sense for a counter. Nor does signed int. The normal operations on the counter are print and +=1.
Even if you had some unusual operations, such as printing the difference in viewcounts, you wouldn't necessarily have a problem. Sure, other answers mention the incorrect abs(i2-i1) but it's not unreasonable to expect programmers to use the correct max(i2,i1) - min(i2,i1). Which does have range issues for signed int. No uniform solution here; programmers should understand the properties of the types they're working with.
Google states that: "Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation."
I personally use unsigned ints as index parameters.
int foo(unsigned int index, int* myArray){
return myArray[index];
}
Google suggests: "Document that a variable is non-negative using assertions. Don't use an unsigned type."
int foo(int index, int* myArray){
assert(index >= 0);
return myArray[index];
}
Pro for Google: If a negative number is passed in debug mode my code will hopefully return an out of bounds error. Google's code is guaranteed to assert.
Pro for me: My code can support a greater size of myArray.
I think the actual deciding factor comes down to, how clean is your code? If you clean up all warnings, it will be clear when the compiler warns you know when you're trying to assign a signed variable to an unsigned variable. If your code already has a bunch of warnings, the compiler's warning is going to be lost on you.
A final note here: Google says: "Sometimes gcc will notice this bug and warn you, but often it will not." I haven't seen that to be the case on Visual Studio, checks against negative numbers and assignments from signed to unsigned are always warned. But if you use gcc you might have a care.
You specific question is:
"Is it bad practice to use unsigned?" to which the only correct answer can be no. It is not bad practice.
There are many style guides, each with a different focus, and while in some cases, an organisation, given their typical toolchain and deployment platform may choose not to use unsigned for their products, other toolchains and platforms almost demand it's use.
Google seem to get a lot of deference because they have a good business model (and probably employ some smart people like everyone else).
CERT IIRC recommend unsigned for buffer indexes, because if you do overflow, at least you'll still be in your own buffer, some intrinsic security there.
What do the language and standard library designers say (probably the best representation of accepted wisdom). strlen returns a size_t, which is probably unsigned (platform dependent), other answers suggest this is an anachronism because shiny new computers have wide architectures, but this misses the point that C and C++ are general programming languages and should scale well on big and small platforms.
Bottom line is that this is one of many religious questions; certainly not settled, and in these cases, I normally go with my religion for green field developments, and go with the existing convention of the codebase for existing work. Consistency matters.

C++ Picking a type for a constant

So on a fairly regular bases it seems I find the type of some constant I declared (typically integer, but occasionally other things like strings) is not the ideal type in a context it is being used, requiring a cast or resulting in a compiler warning about the implicit cast.
E.g. in one piece of code I had something like the below, and got a signed/unsigned comparison issue.
static const int MAX_FOO = 16;
...
if (container.size() > MAX_FOO) {...}
I have been thinking of just always using the smallest / most basic type allowed for a given constant (e.g. char, unsigned char, const char* etc rather than say int, size_t and std::string), but was wondering if this is really a good idea, and if there are some places where it would potentially be a really bad idea? e.g. code using the 'auto' keyword (or perhaps templates) getting a too small type and overflowing on what appeared to be a safe operation?
Going for the smallest type that can hold the initial value is a bad habit. That invites overflow.
Always code for the most general (which according to Murphy's Law is the worst) case. As templates generalize things, that makes the worst case a lot worse. Be prepared for bizarre kinds of overflows and avoid negative numbers while unsigned types are in the neighborhood.
std::size_t is the best choice for the size or length of anything, for the reason you mentioned. But subtract pointers and you get a std::ptrdiff_t instead. Personally I recommend to cast the result of such a subtraction to std::size_t if it can be guaranteed to be positive.
char * does not own its string in the C++ sense as std::string does, so the latter is the more conservative choice.
This question is so broad that no more specific advice can be made…

Why is size_t unsigned?

Bjarne Stroustrup wrote in The C++ Programming Language:
The unsigned integer types are ideal for uses that treat storage as a
bit array. Using an unsigned instead of an int to gain one more bit to
represent positive integers is almost never a good idea. Attempts to
ensure that some values are positive by declaring variables unsigned
will typically be defeated by the implicit conversion rules.
size_t seems to be unsigned "to gain one more bit to represent positive integers". So was this a mistake (or trade-off), and if so, should we minimize use of it in our own code?
Another relevant article by Scott Meyers is here. To summarize, he recommends not using unsigned in interfaces, regardless of whether the value is always positive or not. In other words, even if negative values make no sense, you shouldn't necessarily use unsigned.
size_t is unsigned for historical reasons.
On an architecture with 16 bit pointers, such as the "small" model DOS programming, it would be impractical to limit strings to 32 KB.
For this reason, the C standard requires (via required ranges) ptrdiff_t, the signed counterpart to size_t and the result type of pointer difference, to be effectively 17 bits.
Those reasons can still apply in parts of the embedded programming world.
However, they do not apply to modern 32-bit or 64-bit programming, where a much more important consideration is that the unfortunate implicit conversion rules of C and C++ make unsigned types into bug attractors, when they're used for numbers (and hence, arithmetical operations and magnitude comparisions). With 20-20 hindsight we can now see that the decision to adopt those particular conversion rules, where e.g. string( "Hi" ).length() < -3 is practically guaranteed, was rather silly and impractical. However, that decision means that in modern programming, adopting unsigned types for numbers has severe disadvantages and no advantages – except for satisfying the feelings of those who find unsigned to be a self-descriptive type name, and fail to think of typedef int MyType.
Summing up, it was not a mistake. It was a decision for then very rational, practical programming reasons. It had nothing to do with transferring expectations from bounds-checked languages like Pascal to C++ (which is a fallacy, but a very very common one, even if some of those who do it have never heard of Pascal).
size_t is unsigned because negative sizes make no sense.
(From the comments:)
It's not so much ensuring, as stating what is. When is the last time you saw a list of size -1? Follow that logic too far and you find that unsigned should not exist at all and bit operations shouldn't be permitted either. – geekosaur
More to the point: addresses, for reasons you should think about, are not signed. Sizes are generated by comparing addresses; treating an address as signed will do very much the wrong thing, and using a signed value for the result will lose data in a way that your reading of the Stroustrup quote evidently thinks is acceptable, but in fact is not. Perhaps you can explain what a negative address should do instead. – geekosaur
A reason for making index types unsigned is for symmetry with C and C++'s preference for half-open intervals. And if your index types are going to be unsigned, then it's convenient to also have your size type unsigned.
In C, you can have a pointer that points into an array. A valid pointer can point to any element of the array or one element past the end of the array. It cannot point to one element before the beginning of the array.
int a[2] = { 0, 1 };
int * p = a; // OK
++p; // OK, points to the second element
++p; // Still OK, but you cannot dereference this one.
++p; // Nope, now you've gone too far.
p = a;
--p; // oops! not allowed
C++ agrees and extends this idea to iterators.
Arguments against unsigned index types often trot out an example of traversing an array from back to front, and the code often looks like this:
// WARNING: Possibly dangerous code.
int a[size] = ...;
for (index_type i = size - 1; i >= 0; --i) { ... }
This code works only if index_type is signed, which is used as an argument that index types should be signed (and that, by extension, sizes should be signed).
That argument is unpersuasive because that code is non-idiomatic. Watch what happens if we try to rewrite this loop with pointers instead of indices:
// WARNING: Bad code.
int a[size] = ...;
for (int * p = a + size - 1; p >= a; --p) { ... }
Yikes, now we have undefined behavior! Ignoring the problem when size is 0, we have a problem at the end of the iteration because we generate an invalid pointer that points to the element before the first. That's undefined behavior even if we never try dereference that pointer.
So you could argue to fix this by changing the language standard to make it legit to have a pointer that points to the element before the first, but that's not likely to happen. The half-open interval is a fundamental building block of these languages, so let's write better code instead.
A correct pointer-based solution is:
int a[size] = ...;
for (int * p = a + size; p != a; ) {
--p;
...
}
Many find this disturbing because the decrement is now in the body of the loop instead of in the header, but that's what happens when your for-syntax is designed primarily for forward loops through half-open intervals. (Reverse iterators solve this asymmetry by postponing the decrement.)
Now, by analogy, the index-based solution becomes:
int a[size] = ...;
for (index_type i = size; i != 0; ) {
--i;
...
}
This works whether index_type is signed or unsigned, but the unsigned choice yields code that maps more directly to the idiomatic pointer and iterator versions. Unsigned also means that, as with pointers and iterators, we'll be able to access every element of the sequence--we don't surrender half of our possible range in order to represent nonsensical values. While that's not a practical concern in a 64-bit world, it can be a very real concern in a 16-bit embedded processor or in building an abstract container type for sparse data over a massive range that can still provide the identical API as a native container.
On the other hand ...
Myth 1: std::size_t is unsigned is because of legacy restrictions that no longer apply.
There are two "historical" reasons commonly referred to here:
sizeof returns std::size_t, which has been unsigned since the days of C.
Processors had smaller word sizes, so it was important to squeeze that extra bit of range out.
But neither of these reasons, despite being very old, are actually relegated to history.
sizeof still returns a std::size_t which is still unsigned. If you want to interoperate with sizeof or the standard library containers, you're going to have to use std::size_t.
The alternatives are all worse: You could disable signed/unsigned comparison warnings and size conversion warnings and hope that the values will always be in the overlapping ranges so that you can ignore the latent bugs using different types couple potentially introduce. Or you could do a lot of range-checking and explicit conversions. Or you could introduce your own size type with clever built-in conversions to centralize the range checking, but no other library is going to use your size type.
And while most mainstream computing is done on 32- and 64-bit processors, C++ is still used on 16-bit microprocessors in embedded systems, even today. On those microprocessors, it's often very useful to have a word-sized value that can represent any value in your memory space.
Our new code still has to interoperate with the standard library. If our new code used signed types while the standard library continues to use unsigned ones, we make it harder for every consumer that has to use both.
Myth 2: You don't need that extra bit. (A.K.A., You're never going to have a string larger than 2GB when your address space is only 4GB.)
Sizes and indexes aren't just for memory. Your address space may be limited, but you might process files that are much larger than your address space. And while you might not have a string with more the 2GB, you could comfortably have a bitset with more than 2Gbits. And don't forget virtual containers designed for sparse data.
Myth 3: You can always use a wider signed type.
Not always. It's true that for a local variable or two, you could use a std::int64_t (assuming your system has one) or a signed long long and probably write perfectly reasonable code. (But you're still going to need some explicit casts and twice as much bounds checking or you'll have to disable some compiler warnings that might've alerted you to bugs elsewhere in your code.)
But what if you're building a large table of indices? Do you really want an extra two or four bytes for every index when you need just one bit? Even if you have plenty of memory and a modern processor, making that table twice as large could have deleterious effects on locality of reference, and all your range checks are now two-steps, reducing the effectiveness of branch prediction. And what if you don't have all that memory?
Myth 4: Unsigned arithmetic is surprising and unnatural.
This implies that signed arithmetic is not surprising or somehow more natural. And, perhaps it is when thinking in terms of mathematics where all the basic arithmetic operations are closed over the set of all integers.
But our computers don't work with integers. They work with an infinitesimal fraction of the integers. Our signed arithmetic is not closed over the set of all integers. We have overflow and underflow. To many, that's so surprising and unnatural, they mostly just ignore it.
This is bug:
auto mid = (min + max) / 2; // BUGGY
If min and max are signed, the sum could overflow, and that yields undefined behavior. Most of us routinely miss this these kinds of bugs because we forget that addition is not closed over the set of signed ints. We get away with it because our compilers typically generate code that does something reasonable (but still surprising).
If min and max are unsigned, the sum could still overflow, but the undefined behavior is gone. You'll still get the wrong answer, so it's still surprising, but not any more surprising than it was with signed ints.
The real unsigned surprise comes with subtraction: If you subtract a larger unsigned int from a smaller one, you're going to end up with a big number. This result isn't any more surprising than if you divided by 0.
Even if you could eliminate unsigned types from all your APIs, you still have to be prepared for these unsigned "surprises" if you deal with the standard containers or file formats or wire protocols. Is it really worth adding friction to your APIs to "solve" only part of the problem?