Is this an unavoidable signed and unsigned integer comparison? - c++

Probably not, but I can't think of a good solution. I'm no expert in C++ yet.
Recently I've converted a lot of ints to unsigned ints in a project. Basically everything that should never be negative is made unsigned. This removed a lot of these warnings by MinGW:
warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
I love it. It makes the program more robust and the code more descriptive. However, there is one place where they still occur. It looks like this:
unsigned int subroutine_point_size = settings->get<unsigned int>("subroutine_point_size");
...
for(int dx = -subroutine_point_size;dx <= subroutine_point_size;dx++) //Fill pixels in the point's radius.
{
for(int dy = -subroutine_point_size;dy <= subroutine_point_size;dy++)
{
//Do something with dx and dy here.
}
}
In this case I can't make dx and dy unsigned. They start out negative and depend on comparing which is lesser or greater.
I don't like to make subroutine_point_size signed either, though this is the lesser evil. It indicates a size of a kernel in a pass over an image, and the kernel size can't be negative (it's probably unwise for a user ever to set this kernel size to anything more than 100 but the settings file allows for numbers up to 2^32 - 1).
So it seems there is no way to cast any of the variables to fix this. Is there a way to get rid of this warning and solve this neatly?
We're using C++11, compiling with GCC for Windows, Mac and various Unix distributions.

Cast the variables to a long int or long long int type giving at the same time the range of unsigned int (0..2^32-1) and sign.

You're making a big mistake.
Basically you like the name "unsigned" and you intend it to mean "not negative" but this is not what is the semantic associated to the type.
Consider the statement:
adding a signed integer and an unsigned integer the result is unsigned
Clearly it makes no sense if you consider the term "unsigned" as "not negative", yet this is what the language does: adding -3 to the unsigned value 2 you will get a huge nonsense number instead of the correct answer -1.
Indeed the choice of using an unsigned type for the size of containers is a design mistake of C++, a mistake that is too late to fix now because of backward compatibility. By the way the reason it happened has nothing to do with "non-negativeness", but just with the ability to use the 16th bit when computers were that small (i.e. being able to use 65535 elements instead of 32767). Even back then I don't think the price of wrong semantic was worth the gain (if 32767 is not enough now then 65535 won't be enough quite soon anyway).
Do not repeat the same mistake in your programs... the name is irrelevant, what counts is the semantic and for unsigned in C and C++ it is "member of the Zn modulo ring with n=2k ".
You don't want the size of a container to be the member of a modulo ring. Do you?

Instead of the current
for(int dx = -subroutine_point_size;dx <= subroutine_point_size;dx++) //Fill pixels in the point's radius.
you can do this:
for(int dx = -int(subroutine_point_size);dx <= int(subroutine_point_size);dx++) //Fill pixels in the point's radius.
where the first int cast is (1)technically redundant, but is there for consistency, because the second cast removes the signed/unsigned warning that presumably is the issue here.
However, I strongly advise you to undo the work of converting signed to unsigned types everywhere. A good rule of thumb is to use signed types for numbers, and unsigned types for bit level stuff. That avoids the problems with wrap-around due to implicit conversions, where e.g. std:.string("Bah").length() < -5 is guaranteed (very silly), and because it does away with actual problems, it also reduces spurious warnings.
Note that you can just define a suitable name, where you want to indicate that some value will never be negative.
1) Technically redundant in practice, for two's complement representation of signed integers, with no trapping inserted by the compiler. As far as I know no extant C++ compiler behaves otherwise.

Firstly, without knowing the range of values that will be stored in the variables, your claim that changing signed to unsigned variables is unsubstantiated - there are circumstances where that claim is false.
Second, the compiler is not issuing a warning only as a result of changing variables (and I assume calls of template functions like settings.get()) to be unsigned. It is warning about the fact you have expressions involving both signed and unsigned variables. Compilers typically issue warnings about such expressions because - in practice - they are more likely to indicate a programming error or to potentially involve some behaviour that the programmer may not have anticipated (e.g. instances of undefined behaviour, expressions where a negative result is expected but a large positive result is what will occur, etc).
A rule of thumb is that, if you need to have expressions involving both signed and unsigned types, you are better off making all the relevant variables signed. While there are exceptions where that rule of thumb isn't needed, you wouldn't have asked this question if you understood how to decide that.
On that basis, I suggest the most appropriate action is to unwind your changes.

Related

Is using an unsigned rather than signed int more likely to cause bugs? Why?

In the Google C++ Style Guide, on the topic of "Unsigned Integers", it is suggested that
Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point. The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler.
What is wrong with modular arithmetic? Isn't that the expected behaviour of an unsigned int?
What kind of bugs (a significant class) does the guide refer to? Overflowing bugs?
Do not use an unsigned type merely to assert that a variable is non-negative.
One reason that I can think of using signed int over unsigned int, is that if it does overflow (to negative), it is easier to detect.
Some of the answers here mention the surprising promotion rules between signed and unsigned values, but that seems more like a problem relating to mixing signed and unsigned values, and doesn't necessarily explain why signed variables would be preferred over unsigned outside of mixing scenarios.
In my experience, outside of mixed comparisons and promotion rules, there are two primary reasons why unsigned values are bug magnets as follows.
Unsigned values have a discontinuity at zero, the most common value in programming
Both unsigned and signed integers have a discontinuities at their minimum and maximum values, where they wrap around (unsigned) or cause undefined behavior (signed). For unsigned these points are at zero and UINT_MAX. For int they are at INT_MIN and INT_MAX. Typical values of INT_MIN and INT_MAX on system with 4-byte int values are -2^31 and 2^31-1, and on such a system UINT_MAX is typically 2^32-1.
The primary bug-inducing problem with unsigned that doesn't apply to int is that it has a discontinuity at zero. Zero, of course, is a very common value in programs, along with other small values like 1,2,3. It is common to add and subtract small values, especially 1, in various constructs, and if you subtract anything from an unsigned value and it happens to be zero, you just got a massive positive value and an almost certain bug.
Consider code iterates over all values in a vector by index except the last0.5:
for (size_t i = 0; i < v.size() - 1; i++) { // do something }
This works fine until one day you pass in an empty vector. Instead of doing zero iterations, you get v.size() - 1 == a giant number1 and you'll do 4 billion iterations and almost have a buffer overflow vulnerability.
You need to write it like this:
for (size_t i = 0; i + 1 < v.size(); i++) { // do something }
So it can be "fixed" in this case, but only by carefully thinking about the unsigned nature of size_t. Sometimes you can't apply the fix above because instead of a constant one you have some variable offset you want to apply, which may be positive or negative: so which "side" of the comparison you need to put it on depends on the signedness - now the code gets really messy.
There is a similar issue with code that tries to iterate down to and including zero. Something like while (index-- > 0) works fine, but the apparently equivalent while (--index >= 0) will never terminate for an unsigned value. Your compiler might warn you when the right hand side is literal zero, but certainly not if it is a value determined at runtime.
Counterpoint
Some might argue that signed values also have two discontinuities, so why pick on unsigned? The difference is that both discontinuities are very (maximally) far away from zero. I really consider this a separate problem of "overflow", both signed and unsigned values may overflow at very large values. In many cases overflow is impossible due to constraints on the possible range of the values, and overflow of many 64-bit values may be physically impossible). Even if possible, the chance of an overflow related bug is often minuscule compared to an "at zero" bug, and overflow occurs for unsigned values too. So unsigned combines the worst of both worlds: potentially overflow with very large magnitude values, and a discontinuity at zero. Signed only has the former.
Many will argue "you lose a bit" with unsigned. This is often true - but not always (if you need to represent differences between unsigned values you'll lose that bit anyways: so many 32-bit things are limited to 2 GiB anyways, or you'll have a weird grey area where say a file can be 4 GiB, but you can't use certain APIs on the second 2 GiB half).
Even in the cases where unsigned buys you a bit: it doesn't buy you much: if you had to support more than 2 billion "things", you'll probably soon have to support more than 4 billion.
Logically, unsigned values are a subset of signed values
Mathematically, unsigned values (non-negative integers) are a subset of signed integers (just called _integers).2. Yet signed values naturally pop out of operations solely on unsigned values, such as subtraction. We might say that unsigned values aren't closed under subtraction. The same isn't true of signed values.
Want to find the "delta" between two unsigned indexes into a file? Well you better do the subtraction in the right order, or else you'll get the wrong answer. Of course, you often need a runtime check to determine the right order! When dealing with unsigned values as numbers, you'll often find that (logically) signed values keep appearing anyways, so you might as well start of with signed.
Counterpoint
As mentioned in footnote (2) above, signed values in C++ aren't actually a subset of unsigned values of the same size, so unsigned values can represent the same number of results that signed values can.
True, but the range is less useful. Consider subtraction, and unsigned numbers with a range of 0 to 2N, and signed numbers with a range of -N to N. Arbitrary subtractions result in results in the range -2N to 2N in _both cases, and either type of integer can only represent half of it. Well it turns out that the region centered around zero of -N to N is usually way more useful (contains more actual results in real world code) than the range 0 to 2N. Consider any of typical distribution other than uniform (log, zipfian, normal, whatever) and consider subtracting randomly selected values from that distribution: way more values end up in [-N, N] than [0, 2N] (indeed, resulting distribution is always centered at zero).
64-bit closes the door on many of the reasons to use unsigned values as numbers
I think the arguments above were already compelling for 32-bit values, but the overflow cases, which affect both signed and unsigned at different thresholds, do occur for 32-bit values, since "2 billion" is a number that can exceeded by many abstract and physical quantities (billions of dollars, billions of nanoseconds, arrays with billions of elements). So if someone is convinced enough by the doubling of the positive range for unsigned values, they can make the case that overflow does matter and it slightly favors unsigned.
Outside of specialized domains 64-bit values largely remove this concern. Signed 64-bit values have an upper range of 9,223,372,036,854,775,807 - more than nine quintillion. That's a lot of nanoseconds (about 292 years worth), and a lot of money. It's also a larger array than any computer is likely to have RAM in a coherent address space for a long time. So maybe 9 quintillion is enough for everybody (for now)?
When to use unsigned values
Note that the style guide doesn't forbid or even necessarily discourage use of unsigned numbers. It concludes with:
Do not use an unsigned type merely to assert that a variable is non-negative.
Indeed, there are good uses for unsigned variables:
When you want to treat an N-bit quantity not as an integer, but simply a "bag of bits". For example, as a bitmask or bitmap, or N boolean values or whatever. This use often goes hand-in-hand with the fixed width types like uint32_t and uint64_t since you often want to know the exact size of the variable. A hint that a particular variable deserves this treatment is that you only operate on it with with the bitwise operators such as ~, |, &, ^, >> and so on, and not with the arithmetic operations such as +, -, *, / etc.
Unsigned is ideal here because the behavior of the bitwise operators is well-defined and standardized. Signed values have several problems, such as undefined and unspecified behavior when shifting, and an unspecified representation.
When you actually want modular arithmetic. Sometimes you actually want 2^N modular arithmetic. In these cases "overflow" is a feature, not a bug. Unsigned values give you what you want here since they are defined to use modular arithmetic. Signed values cannot be (easily, efficiently) used at all since they have an unspecified representation and overflow is undefined.
0.5 After I wrote this I realized this is nearly identical to Jarod's example, which I hadn't seen - and for good reason, it's a good example!
1 We're talking about size_t here so usually 2^32-1 on a 32-bit system or 2^64-1 on a 64-bit one.
2 In C++ this isn't exactly the case because unsigned values contain more values at the upper end than the corresponding signed type, but the basic problem exists that manipulating unsigned values can result in (logically) signed values, but there is no corresponding issue with signed values (since signed values already include unsigned values).
As stated, mixing unsigned and signed might lead to unexpected behaviour (even if well defined).
Suppose you want to iterate over all elements of vector except for the last five, you might wrongly write:
for (int i = 0; i < v.size() - 5; ++i) { foo(v[i]); } // Incorrect
// for (int i = 0; i + 5 < v.size(); ++i) { foo(v[i]); } // Correct
Suppose v.size() < 5, then, as v.size() is unsigned, s.size() - 5 would be a very large number, and so i < v.size() - 5 would be true for a more expected range of value of i. And UB then happens quickly (out of bound access once i >= v.size())
If v.size() would have return signed value, then s.size() - 5 would have been negative, and in above case, condition would be false immediately.
On the other side, index should be between [0; v.size()[ so unsigned makes sense.
Signed has also its own issue as UB with overflow or implementation-defined behaviour for right shift of a negative signed number, but less frequent source of bug for iteration.
One of the most hair-raising examples of an error is when you MIX signed and unsigned values:
#include <iostream>
int main() {
auto qualifier = -1 < 1u ? "makes" : "does not make";
std::cout << "The world " << qualifier << " sense" << std::endl;
}
The output:
The world does not make sense
Unless you have a trivial application, it's inevitable you'll end up with either dangerous mixes between signed and unsigned values (resulting in runtime errors) or if you crank up warnings and make them compile-time errors, you end up with a lot of static_casts in your code. That's why it's best to strictly use signed integers for types for math or logical comparison. Only use unsigned for bitmasks and types representing bits.
Modeling a type to be unsigned based on the expected domain of the values of your numbers is a Bad Idea. Most numbers are closer to 0 than they are to 2 billion, so with unsigned types, a lot of your values are closer to the edge of the valid range. To make things worse, the final value may be in a known positive range, but while evaluating expressions, intermediate values may underflow and if they are used in intermediate form may be VERY wrong values. Finally, even if your values are expected to always be positive, that doesn't mean that they won't interact with other variables that can be negative, and so you end up with a forced situation of mixing signed and unsigned types, which is the worst place to be.
Why is using an unsigned int more likely to cause bugs than using a signed int?
Using an unsigned type is not more likely to cause bugs than using a signed type with certain classes of tasks.
Use the right tool for the job.
What is wrong with modular arithmetic? Isn't that the expected behaviour of an unsigned int?
Why is using an unsigned int more likely to cause bugs than using a signed int?
If the task if well-matched: nothing wrong. No, not more likely.
Security, encryption, and authentication algorithm count on unsigned modular math.
Compression/decompression algorithms too as well as various graphic formats benefit and are less buggy with unsigned math.
Any time bit-wise operators and shifts are used, the unsigned operations do not get messed up with the sign-extension issues of signed math.
Signed integer math has an intuitive look and feel readily understood by all including learners to coding. C/C++ was not targeted originally nor now should be an intro-language. For rapid coding that employs safety nets concerning overflow, other languages are better suited. For lean fast code, C assumes that coders knows what they are doing (they are experienced).
A pitfall of signed math today is the ubiquitous 32-bit int that with so many problems is well wide enough for the common tasks without range checking. This leads to complacency that overflow is not coded against. Instead, for (int i=0; i < n; i++) int len = strlen(s); is viewed as OK because n is assumed < INT_MAX and strings will never be too long, rather than being full ranged protected in the first case or using size_t, unsigned or even long long in the 2nd.
C/C++ developed in an era that included 16-bit as well as 32-bit int and the extra bit an unsigned 16-bit size_t affords was significant. Attention was needed in regard to overflow issues be it int or unsigned.
With 32-bit (or wider) applications of Google on non-16 bit int/unsigned platforms, affords the lack of attention to +/- overflow of int given its ample range. This makes sense for such applications to encourage int over unsigned. Yet int math is not well protected.
The narrow 16-bit int/unsigned concerns apply today with select embedded applications.
Google's guidelines apply well for code they write today. It is not a definitive guideline for the larger wide scope range of C/C++ code.
One reason that I can think of using signed int over unsigned int, is that if it does overflow (to negative), it is easier to detect.
In C/C++, signed int math overflow is undefined behavior and so not certainly easier to detect than defined behavior of unsigned math.
As #Chris Uzdavinis well commented, mixing signed and unsigned is best avoided by all (especially beginners) and otherwise coded carefully when needed.
I have some experience with Google's style guide, AKA the Hitchhiker's Guide to Insane Directives from Bad Programmers Who Got into the Company a Long Long Time Ago. This particular guideline is just one example of the dozens of nutty rules in that book.
Errors only occur with unsigned types if you try to do arithmetic with them (see Chris Uzdavinis example above), in other words if you use them as numbers. Unsigned types are not intended to be used to store numeric quantities, they are intended to store counts such as the size of containers, which can never be negative, and they can and should be used for that purpose.
The idea of using arithmetical types (like signed integers) to store container sizes is idiotic. Would you use a double to store the size of a list, too? That there are people at Google storing container sizes using arithmetical types and requiring others to do the same thing says something about the company. One thing I notice about such dictates is that the dumber they are, the more they need to be strict do-it-or-you-are-fired rules because otherwise people with common sense would ignore the rule.
Using unsigned types to represent non-negative values...
is more likely to cause bugs involving type promotion, when using signed and unsigned values, as other answer demonstrate and discuss in depth, but
is less likely to cause bugs involving choice of types with domains capable of representing undersirable/disallowed values. In some places you'll assume the value is in the domain, and may get unexpected and potentially hazardous behavior when other value sneak in somehow.
The Google Coding Guidelines puts emphasis on the first kind of consideration. Other guideline sets, such as the C++ Core Guidelines, put more emphasis on the second point. For example, consider Core Guideline I.12:
I.12: Declare a pointer that must not be null as not_null
Reason
To help avoid dereferencing nullptr errors. To improve performance by
avoiding redundant checks for nullptr.
Example
int length(const char* p); // it is not clear whether length(nullptr) is valid
length(nullptr); // OK?
int length(not_null<const char*> p); // better: we can assume that p cannot be nullptr
int length(const char* p); // we must assume that p can be nullptr
By stating the intent in source, implementers and tools can provide
better diagnostics, such as finding some classes of errors through
static analysis, and perform optimizations, such as removing branches
and null tests.
Of course, you could argue for a non_negative wrapper for integers, which avoids both categories of errors, but that would have its own issues...
The google statement is about using unsigned as a size type for containers. In contrast, the question appears to be more general. Please keep that in mind, while you read on.
Since most answers so far reacted to the google statement, less so to the bigger question, I will start my answer about negative container sizes and subsequently try to convince anyone (hopeless, I know...) that unsigned is good.
Signed container sizes
Lets assume someone coded a bug, which results in a negative container index. The result is either undefined behavior or an exception / access violation. Is that really better than getting undefined behavior or an exception / access violation when the index type was unsigned? I think, no.
Now, there is a class of people who love to talk about mathematics and what is "natural" in this context. How can an integral type with negative number be natural to describe something, which is inherently >= 0? Using arrays with negative sizes much? IMHO, especially mathematically inclined people would find this mismatch of semantics (size/index type says negative is possible, while a negative sized array is hard to imagine) irritating.
So, the only question, remaining on this matter is if - as stated in the google comment - a compiler could actually actively assist in finding such bugs. And even better than the alternative, which would be underflow protected unsigned integers (x86-64 assembly and probably other architectures have means to achieve that, only C/C++ does not use those means). The only way I can fathom is if the compiler automatically added run time checks (if (index < 0) throwOrWhatever) or in case of compile time actions produce a lot of potentially false positive warnings/errors "The index for this array access could be negative." I have my doubts, this would be helpful.
Also, people who actually write runtime checks for their array/container indices, it is more work dealing with signed integers. Instead of writing if (index < container.size()) { ... } you now have to write: if (index >= 0 && index < container.size()) { ... }. Looks like forced labor to me and not like an improvement...
Languages without unsigned types suck...
Yes, this is a stab at java. Now, I come from embedded programming background and we worked a lot with field buses, where binary operations (and,or,xor,...) and bit wise composition of values is literally the bread and butter. For one of our products, we - or rather a customer - wanted a java port... and I sat opposite to the luckily very competent guy who did the port (I refused...). He tried to stay composed... and suffer in silence... but the pain was there, he could not stop cursing after a few days of constantly dealing with signed integral values, which SHOULD be unsigned... Even writing unit tests for those scenarios is painful and me, personally I think java would have been better off if they had omitted signed integers and just offered unsigned... at least then, you do not have to care about sign extensions etc... and you can still interpret numbers as 2s complement.
Those are my 5 cents on the matter.

How to deal with “signed/unsigned mismatch” warnings (C4018, no loop)?

There are similar questions here but they related to particular case of usage in loop indexes. This one is more generic case. How to deal with this warning if there is no loops?
How to deal with the warning in this simplified case to outline the problem?
int(-3) >= size_t(31)
It is a case-by-case basis, and you just have to use your knowledge of the problem you're solving to make that call. The goal is to avoid logic errors at runtime, not to stamp out warning messages.
Casting one or the other value to be the same as the other removes the warning, but won't prevent runtime problems due to logic errors.
e.g. if you had a large unsigned value and cast that to be signed, then it flips negative, which would mess up comparisions. Conversely, flipping -3 to unsigned will make it into a very large positive value, which would mess up the comparison you're trying. Sure, explicitly casting the values could avoid the messages but those messages are warning you about possible unexpected behaviour from your program, which you need to consider carefully regarding the possible & likely values that these variables can take.
Cast at least one of the operands so that they are both of the same signedness.
Be careful when casting unsigned to signed: if the value is too large, you will get implementation defined behaviour.
Be careful when casting signed to unsigned - the behaviour when the original value is negative is precisely defined, but it may be surprising. If the expression is rewritten as size_t(-3) >= size_t(31) then it is always true.
Note that the cast to int in the example is pointless - the literal 3 must be of type int, and applying unary - to that will give an int result.
To get answer which value is bigger use signed arbitrary arithmetic types like boost::cpp_int
auto v1 = int(-3);
auto v2 = size_t(31);
boost::cpp_int(v1) >= boost::cpp_int(v2);
If you have access to 80bit MMX registers then consider using them to store both values as signed and compare.

Is the using `int` is more preferably than the using of `unsigned int`? [duplicate]

Should one ever declare a variable as an unsigned int if they don't require the extra range of values? For example, when declaring the variable in a for loop, if you know it's not going to be negative, does it matter? Is one faster than the other? Is it bad to declare an unsigned int just as unsigned in C++?
To reitterate, should it be done even if the extra range is not required? I heard they should be avoided because they cause confusion (IIRC that's why Java doesn't have them).
The reason to use uints is that it gives the compiler a wider variety of optimizations. For example, it may replace an instance of 'abs(x)' with 'x' if it knows that x is positive. It also opens up a variety of bitwise 'strength reductions' that only work for positive numbers. If you always mult/divide an int by a power of two, then the compiler may replace the operation with a bit shift (ie x*8 == x<<3) which tends to perform much faster. Unfortunately, this relation only holds if 'x' is positive because negative numbers are encoded in a way that precludes this. With ints, the compiler may apply this trick if it can prove that the value is always positive (or can be modified earlier in the code to be so). In the case of uints, this attribute is trivial to prove, which greatly increases the odds of it being applied.
Another example might be the equation y = 16 * x + 12. If x can be negative, then a multiply and add would be required. Yet if x is always positive, then not only can the x*16 term be replaced with x<<4, but since the term would always end with four zeros this opens up replacing the '+ 12' with a binary OR (as long as the '12' term is less than 16). The result would be y = (x<<4) | 12.
In general, the 'unsigned' qualifier gives the compiler more information about the variable, which in turn allows it to squeeze in more optimizations.
You should use unsigned integers when it doesn't make sense for them to have negative values. This is completely independent of the range issue. So yes, you should use unsigned integer types even if the extra range is not required, and no, you shouldn't use unsigned ints (or anything else) if not necessary, but you need to revise your definition of what is necessary.
More often than not, you should use unsigned integers.
They are more predictable in terms of undefined behavior on overflow and such.
This is a huge subject of its own, so I won't say much more about it.
It's a very good reason to avoid signed integers unless you actually need signed values.
Also, they are easier to work with when range-checking -- you don't have to check for negative values.
Typical rules of thumb:
If you are writing a forward for loop with an index as the control variable, you almost always want unsigned integers. In fact, you almost always want size_t.
If you're writing a reverse for loop with an index as a the control variable, you should probably use signed integers, for obvious reasons. Probably ptrdiff_t would do.
The one thing to be careful with is when casting between signed and unsigned values of different sizes.
You probably want to double-check (or triple-check) to make sure the cast is working the way you expect.
int is the general purpose integer type. If you need an integer, and int meets your requirements (range [-32767,32767]), then use it.
If you have more specialized purposes, then you can choose something else. If you need an index into an array, then use size_t. If you need an index into a vector, then use std::vector<T>::size_type. If you need specific sizes, then pick something from <cstdint>. If you need something larger than 64 bits, then find a library like gmp.
I can't think of any good reasons to use unsigned int. At least, not directly (size_t and some of the specifically sized types from <cstdint> may be typedefs of unsigned int).
The problem with the systematic use of unsigned when values can't be negative isn't that Java doesn't have unsigned, it is that expressions with unsigned values, especially when mixed with signed one, give sometimes confusing results if you think about unsigned as an integer type with a shifted range. Unsigned is a modular type, not a restriction of integers to positive or zero.
Thus the traditional view is that unsigned should be used when you need a modular type or for bitwise manipulation. That view is implicit in K&R — look how int and unsigned are used —, and more explicit in TC++PL (2nd edition, p. 50):
The unsigned integer types are ideal for uses that treat storage as a bit array. Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules.
In almost all architectures the cost of signed operation and unsigned operation is the same. So efficiency wise you wont get any advantage for using unsigned over signed. But as you pointed out, if you use unsigned you will have a bigger range
Even if you have variables that should only take non negative values unsigned can be a problem. Here is an example. Suppose a programmer is asked to write a code to print all pairs of integer numbers (a,b) with 0 <= a < b <= n where n is a given input. An incorrect code is
for (unsigned b = 0; b <= n; b++)
for (unsigned a=0; a <=b-1; b++)
cout << a << ',' << b << n ;
This is easy to correct, but thinking with unsigned is a bit less natural than thinking with int.

Should unsigned ints be used if not necessary?

Should one ever declare a variable as an unsigned int if they don't require the extra range of values? For example, when declaring the variable in a for loop, if you know it's not going to be negative, does it matter? Is one faster than the other? Is it bad to declare an unsigned int just as unsigned in C++?
To reitterate, should it be done even if the extra range is not required? I heard they should be avoided because they cause confusion (IIRC that's why Java doesn't have them).
The reason to use uints is that it gives the compiler a wider variety of optimizations. For example, it may replace an instance of 'abs(x)' with 'x' if it knows that x is positive. It also opens up a variety of bitwise 'strength reductions' that only work for positive numbers. If you always mult/divide an int by a power of two, then the compiler may replace the operation with a bit shift (ie x*8 == x<<3) which tends to perform much faster. Unfortunately, this relation only holds if 'x' is positive because negative numbers are encoded in a way that precludes this. With ints, the compiler may apply this trick if it can prove that the value is always positive (or can be modified earlier in the code to be so). In the case of uints, this attribute is trivial to prove, which greatly increases the odds of it being applied.
Another example might be the equation y = 16 * x + 12. If x can be negative, then a multiply and add would be required. Yet if x is always positive, then not only can the x*16 term be replaced with x<<4, but since the term would always end with four zeros this opens up replacing the '+ 12' with a binary OR (as long as the '12' term is less than 16). The result would be y = (x<<4) | 12.
In general, the 'unsigned' qualifier gives the compiler more information about the variable, which in turn allows it to squeeze in more optimizations.
You should use unsigned integers when it doesn't make sense for them to have negative values. This is completely independent of the range issue. So yes, you should use unsigned integer types even if the extra range is not required, and no, you shouldn't use unsigned ints (or anything else) if not necessary, but you need to revise your definition of what is necessary.
More often than not, you should use unsigned integers.
They are more predictable in terms of undefined behavior on overflow and such.
This is a huge subject of its own, so I won't say much more about it.
It's a very good reason to avoid signed integers unless you actually need signed values.
Also, they are easier to work with when range-checking -- you don't have to check for negative values.
Typical rules of thumb:
If you are writing a forward for loop with an index as the control variable, you almost always want unsigned integers. In fact, you almost always want size_t.
If you're writing a reverse for loop with an index as a the control variable, you should probably use signed integers, for obvious reasons. Probably ptrdiff_t would do.
The one thing to be careful with is when casting between signed and unsigned values of different sizes.
You probably want to double-check (or triple-check) to make sure the cast is working the way you expect.
int is the general purpose integer type. If you need an integer, and int meets your requirements (range [-32767,32767]), then use it.
If you have more specialized purposes, then you can choose something else. If you need an index into an array, then use size_t. If you need an index into a vector, then use std::vector<T>::size_type. If you need specific sizes, then pick something from <cstdint>. If you need something larger than 64 bits, then find a library like gmp.
I can't think of any good reasons to use unsigned int. At least, not directly (size_t and some of the specifically sized types from <cstdint> may be typedefs of unsigned int).
The problem with the systematic use of unsigned when values can't be negative isn't that Java doesn't have unsigned, it is that expressions with unsigned values, especially when mixed with signed one, give sometimes confusing results if you think about unsigned as an integer type with a shifted range. Unsigned is a modular type, not a restriction of integers to positive or zero.
Thus the traditional view is that unsigned should be used when you need a modular type or for bitwise manipulation. That view is implicit in K&R — look how int and unsigned are used —, and more explicit in TC++PL (2nd edition, p. 50):
The unsigned integer types are ideal for uses that treat storage as a bit array. Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules.
In almost all architectures the cost of signed operation and unsigned operation is the same. So efficiency wise you wont get any advantage for using unsigned over signed. But as you pointed out, if you use unsigned you will have a bigger range
Even if you have variables that should only take non negative values unsigned can be a problem. Here is an example. Suppose a programmer is asked to write a code to print all pairs of integer numbers (a,b) with 0 <= a < b <= n where n is a given input. An incorrect code is
for (unsigned b = 0; b <= n; b++)
for (unsigned a=0; a <=b-1; b++)
cout << a << ',' << b << n ;
This is easy to correct, but thinking with unsigned is a bit less natural than thinking with int.

Is it a best practice to use unsigned data types to enforce non-negative and/or valid values?

Recently, during a refactoring session, I was looking over some code I wrote and noticed several things:
I had functions that used unsigned char to enforce values in the interval [0-255].
Other functions used int or long data types with if statements inside the functions to silently clamp the values to valid ranges.
Values contained in classes and/or declared as arguments to functions that had an unknown upper bound but a known and definite non-negative lower bound were declared as an unsigned data type (int or long depending on the possibility that the upper bound went above 4,000,000,000).
The inconsistency is unnerving. Is this a good practice that I should continue? Should I rethink the logic and stick to using int or long with appropriate non-notifying clamping?
A note on the use of "appropriate": There are cases where I use signed data types and throw notifying exceptions when the values go out of range but these are reserved for divde by zero and constructors.
In C and C++, signed and unsigned integer types have certain specific characteristics.
Signed types have bounds far from zero, and operations that exceed those bounds have undefined behavior (or implementation-defined in the case of conversions).
Unsigned types have a lower bound of zero and an upper bound far from zero, and operations that exceed those bounds quietly wrap around.
Often what you really want is a particular range of values with some particular behavior when operations exceed those bounds (saturation, signaling an error, etc.). Neither signed nor unsigned types are entirely suitable for such requirements. And operations that mix signed and unsigned types can be confusing; the rules for such operations are defined by the language, but they're not always obvious.
Unsigned types can be problematic because the lower bound is zero, so operations with reasonable values (nowhere near the upper bound) can behave in unexpected ways. For example, this:
for (unsigned int u = 10; u >= 0; u --) {
// ...
}
is an infinite loop.
One approach is to use signed types for everything that doesn't absolutely require an unsigned representation, choosing a type wide enough to hold the values you need. This avoids problems with signed/unsigned mixed operations. Java, for example, enforces this approach by not having unsigned types at all. (Personally, I think that decision was overkill, but I can see the advantages of it.)
Another approach is to use unsigned types for values that logically cannot be negative, and be very careful with expressions that might underflow or that mix signed and unsigned types.
(Yet another is to define your own types with exactly the behavior you want, but that has costs.)
As John Sallay's answer says, consistency is probably more important than which particular approach you take.
I wish I could give a "this way is right, that way is wrong" answer, but there really isn't one.
The biggest benefit from unsigned is that it documents your code that the values are always positive.
It doesn't really buy you any safety as going outside the range of an unsigned is usually unintentional and can cause just as much frustration as if it were signed.
I had functions that used unsigned char to enforce values in the interval [0-255].
If you're relying on the wraparound then use uint8_t as unsigned char could possibly be more than 8 bits.
Other functions used int or long data types with if statements inside the functions to silently clamp the values to valid ranges.
Is this really the correct behavior?
Values contained in classes and/or declared as arguments to functions that had an unknown upper bound but a known and definite non-negative lower bound were declared as an unsigned data type (int or long depending on the possibility that the upper bound went above 4,000,000,000).
Where did you get an upper bound of 4,000,000,000 from? Your bound is between INT_MAX and INT_MIN (you can also use std::numeric_limits. In C++11 you can use decltype to specify the type which you can wrap into a template/macro:
decltype(4000000000) x; // x can hold at least 4000000000
I would probably argue that consistency is most important. If you pick one way and do it right then it will be easy for someone else to understand what you are doing at a later point in time. On the note of doing it right, there are several issues to think about.
First, it is common when checking if an integer variable n is in a valid range, say 0 to N to write:
if ( n > 0 && n <= N ) ...
This comparison only makes sense if n is signed. If n is unsigned then it will never be less than 0 since negative values will wrap around. You could rewrite the above if as just:
if ( n <= N ) ...
If someone isn't used to seeing this, they might be confused and think you did it wrong.
Second, I would keep in mind that there is no guarantee of type size for integers in c++. Thus, if you want something to be bounded by 255, an unsigned char may not do the trick. If the variable has a specific meaning then it may be valuable to to a typedef to show that. For example, size_t is a value as wide as a memory address. Which means that you can use it with arrays and not have to worry about being on 32 or 64 bit machines. I try to use such typedefs whenever possible because they clearly communicate why I am using the type. (size_t because I'm accessing an array.)
Third, is back on the issue of wrap around. What do you want to happen with an invalid number. In the case of an unsigned char, if you use the type to bound the data, then you won't be able to check if a value over 255 was entered. That may or may not be a problem.
This is a subjective issue but I'll give you my take.
Personally if there isn't type designated to the operation I am trying to carray out, IE std::size_t for sizes and index, uintXX_t for specific bit depths etc... then I default to unsigned unless I need to use negative values.
So it isn't a case of using it to enforce positive values, but rather I have to select signed feature explicitly.
As well as this I if you are worried about boundaries then you need to do your own bounds checking to ensure that you aren't overflowing.
But I said, more often then not your datatype will be decided by your context with the return type of the functions you apply it to.