How to deal with “signed/unsigned mismatch” warnings (C4018, no loop)? - c++

There are similar questions here but they related to particular case of usage in loop indexes. This one is more generic case. How to deal with this warning if there is no loops?
How to deal with the warning in this simplified case to outline the problem?
int(-3) >= size_t(31)

It is a case-by-case basis, and you just have to use your knowledge of the problem you're solving to make that call. The goal is to avoid logic errors at runtime, not to stamp out warning messages.
Casting one or the other value to be the same as the other removes the warning, but won't prevent runtime problems due to logic errors.
e.g. if you had a large unsigned value and cast that to be signed, then it flips negative, which would mess up comparisions. Conversely, flipping -3 to unsigned will make it into a very large positive value, which would mess up the comparison you're trying. Sure, explicitly casting the values could avoid the messages but those messages are warning you about possible unexpected behaviour from your program, which you need to consider carefully regarding the possible & likely values that these variables can take.

Cast at least one of the operands so that they are both of the same signedness.
Be careful when casting unsigned to signed: if the value is too large, you will get implementation defined behaviour.
Be careful when casting signed to unsigned - the behaviour when the original value is negative is precisely defined, but it may be surprising. If the expression is rewritten as size_t(-3) >= size_t(31) then it is always true.
Note that the cast to int in the example is pointless - the literal 3 must be of type int, and applying unary - to that will give an int result.

To get answer which value is bigger use signed arbitrary arithmetic types like boost::cpp_int
auto v1 = int(-3);
auto v2 = size_t(31);
boost::cpp_int(v1) >= boost::cpp_int(v2);
If you have access to 80bit MMX registers then consider using them to store both values as signed and compare.

Related

Why does Qt implement QFile::size() which returns a qint64 rather than quint64 [duplicate]

The question is clear.
I wonder why they even thought this would be handy, as clearly negative indices are unusable in the containers that would be used with them (see for example QList's docs).
I thought they wanted to allow that for some crazy form of indexing, but it seems unsupported?
It also generates a ton of (correct) compiler warnings about casting to and comparing of signed/unsigned types (on MSVC).
It just seems incompatible with the STL by design for some reason...
Although I am deeply sympathetic to Chris's line of reasoning, I will disagree here (at least in part, I am playing devil's advocate). There is nothing wrong with using unsigned types for sizes, and it can even be beneficial in some circumstances.
Chris's justification for signed size types is that they are naturally used as array indices, and you may want to do arithmetic on array indices, and that arithmetic may create temporary values that are negative.
That's fine, and unsigned arithmetic introduces no problem in doing so, as long as you make sure to interpret your values correctly when you do comparisons. Because the overflow behavior of unsigned integers is fully specified, temporary overflows into the negative range (or into huge positive numbers) do not introduce any error as long as they are corrected before a comparison is performed.
Sometimes, the overflow behavior is even desirable, as the overflow behavior of unsigned arithmetic makes certain range checks expressible as a single comparison that would require two comparisons otherwise. If I want to check if x is in the range [a,b] and all the values are unsigned, I can simply do:
if (x - a < b - a) {
}
That doesn't work with signed variables; such range checks are pretty common with sizes and array offsets.
I mentioned before that a benefit is that overflow arithmetic has defined results. If your index arithmetic overflows a signed type, the behavior is implementation defined; there is no way to make your program portable. Use an unsigned type and this problem goes away. Admittedly this only applies to huge offsets, but it is a concern for some uses.
Basically, the objections to unsigned types are frequently overstated. The real problem is that most programmers don't really think about the exact semantics of the code they write, and for small integer values, signed types behave more nearly in line with their intuition. However, data sizes grow pretty fast. When we deal with buffers or databases, we're frequently way outside of the range of "small", and signed overflow is far more problematic to handle correctly than is unsigned overflow. The solution is not "don't use unsigned types", it is "think carefully about the code you are writing, and make sure you understand it".
Because, realistically, you usually want to perform arithmetic on indices, which means that you might want to create temporaries that are negative.
This is clearly painful when the underlying indexing type is unsigned.
The only appropriate time to use unsigned numbers is with modulus arithmetic.
Using "unsgined" as some kind of contract specifier "a number in the range [0..." is just clumsy, and too coarse to be useful.
Consider: What type should I use to represent the idea that the number should be a positive integer between 1 and 10? Why is 0...2^x a more special range?

In C/C++, what's the minimum type up-casting required for mixed-type integer math?

I have code that depends on data that is a mixture of uint16_t, int32_t / uint32_t and int64_t values. It also includes some larger bit shifted constants (e.g., 1<<23, even 1<<33).
In calculation of a int64_t value, if I carefully cast each sub-part (e.g., up-casting uint16_t values to int64_t) it works - if I don't, the calculations often go awry.
I end up with code that looks like this:
int64_t sensDT = (int64_t)sensD2-(int64_t)promV[PROM_C5]*(int64_t)(1<<8);
temperatureC = (double)((2000+sensDT*(int64_t)promV[PROM_C6]/(1<<23))/100.0);
I wonder, though, if my sprinkling of type casts here is too cluttered and too generous. I'm not sure the 1<<8 requires the cast (while despite not having one, 1<<23 doesn't lead to erroneous calculations) but perhaps they do too. How much is too much when it comes to up-casting values for a calculation like this?
Edit: So it's clear, I'm asking what the minimum proper amount of casting is - what's necessary for correct functionality (one can add more casts or modifiers for clarity, but from the compiler's perspective what's necessary to ensure correct calculations?)
Edit2: I'm using C-style casts as this is from an Arduino-type embedded code base (which itself used that style of casts already). From the perspective of having the desired effect they appear to be equivalent, thus I used the existing coding style.
Generally you can rely on the integer promotions to give you the correct operation, as long as one of the operands for each operator have the correct size. So your first example could be simplified:
int64_t sensDT = sensD2-(int64_t)promV[PROM_C5]*(1<<8);
Be careful to consider the precedence rules to know what order the operators will be applied!
You might run into trouble if you're mixing signed and unsigned types of the same size, although either should promote to a larger signed type.
You need to be careful with constants, because without any decoration those will be the default integer size and signed. 1<<8 won't be a problem, but 1<<35 probably will; you need 1LL<<35.
When in doubt, a few extra casts or parentheses won't hurt.

Is this an unavoidable signed and unsigned integer comparison?

Probably not, but I can't think of a good solution. I'm no expert in C++ yet.
Recently I've converted a lot of ints to unsigned ints in a project. Basically everything that should never be negative is made unsigned. This removed a lot of these warnings by MinGW:
warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
I love it. It makes the program more robust and the code more descriptive. However, there is one place where they still occur. It looks like this:
unsigned int subroutine_point_size = settings->get<unsigned int>("subroutine_point_size");
...
for(int dx = -subroutine_point_size;dx <= subroutine_point_size;dx++) //Fill pixels in the point's radius.
{
for(int dy = -subroutine_point_size;dy <= subroutine_point_size;dy++)
{
//Do something with dx and dy here.
}
}
In this case I can't make dx and dy unsigned. They start out negative and depend on comparing which is lesser or greater.
I don't like to make subroutine_point_size signed either, though this is the lesser evil. It indicates a size of a kernel in a pass over an image, and the kernel size can't be negative (it's probably unwise for a user ever to set this kernel size to anything more than 100 but the settings file allows for numbers up to 2^32 - 1).
So it seems there is no way to cast any of the variables to fix this. Is there a way to get rid of this warning and solve this neatly?
We're using C++11, compiling with GCC for Windows, Mac and various Unix distributions.
Cast the variables to a long int or long long int type giving at the same time the range of unsigned int (0..2^32-1) and sign.
You're making a big mistake.
Basically you like the name "unsigned" and you intend it to mean "not negative" but this is not what is the semantic associated to the type.
Consider the statement:
adding a signed integer and an unsigned integer the result is unsigned
Clearly it makes no sense if you consider the term "unsigned" as "not negative", yet this is what the language does: adding -3 to the unsigned value 2 you will get a huge nonsense number instead of the correct answer -1.
Indeed the choice of using an unsigned type for the size of containers is a design mistake of C++, a mistake that is too late to fix now because of backward compatibility. By the way the reason it happened has nothing to do with "non-negativeness", but just with the ability to use the 16th bit when computers were that small (i.e. being able to use 65535 elements instead of 32767). Even back then I don't think the price of wrong semantic was worth the gain (if 32767 is not enough now then 65535 won't be enough quite soon anyway).
Do not repeat the same mistake in your programs... the name is irrelevant, what counts is the semantic and for unsigned in C and C++ it is "member of the Zn modulo ring with n=2k ".
You don't want the size of a container to be the member of a modulo ring. Do you?
Instead of the current
for(int dx = -subroutine_point_size;dx <= subroutine_point_size;dx++) //Fill pixels in the point's radius.
you can do this:
for(int dx = -int(subroutine_point_size);dx <= int(subroutine_point_size);dx++) //Fill pixels in the point's radius.
where the first int cast is (1)technically redundant, but is there for consistency, because the second cast removes the signed/unsigned warning that presumably is the issue here.
However, I strongly advise you to undo the work of converting signed to unsigned types everywhere. A good rule of thumb is to use signed types for numbers, and unsigned types for bit level stuff. That avoids the problems with wrap-around due to implicit conversions, where e.g. std:.string("Bah").length() < -5 is guaranteed (very silly), and because it does away with actual problems, it also reduces spurious warnings.
Note that you can just define a suitable name, where you want to indicate that some value will never be negative.
1) Technically redundant in practice, for two's complement representation of signed integers, with no trapping inserted by the compiler. As far as I know no extant C++ compiler behaves otherwise.
Firstly, without knowing the range of values that will be stored in the variables, your claim that changing signed to unsigned variables is unsubstantiated - there are circumstances where that claim is false.
Second, the compiler is not issuing a warning only as a result of changing variables (and I assume calls of template functions like settings.get()) to be unsigned. It is warning about the fact you have expressions involving both signed and unsigned variables. Compilers typically issue warnings about such expressions because - in practice - they are more likely to indicate a programming error or to potentially involve some behaviour that the programmer may not have anticipated (e.g. instances of undefined behaviour, expressions where a negative result is expected but a large positive result is what will occur, etc).
A rule of thumb is that, if you need to have expressions involving both signed and unsigned types, you are better off making all the relevant variables signed. While there are exceptions where that rule of thumb isn't needed, you wouldn't have asked this question if you understood how to decide that.
On that basis, I suggest the most appropriate action is to unwind your changes.

Is the using `int` is more preferably than the using of `unsigned int`? [duplicate]

Should one ever declare a variable as an unsigned int if they don't require the extra range of values? For example, when declaring the variable in a for loop, if you know it's not going to be negative, does it matter? Is one faster than the other? Is it bad to declare an unsigned int just as unsigned in C++?
To reitterate, should it be done even if the extra range is not required? I heard they should be avoided because they cause confusion (IIRC that's why Java doesn't have them).
The reason to use uints is that it gives the compiler a wider variety of optimizations. For example, it may replace an instance of 'abs(x)' with 'x' if it knows that x is positive. It also opens up a variety of bitwise 'strength reductions' that only work for positive numbers. If you always mult/divide an int by a power of two, then the compiler may replace the operation with a bit shift (ie x*8 == x<<3) which tends to perform much faster. Unfortunately, this relation only holds if 'x' is positive because negative numbers are encoded in a way that precludes this. With ints, the compiler may apply this trick if it can prove that the value is always positive (or can be modified earlier in the code to be so). In the case of uints, this attribute is trivial to prove, which greatly increases the odds of it being applied.
Another example might be the equation y = 16 * x + 12. If x can be negative, then a multiply and add would be required. Yet if x is always positive, then not only can the x*16 term be replaced with x<<4, but since the term would always end with four zeros this opens up replacing the '+ 12' with a binary OR (as long as the '12' term is less than 16). The result would be y = (x<<4) | 12.
In general, the 'unsigned' qualifier gives the compiler more information about the variable, which in turn allows it to squeeze in more optimizations.
You should use unsigned integers when it doesn't make sense for them to have negative values. This is completely independent of the range issue. So yes, you should use unsigned integer types even if the extra range is not required, and no, you shouldn't use unsigned ints (or anything else) if not necessary, but you need to revise your definition of what is necessary.
More often than not, you should use unsigned integers.
They are more predictable in terms of undefined behavior on overflow and such.
This is a huge subject of its own, so I won't say much more about it.
It's a very good reason to avoid signed integers unless you actually need signed values.
Also, they are easier to work with when range-checking -- you don't have to check for negative values.
Typical rules of thumb:
If you are writing a forward for loop with an index as the control variable, you almost always want unsigned integers. In fact, you almost always want size_t.
If you're writing a reverse for loop with an index as a the control variable, you should probably use signed integers, for obvious reasons. Probably ptrdiff_t would do.
The one thing to be careful with is when casting between signed and unsigned values of different sizes.
You probably want to double-check (or triple-check) to make sure the cast is working the way you expect.
int is the general purpose integer type. If you need an integer, and int meets your requirements (range [-32767,32767]), then use it.
If you have more specialized purposes, then you can choose something else. If you need an index into an array, then use size_t. If you need an index into a vector, then use std::vector<T>::size_type. If you need specific sizes, then pick something from <cstdint>. If you need something larger than 64 bits, then find a library like gmp.
I can't think of any good reasons to use unsigned int. At least, not directly (size_t and some of the specifically sized types from <cstdint> may be typedefs of unsigned int).
The problem with the systematic use of unsigned when values can't be negative isn't that Java doesn't have unsigned, it is that expressions with unsigned values, especially when mixed with signed one, give sometimes confusing results if you think about unsigned as an integer type with a shifted range. Unsigned is a modular type, not a restriction of integers to positive or zero.
Thus the traditional view is that unsigned should be used when you need a modular type or for bitwise manipulation. That view is implicit in K&R — look how int and unsigned are used —, and more explicit in TC++PL (2nd edition, p. 50):
The unsigned integer types are ideal for uses that treat storage as a bit array. Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules.
In almost all architectures the cost of signed operation and unsigned operation is the same. So efficiency wise you wont get any advantage for using unsigned over signed. But as you pointed out, if you use unsigned you will have a bigger range
Even if you have variables that should only take non negative values unsigned can be a problem. Here is an example. Suppose a programmer is asked to write a code to print all pairs of integer numbers (a,b) with 0 <= a < b <= n where n is a given input. An incorrect code is
for (unsigned b = 0; b <= n; b++)
for (unsigned a=0; a <=b-1; b++)
cout << a << ',' << b << n ;
This is easy to correct, but thinking with unsigned is a bit less natural than thinking with int.

Is it a best practice to use unsigned data types to enforce non-negative and/or valid values?

Recently, during a refactoring session, I was looking over some code I wrote and noticed several things:
I had functions that used unsigned char to enforce values in the interval [0-255].
Other functions used int or long data types with if statements inside the functions to silently clamp the values to valid ranges.
Values contained in classes and/or declared as arguments to functions that had an unknown upper bound but a known and definite non-negative lower bound were declared as an unsigned data type (int or long depending on the possibility that the upper bound went above 4,000,000,000).
The inconsistency is unnerving. Is this a good practice that I should continue? Should I rethink the logic and stick to using int or long with appropriate non-notifying clamping?
A note on the use of "appropriate": There are cases where I use signed data types and throw notifying exceptions when the values go out of range but these are reserved for divde by zero and constructors.
In C and C++, signed and unsigned integer types have certain specific characteristics.
Signed types have bounds far from zero, and operations that exceed those bounds have undefined behavior (or implementation-defined in the case of conversions).
Unsigned types have a lower bound of zero and an upper bound far from zero, and operations that exceed those bounds quietly wrap around.
Often what you really want is a particular range of values with some particular behavior when operations exceed those bounds (saturation, signaling an error, etc.). Neither signed nor unsigned types are entirely suitable for such requirements. And operations that mix signed and unsigned types can be confusing; the rules for such operations are defined by the language, but they're not always obvious.
Unsigned types can be problematic because the lower bound is zero, so operations with reasonable values (nowhere near the upper bound) can behave in unexpected ways. For example, this:
for (unsigned int u = 10; u >= 0; u --) {
// ...
}
is an infinite loop.
One approach is to use signed types for everything that doesn't absolutely require an unsigned representation, choosing a type wide enough to hold the values you need. This avoids problems with signed/unsigned mixed operations. Java, for example, enforces this approach by not having unsigned types at all. (Personally, I think that decision was overkill, but I can see the advantages of it.)
Another approach is to use unsigned types for values that logically cannot be negative, and be very careful with expressions that might underflow or that mix signed and unsigned types.
(Yet another is to define your own types with exactly the behavior you want, but that has costs.)
As John Sallay's answer says, consistency is probably more important than which particular approach you take.
I wish I could give a "this way is right, that way is wrong" answer, but there really isn't one.
The biggest benefit from unsigned is that it documents your code that the values are always positive.
It doesn't really buy you any safety as going outside the range of an unsigned is usually unintentional and can cause just as much frustration as if it were signed.
I had functions that used unsigned char to enforce values in the interval [0-255].
If you're relying on the wraparound then use uint8_t as unsigned char could possibly be more than 8 bits.
Other functions used int or long data types with if statements inside the functions to silently clamp the values to valid ranges.
Is this really the correct behavior?
Values contained in classes and/or declared as arguments to functions that had an unknown upper bound but a known and definite non-negative lower bound were declared as an unsigned data type (int or long depending on the possibility that the upper bound went above 4,000,000,000).
Where did you get an upper bound of 4,000,000,000 from? Your bound is between INT_MAX and INT_MIN (you can also use std::numeric_limits. In C++11 you can use decltype to specify the type which you can wrap into a template/macro:
decltype(4000000000) x; // x can hold at least 4000000000
I would probably argue that consistency is most important. If you pick one way and do it right then it will be easy for someone else to understand what you are doing at a later point in time. On the note of doing it right, there are several issues to think about.
First, it is common when checking if an integer variable n is in a valid range, say 0 to N to write:
if ( n > 0 && n <= N ) ...
This comparison only makes sense if n is signed. If n is unsigned then it will never be less than 0 since negative values will wrap around. You could rewrite the above if as just:
if ( n <= N ) ...
If someone isn't used to seeing this, they might be confused and think you did it wrong.
Second, I would keep in mind that there is no guarantee of type size for integers in c++. Thus, if you want something to be bounded by 255, an unsigned char may not do the trick. If the variable has a specific meaning then it may be valuable to to a typedef to show that. For example, size_t is a value as wide as a memory address. Which means that you can use it with arrays and not have to worry about being on 32 or 64 bit machines. I try to use such typedefs whenever possible because they clearly communicate why I am using the type. (size_t because I'm accessing an array.)
Third, is back on the issue of wrap around. What do you want to happen with an invalid number. In the case of an unsigned char, if you use the type to bound the data, then you won't be able to check if a value over 255 was entered. That may or may not be a problem.
This is a subjective issue but I'll give you my take.
Personally if there isn't type designated to the operation I am trying to carray out, IE std::size_t for sizes and index, uintXX_t for specific bit depths etc... then I default to unsigned unless I need to use negative values.
So it isn't a case of using it to enforce positive values, but rather I have to select signed feature explicitly.
As well as this I if you are worried about boundaries then you need to do your own bounds checking to ensure that you aren't overflowing.
But I said, more often then not your datatype will be decided by your context with the return type of the functions you apply it to.