Converting Integer Types - c++

How does one convert from one integer type to another safely and with setting off alarm bells in compilers and static analysis tools?
Different compilers will warn for something like:
int i = get_int();
size_t s = i;
for loss of signedness or
size_t s = get_size();
int i = s;
for narrowing.
casting can remove the warnings but don't solve the safety issue.
Is there a proper way of doing this?

You can try boost::numeric_cast<>.
boost numeric_cast returns the result of converting a value of type Source to a value of type Target. If out-of-range is detected, an exception is thrown (see bad_numeric_cast, negative_overflow and positive_overflow ).

How does one convert from one integer type to another safely and with setting off alarm bells in compilers and static analysis tools?
Control when conversion is needed. As able, only convert when there is no value change. Sometimes, then one must step back and code at a higher level. IOWs, was a lossy conversion needed or can code be re-worked to avoid conversion loss?
It is not hard to add an if(). The test just needs to be carefully formed.
Example where size_t n and int len need a compare. Note that positive values of int may exceed that of size_t - or visa-versa or the same. Note in this case, the conversion of int to unsigned only happens with non-negative values - thus no value change.
int len = snprintf(buf, n, ...);
if (len < 0 || (unsigned)len >= n) {
// Handle_error();
}
unsigned to int example when it is known that the unsigned value at this point of code is less than or equal to INT_MAX.
unsigned n = ...
int i = n & INT_MAX;
Good analysis tools see that n & INT_MAX always converts into int without loss.

There is no built-in safe narrowing conversion between int types in c++ and STL. You could implement it yourself using as an example Microsoft GSL.

Theoretically, if you want perfect safety, you shouldn't be mixing types like this at all. (And you definitely shouldn't be using explicit casts to silence warnings, as you know.) If you've got values of type size_t, it's best to always carry them around in variables of type size_t.
There is one case where I do sometimes decide I can accept less than 100.000% perfect type safety, and that is when I assign sizeof's return value, which is a size_t, to an int. For any machine I am ever going to use, the only time this conversion might lose information is when sizeof returns a value greater than 2147483647. But I am content to assume that no single object in any of my programs will ever be that big. (In particular, I will unhesitatingly write things like printf("sizeof(int) = %d\n", (int)sizeof(int)), explicit cast and all. There is no possible way that the size of a type like int will not fit in an int!)
[Footnote: Yes, it's true, on a 16-bit machine the assumption is the rather less satisfying threshold that sizeof won't return a value greater than 32767. It's more likely that a single object might have a size like that, but probably not in a program that's running on a 16-bitter.]

Related

how does the short(vector.size()) command conversion work in C++?

I don't know any other way to return the size of a vector other than the .size() command, and it works very well, but, it return a variable of type long long unsigned int, and this in very cases are very good, but I'm sure my program will never have a vector so big that it need all that size of return, short int is more than enough.
I know, for today's computers those few enused bytes are irrelevant, but I don't like to leave these "loose ends" even if they are small, and whem I was programming, I came across some details that bothered me.
Look at these examples:
for(short int X = 0 ; X < Vector.size() ; X++){
}
compiling this, I receive this warning:
warning: comparison of integer expressions of different signedness: 'short int' and 'std::vector<unsigned char>::size_type' {aka 'long long unsigned int'} [-Wsign-compare]|
this is because the .size() return value type is different from the short int I'm compiling, "X" is a short int, and Vector.size() return a long long unsigned int, was expected, so if I do this:
for(size_t X = 0 ; X < Vector.size() ; X++){
}
the problem is gone, but by doing this, I'm creating a long long unsigned int in variable size_t and I'm returning another variable long long unsigned int, so, my computer allocale two variables long long unsigned int, so, what I do for returning a simple short int? I don't need anything more than this, long long unsigned int is overkill, so I did this:
for(short int X = 0 ; X < short(Vector.size()) ; X++){
}
but... how is this working? short int X = 0 is allocating a short int, nothing new, but what about short (Vector.size()), is the computer allocating a long unsigned int and converting it to a short int? or is the compiler "changing" the return of the .size() function by making it naturally return a short int and, in this case, not allocating a long long unsined int? because I know the compilers are responsible for optimizing the code too, is there any "problem" or "detail" when using this method? since I rarely see anyone using this, what exactly is this short() doing in memory allocation? where can i read more about it?
(thanks to everyone who responded)
Forget for a moment that this involves a for loop; that's important for the underlying code, but it's a distraction from what's going on with the conversion.
short X = Vector.size();
That line calls Vector.size(), which returns a value of type std::size_t. std::size_t is an unsigned type, large enough to hold the size of any object. So it could be unsigned long, or it could be unsigned long long. In any event, it's definitely not short. So the compiler has to convert that value to short, and that's what it does.
Most compilers these days don't trust you to understand what this actually does, so they warn you. (Yes, I'm rather opinionated about compilers that nag; that doesn't change the analysis here). So if you want to see that warning (i.e., you don't turn it off), you'll see it. If you want to write code that doesn't generate that warning, then you have to change the code to say "yes, I know, and I really mean it". You do that with a cast:
short X = short(Vector.size());
The cast tells the compiler to call Vector.size() and convert the resulting value to short. The code then assigns the result of that conversion to X. So, more briefly, in this case it tells the compiler that you want it to do exactly what it would have done without the cast. The difference is that because you wrote a cast, the compiler won't warn you that you might not know what you're doing.
Some folks prefer to write that cast is with a static_cast:
short X = static_cast<short>(Vector.size());
That does the same thing: it tells the compiler to do the conversion to short and, again, the compiler won't warn you that you did it.
In the original for loop, a different conversion occurs:
X < Vector.size()
That bit of code calls Vector.size(), which still returns an unsigned type. In order to compare that value with X, the two sides of the < have to have the same type, and the rules for this kind of expression require that X gets promoted to std::size_t, i.e., that the value of X gets treated as an unsigned type. That's okay as long as the value isn't negative. If it's negative, the conversion to the unsigned type is okay, but it will produce results that probably aren't what was intended. Since we know that X is not negative here, the code works perfectly well.
But we're still in the territory of compiler nags: since X is signed, the compiler warns you that promoting it to an unsigned type might do something that you don't expect. Again, you know that that won't happen, but the compiler doesn't trust you. So you have to insist that you know what you're doing, and again, you do that with a cast:
X < short(Vector.size())
Just like before, that cast converts the result of calling Vector.size() to short. Now both sides of the < are the same type, so the < operation doesn't require a conversion from a signed to an unsigned type, so the compiler has nothing to complain about. There is still a conversion, because the rules say that values of type short get promoted to int in this expression, but don't worry about that for now.
Another possibility is to use an unsigned type for that loop index:
for (unsigned short X = 0; X < Vector.size(); ++X)
But the compiler might still insist on warning you that not all values of type std::size_t can fit in an unsigned short. So, again, you might need a cast. Or change the type of the index to match what the compiler think you need:
for (std::size_t X = 0; X < Vector.size(); ++X_
If I were to go this route, I would use unsigned int and if the compiler insisted on telling me that I don't know what I'm doing I'd yell at the compiler (which usually isn't helpful) and then I'd turn off that warning. There's really no point in using short here, because the loop index will always be converted to int (or unsigned int) wherever it's used. It will probably be in a register, so there is no space actually saved by storing it as a short.
Even better, as recommended in other answers, is to use a range-base for loop, which avoids managing that index:
for (auto& value: Vector) ...
In all cases, X has a storage duration of automatic, and the result of Vector.size() does not outlive the full expression where it is created.
I don't need anything more than this, long long unsigned int is overkill
Typically, automatic duration variables are "allocated" either on the stack, or as registers. In either case, there is no performance benefit to decreasing the allocation size, and there can be a performance penalty in narrowing and then widening values.
In the very common case where you are using X solely to index into Vector, you should strongly consider using a different kind of for:
for (auto & value : Vector) {
// replace Vector[X] with value in your loop body
}

What is the real advantage of using unsigned variables in C++? [duplicate]

This question already has answers here:
When to use unsigned values over signed ones?
(5 answers)
Closed 6 years ago.
So I understand that unsigned variables can only hold positive values and signed variables can hold negative and positive. However, it is unclear to me why would someone use unsigned variables? Isn't that risky? I mean I would personally just stick with signed variables just in case. Is there a memory/performance advantage in using unsigned variables?
Selecting the right kind of primitive data type for your particular problem is all about being correct about showing your intent. While, for example, a size of an array might as well be stored in a signed type (like it is the case in Java or C#), but why should you? An array cannot be of negative size. Doing so anyway is only confusing to readers of your program and will not give you any benefit.
There is no measurable performance gain by not using unsigned values; actually it can even be dangerous to do so since unsigned values can hold bigger positive numbers than signed values of the same memory size, thus risking a narrowing conversion when assigning, for example, array sizes that are naturally unsigned to a signed value of same memory size:
// While unlikely, truncation can happen
int64_t x = sizeof(...);
// ~~~~^~~~~~~ uint64_t on my system
Those bugs are generally hard to track, but compilers have gotten better at warning you about committing them.
Naturally, you should be aware that using unsigned integers can indeed be dangerous in some cases. As an example I wrote a simple for loop. We do not expect the value i to be negative, so we do the seemingly correct decision to use a value of unsigned type:
for(unsigned i = 5; i >= 0; --i)
{
}
But in this case, the loop will never terminate since the unsigned value of 0 - 1 (happens in fifth iteration here) will be a big positive value (this is called wrap around), thus defeating the loop termination check.
This can, for example, be solved like this:
for(unsigned i = 5; (i+1) > 0; --i)
{
}
But this should not deter you from using the right data type. Just exercise caution about things like value ranges and wrap around and you will be fine.
In conclusion, use the type that is most appropriate and seems to show your intent the best.
Unsigned is more appropriate if your value is actually a bit-field, and if you do bit manipulations.
Signed behaviour is by default undefined if an operation causes an overflow.
The ranges of signed and unsigned numbers are different.
For example, on a 32 bit system the range of a signed integer would be between -2G and 2G - 1 (i.e. -2147483648 to 2147483647).

Should I use a bit mask when truncating uint64_t to uint8_t[i]?

If I have a large int, say a uint64_t, and an array of uint8_t, e.g.:
uint64_t large = 12345678901234567890;
uint8_t small[5];
and I want to copy the 8 least significant bits of the uint64_t into an element of the array of uint8_t, is it safe to just use:
small[3] = large;
or should I use a bit-mask:
small[3] = large & 255;
i.e. Is there any situation where the rest of the large int may somehow overflow into the other elements of the array?
It will most certainly not cause data to be processed incorrectly. However, some compilers may generate a warning message.
There are two options to avoid these.
You can cast your variable:
(uint8_t)large
Or you can disable the warning:
#pragma warning(disable:4503)
I would suggest casting the variable, because hiding compiler warnings will potentially keep you from spotting actual problems and is therefore not best practice.
This is perfectly safe:
small[3] = large;
and such a conversion is explicitly described in [conv.integral]:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type).
That is, these four statements all are guaranteed to end up with the same value in small[3]:
small[3] = large;
small[3] = large % 256;
small[3] = large & 255;
small[3] = static_cast<uint8_t>(large);
there's no functional reason to do the % or & or cast yourself, though if you want to anyway I would be surprised if the compiler didn't generate the same code for all four (gcc and clang do).
The one difference would be if you compile with something like -Wconversion, which would cause this to issue a warning (which can sometimes be beneficial). In that case, you'll want to do the cast.

Detection of overflow

How to detect overflow of unsigned char variable in c++?
unsigned numbers always positive between 0 to 255.and obey the law 2^n(n = numbers of bit in type);if char 8 bit then unsigned char variables have values between 0 and 255 , while signed chars have values between -128 and 127.
unsigned char Test = 260;
Because 260 is an integer literal your compiler should emit a warning. How to handle that? Do not ignore compiler warnings (or use an alternative syntax to avoid automatic conversions or enable this warning as error). Also note that integer literals are always positive (unsigned): -1 is not an integer literal: it's i.l. 1 and unary operator -. For gcc I'd suggest to use -Wstrict-overflow=2 (or more, according to your code policies) and possibly enabling -Werror=strict-overflow. For MS VC++ you may enable warning C4307 with /we4307 and /W14307 if you keep warnings at level 1 (!!!) (you may also do it with #pragma warning directive).
How to detect overflow of unsigned char variable in c++?
At compile-time compiler warnings are your friends but at run-time?
There is not a portable way (like, for example, checked in C#) to do this and better technique depends on which type of operation you want to monitor. For a simple assignment (made with values known at run-time) you may write something like this:
int32_t bigNumber = 260;
uint8_t smallNumber = static_cast<uint8_t>(bigNumber);
if (static_cast<int32_t>(smallNumber) != bigNumber) {
// Overflow...
}
In alternative you may check before assigning:
int32_t bigNumber = 260;
if (bigNumber > UINT8_MAX) {
// Overflow
}
Note that you may also make compiler life easier writing (after assignment):
if (smallNumber != bigNumber) {
// Overflow
}
It works because automatic promotions will convert smallNumber to bigNumber type (unless you're performing a signed/unsigned comparison, in this case you should simply avoid this alternative).
If you need it often you may write a small helper function to perform this conversion. For some ideas and possible implementations, if you're using MS compiler, you may take a look to SafeInt family functions (note, however, that in this case assignment and casting won't throw).
You can use braces to initialize your value to force a compile-time error (assuming you use C++11 or later):
unsigned char Test{260};
Brace-initialization doesn't allow narrowing conversions.
Of course, that still wouldn't allow sticking the value 260 into an unsigned char but it would draw attention to the attempt. You'd need a bigger data type, e.g., unsigned short, to represent 260.

Will 'comparison between signed and unsigned integer expressions' ever actually result in errors?

Often an object I use will have (signed) int parameters (e.g. int iSize) which eventually store how large something should be. At the same time, I will often initialize them to -1 to signify that the object (etc) hasn't been setup / hasn't been filled / isn't ready for use.
I often end up with the warning comparison between signed and unsigned integer, when I do something like if( iSize >= someVector.size() ) { ... }.
Thus, I nominally don't want to be using an unsigned int. Are there any situations where this will lead to an error or unexpected behavior?
If not: what is the best way to handle this? If I use the compiler flag -Wno-sign-compare I could (hypothetically) miss a situation in which I should be using an unsigned int (or something like that). So should I just use a cast when comparing with an unsigned int--e.g. if( iSize >= (int)someVector.size() ) { ... } ?
Yes, there are, and very subtle ones. If you are curious, you can check this interesting presentation by Stephan T. Lavavej about arithmetic conversion and a bug in Microsoft's implementation of STL which was caused just by signed vs unsigned comparison.
In general, the problem is due to the fact that because of complement 2 arithmetic, a very small negative integral value has the same bit representation as a very big unsigned integral value (e.g. -1 = 0xFFFF = 65535).
In the specific case of checking size(), why not using type size_t for iSize in the first place? Unsigned values just give you greater expressivity, use it.
And if you do not want to declare iSize as size_t, just make it clear by using an explicit cast that you are aware of the nature of this comparison. The compiler is trying to do you a favor with those warnings and, as you correctly wrote, there might be situations where ignoring them would cause you a very bad headache.
Thus, if iSize is sometimes negative (and should be evaluated as less than all unsigned int values of size()), use the idiom: if ((iSize < 0) || ((unsigned)iSize < somevector.size())) ...