How can I safely convert `unsigned long int` to `int`? - c++

I have an app which is creating unique ids in the form of unsigned long ints. The app needs this precision.
However, I have to send these ids in a protocol that only allows for ints. The receiving application – of the protocol – does not need this precision. So my questions is: how can I convert an unsigned long int to an int, especially when the unsigned long int is larger than an int?
edit:
The protocol only supports int. I would be good to know how to avoid "roll-over problems"
The application sending the message needs to know the uniqueness for a long period of time, whereas the receiver needs to know the uniqueness only over a short period of time.

Here's one possible approach:
#include <climits>
unsigned long int uid = ...;
int abbreviated_uid = uid & INT_MAX;
If int is 32 bits, for example, this discards all but the low-order 31 bits of the UID. It will only yield non-negative values.
This loses information from the original uid, but you indicated that that's not a problem.
But your question is vague enough that it's hard to tell whether this will suit your purposes.

Boost has numeric_cast:
unsigned long l = ...;
int i = boost::numeric_cast<int>(l);
This will throw an exception if the conversion would overflow, which may or may not be what you want.

Keith Thompson's "& INT_MAX" is only necessary if you need to ensure that abbreviated_uid is non-negative. If that's not an issue, and you can tolerate negative IDs, then a simple cast (C-style or static_cast()) should suffice, with the benefit that if sizeof(unsigned long int)==sizeof(int), then the binary representation will be the same on both ends (and if you cast it back to unsigned long int on the receiving end it will be the same value as on the sending end).
Does the receiver send responses back to the sender regarding the IDs, and does the original sender (now the receiver of the response) need to match this up with the original unsigned long int ID? If so, you'll need some additional logic to match up the response with the original ID. If so, post an edit indicating such requirement and I (or others) can suggest ways of addressing that issue. One possible solution to that issue would be to break up the ID into multiple int pieces and reconstruct it into the exact same unsigned long int value on the other end. If you need help with that, I or someone else can help with that.

As you know, one cannot in theory safely convert an unsigned long int to an int in the general case. However, one can indeed do so in many practical cases of interest, in which the integer is not too large.
I would probably define and use this:
struct Exc_out_of_range {};
int make_int(const unsigned long int a) {
const int n = static_cast<int>(a);
const unsigned long int a2 = static_cast<unsigned long int>(n);
if (a2 != a) throw Exc_out_of_range();
return n;
}
An equivalent solution using the <limits> header naturally is possible, but I don't know that it is any better than the above. (If the code is in a time-critical loop and portability is not a factor, then you could code it in assembly, testing the bit or bits of interest directly, but except as an exercise in assembly language this would be a bother.)
Regarding performance, it is worth noting that -- unless your compiler is very old -- the throw imposes no runtime burden unless used.
#GManNickG adds the advice to inherit from std::exception. I personally don't have a strong feeling about this, but the advice is well founded and appreciated, and I see little reason not to follow it. You can read more about such inheritance here.

I came along this, since I had to have a solution for converting larger integer types to smaller types, even when potentially loosing information.
I came up with a pretty neat solution using templates:
template<typename Tout, typename Tin>
Tout toInt(Tin in)
{
Tout retVal = 0;
if (in > 0)
retVal = static_cast<Tout>(in & std::numeric_limits<Tout>::max());
else if (in < 0)
retVal = static_cast<Tout>(in | std::numeric_limits<Tout>::min());
return retVal;
}

You can try to use std::stringstream and atoi():
#include <sstream>
#include <stdlib.h>
unsigned long int a = ...;
std::stringstream ss;
ss << a;
std::string str = ss.str();
int i = atoi(str.c_str());

Related

Converting Integer Types

How does one convert from one integer type to another safely and with setting off alarm bells in compilers and static analysis tools?
Different compilers will warn for something like:
int i = get_int();
size_t s = i;
for loss of signedness or
size_t s = get_size();
int i = s;
for narrowing.
casting can remove the warnings but don't solve the safety issue.
Is there a proper way of doing this?
You can try boost::numeric_cast<>.
boost numeric_cast returns the result of converting a value of type Source to a value of type Target. If out-of-range is detected, an exception is thrown (see bad_numeric_cast, negative_overflow and positive_overflow ).
How does one convert from one integer type to another safely and with setting off alarm bells in compilers and static analysis tools?
Control when conversion is needed. As able, only convert when there is no value change. Sometimes, then one must step back and code at a higher level. IOWs, was a lossy conversion needed or can code be re-worked to avoid conversion loss?
It is not hard to add an if(). The test just needs to be carefully formed.
Example where size_t n and int len need a compare. Note that positive values of int may exceed that of size_t - or visa-versa or the same. Note in this case, the conversion of int to unsigned only happens with non-negative values - thus no value change.
int len = snprintf(buf, n, ...);
if (len < 0 || (unsigned)len >= n) {
// Handle_error();
}
unsigned to int example when it is known that the unsigned value at this point of code is less than or equal to INT_MAX.
unsigned n = ...
int i = n & INT_MAX;
Good analysis tools see that n & INT_MAX always converts into int without loss.
There is no built-in safe narrowing conversion between int types in c++ and STL. You could implement it yourself using as an example Microsoft GSL.
Theoretically, if you want perfect safety, you shouldn't be mixing types like this at all. (And you definitely shouldn't be using explicit casts to silence warnings, as you know.) If you've got values of type size_t, it's best to always carry them around in variables of type size_t.
There is one case where I do sometimes decide I can accept less than 100.000% perfect type safety, and that is when I assign sizeof's return value, which is a size_t, to an int. For any machine I am ever going to use, the only time this conversion might lose information is when sizeof returns a value greater than 2147483647. But I am content to assume that no single object in any of my programs will ever be that big. (In particular, I will unhesitatingly write things like printf("sizeof(int) = %d\n", (int)sizeof(int)), explicit cast and all. There is no possible way that the size of a type like int will not fit in an int!)
[Footnote: Yes, it's true, on a 16-bit machine the assumption is the rather less satisfying threshold that sizeof won't return a value greater than 32767. It's more likely that a single object might have a size like that, but probably not in a program that's running on a 16-bitter.]

Is there a way to initialize a char using bits?

I'm trying to represent the 52 cards in a deck of playing cards.
I need a total of 6 bits; 2 for the suit and 4 for the rank.
I thought I would use a char and have the first 2 bits be zero since I don't need them. The problem is I don't know if there's a way to initialize a char using bits.
For example, I'd like to do is:
char aceOfSpades = 00000000;
char queenOfHearts = 00011101;
I know once I've initialized char I can manipulate the bits but it would be easier if I could initialize it from the beginning as shown in my example. Thanks in advance!
Yes you can:
example,
char aceOfSpades = 0b00000000;
char queenOfHearts = 0b00011101;
The easier way, as Captain Oblivious said in comments, is to use a bit field
struct SixBits
{
unsigned int suit : 2;
unsigned int rank : 4;
};
int main()
{
struct SixBits card;
card.suit = 0; /* You need to specify what the values mean */
card.rank = 10;
}
You could try using various bit fiddling operations on a char, but that is more difficult to work with. There is also a potential problem that it is implementation-defined whether char is signed or unsigned - and, if it is signed, bitfiddling operations give undefined behaviour in some circumstances (e.g. if operating on a negative value).
Personally, I wouldn't bother with trying to pack everything into a char. I'd make the code comprehensible (e.g. use an enum to represent the sut, an int to represent rank) unless there is demonstrable need (e.g. trying to get the program to work on a machine with extremely limited memory - which is unlikely in practice with hardware less than 20 years old). Otherwise, all you are really achieving is code that is hard to maintain with few real-world advantages.

Converting string to int (C++)

I looked everywhere and can't find an answer to this specific question :(
I have a string date, which contains the date with all the special characters stripped away. (i.e : yyyymmddhhmm or 201212031204).
I'm trying to convert this string into an int to be able to sort them later. I tried atoi, did not work because the value is too high for the function. I tried streams, but it always returns -858993460 and I suspect this is because the string is too large too. I tried atol and atoll and they still dont give the right answer.
I'd rather not use boost since this is for a homework, I dont think i'd be allowed.
Am I out of options to convert a large string to an int ?
Thank you!
What i'd like to be able to do :
int dateToInt(string date)
{
date = date.substr(6,4) + date.substr(3,2) + date.substr(0,2) + date.substr(11,2) + date.substr(14,2);
int d;
d = atoi(date.c_str());
return d;
}
You get negative numbers because 201212031204 is too large to fit int. Consider using long longs
BTW, You may sort strings as well.
You're on the right track that the value is too large, but it's not just for those functions. It's too large for an int in general. ints only hold up to 32 bits, or a maximum value of 2147483647 (4294967295 if unsigned). A long long is guaranteed to be large enough for the numbers you're using. If you happen to be on a 64-bit system, a long will be too.
Now, if you use one of these larger integers, a stream should convert properly. Or, if you want to use a function to do it, have a look at atoll for a long long or atol for a long. (Although for better error checking, you should really consider strtoll or strtol.)
Completely alternatively, you could also use a time_t. They're integer types under the hood, so you can compare and sort them. And there's some nice functions for them in <ctime> (have a look at http://www.cplusplus.com/reference/ctime/).
typedef long long S64;
S64 dateToInt(char * s) {
S64 retval = 0;
while (*s) {
retval = retval * 10 + (*s - '0');
++s;
}
return retval;
}
Note that as has been stated, the numbers you're working with will not fit into 32 bits.

How to write an unsigned short int literal?

42 as unsigned int is well defined as "42U".
unsigned int foo = 42U; // yeah!
How can I write "23" so that it is clear it is an unsigned short int?
unsigned short bar = 23; // booh! not clear!
EDIT so that the meaning of the question is more clear:
template <class T>
void doSomething(T) {
std::cout << "unknown type" << std::endl;
}
template<>
void doSomething(unsigned int) {
std::cout << "unsigned int" << std::endl;
}
template<>
void doSomething(unsigned short) {
std::cout << "unsigned short" << std::endl;
}
int main(int argc, char* argv[])
{
doSomething(42U);
doSomething((unsigned short)23); // no other option than a cast?
return EXIT_SUCCESS;
}
You can't. Numeric literals cannot have short or unsigned short type.
Of course in order to assign to bar, the value of the literal is implicitly converted to unsigned short. In your first sample code, you could make that conversion explicit with a cast, but I think it's pretty obvious already what conversion will take place. Casting is potentially worse, since with some compilers it will quell any warnings that would be issued if the literal value is outside the range of an unsigned short. Then again, if you want to use such a value for a good reason, then quelling the warnings is good.
In the example in your edit, where it happens to be a template function rather than an overloaded function, you do have an alternative to a cast: do_something<unsigned short>(23). With an overloaded function, you could still avoid a cast with:
void (*f)(unsigned short) = &do_something;
f(23);
... but I don't advise it. If nothing else, this only works if the unsigned short version actually exists, whereas a call with the cast performs the usual overload resolution to find the most compatible version available.
unsigned short bar = (unsigned short) 23;
or in new speak....
unsigned short bar = static_cast<unsigned short>(23);
at least in Visual Studio (at least 2013 and newer) you can write
23ui16
for get an constant of type unsigned short.
see definitions of INT8_MIN, INT8_MAX, INT16_MIN, INT16_MAX, etc. macros in stdint.h
I don't know at the moment whether this is part of the standard C/C++
There are no modifiers for unsigned short. Integers, which has int type by default, usually implicitly converted to target type with no problems. But if you really want to explicitly indicate type, you could write the following:
unsigned short bar = static_cast<unsigned short>(23);
As I can see the only reason is to use such indication for proper deducing template type:
func( static_cast<unsigned short>(23) );
But for such case more clear would be call like the following:
func<unsigned short>( 23 );
There are multiple answers here, none of which are terribly satisfying. So here is a compilation answer with some added info to help explain things a little more thoroughly.
First, avoid shorts as suggested, but if you find yourself needing them such as when working with indexed mesh data and simply switching to shorts for your index size cuts your index data size in half...then read on...
1 While it is technically true that there is no way to express an unsigned short literal in c or C++ you can easily side step this limitation by simply marking your literal as unsigned with a 'u'.
unsigned short myushort = 16u;
This works because it tells the compiler that 16 is unsigned int, then the compiler goes looking for a way to convert it to unsigned short, finds one, most compilers will then check for overflow, and do the conversion with no complaints. The "narrowing conversion" error/warning when the "u" is left out is the compiler complaining that the code is throwing away the sign. Such that if the literal is negative such as -1 then the result is undefined. Usually this means you will get a very large unsigned value that will then be truncated to fit the short.
2 There are multiple suggestions on how to side step this limitation, most seasoned programmers will sum these up with a "don't do that".
unsigned short myshort = (unsigned short)16;
unsigned short myothershort = static_cast<unsigned short>(16);
While both of these work they are undesirable for 2 major reasons. First they are wordy, programmers get lazy and typing all that just for a literal is easy to skip which leads to basic errors that could have been avoided with a better solution. Second they are not free, static_cast in particular generates a little assembly code to do the conversion, and while an optimizer may(or may not) figure out that it can do the conversion its better to just write good quality code from the start.
unsigned short myshort = 16ui16;
This solution is undesirable because it limits who can read your code and understand it, it also means you are starting down the slippery slope of compiler specific code which can lead to your code suddenly not working because of the whims of some compiler writer, or some company that randomly decides to "make a right hand turn", or goes away and leaves in you in the lurch.
unsigned short bar = L'\x17';
This is so unreadable that nobody has upvoted it. And unreadable should be avoided for many good reasons.
unsigned short bar = 0xf;
This to is unreadable. While being able to read understand and convert hex is something serious programmers really need to learn it is very unreadable quick what number is this: 0xbad; Now convert it to binary...now octal.
3 Lastly if you find all the above solutions undesirable I offer up yet another solution that is available via a user defined operator.
constexpr unsigned short operator ""_ushort(unsigned long long x)
{
return (unsigned short)x;
}
and to use it
unsigned short x = 16_ushort;
Admittedly this too isn't perfect. First it takes an unsigned long long and whacks it all the way down to an unsigned short suppressing potential compiler warnings along the way, and it uses the c style cast. But it is constexpr which gurantees it is free in an optimized program, yet can be stepped into during debug. It is also short and sweet so programmers are more likely to use it and it is expressive so it is easy to read and understand. Unfortunately it requires a recent compiler as what can legally be done with user defined operators has changed over the various version of C++.
So pick your trade off but be careful as you may regret them later. Happy Programming.
Unfortunately, the only method defined for this is
One or two characters in single quotes
('), preceded by the letter L
According to http://cpp.comsci.us/etymology/literals.html
Which means you would have to represent your number as an ASCII escape sequence:
unsigned short bar = L'\x17';
Unfortunately, they can't. But if people just look two words behind the number, they should clearly see it is a short... It's not THAT ambiguous. But it would be nice.
If you express the quantity as a 4-digit hex number, the unsigned shortness might be clearer.
unsigned short bar = 0x0017;
You probably shouldn't use short, unless you have a whole lot of them. It's intended to use less storage than an int, but that int will have the "natural size" for the architecture. Logically it follows that a short probably doesn't. Similar to bitfields, this means that shorts can be considered a space/time tradeoff. That's usually only worth it if it buys you a whole lot of space. You're unlikely to have very many literals in your application, though, so there was no need foreseen to have short literals. The usecases simply didn't overlap.
In C++11 and beyond, if you really want an unsigned short literal conversion then it can be done with a user defined literal:
using uint16 = unsigned short;
using uint64 = unsigned long long;
constexpr uint16 operator""_u16(uint64 to_short) {
// use your favorite value validation
assert(to_short < USHRT_MAX); // USHRT_MAX from limits.h
return static_cast<uint16>(to_short);
}
int main(void) {
uint16 val = 26_u16;
}

Why or why not should I use 'UL' to specify unsigned long?

ulong foo = 0;
ulong bar = 0UL;//this seems redundant and unnecessary. but I see it a lot.
I also see this in referencing the first element of arrays a good amount
blah = arr[0UL];//this seems silly since I don't expect the compiler to magically
//turn '0' into a signed value
Can someone provide some insight to why I need 'UL' throughout to specify specifically that this is an unsigned long?
void f(unsigned int x)
{
//
}
void f(int x)
{
//
}
...
f(3); // f(int x)
f(3u); // f(unsigned int x)
It is just another tool in C++; if you don't need it don't use it!
In the examples you provide it isn't needed. But suffixes are often used in expressions to prevent loss of precision. For example:
unsigned long x = 5UL * ...
You may get a different answer if you left off the UL suffix, say if your system had 16-bit ints and 32-bit longs.
Here is another example inspired by Richard Corden's comments:
unsigned long x = 1UL << 17;
Again, you'd get a different answer if you had 16 or 32-bit integers if you left the suffix off.
The same type of problem will apply with 32 vs 64-bit ints and mixing long and long long in expressions.
Some compiler may emit a warning I suppose.
The author could be doing this to make sure the code has no warnings?
Sorry, I realize this is a rather old question, but I use this a lot in c++11 code...
ul, d, f are all useful for initialising auto variables to your intended type, e.g.
auto my_u_long = 0ul;
auto my_float = 0f;
auto my_double = 0d;
Checkout the cpp reference on numeric literals: http://www.cplusplus.com/doc/tutorial/constants/
You don't normally need it, and any tolerable editor will have enough assistance to keep things straight. However, the places I use it in C# are (and you'll see these in C++):
Calling a generic method (template in C++), where the parameter types are implied and you want to make sure and call the one with an unsigned long type. This happens reasonably often, including this one recently:
Tuple<ulong, ulong> = Tuple.Create(someUlongVariable, 0UL);
where without the UL it returns Tuple<ulong, int> and won't compile.
Implicit variable declarations using the var keyword in C# or the auto keyword coming to C++. This is less common for me because I only use var to shorten very long declarations, and ulong is the opposite.
When you feel obligated to write down the type of constant (even when not absolutely necessary) you make sure:
That you always consider how the compiler will translate this constant into bits
Who ever reads your code will always know how you thought the constant looks like and that you taken it into consideration (even you, when you rescan the code)
You don't spend time if thoughts whether you need to write the 'U'/'UL' or don't need to write it
also, several software development standards such as MISRA require you to mention the type of constant no matter what (at least write 'U' if unsigned)
in other words it is believed by some as good practice to write the type of constant because at the worst case you just ignore it and at the best you avoid bugs, avoid a chance different compilers will address your code differently and improve code readability