Why are stoi, stol not fixed width integers? - c++

Since ints and longs and other integer types may be different sizes on different systems, why not have stouint8_t(), stoint64_t(), etc. so that portable string to int code could be written?

Because typing that would make me want to chop off my fingers.
Seriously, the basic integer types are int and long and the std::stoX functions are just very simple wrappers around strtol etc. and note that C doesn't provide strtoi32 or strtoi64 or anything that std::stouint32_t could wrap.
If you want something more complicated you can write it yourself.
I could just as well ask "why do people use int and long, instead of int32_t and int64_t everywhere, so the code is portable?" and the answer would be because it's not always necessary.
But the actual reason is probably that noone ever proposed it for the standard. Things don't just magically appear in the standard, someone has to write a proposal and justify adding them, and convince the rest of the committee to add them. So the answer to most "why isn't this thing I just thought of in the standard?" is that noone proposed it.

Because it's usually not necessary.
stoll and stoull return results of type long long and unsigned long long respectively. If you want to convert a string to int64_t, you can just call stoll() and store the result in your int64_t object; the value will be implicitly converted.
This assumes that long long is the widest signed integer type. Like C (starting with C99), C++ permits extended integer types, some of which might be wider than [unsigned] long long. C provides conversion functions strtoimax and strtoumax (operating on intmax_t and uintmax_t, respectively) in <inttypes.h>. For whatever reason, C++ doesn't provide wrappers for this functions (the logical names would be stoimax and stoumax.
But that's not going to matter unless you're using a C++ compiler that provides an extended integer type wider than [unsigned] long long, and I'm not aware that any such compilers actually exist. For any types no wider than 64 bits, the existing functions are all you need.
For example:
#include <iostream>
#include <string>
#include <cstdint>
int main() {
const char *s = "0xdeadbeeffeedface";
uint64_t u = std::stoull(s, NULL, 0);
std::cout << u << "\n";
}

Related

Is there a portable literal suffix for int64_t and similar types?

I was trying to understand std::variant:
#include <cstdint>
#include <variant>
std::variant<int64_t, double> v;
I wanted to assign an int64_t variant: v = 5L;
That compiles on x86_64 because int64_t is long. But it does not compile on arm, because int64_t is long long. The type deduction has now two equal choices between int64_t and double to convert my number to, so it declines. With variant<int64_t, string> I wouldn't even have noticed the conversion, because then there is only one available and the compiler would have accepted it.
Similar issue with: v = 5LL; Now arm / 32 bit is fine, but x86_64 not anymore.
I get this compiling on both platforms but this is (sometimes) a type conversion with potential side-effects I am not able to foresee: v = int64_t(5LL);. Without the LL I wouldn't even be able to express values outside 32bit int.
The INT64_C macro seems to be the most portable and safest way to express this: v = INT64_C(5);
But this not nice to read and write anymore.
Is there a similar literal suffix like L/LL for int64_t that is portable?
No, there are no standard literals for fixed width integer aliases.
One potential workaround would be to use std::variant<long long, double> v;. Although long long theoretically isn't guaranteed to be exactly 64 bits (it may be wider, but not narrower), it is 64 bits on practically every system that supports long long today. The benefit is that long long of course has a standard literal. The potential drawback is that the size situation may theoretically change in future.
A more general solution is to give up on using a literal suffix, and instead use a cast: v = static_cast<int64_t>(5);.
Another solution is to create a user defined literal as shown in this answer linked in comments:
constexpr std::int64_t operator "" _int64(unsigned long long v)
{ return static_cast<std::int64_t>(v); }
There is a proposal to add literals such as this to the standard library: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1280r2
On a related note, there is a proposal to add literals for std::size_t and std::ptrdiff_t. That proposal suggests core language literals instead of a standard library literals: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0330r3

Defining '999e999' value without using char type in C++

Is it possible to define 999e999 value without using the char type?
I've tried defining it even with unsigned long long, but the compiler keeps giving me constant too big error.
Thanks in advance.
Is it possible to define 999e999 value without using the char type?
No, that's not possible using intrinsic c++ data types. That's a way to big number that could be held in either a unsigned long long type in c++.
A long double type would enable you to use 10 based exponents as large as you want, for modern FPU architectures.
What can be achieved with your current CPU architecture can be explored using the std::numeric_limits facilities like this:
#include <iostream>
#include <limits>
int main() {
std::cout<< "max_exponent10: " << std::numeric_limits<long double>::max_exponent10 << std::endl;
}
Output:
max_exponent10: 4932
See the online demo
You have to use a 3rd party library (like GMP) or write your own algorithms to deal with big numbers like that.
In most (If not all) implementations, that constant is just too big to be represented as a unsigned long long or long double (Though some may just have it be floating point infinity).
You may instead be interested in std::numeric_limits<T>::infinity() (for float, double or long double) or std::numeric_limits<T>::max() instead.
I've tried defining it even with unsigned long long, but the compiler keeps giving me constant too big error.
Of course it does. A long long is typically 64 bits long, which gives you log(2^64) ≅ 19 decimal digits of precision. 999e999 ≅ (10^3)^1000, so is on the order of 3000 decimal digits long, or nearly 10,000 bits long. So 999e999 isn't just too big for a long long, it's too big by an enormous margin.
Is it possible to define 999e999 value without using the char type?
Sure. You could define an integer-like type based on an array of some sort of integers, like long long. You'd still need to write a set of operators to work with your new giant type, though. Also, most of the time when you're working with numbers that large, you don't need an exact representation, which is why floating point types like float and double are useful.

How to check if plain chars are signed or unsigned?

Apparently there is a possibility that plain char can be either signed or unsigned by default. Stroustrup writes:
It is implementation-defined whether a plain char is considered signed or unsigned. This opens the
possibility for some nasty surprises and implementation dependencies.
How do I check whether my chars are signed or unsigned? I might want to convert them to int later, and I don't want them to be negative. Should I always use unsigned char explicitly?
From header <limits>
std::numeric_limits<char>::is_signed
http://en.cppreference.com/w/cpp/types/numeric_limits/is_signed
Some alternatives:
const bool char_is_signed = (char)-1 < 0;
#include <climits>
const bool char_is_signed = CHAR_MIN < 0;
And yes, some systems do make plain char an unsigned type. Examples I've encountered: Cray T90, Cray SV1, Cray T3E, SGI MIPS IRIX, IBM PowerPC AIX. And any system that uses EBCDIC pretty much has to make plain char unsigned so that all basic characters have non-negative values. (And some compilers have an option to control the signedness of char, such as gcc's -fsigned-char and -funsigned-char.)
But std::numeric_limits<char>::is_signed, as suggested by Benjamin Lindley's answer, probably expresses the intent more clearly.
(On the other hand, the methods I suggested can also be applied to C.)
Using unsigned char "always" could give you some interesting surprises, as the majority of C-style functions like printf, fopen, will use char, not unsigned char.
edit: Example of "fun" with C-style functions:
const unsigned char *cmd = "grep -r blah *.txt";
FILE *pf = popen(cmd, "r");
will give errors (in fact, I get one for the *cmd = line, and one error for the popen line). Using const char *cmd = ... will work fine. I picked popen because it's a function that isn't trivial to replace with some standard C++ functionality - obviously, printf or fopen can quite easily be replaced with some iostream or fstream type functionality, which generally has alternatives that take unsigned char as well as char.
However, if you are using > or < on characters that are beyond 127, then you will need to use unsigned char (or some other solution, such as casting to int and masking the lower 8 bits). It is probably better to try to avoid direct comparisons (in particular when it comes to non-ASCII characters - they are messy anyway, because there are often several variants depending on locale, character encodings, etc). Comparing for equality should work however.
Yes, if you want to use a char type and you always want it to be unsigned, use unsigned char. Note that unlike the other fundamental integer types, unsigned char is a different type from char -- even on systems where char is unsigned. Also, conversion from char to int ought to be lossless so if your result is incorrect, your source char value may also be incorrect.
The cleanest way to test whether char is unsigned depends on whether you need it to be a preprocessor test and on which version of C++ you are targeting.
To conditionally compile code using a preprocessor test, the value of CHAR_MIN should work:
#include <climits>
#if (CHAR_MIN==0)
// code that relies on char being unsigned
#endif
In C++17, I would use std::is_signed_v and std::is_unsigned_v:
#include <type_traits>
static_assert(std::is_unsigned_v<char>);
// code that relies on char being unsigned
If you are writing against C++11 or C++14 you need the slightly more verbose std::is_signed and std::is_unsigned:
#include <type_traits>
static_assert(std::is_unsigned<char>::value, "char is signed");
// code that relies on char being unsigned
For all revisions of C++, #benjamin-lindley's solution is a good alternative.
You can use preprocessor command:
#define is_type_signed(my_type) (((my_type)-1) < 0)

How to write an unsigned short int literal?

42 as unsigned int is well defined as "42U".
unsigned int foo = 42U; // yeah!
How can I write "23" so that it is clear it is an unsigned short int?
unsigned short bar = 23; // booh! not clear!
EDIT so that the meaning of the question is more clear:
template <class T>
void doSomething(T) {
std::cout << "unknown type" << std::endl;
}
template<>
void doSomething(unsigned int) {
std::cout << "unsigned int" << std::endl;
}
template<>
void doSomething(unsigned short) {
std::cout << "unsigned short" << std::endl;
}
int main(int argc, char* argv[])
{
doSomething(42U);
doSomething((unsigned short)23); // no other option than a cast?
return EXIT_SUCCESS;
}
You can't. Numeric literals cannot have short or unsigned short type.
Of course in order to assign to bar, the value of the literal is implicitly converted to unsigned short. In your first sample code, you could make that conversion explicit with a cast, but I think it's pretty obvious already what conversion will take place. Casting is potentially worse, since with some compilers it will quell any warnings that would be issued if the literal value is outside the range of an unsigned short. Then again, if you want to use such a value for a good reason, then quelling the warnings is good.
In the example in your edit, where it happens to be a template function rather than an overloaded function, you do have an alternative to a cast: do_something<unsigned short>(23). With an overloaded function, you could still avoid a cast with:
void (*f)(unsigned short) = &do_something;
f(23);
... but I don't advise it. If nothing else, this only works if the unsigned short version actually exists, whereas a call with the cast performs the usual overload resolution to find the most compatible version available.
unsigned short bar = (unsigned short) 23;
or in new speak....
unsigned short bar = static_cast<unsigned short>(23);
at least in Visual Studio (at least 2013 and newer) you can write
23ui16
for get an constant of type unsigned short.
see definitions of INT8_MIN, INT8_MAX, INT16_MIN, INT16_MAX, etc. macros in stdint.h
I don't know at the moment whether this is part of the standard C/C++
There are no modifiers for unsigned short. Integers, which has int type by default, usually implicitly converted to target type with no problems. But if you really want to explicitly indicate type, you could write the following:
unsigned short bar = static_cast<unsigned short>(23);
As I can see the only reason is to use such indication for proper deducing template type:
func( static_cast<unsigned short>(23) );
But for such case more clear would be call like the following:
func<unsigned short>( 23 );
There are multiple answers here, none of which are terribly satisfying. So here is a compilation answer with some added info to help explain things a little more thoroughly.
First, avoid shorts as suggested, but if you find yourself needing them such as when working with indexed mesh data and simply switching to shorts for your index size cuts your index data size in half...then read on...
1 While it is technically true that there is no way to express an unsigned short literal in c or C++ you can easily side step this limitation by simply marking your literal as unsigned with a 'u'.
unsigned short myushort = 16u;
This works because it tells the compiler that 16 is unsigned int, then the compiler goes looking for a way to convert it to unsigned short, finds one, most compilers will then check for overflow, and do the conversion with no complaints. The "narrowing conversion" error/warning when the "u" is left out is the compiler complaining that the code is throwing away the sign. Such that if the literal is negative such as -1 then the result is undefined. Usually this means you will get a very large unsigned value that will then be truncated to fit the short.
2 There are multiple suggestions on how to side step this limitation, most seasoned programmers will sum these up with a "don't do that".
unsigned short myshort = (unsigned short)16;
unsigned short myothershort = static_cast<unsigned short>(16);
While both of these work they are undesirable for 2 major reasons. First they are wordy, programmers get lazy and typing all that just for a literal is easy to skip which leads to basic errors that could have been avoided with a better solution. Second they are not free, static_cast in particular generates a little assembly code to do the conversion, and while an optimizer may(or may not) figure out that it can do the conversion its better to just write good quality code from the start.
unsigned short myshort = 16ui16;
This solution is undesirable because it limits who can read your code and understand it, it also means you are starting down the slippery slope of compiler specific code which can lead to your code suddenly not working because of the whims of some compiler writer, or some company that randomly decides to "make a right hand turn", or goes away and leaves in you in the lurch.
unsigned short bar = L'\x17';
This is so unreadable that nobody has upvoted it. And unreadable should be avoided for many good reasons.
unsigned short bar = 0xf;
This to is unreadable. While being able to read understand and convert hex is something serious programmers really need to learn it is very unreadable quick what number is this: 0xbad; Now convert it to binary...now octal.
3 Lastly if you find all the above solutions undesirable I offer up yet another solution that is available via a user defined operator.
constexpr unsigned short operator ""_ushort(unsigned long long x)
{
return (unsigned short)x;
}
and to use it
unsigned short x = 16_ushort;
Admittedly this too isn't perfect. First it takes an unsigned long long and whacks it all the way down to an unsigned short suppressing potential compiler warnings along the way, and it uses the c style cast. But it is constexpr which gurantees it is free in an optimized program, yet can be stepped into during debug. It is also short and sweet so programmers are more likely to use it and it is expressive so it is easy to read and understand. Unfortunately it requires a recent compiler as what can legally be done with user defined operators has changed over the various version of C++.
So pick your trade off but be careful as you may regret them later. Happy Programming.
Unfortunately, the only method defined for this is
One or two characters in single quotes
('), preceded by the letter L
According to http://cpp.comsci.us/etymology/literals.html
Which means you would have to represent your number as an ASCII escape sequence:
unsigned short bar = L'\x17';
Unfortunately, they can't. But if people just look two words behind the number, they should clearly see it is a short... It's not THAT ambiguous. But it would be nice.
If you express the quantity as a 4-digit hex number, the unsigned shortness might be clearer.
unsigned short bar = 0x0017;
You probably shouldn't use short, unless you have a whole lot of them. It's intended to use less storage than an int, but that int will have the "natural size" for the architecture. Logically it follows that a short probably doesn't. Similar to bitfields, this means that shorts can be considered a space/time tradeoff. That's usually only worth it if it buys you a whole lot of space. You're unlikely to have very many literals in your application, though, so there was no need foreseen to have short literals. The usecases simply didn't overlap.
In C++11 and beyond, if you really want an unsigned short literal conversion then it can be done with a user defined literal:
using uint16 = unsigned short;
using uint64 = unsigned long long;
constexpr uint16 operator""_u16(uint64 to_short) {
// use your favorite value validation
assert(to_short < USHRT_MAX); // USHRT_MAX from limits.h
return static_cast<uint16>(to_short);
}
int main(void) {
uint16 val = 26_u16;
}

Why or why not should I use 'UL' to specify unsigned long?

ulong foo = 0;
ulong bar = 0UL;//this seems redundant and unnecessary. but I see it a lot.
I also see this in referencing the first element of arrays a good amount
blah = arr[0UL];//this seems silly since I don't expect the compiler to magically
//turn '0' into a signed value
Can someone provide some insight to why I need 'UL' throughout to specify specifically that this is an unsigned long?
void f(unsigned int x)
{
//
}
void f(int x)
{
//
}
...
f(3); // f(int x)
f(3u); // f(unsigned int x)
It is just another tool in C++; if you don't need it don't use it!
In the examples you provide it isn't needed. But suffixes are often used in expressions to prevent loss of precision. For example:
unsigned long x = 5UL * ...
You may get a different answer if you left off the UL suffix, say if your system had 16-bit ints and 32-bit longs.
Here is another example inspired by Richard Corden's comments:
unsigned long x = 1UL << 17;
Again, you'd get a different answer if you had 16 or 32-bit integers if you left the suffix off.
The same type of problem will apply with 32 vs 64-bit ints and mixing long and long long in expressions.
Some compiler may emit a warning I suppose.
The author could be doing this to make sure the code has no warnings?
Sorry, I realize this is a rather old question, but I use this a lot in c++11 code...
ul, d, f are all useful for initialising auto variables to your intended type, e.g.
auto my_u_long = 0ul;
auto my_float = 0f;
auto my_double = 0d;
Checkout the cpp reference on numeric literals: http://www.cplusplus.com/doc/tutorial/constants/
You don't normally need it, and any tolerable editor will have enough assistance to keep things straight. However, the places I use it in C# are (and you'll see these in C++):
Calling a generic method (template in C++), where the parameter types are implied and you want to make sure and call the one with an unsigned long type. This happens reasonably often, including this one recently:
Tuple<ulong, ulong> = Tuple.Create(someUlongVariable, 0UL);
where without the UL it returns Tuple<ulong, int> and won't compile.
Implicit variable declarations using the var keyword in C# or the auto keyword coming to C++. This is less common for me because I only use var to shorten very long declarations, and ulong is the opposite.
When you feel obligated to write down the type of constant (even when not absolutely necessary) you make sure:
That you always consider how the compiler will translate this constant into bits
Who ever reads your code will always know how you thought the constant looks like and that you taken it into consideration (even you, when you rescan the code)
You don't spend time if thoughts whether you need to write the 'U'/'UL' or don't need to write it
also, several software development standards such as MISRA require you to mention the type of constant no matter what (at least write 'U' if unsigned)
in other words it is believed by some as good practice to write the type of constant because at the worst case you just ignore it and at the best you avoid bugs, avoid a chance different compilers will address your code differently and improve code readability