Why isn't there any std::stoui? [duplicate] - c++

C++11 added some new string conversion functions:
http://en.cppreference.com/w/cpp/string/basic_string/stoul
It includes stoi (string to int), stol (string to long), stoll (string to long long), stoul (string to unsigned long), stoull (string to unsigned long long). Notable in its absence is a stou (string to unsigned) function. Is there some reason it is not needed but all of the others are?
related: No "sto{short, unsigned short}" functions in C++11?

The most pat answer would be that the C library has no corresponding “strtou”, and the C++11 string functions are all just thinly veiled wrappers around the C library functions: The std::sto* functions mirror strto*, and the std::to_string functions use sprintf.
Edit: As KennyTM points out, both stoi and stol use strtol as the underlying conversion function, but it is still mysterious why while there exists stoul that uses strtoul, there is no corresponding stou.

I've no idea why stoi exists but not stou, but the only difference between stoul and a hypothetical stou would be a check that the result is in the range of unsigned:
unsigned stou(std::string const & str, size_t * idx = 0, int base = 10) {
unsigned long result = std::stoul(str, idx, base);
if (result > std::numeric_limits<unsigned>::max()) {
throw std::out_of_range("stou");
}
return result;
}
(Likewise, stoi is also similar to stol, just with a different range check; but since it already exists, there's no need to worry about exactly how to implement it.)

unsigned long ulval = std::stoul(buf);
unsigned long mask = ~0xffffffffl;
unsigned int uival;
if( (ulval & mask) == 0 )
uival = (unsigned int)ulval;
else {
...range error...
}
Using masks to do this with the expected value size in bits expressed in the mask, will make this work for 64-bit longs vs 32-bit ints, but also for 32-bit longs vs 32-bit ints.
In the case of 64-bit longs, ~0xffffffffl will become 0xffffffff00000000 and will thus see if any of the top 32 bits are set. With 32-bit longs, it ~0xffffffffl becomes 0x00000000 and the mask check will always be zero.

Related

Understanding typecasting(pointers)

I am reading the Beej's Guide to network programming book and I am having trouble understanding a function. The function expects a char * pointer but it dereferences the pointer and casts it to a (unsigned long int) and perform some bitwise operations. Why couldn't we just pass it as a
(unsigned int *) instead of (unsigned char *). Also if the parameter was replaced by (void *) and then inside code we did some thing like:
*(unsigned long int *)buf[0] << 24
will we get the same result? (Sorry this is my first time asking a question here so let me know if any more info is required).
unsigned long int unpacku32(unsigned char *buf)
{
return ((unsigned long int)buf[0]<<24) |
((unsigned long int)buf[1]<<16) |
((unsigned long int)buf[2]<< 8) |
buf[3];
}
What you're suggesting is not guaranteed to work. Unless buf points to an actual unsigned long, you're attempting to read an object of one type as another which is not allowed (unless you're reading as an unsigned char). There could be further issues if the pointer value you create is not properly aligned for its type.
Then there is also the issue of endianness. Bytes sent over a network are typically sent in big-endian format, i.e. most significant byte first. If your system is little-endian, it will interpret the bytes in the reverse order.
The function you posted demonstrates the proper way of deserializing an unsigned long from a byte buffer in a standard compliant manner.
That would make it dependable on the endianness of the platform. So we pick out the parts from the defined order to make it platform neutral.
buf[0] is treated as 8 bit unsigned value. If we do this:
(unsigned long int)buf[0] << 24, by casting we tell to treat it not as 8 bit value, but as 64 bit so we got more space to work with.
We shifted only buf[0], buf[1] and other fields are not considered during shifting process.
If you want to convert to unsigned long lets say a string "aabbccd" and we don't care about endianness we can do this like below:
char* str = const_cast<char *>("aabbccd\0");
unsigned long value = *(reinterpret_cast<unsigned long *>(str));
std::cout << value << std::endl;
std::cout << reinterpret_cast<char *>(&value) << std::endl;
It should be pointed, unsigned long can store up to 8 chars only, because its 64 bit integer.
However if many platforms are going to use same data, doing it like this maybe be not enough due to endianness. The approach given in your book is as someone mentioned platform neutral.
The function expects a char * pointer but it dereferences the
pointer and casts it to a (unsigned long int) and perform some
bitwise operations.
Actually, what the code does is use the array index operator to pull out the first byte from the buffer, casts that to an unsigned long int, and then does some bitwise operations. The pointer that's dereferenced is an unsigned char * not anything to do with long integers.
Why couldn't we just pass it as a (unsigned int *) instead of
(unsigned char *).
Because it isn't a pointer to any kind of integer. It's a pointer to a buffer of unsigned char, i.e. bytes. Treating a pointer as if it were a pointer to a different type is likely to lead to a violation of the "Strict Aliasing Rule" (which I encourage you to read about).
Also if the parameter was replaced by (void *) and then inside code we
did some thing like *(unsigned long int *)buf[0] << 24 will we get
the same result?
No. If you define buf as a void*, then buf[0] is a meaningless expression. If buf is defined as, or cast to, an unsigned long int *, then buf[0] is an unsigned long int, not the unsigned char that the algorithm is expecting. There will almost certainly be too many bits set (as many as 64, not 8) and the result of the expression will be invalid.

Is it safe to compare an unsigned int with a std::string::size_type

I am going trough the book "Accelerated C++" by Andrew Koenig and Barbara E. Moo and I have some questions about the main example in chap 2. The code can be summarized as below, and is compiling without warning/error with g++:
#include <string>
using std::string;
int main()
{
const string greeting = "Hello, world!";
// OK
const int pad = 1;
// KO
// int pad = 1;
// OK
// unsigned int pad = 1;
const string::size_type cols = greeting.size() + 2 + pad * 2;
string::size_type c = 0;
if (c == 1 + pad)
{;}
return 0;
}
However, if I replace const int pad = 1; by int pad = 1;, the g++ compiler will return a warning:
warning: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
if (c == 1 + pad)
If I replace const int pad = 1; by unsigned int pad = 1;, the g++ compiler will not return a warning.
I understand why g++ return the warning, but I am not sure about the three below points:
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
From the compiler point of view:
It is unsafe to compare signed and unsinged variables (non-constants).
It is safe to compare 2 unsinged variables of different sizes.
It is safe to compare an unsigned variable with a singed constant if the compiler can check that constant to be in the allowed range for the type of the signed variable (e.g. for 16-bit signed integer it is safe to use a constant in range [0..32767]).
So the answers to your questions:
Yes, it is safe to compare unsigned int and std::string::size_type.
There is no warning because the compiler can perform the safety check (while compiling :)).
There is no problem to use different unsigned types in comparison. Use unsinged int.
Comparing signed and unsigned values is "dangerous" in the sense that you may not get what you expect when the signed value is negative (it may well behave as a very large unsigned value, and thus a > b gives true when a = -1 and b = 100. (The use of const int works because the compiler knows the value isn't changing and thus can say "well, this value is always 1, so it works fine here")
As long as the value you want to compare fits in unsigned int (on typical machines, a little over 4 billion) is fine.
If you are using std::string with the default allocator (which is likely), then size_type is actually size_t.
[support.types]/6 defines that size_t is
an implementation-defined unsigned integer type that is large enough to contain the size
in bytes of any object.
So it's not technically guaranteed to be a unsigned int, but I believe it is defined this way in most cases.
Now regarding your second question: if you use const int something = 2, the compiler sees that this integer is a) never negative and b) never changes, so it's always safe to compare this variable with size_t. In some cases the compiler may optimize the variable out completely and simply replace all it's occurrences with 2.
I would say that it is better to use size_type everywhere where you are to the size of something, since it is more verbose.
What the compiler warns about is the comparison of unsigned and signed integer types. This is because the signed integer can be negative and the meaning is counter intuitive. This is because the signed is converted to unsigned before comparison, which means the negative number will compare greater than the positive.
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Yes, they are both unsigned and then the semantics is what's expected. If their range differs the narrower are converted to a wider type.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
This is because how the compiler is constructed. The compiler parses and to some extent optimizes the code before warnings are issued. The important point is that at the point this warning is being considered the compiler nows that the signed integer is 1 and then it's safe to compare with a unsigned integer.
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
If you don't want it to be constant the best solution would probably be to make it at least an unsigned integer type. However you should be aware that there is no guaranteed relation between normal integer types and sizes, for example unsigned int may be narrower, wider or equal to size_t and size_type (the latter may also differ).

How to check conversion from C++ string to unsigned int

I need to:
1) Find what is the maximum unsigned int value on my current system. I didn't find it on limits.h. Is it safe to write unsigned int maxUnsInt = 0 - 1;? I also tried unsigned int maxUnsInt = MAX_INT * 2 + 1 that returns the correct value but the compiler shows a warning about int overflow operation.
2) Once found, check if a C++ string (that I know it is composed only by digits) exceeded the maximum unsigned int value on my system.
My final objective is to convert the string to a unsigned int using atoi if and only if it is a valid unsigned int. I would prefer to use only the standard library.
There should be a #define UINT_MAX in <limits.h>; I'd be
very surprised if there wasn't. Otherwise, it's guaranteed
that:
unsigned int u = -1;
will result in the maximum value. In C++, you can also use
std::numeric_limits<unsigned int>::max(), but until C++11,
that wasn't an integral constant expression (which may or may
not be a problem).
unsigned int u = 2 * MAX_INT + 1;
is not guaranteed to be anything (on at least one system,
MAX_INT == UMAX_INT).
With regards to checking a string, the simplest solution would
be to use strtoul, then verify errno and the return value:
bool
isLegalUInt( std::string const& input )
{
char const* end;
errno = 0;
unsigned long v = strtoul( input.c_str(), &end, 10 );
return errno == 0 && *end == '\0' && end != input.c_str() && v <= UINT_MAX;
}
If you're using C++11, you could also use std::stoul, which
throws an std::out_of_range exception in case of overflow.
numeric_limits has limits for various numeric types:
unsigned int maxUnsInt = std::numeric_limits<unsigned int>::max();
stringstream can read a string into any type that supports operator>> and tell you whether it failed:
std::stringstream ss("1234567890123456789012345678901234567890");
unsigned int value;
ss >> value;
bool successful = !ss.fail();
According to this you do not need to calculate it, just use appropriate constant, which it this case should be UINT_MAX.
Few notes.
This seems more of a c way in contrast to c++ but since you say you want to use atol I stick with it. c++ would be using numeric_limits as Joachim suggested. However the c++ standard also defines the c-like macros/definitions, so it should be safe to use.
Also if you want it to be c++-way, it would probably be preferred to use stringstream (which is a part of standard c++ library) for conversion.
Lastly I deliberately don't post explicit code solution, 'cause it looks like homework, and you should be good to go from here now.

Why is there no std::stou?

C++11 added some new string conversion functions:
http://en.cppreference.com/w/cpp/string/basic_string/stoul
It includes stoi (string to int), stol (string to long), stoll (string to long long), stoul (string to unsigned long), stoull (string to unsigned long long). Notable in its absence is a stou (string to unsigned) function. Is there some reason it is not needed but all of the others are?
related: No "sto{short, unsigned short}" functions in C++11?
The most pat answer would be that the C library has no corresponding “strtou”, and the C++11 string functions are all just thinly veiled wrappers around the C library functions: The std::sto* functions mirror strto*, and the std::to_string functions use sprintf.
Edit: As KennyTM points out, both stoi and stol use strtol as the underlying conversion function, but it is still mysterious why while there exists stoul that uses strtoul, there is no corresponding stou.
I've no idea why stoi exists but not stou, but the only difference between stoul and a hypothetical stou would be a check that the result is in the range of unsigned:
unsigned stou(std::string const & str, size_t * idx = 0, int base = 10) {
unsigned long result = std::stoul(str, idx, base);
if (result > std::numeric_limits<unsigned>::max()) {
throw std::out_of_range("stou");
}
return result;
}
(Likewise, stoi is also similar to stol, just with a different range check; but since it already exists, there's no need to worry about exactly how to implement it.)
unsigned long ulval = std::stoul(buf);
unsigned long mask = ~0xffffffffl;
unsigned int uival;
if( (ulval & mask) == 0 )
uival = (unsigned int)ulval;
else {
...range error...
}
Using masks to do this with the expected value size in bits expressed in the mask, will make this work for 64-bit longs vs 32-bit ints, but also for 32-bit longs vs 32-bit ints.
In the case of 64-bit longs, ~0xffffffffl will become 0xffffffff00000000 and will thus see if any of the top 32 bits are set. With 32-bit longs, it ~0xffffffffl becomes 0x00000000 and the mask check will always be zero.

convert double number to (IEEE 754) 64-bit binary string representation in c++

I have a double number, I want to represent it in IEEE 754 64-bit binary string.
Currently i'm using a code like this:
double noToConvert;
unsigned long* valueRef = reinterpret_cast<unsigned long*>(&noToConvert);
bitset<64> lessSignificative(*valueRef);
bitset<64> mostSignificative(*(++valueRef));
mostSignificative <<= 32;
mostSignificative |= lessSignificative;
RowVectorXd binArray = RowVectorXd::Zero(mostSignificative.size());
for(unsigned int i = 0; i <mostSignificative.size();i++)
{
(mostSignificative[i] == 0) ? (binArray(i) = 0) : (binArray(i) = 1);
}
The above code just works fine without any problem. But If you see, i'm using reinterpret_cast and using unsigned long. So, this code is very much compiler dependent. Could anyone show me how to write a code that is platform independent and without using any libraries. i'm ok, if we use the standard libraries and even bitset, but i dont want to use any machine or compiler dependent code.
Thanks in advance.
If you're willing to assume that double is the IEEE-754 double type:
#include <cstdint>
#include <cstring>
uint64_t getRepresentation(const double number) {
uint64_t representation;
memcpy(&representation, &number, sizeof representation);
}
If you don't even want to make that assumption:
#include <cstring>
char *getRepresentation(const double number) {
char *representation = new char[sizeof number];
memcpy(representation, &number, sizeof number);
return representation;
}
Why not use the union?
bitset<64> binarize(unsigned long* input){
union binarizeUnion
{
unsigned long* intVal;
bitset<64> bits;
} binTransfer;
binTransfer.intVal=input;
return (binTransfer.bits);
}
The simplest way to get this is to memcpy the double into an array of char:
char double_as_char[sizeof(double)];
memcpy(double_as_char, &noToConvert, sizeof(double_as_char));
and then extract the bits from double_as_char. C and C++ define that in the standard as legal.
Now, if you want to actually extract the various components of a double, you can use the following:
sign= noToConvert<=-0.0f;
int exponent;
double normalized_mantissa= frexp(noToConvert, &exponent);
unsigned long long mantissa= normalized_mantissa * (1ull << 53);
Since the value returned by frexp is in [0.5, 1), you need to shift it one extra bit to get all the bits in the mantissa as an integer. Then you just need to map that into the binary represenation you want, although you'll have to adjust the exponent to include the implicit bias as well.
The function print_raw_double_binary() in my article Displaying the Raw Fields of a Floating-Point Number should be close to what you want. You'd probably want to replace the casting of double to int with a union, since the former violates "strict aliasing" (although even use of a union to access something different than what is stored is technically illegal).