Why strtoul doesn't work as expected? - c++

Hi I wrote a small test program to check how the function I wrote to convert string (an hexadecimal number) into a unsigned integer and I found that the code behave differently depending on the compiler or system I use.
I compiled the code below on:
(1) ideone C++4.3.2 https://ideone.com/LlcNWw
(2) g++ 4.4.7 on a centos6 (64bits)
(3) g++ 4.6.3 on an ubuntu12 (64bits)
(4) g++ 4.9.3 in a cygwin (32bits) environment
As expected (1) and (4) return AND IT'S exactly the correct result as the 1st value '0x210000000' is to big for a 32bit value....
Error while converting Id (0x210000000).
success
but (2) and (3) return
success
success
SO THE QUESTION is why the same simple C code build on different platform with different compiler return the same result... and Why 'strtoul("0x210000000", ....)' doesn't set 'errno' to 'ERANGE' to said that the bit 33 to 37 are out of range.
more trace on a the platform (3) give:
Id (0x210000000) as ul = 0x10000000 - str_end - errno 0.
sucess
Id (0x10000000) as ul = 0x10000000 - str_end - errno 0.
sucess
/* strtoul example */
#include <stdio.h> /* printf, NULL */
#include <stdlib.h> /* strtoul */
#include <errno.h>
signed int GetIdentifier(const char* idString)
{
char *str_end;
int id = -1;
errno = 0;
id = strtoul(idString, &str_end, 16);
if ( *str_end != '\0' || (errno == ERANGE))
{
printf("Error while converting Id (%s).\n", idString);
return -1;
}
// Return error if converted Id is more than 29-bit
if(id > 0x1FFFFFFF)
{
printf("Error: Id (%s) should fit on 29 bits (maximum value: 0x1FFFFFFF).\n", idString);
return -1;
}
printf("sucess\n");
return id;
}
int main ()
{
GetIdentifier("0x210000000");
GetIdentifier("0x10000000");
return 0;
}

The value 0x210000000 is larger than 32 bits, and on 32 bit systems long is usually 32 bits which means you can't use strtoul to convert the string correctly. You need to use strtoull and use unsigned long long which is guaranteed to be at least 64 bits.
Of course, long long and strtoull was introduced in C99, so you might need to add e.g. -std=c99 (or use a later standard like C11) to have it build correctly.
The problem, it seems, is that you assume that long is always 32 bits, when in fact it's defined to be at least 32 bits. See e.g. this reference for the minimum bit-size of the standard integer types.
On some platforms and compilers, long can be bigger than 32 bits. Linux on 64-bit hardware is a typical such platform where long is bigger, namely 64 bits, which is of course well enough to fit 0x210000000, which leads to strtoul not giving an error.

Your code is also incorrect in assuming a successful call will not change the value of errno. Per the Linux errno man page:
The <errno.h> header file defines the integer variable errno,
which is set by system calls and some library functions in the event
of an error to indicate what went wrong. Its value is significant
only the return value of the call indicated an error (i.e., -1 from
most system calls; -1 or NULL from most library functions); a
function that succeeds is allowed to change errno.
(POSIX does place greater restrictions on errno modification by successful calls, but Linux doesn't strictly adhere to POSIX in many cases, and after all, GNU's Not Unix...)
The strtoul man page states:
The strtoul() function returns either the result of the conversion
or, if there was a leading minus sign, the negation of the result of
the conversion represented as an unsigned value, unless the original
(nonnegated) value would overflow; in the latter case, strtoul()
returns ULONG_MAX and sets errno to ERANGE. Precisely the same
holds for strtoull() (with ULLONG_MAX instead of ULONG_MAX).
Unless strtoul returned ULONG_MAX, the value of errno after a call to strtoul is indeterminate.

Related

Conversion of float to integer in ARM based system

I have the following piece of code called main.cpp that converts an IEE 754 32-bit hex value to float and then converts it into unsigned short.
#include <iostream>
using namespace std;
int main() {
unsigned int input_val = 0xc5dac022;
float f;
*((int*) &f) = input_val;
unsigned short val = (unsigned short) f;
cout <<"Val = 0x" << std::hex << val << endl;
}
I build and run the code using the following command:
g++ main.cpp -o main
./main
When I following code in my normal PC, I get the correct answer which is 0xe4a8. But when I run the same code on an ARM processor, it gives an output of 0x0.
Is this happening because I am building the code with normal gcc instead of aarch64? The code gives correct output for some other test cases on the ARM processor but gives an incorrect output for the given test value. How can I solve this issue?
First, your "type pun" via pointers violates the strict aliasing rule, as mentioned in comments. You can fix that by switching to memcpy.
Next, the bit pattern 0xc5dac022 as an IEEE-754 single precision float corresponds to a value of about -7000, if my test is right. This is truncated to -7000, which, being negative, cannot be represented in an unsigned short. As such, attempting to convert it to unsigned short has undefined behavior, per [7.3.10 p1] in the C++ standard (C++20 N4860). Note this is different than the situation for trying to convert a signed or unsigned integer to unsigned short, which would have well-defined "wrapping" behavior.
So there is no "correct answer" here. Printing 0 is a perfectly legal result, and is also logical in some sense, as 0 is the closest unsigned short value to -7000. But it's also not surprising that the result would vary between platforms / compilers / optimization options, as this is common for UB.
There is actually a difference between ARM64 and x86-64 that explains why this is the particular behavior you see.
When compiling without optimization, in both cases, gcc emits instructions to actually convert the float value to unsigned short at runtime.
ARM64 has a dedicated instruction fcvtzu that converts a float to a 32-bit unsigned int, so gcc emits that instruction, and then extracts the low 16 bits of the integer result. The behavior of fcvtzu with a negative input is to output 0, and so that's the value that you get.
x86-64 doesn't have such an instruction. The nearest thing is cvttss2si which converts a single-precision float to a signed 32-bit integer. So gcc emits that instruction, then uses the low 16 bits of it as the unsigned short value. This gives the right answer whenever the input float is in the range [0, 65536), because all these values fit in the range of a 32-bit signed integer. GCC doesn't care what it does in all other cases, because they are UB according to the C++ standard. But it so happens that, since your value -7000 does fit in signed int, then cvstss2si returns the signed integer -7000, which is 0xffffe4a8. Extracting the low 16 bits gives you the 0xe4a8 that you observed.
When optimizing, gcc on both platforms optimizes the value into a constant 0. Which is also perfectly legal.

comparison between signed and unsigned integer expressions and 0x80000000

I have the following code:
#include <iostream>
using namespace std;
int main()
{
int a = 0x80000000;
if(a == 0x80000000)
a = 42;
cout << "Hello World! :: " << a << endl;
return 0;
}
The output is
Hello World! :: 42
so the comparison works. But the compiler tells me
g++ -c -pipe -g -Wall -W -fPIE -I../untitled -I. -I../bin/Qt/5.4/gcc_64/mkspecs/linux-g++ -o main.o ../untitled/main.cpp
../untitled/main.cpp: In function 'int main()':
../untitled/main.cpp:8:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if(a == 0x80000000)
^
So the question is: Why is 0x80000000 an unsigned int? Can I make it signed somehow to get rid of the warning?
As far as I understand, 0x80000000 would be INT_MIN as it's out of range for positive a integer. but why is the compiler assuming, that I want a positive number?
I'm compiling with gcc version 4.8.1 20130909 on linux.
0x80000000 is an unsigned int because the value is too big to fit in an int and you did not add any L to specify it was a long.
The warning is issued because unsigned in C/C++ has a quite weird semantic and therefore it's very easy to make mistakes in code by mixing up signed and unsigned integers. This mixing is often a source of bugs especially because the standard library, by historical accident, chose to use an unsigned value for the size of containers (size_t).
An example I often use to show how subtle is the problem consider
// Draw connecting lines between the dots
for (int i=0; i<pts.size()-1; i++) {
draw_line(pts[i], pts[i+1]);
}
This code seems fine but has a bug. In case the pts vector is empty pts.size() is 0 but, and here comes the surprising part, pts.size()-1 is a huge nonsense number (today often 4294967295, but depends on the platform) and the loop will use invalid indexes (with undefined behavior).
Here changing the variable to size_t i will remove the warning but is not going to help as the very same bug remains...
The core of the problem is that with unsigned values a < b-1 and a+1 < b are not the same thing even for very commonly used values like zero; this is why using unsigned types for non-negative values like container size is a bad idea and a source of bugs.
Also note that your code is not correct portable C++ on platforms where that value doesn't fit in an integer as the behavior around overflow is defined for unsigned types but not for regular integers. C++ code that relies on what happens when an integer gets past the limits has undefined behavior.
Even if you know what happens on a specific hardware platform note that the compiler/optimizer is allowed to assume that signed integer overflow never happens: for example a test like a < a+1 where a is a regular int can be considered always true by a C++ compiler.
It seems you are confusing 2 different issues: The encoding of something and the meaning of something. Here is an example: You see a number 97. This is a decimal encoding. But the meaning of this number is something completely different. It can denote the ASCII 'a' character, a very hot temperature, a geometrical angle in a triangle, etc. You cannot deduce meaning from encoding. Someone must supply a context to you (like the ASCII map, temperature etc).
Back to your question: 0x80000000 is encoding. While INT_MIN is meaning. There are not interchangeable and not comparable. On a specific hardware in some contexts they might be equal just like 97 and 'a' are equal in the ASCII context.
Compiler warns you about ambiguity in the meaning, not in the encoding. One way to give meaning to a specific encoding is the casting operator. Like (unsigned short)-17 or (student*)ptr;
On a 32 bits system or 64bits with back compatibility int and unsigned int have encoding of 32bits like in 0x80000000 but on 64 bits MIN_INT would not be equal to this number.
Anyway - the answer to your question: in order to remove the warning you must give identical context to both left and right expressions of the comparison.
You can do it in many ways. For example:
(unsigned int)a == (unsigned int)0x80000000 or (__int64)a == (__int64)0x80000000 or even a crazy (char *)a == (char *)0x80000000 or any other way as long as you maintain the following rules:
You don't demote the encoding (do not reduce the amount of bits it requires). Like (char)a == (char)0x80000000 is incorrect because you demote 32 bits into 8 bits
You must give both the left side and the right side of the == operator the same context. Like (char *)a == (unsigned short)0x80000000 is incorrect an will yield an error/warning.
I want to give you another example of how crucial is the difference between encoding and meaning. Look at the code
char a = -7;
bool b = (a==-7) ? true : false;
What is the result of 'b'? The answer will shock you: it is undefined.
Some compilers (typically Microsoft visual studio) will compile a program that b will get true while on Android NDK compilers b will get false.
The reason is that Android NDK treats 'char' type as 'unsigned char', while Visual studio treats 'char' as 'signed char'. So on Android phones the encoding of -7 actually has a meaning of 249 and is not equal to the meaning of (int)-7.
The correct way to fix this problem is to specifically define 'a' as signed char:
signed char a = -7;
bool b = (a==-7) ? true : false;
0x80000000 is considered unsigned per default.
You can avoid the warning like this:
if (a == (int)0x80000000)
a=42;
Edit after a comment:
Another (perhaps better) way would be
if ((unsigned)a == 0x80000000)
a=42;

Why do I get wrong conversion from hex to decimal with strtoul function in Visual Studio compiler?

I am converting a string from hex to decimal. The problem is that in Visual Studio compiler the conversion returns a wrong value. However when I compile the same code in a Mac at the terminal using the g++ compiler, the value is returned correctly.
Why this is happening?
#include <string>
#include <iostream>
using namespace std;
int main()
{
string hex = "412ce69800";
unsigned long n = strtoul( hex.c_str(), nullptr, 16 );
cout<<"The value to convert is: "<<hex<<" hex\n\n";
cout<<"The converted value is: "<<n<<" dec\n\n";
cout<<"The converted value should be: "<<"279926183936 dec\n\n";
return 0;
}
output:
Because in Windows long is a 32-bit type, unlike most Unix/Linux implementations which use LP64 memory model in which long is 64 bits. The number 412ce69800 has 39 bits and inherently it can't be stored in a 32-bit type. Read compiler warnings and you'll know the issue immediately
C standard only requires long to have at least 32 bits. C99 added a new long long type with at least 64 bits, and that's guaranteed in all platforms. So if your value is in 64-bit type's range, use unsigned long long or uint64_t/uint_least64_t and strtoull instead to get the correct value.

Converting string to int fails

I'm trying to convert the string to int with stringstream, the code down below works, but if i use a number more then 1234567890, like 12345678901 then it return 0 back ...i dont know how to fix that, please help me out
std:: string number= "1234567890";
int Result;//number which will contain the result
std::stringstream convert(number.c_str()); // stringstream used for the conversion initialized with the contents of Text
if ( !(convert >> Result) )//give the value to Result using the characters in the string
Result = 0;
printf ("%d\n", Result);
the maximum number an int can contain is slightly more than 2 billion. (assuming ubiquitios 32 bit ints)
It just doesn't fit in an int!
The largest unsigned int (on a 32-bit platform) is 2^32 (4294967296), and your input is larger than that, so it's giving up. I'm guessing you can get an error code from it somehow. Maybe check failbit or badbit?
int Result;
std::stringstream convert(number.c_str());
convert >> Result;
if(convert.fail()) {
std::cout << "Bad things happened";
}
If you're on a 32-bit or LP64 64-bit system then int is 32-bit so the largest number you can store is approximately 2 billion. Try using a long or long long instead, and change "%d" to "%ld" or "%lld" appropriately.
The (usual) maximum value for a signed int is 2.147.483.647 as it is (usually) a 32bit integer, so it fails for numbers which are bigger.
if you replace int Result; by long Result; it should be working for even bigger numbers, but there is still a limit. You can extend that limit by factor 2 by using unsigned integer types, but only if you don't need negative numbers.
Hm, lots of disinformation in the existing four or five answers.
An int is minimum 16 bits, and with common desktop system compilers it’s usually 32 bits (in all Windows version) or 64 bits. With 32 bits it has maximum 232 distinct values, which, setting K=210 = 1024, is 4·K3, i.e. roughly 4 billion. Your nearest calculator or Python prompt can tell you the exact value.
A long is minimum 32 bits, but that doesn’t help you for the current problem, because in all extant Windows variants, including 64-bit Windows, long is 32 bits…
So, for better range than int, use long long. It’s minimum 64 bits, and in practice, as of 2012 it’s 64 bits with all compilers. Or, just use a double, which, although not an integer type, with the most common implementation (IEEE 754 64-bit) can represent integer values exactly with, as I recall, about 51 or 52 bits – look it up if you want exact number of bits.
Anyway, remember to check the stream for conversion failure, which you can do by s.fail() or simply !s (which is equivalent to fail(), more precisely, the stream’s explicit conversion to bool returns !fail()).

Why do I get a "constant too large" error?

I'm new to Windows development and I'm pretty confused.
When I compile this code with Visual C++ 2010, I get an error "constant too large." Why do I get this error, and how do I fix it?
Thanks!
int _tmain(int argc, _TCHAR* argv[])
{
unsigned long long foo = 142385141589604466688ULL;
return 0;
}
The digit sequence you're expressing would take about 67 bits -- maybe your "unsigned long long" type takes only (!) 64 bits, your digit sequence won't fit in its, etc, etc.
If you regularly need to deal with integers that won't fit in 64 bits you might want to look at languages that smoothly support them, such as Python (maybe with gmpy;-). Or, give up on language support and go for suitable libraries, such as GMP and MPIR!-)
A long long is 64 bits and thus holds a maximum value of 2^64, which is 9223372036854775807 as a signed value and 18446744073709551615 as an unsigned value. Your value is bigger, hence it's a constant value that's too large.
Pick a different data type to hold your value.
You get the error because your constant is too large.
From Wikipedia:
An unsigned long long's max value is at least 18,446,744,073,709,551,615
Here is the max value and your value:
18,446,744,073,709,551,615 // Max value
142,385,141,589,604,466,688 // Your value
See why your value is too long?
According to http://msdn.microsoft.com/en-us/library/s3f49ktz%28VS.100%29.aspx, the range of unsigned long long is 0 to 18,446,744,073,709,551,615.
142385141589604466688 > 18446744073709551615
You have reached the limit of your hardware to represent integers directly.
It seems that beyond 64 bits (on your hardware) requires the integer to be simulated by software constructs. There are several projects out there that help.
See BigInt
http://sourceforge.net/projects/cpp-bigint/
Note: Others have misconstrued that long long has a limit of 64 bits.
This is not accurate. The only limitation placed by the language are:
(Also Note: Currently C++ does not support long long (But C does) It is an extension by your compiler (coming in the next version of the standard))
sizeof(long) <= sizeof(long long)
sizeof(long long) * CHAR_BITS >= 64 // Not defined explicitly but deducible from
// The values defined in limits.h
For more details See:
What is the difference between an int and a long in C++?