c++ portable conversion of long to double - c++

I need to accurately convert a long representing bits to a double and my soluton shall be portable to different architectures (being able to be standard across compilers as g++ and clang++ woulf be great too).
I'm writing a fast approximation for computing the exp function as suggested in this question answers.
double fast_exp(double val)
{
double result = 0;
unsigned long temp = (unsigned long)(1512775 * val + 1072632447);
/* to convert from long bits to double,
but must check if they have the same size... */
temp = temp << 32;
memcpy(&result, &temp, sizeof(temp));
return result;
}
and I'm using the suggestion found here to convert the long into a double. The issue I'm facing is that whereas I got the following results for int values in [-5, 5] under OS X with clang++ and libc++:
0.00675211846828461
0.0183005779981613
0.0504353642463684
0.132078289985657
0.37483024597168
0.971007823944092
2.7694206237793
7.30961990356445
20.3215942382812
54.8094177246094
147.902587890625
I always get 0 under Ubuntu with clang++ (3.4, same version) and libstd++. The compiler there even tells me (through a warning) that the shifting operation can be problematic since the long has size equal or less that the shifting parameter (indicating that longs and doubles have not the same size probably)
Am I doing something wrong and/or is there a better way to solve the problem being as more compatible as possible?

First off, using "long" isn't portable. Use the fixed length integer types found in stdint.h. This will alleviate the need to check for the same size, since you'll know what size the integer will be.
The reason you are getting a warning is that left shifting 32 bits on the 32 bit intger is undefined behavior. What's bad about shifting a 32-bit variable 32 bits?
Also see this answer: Is it safe to assume sizeof(double) >= sizeof(void*)? It should be safe to assume that a double is 64bits, and then you can use a uint64_t to store the raw hex. No need to check for sizes, and everything is portable.

Related

Conversion of float to integer in ARM based system

I have the following piece of code called main.cpp that converts an IEE 754 32-bit hex value to float and then converts it into unsigned short.
#include <iostream>
using namespace std;
int main() {
unsigned int input_val = 0xc5dac022;
float f;
*((int*) &f) = input_val;
unsigned short val = (unsigned short) f;
cout <<"Val = 0x" << std::hex << val << endl;
}
I build and run the code using the following command:
g++ main.cpp -o main
./main
When I following code in my normal PC, I get the correct answer which is 0xe4a8. But when I run the same code on an ARM processor, it gives an output of 0x0.
Is this happening because I am building the code with normal gcc instead of aarch64? The code gives correct output for some other test cases on the ARM processor but gives an incorrect output for the given test value. How can I solve this issue?
First, your "type pun" via pointers violates the strict aliasing rule, as mentioned in comments. You can fix that by switching to memcpy.
Next, the bit pattern 0xc5dac022 as an IEEE-754 single precision float corresponds to a value of about -7000, if my test is right. This is truncated to -7000, which, being negative, cannot be represented in an unsigned short. As such, attempting to convert it to unsigned short has undefined behavior, per [7.3.10 p1] in the C++ standard (C++20 N4860). Note this is different than the situation for trying to convert a signed or unsigned integer to unsigned short, which would have well-defined "wrapping" behavior.
So there is no "correct answer" here. Printing 0 is a perfectly legal result, and is also logical in some sense, as 0 is the closest unsigned short value to -7000. But it's also not surprising that the result would vary between platforms / compilers / optimization options, as this is common for UB.
There is actually a difference between ARM64 and x86-64 that explains why this is the particular behavior you see.
When compiling without optimization, in both cases, gcc emits instructions to actually convert the float value to unsigned short at runtime.
ARM64 has a dedicated instruction fcvtzu that converts a float to a 32-bit unsigned int, so gcc emits that instruction, and then extracts the low 16 bits of the integer result. The behavior of fcvtzu with a negative input is to output 0, and so that's the value that you get.
x86-64 doesn't have such an instruction. The nearest thing is cvttss2si which converts a single-precision float to a signed 32-bit integer. So gcc emits that instruction, then uses the low 16 bits of it as the unsigned short value. This gives the right answer whenever the input float is in the range [0, 65536), because all these values fit in the range of a 32-bit signed integer. GCC doesn't care what it does in all other cases, because they are UB according to the C++ standard. But it so happens that, since your value -7000 does fit in signed int, then cvstss2si returns the signed integer -7000, which is 0xffffe4a8. Extracting the low 16 bits gives you the 0xe4a8 that you observed.
When optimizing, gcc on both platforms optimizes the value into a constant 0. Which is also perfectly legal.

Converting string to int fails

I'm trying to convert the string to int with stringstream, the code down below works, but if i use a number more then 1234567890, like 12345678901 then it return 0 back ...i dont know how to fix that, please help me out
std:: string number= "1234567890";
int Result;//number which will contain the result
std::stringstream convert(number.c_str()); // stringstream used for the conversion initialized with the contents of Text
if ( !(convert >> Result) )//give the value to Result using the characters in the string
Result = 0;
printf ("%d\n", Result);
the maximum number an int can contain is slightly more than 2 billion. (assuming ubiquitios 32 bit ints)
It just doesn't fit in an int!
The largest unsigned int (on a 32-bit platform) is 2^32 (4294967296), and your input is larger than that, so it's giving up. I'm guessing you can get an error code from it somehow. Maybe check failbit or badbit?
int Result;
std::stringstream convert(number.c_str());
convert >> Result;
if(convert.fail()) {
std::cout << "Bad things happened";
}
If you're on a 32-bit or LP64 64-bit system then int is 32-bit so the largest number you can store is approximately 2 billion. Try using a long or long long instead, and change "%d" to "%ld" or "%lld" appropriately.
The (usual) maximum value for a signed int is 2.147.483.647 as it is (usually) a 32bit integer, so it fails for numbers which are bigger.
if you replace int Result; by long Result; it should be working for even bigger numbers, but there is still a limit. You can extend that limit by factor 2 by using unsigned integer types, but only if you don't need negative numbers.
Hm, lots of disinformation in the existing four or five answers.
An int is minimum 16 bits, and with common desktop system compilers it’s usually 32 bits (in all Windows version) or 64 bits. With 32 bits it has maximum 232 distinct values, which, setting K=210 = 1024, is 4·K3, i.e. roughly 4 billion. Your nearest calculator or Python prompt can tell you the exact value.
A long is minimum 32 bits, but that doesn’t help you for the current problem, because in all extant Windows variants, including 64-bit Windows, long is 32 bits…
So, for better range than int, use long long. It’s minimum 64 bits, and in practice, as of 2012 it’s 64 bits with all compilers. Or, just use a double, which, although not an integer type, with the most common implementation (IEEE 754 64-bit) can represent integer values exactly with, as I recall, about 51 or 52 bits – look it up if you want exact number of bits.
Anyway, remember to check the stream for conversion failure, which you can do by s.fail() or simply !s (which is equivalent to fail(), more precisely, the stream’s explicit conversion to bool returns !fail()).

What can go wrong in following code - and compile time requirements?

first let me say I know the following code will be considered "bad" practices.. But I'm limited by the environment a "little" bit:
In an dynamic library I wish to use "pointers" (to point to classes) - however the program that will use this dll, can only pass & receive doubles. So I need to "fit" the pointer in a double. The following code tries to achieve this, which I hope to work in a 64-bit environment:
EXPORT double InitializeClass() {
SampleClass* pNewObj = new SampleClass;
double ret;
unsigned long long tlong(reinterpret_cast<unsigned long long>(pNewObj));
memcpy(&ret, &tlong, sizeof(tlong));
return ret;
}
EXPORT double DeleteClass(double i) {
unsigned long long tlong;
memcpy(&tlong, &i, sizeof(i));
SampleClass* ind = reinterpret_cast<SampleClass* >(tlong);
delete ind;
return 0;
}
Now once again I realize I might've been better of using vectors & storing the pointers inside the vector. However I really wish to do this using pointers (as alternative). So can anyone tell me possible failures/better versions?
The obvious failure is if double & unsigned long long aren't the same length in size (or pointers being longer than 64 bits). Is there a method to check this at compile time? - And give a compile error in case the sizes aren't the same?
In theory, at least, a 64 bit pointer, type punned to a 64 bit IEEE
double, could result in a trapping NaN, which would in turn trap. In
practice, this might not be a problem; my attempts to get trapping NaN
to actually do something other than be ignored have not been very
successful.
Another possible problem is that the values might not be normalized
(and in fact, probably won't be). What the hardware does with
non-normalized values depends: it could either just pass them on
transparently, silently normalize them (changing the value of the
"pointer"), or trigger some sort of runtime error.
There's also the issue of aliasing. Accessing a pointer through an
lvalue which has a type of double is undefined behavior, and many
compilers will take advantage of this when optimizing, assuming that
changes through a double* or a double& reference cannot affect any
pointers (and moving the load of the pointer before the write of the
double, or not reloading the pointer after a modification of the
double).
In practice if you're working in an Intel environment, I think all
"64-bit" pointers will in fact have the upper 16 bits 0. This is where
the exponent lives in an IEEE double, and an exponent of 0 is a gradual
underflow, which won't trap (at least with the default modes), and won't
be changes. So your code might actually seem to work, as long as the
compiler doesn't optimize too much.
assert(sizeof(SampleClass*) <= sizeof(unsigned long long));
assert(sizeof(unsigned long long) <= sizeof(double));
I would say that you'll have to test it in both 64-bit and 32-bit to make sure it works. Say it does have a different behaviour in 64-bit systems, then you could use this format to get around the problem (since you've mentioned that you're using VS2010):
EXPORT double InitializeClass64() {
// Assert the pointer-size is the same as the data-type being used
assert(sizeof(void*) == sizeof(double));
// 64-bit specific code
return ret;
}
EXPORT double DeleteClass64(double i) {
// Assert the pointer-size is the same as the data-type being used
assert(sizeof(void*) == sizeof(double));
// 64-bit specific code
return 0;
}
EXPORT double InitializeClass32() {
// Assert the pointer-size is the same as the data-type being used
assert(sizeof(void*) == sizeof(double));
// 32-bit specific code
return ret;
}
EXPORT double DeleteClass32(double i) {
// Assert the pointer-size is the same as the data-type being used
assert(sizeof(void*) == sizeof(double));
// 32-bit specific code
return 0;
}
#if defined(_M_X64) || defined(_M_IA64)
// If it's 64-bit
# define InitializeClass InitializeClass64
# define DeleteClass DeleteClass64
#else
// If it's 32-bit
# define InitializeClass InitializeClass32
# define DeleteClass DeleteClass32
#endif // _M_X64 || _M_IA64

char* to double and back to char* again ( 64 bit application)

I am trying to convert a char* to double and back to char* again. the following code works fine if the application you created is 32-bit but doesn't work for 64-bit application. The problem occurs when you try to convert back to char* from int. for example if the hello = 0x000000013fcf7888 then converted is 0x000000003fcf7888 only the last 32 bits are right.
#include <iostream>
#include <stdlib.h>
#include <tchar.h>
using namespace std;
int _tmain(int argc, _TCHAR* argv[]){
char* hello = "hello";
unsigned int hello_to_int = (unsigned int)hello;
double hello_to_double = (double)hello_to_int;
cout<<hello<<endl;
cout<<hello_to_int<<"\n"<<hello_to_double<<endl;
unsigned int converted_int = (unsigned int)hello_to_double;
char* converted = reinterpret_cast<char*>(converted_int);
cout<<converted_int<<"\n"<<converted<<endl;
getchar();
return 0;
}
On 64-bit Windows pointers are 64-bit while int is 32-bit. This is why you're losing data in the upper 32-bits while casting. Instead of int use long long to hold the intermediate result.
char* hello = "hello";
unsigned long long hello_to_int = (unsigned long long)hello;
Make similar changes for the reverse conversion. But this is not guaranteed to make the conversions function correctly because a double can easily represent the entire 32-bit integer range without loss of precision but the same is not true for a 64-bit integer.
Also, this isn't going to work
unsigned int converted_int = (unsigned int)hello_to_double;
That conversion will simply truncate anything digits after the decimal point in the floating point representation. The problem exists even if you change the data type to unsigned long long. You'll need to reinterpret_cast<unsigned long long> to make it work.
Even after all that you may still run into trouble depending on the value of the pointer. The conversion to double may cause the value to be a signalling NaN for instance, in which cause your code might throw an exception.
Simple answer is, unless you're trying this out for fun, don't do conversions like these.
You can't cast a char* to int on 64-bit Windows because an int is 32 bits, while a char* is 64 bits because it's a pointer. Since a double is always 64 bits, you might be able to get away with casting between a double and char*.
A couple of issues with encoding any integer (specifically, a collection of bits) into a floating point value:
Conversions from 64-bit integers to doubles can be lossy. A double has 53-bits of actual precision, so integers above 2^52 (give or take an extra 2) will not necessarily be represented precisely.
If you decide to reinterpret the bits of a pointer as a double instead (via union or reinterpret_cast) you will still have issues if you happen to encode a pointer as set of bits that are not a valid double representation. Unless you can guarantee that the double value never gets written back by the FPU, the FPU can silently transform an invalid double into another invalid double (see NaN), i.e., a double value that represents the same value but has different bits. (See this for issues related to using floating point formats as bits.)
You can probably safely get away with encoding a 32-bit pointer in a double, as that will definitely fit within the 53-bit precision range.
only the last 32 bits are right.
That's because an int in your platform is only 32 bits long. Note that reinterpret_cast only guarantees that you can convert a pointer to an int of sufficient size (not your case), and back.
If it works in any system, anywhere, just all yourself lucky and move on. Converting a pointer to an integer is one thing (as long as the integer is large enough, you can get away with it), but a double is a floating point number - what you are doing simply doesn't make any sense, because a double is NOT necessarily capable of representing any random number. A double has range and precision limitations, and limits on how it represents things. It can represent numbers across a wide range of values, but it can't represent EVERY number in that range.
Remember that a double has two components: the mantissa and the exponent. Together, these allow you to represent either very big or very small numbers, but the mantissa has limited number of bits. If you run out of bits in the mantissa, you're going to lose some bits in the number you are trying to represent.
Apparently you got away with it under certain circumstances, but you're asking it to do something it wasn't made for, and for which it is manifestly inappropriate.
Just don't do that - it's not supposed to work.
This is as expected.
Typically a char* is going to be 32 bits on a 32-bit system, 64 bits on a 64-bit system; double is typically 64 bits on both systems. (These sizes are typical, and probably correct for Windows; the language permits a lot more variations.)
Conversion from a pointer to a floating-point type is, as far as I know, undefined. That doesn't just mean that the result of the conversion is undefined; the behavior of a program that attempts to perform such a conversion is undefined. If you're lucky, the program will crash or fail to compile.
But you're converting from a pointer to an integer (which is permitted, but implementation-defined) and then from an integer to a double (which is permitted and meaningful for meaningful numeric values -- but converted pointer values are not numerically meaningful). You're losing information because not all of the 64 bits of a double are used to represent the magnitude of the number; typically 11 or so bits are used to represent the exponent.
What you're doing quite simply makes no sense.
What exactly are you trying to accomplish? Whatever it is, there's surely a better way to do it.

Why do I get a "constant too large" error?

I'm new to Windows development and I'm pretty confused.
When I compile this code with Visual C++ 2010, I get an error "constant too large." Why do I get this error, and how do I fix it?
Thanks!
int _tmain(int argc, _TCHAR* argv[])
{
unsigned long long foo = 142385141589604466688ULL;
return 0;
}
The digit sequence you're expressing would take about 67 bits -- maybe your "unsigned long long" type takes only (!) 64 bits, your digit sequence won't fit in its, etc, etc.
If you regularly need to deal with integers that won't fit in 64 bits you might want to look at languages that smoothly support them, such as Python (maybe with gmpy;-). Or, give up on language support and go for suitable libraries, such as GMP and MPIR!-)
A long long is 64 bits and thus holds a maximum value of 2^64, which is 9223372036854775807 as a signed value and 18446744073709551615 as an unsigned value. Your value is bigger, hence it's a constant value that's too large.
Pick a different data type to hold your value.
You get the error because your constant is too large.
From Wikipedia:
An unsigned long long's max value is at least 18,446,744,073,709,551,615
Here is the max value and your value:
18,446,744,073,709,551,615 // Max value
142,385,141,589,604,466,688 // Your value
See why your value is too long?
According to http://msdn.microsoft.com/en-us/library/s3f49ktz%28VS.100%29.aspx, the range of unsigned long long is 0 to 18,446,744,073,709,551,615.
142385141589604466688 > 18446744073709551615
You have reached the limit of your hardware to represent integers directly.
It seems that beyond 64 bits (on your hardware) requires the integer to be simulated by software constructs. There are several projects out there that help.
See BigInt
http://sourceforge.net/projects/cpp-bigint/
Note: Others have misconstrued that long long has a limit of 64 bits.
This is not accurate. The only limitation placed by the language are:
(Also Note: Currently C++ does not support long long (But C does) It is an extension by your compiler (coming in the next version of the standard))
sizeof(long) <= sizeof(long long)
sizeof(long long) * CHAR_BITS >= 64 // Not defined explicitly but deducible from
// The values defined in limits.h
For more details See:
What is the difference between an int and a long in C++?