How to convert decimal number into 64 bit binary float number? - c++

I need to convert decimal number into 64 bit binary float value.
If you know any algorithm or anything about it then please help.

Use boost::lexical_cast.
double num = lexical_cast<double>(decimal);

Assuming you mean a decimal stored inside a string, atof would be a standard solution to convert that to a double value (which is a 64-bit floating point number on the x86 architecture).
std::string s = "0.4";
double convertedValue = atof(s.c_str());
Or similar for C strings:
const char *s = "0.4";
double convertedValue = atof(s);
But if you mean integer number by "decimal number", then just write
int yourNumber = 100;
double convertedValue = yourNumber;
and the value will automatically be converted.

Value casting from a string to double can be implemented by boost::lexical_cast.
Type casting from int to double is a part of C++:
double d = (double)i;
It was already mentioned in the previous replies.
If you are interested to know how this casting is implemented, you may refer the sources of the C standard library your compiler is using given that the sources are provided and no floating point co-processor is used for this purpose. Many embedded target compilers do this work "manually" if no floating point co-processor is available.
For the binary format description, please see Victor's reply

Decimal decimalNumber = 1234;
Float binaryFloatValue = decimalNumber;

Related

What is the correct type in c\c++ to store a COM's VT_DECIMAL?

I'm trying to write a wrapper to ADO.
A DECIMAL is one type a COM VARIANT can be, when the VARIANT type is VT_DECIMAL.
I'm trying to put it in c native data type, and keep the variable value.
it seem that the correct type is long double, but I get "no suitable conversion error".
For example:
_variant_t v;
...
if(v.vt == VT_DECIMAL)
{
double d = (double)v; //this works but I'm afraid can be loss of data...
long double ld1 = (long double)v; //error: more then one conversion from variant to long double applied.
long double ld2 = (long double)v.decVal; //error: no suitable conversion function from decimal to long double exist.
}
So my questions are:
is it totally safe to use double to store all possible decimal values?
if not, how can I convert the decimal to a long double?
How to convert a decimal to string? (using the << operator, sprintf is also good for me)
The internal representation for DECIMAL is not a double precision floating point value, it is integer instead with sign/scale options. If you are going to initialize DECIMAL parts, you should initialize these fields - 96-bit integer value, scale, sign, then you get valid decimal VARIANT value.
DECIMAL on MSDN:
scale - The number of decimal places for the number. Valid values are from 0 to 28. So 12.345 is represented as 12345 with a scale of 3.
sign - Indicates the sign; 0 for positive numbers or DECIMAL_NEG for negative numbers. So -1 is represented as 1 with the DECIMAL_NEG bit set.
Hi32 - The high 32 bits of the number.
Lo64 - The low 64 bits of the number. This is an _int64.
Your questions:
is it totally safe to use double to store all possible decimal values?
You cannot initialize as double directly (e.g. VT_R8), but you can initialize as double variant and use variant conversion API to convert to VT_DECIMAL. A small rounding can be applied to value.
if not, how can I convert the decimal to a long double?
How to convert a decimal to string? (using the << operator, sprintf is also good for me)
VariantChangeType can convert decimal variant to variant of another type, including integer, double, string - you provide the type to convert to. Vice versa, you can also convert something different to decimal.
"Safe" isn't exactly the correct word, the point of DECIMAL is to not introduce rounding errors due to base conversions. Calculations are done in base 10 instead of base 2. That makes them slow but accurate, the kind of accuracy that an accountant likes. He won't have to chase a billionth-of-a-penny mismatches.
Use _variant_t::ChangeType() to make conversions. Pass VT_R8 to convert to double precision. Pass VT_BSTR to convert to a string, the kind that the accountant likes. No point in chasing long double, that 10-byte FPU type is history.
this snippets is taken from http://hackage.haskell.org/package/com-1.2.1/src/cbits/AutoPrimSrc.c
the Hackage.org says:
Hackage is the Haskell community's central package archive of open
source software.
but please check the authors permissions
void writeVarWord64( unsigned int hi, unsigned int lo, VARIANT* v )
{
ULONGLONG r;
r = (ULONGLONG)hi;
r >>= 32;
r += (ULONGLONG)lo;
if (!v) return;
VariantInit(v);
v->vt = VT_DECIMAL;
v->decVal.Lo64 = r;
v->decVal.Hi32 = 0;
v->decVal.sign = 0;
v->decVal.scale = 0;
}
If I understood Microsoft's documentation (https://msdn.microsoft.com/en-us/library/cc234586.aspx) correctly, VT_DECIMAL is an exact 92-bit integer value with a fixed scale and precision. In that case you can't store this without loss of information in a float, a double or a 64-bit integer variable.
You're best bet would be to store it in a 128-bit integer like __int128 but I don't know the level of compiler support for it. I'm also not sure you will be able to just cast one to the other without resorting to some bit manipulations.
Is it totally safe to use double to store all possible decimal values?
It actually depends what you mean by safe. If you mean "is there any risk of introducing some degree of conversion imprecision?", yes there is a risk. The internal representations are far too different to guarantee perfect conversion, and conversion noise is likely to be introduced.
How can I convert the decimal to a long double / a string?
It depends (again) of what you want to do with the object:
For floating-point computation, see #Gread.And.Powerful.Oz's link to the following answer: C++ converting Variant Decimal to Double Value
For display, see MSDN documentation on string conversion
For storage without any conversion imprecision, you should probably store the decimal as a scaled integer of the form pair<long long,short>, where first holds the 96-bits mantissa and second holds the number of digits to the right of the decimal point. This representation is as close as possible to the decimal's internal representation, will not introduce any conversion imprecision and won't waste CPU resources on integer-to-string formatting.

C++ floating point representation

I am trying to create a float from a hexadecimal representation I got from here. For the representation of 32.50002, the site shows the IEEE 754 hexadecimal representation as 0x42020005.
In my code, I have this: float f = 0x42020005;. However, when I print the value, I get 1.10E+9 instead of 32.50002. Why is this?
I am using Microsoft Visual C++ 2010.
When you assign a value to a float variable via =, you don’t assign its internal representation, you assign its value. 0x42020005 in decimal is 1107427333, and that’s the value you are assigning.
The underlying representation of a float cannot be retrieved in a platform independent way. However, making some assumptions (namely, that the float is in fact using IEEE 754 format), we can trick a bit:
float f;
uint32_t rep = 0x42020005;
std::memcpy(&f, &rep, sizeof f);
Will give the desired result.
0x42020005 actually is int value of 1107427333.
You can try out this code. Should work... Use union:
union IntFloat {
uint32_t i;
float f;
};
and call it when you need to convert the value.
union IntFloat val;
val.i = 0x42020005;
printf("%f\n", val.f);
0x42020005 is an int with value of 1107427333.
float f = 0x42020005; is equal with
float f = 1107427333;

boost lexical cast string to double

I am facing a conversion issue for which I'd like your help. I'm using gcc4 compiler and I am quite restricted to use gcc4.
I want to convert std::string to double.
std::string aQuantity = aRate.getQuantity();
std::string aAmount = aRate.getAmount();
// aAmount = "22.05"
double dQuantity = boost::lexical_cast<double>(aQuantity);
double dAmount = boost::lexical_cast<double> (aAmount);
// dAmount = 22.050000000000001
By the way, I also tried atof and I still have the same issue. Is there any way to use istringstream with setprecission(2) to get the correct value shown by aAmount?
Due to the nature of floating point values, 22.050000000000001 is the closest value to 22.05 that can be stored. The same would occure if you simply tried to store 22.05 in a double and then print it.
You should set the precision on the output stream if you want to print 22.05. Alternatively you could investigate a rational number library (for example, Boost.Rational). This would be able to store the value 22.05 precisely, unlike a double (or float).

Converting double to array of bits for genetic algorithm in C(++) [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Floating Point to Binary Value(C++)
Currently I'm working on a genetic algorithm for my thesis and I'm trying to optimize a problem which takes three doubles to be the genome for a particular solution. For the breeding of these doubles I would like to use a binary representation of these doubles and for this I'll have to convert the doubles to their binary representation. I've searched for this, but can't find a clear solution, unfortunately.
How to do this? Is there a library function to do this, as there is in Java? Any help is greatly appreciated.
What about:
double d = 1234;
unsigned char *b = (unsigned char *)&d;
Assuming a double consists of 8 bytes you could use b[0] ... b[7].
Another possibility is:
long long x = *(long long *)&d;
Since you tag the question C++ I would use a reinterpret_cast
For the genetic algorithm what you probably really want is treating the mantissa, exponent and sign of your doubles independently. See "how can I extract the mantissa of a double"
Why do you want to use a binary representation? Just because something is more popular, does not mean that it is the solution to your specific problem.
There is a known genome representation called real that you can use to solve your problem without being submitted to several issues of the binary representation, such as hamming cliffs and different mutation values.
Please notice that I am not talking about cutting-edge, experimental stuff. This 1991 paper already describes the issue I am talking about. If you are spanish or portuguese speaking, I could point you to my personal book on GA, but there are beutiful references in English, such as Melanie Mitchell's or Eiben's books that could describe this issue more deeply.
The important thing to have in mind is that you need to tailor the genetic algorithm to your problem, not modify your needs in order to be able to use a specific type of GA.
I wouldn't convert it into an array. I guess if you do genetic stuff it should be performant. If I were you I would use an integer type (like suggested from irrelephant) and then do the mutation and crossover stuff with int operations.
If you don't do that you're always converting it back and forth. And for crossover you have to iterate through the 64 elements.
Here an example for crossover:
__int64 crossover(__int64 a, __int64 b, int x) {
__int64 mask1 = ...; // left most x bits
__int64 mask2 = ...; // right most 64-x bits
return (a & mask1) + (b & mask2);
}
And for selection, you can just cast it back to a double.
You could do it like this:
// Assuming a DOUBLE is 64bits
double d = 42.0; // just a random double
char* bits = (char*)&d; // access my double byte-by-byte
int array[64]; // result
for (int i = 0, k = 63; i < 8; ++i) // for each byte of my double
for (char j = 0; j < 8; ++j, --k) // for each bit of each byte of my double
array[k] = (bits[i] >> j) & 1; // is the Jth bit of the current byte 1?
Good luck
Either start with a binary representation of the genome and then use one-point or two-point crossover operators, or, if you want to use a real encoding for your GA then please use the simulated binary crossover(SBX) operator for crossover. Most modern GA implementation use real coded representation and a corresponding crossover and mutation operator.
You could use an int (or variant thereof).
The trick is to encode a float of 12.34 as an int of 1234.
Therefore you just need to cast to a float & divide by 100 during the fitness function, and do all your mutation & crossover on an integer.
Gotchas:
Beware the loss of precision if you actually need the nth bit.
Beware the sign bit.
Beware the difference in range between floats & ints.

what is the workaround for floating point inacurracy?

Here's the code snippet:
float pNum = 9.2;
char* l_tmpCh = new char[255];
sprintf_s(l_tmpCh, 255, "%.9f", pNum);
cout << l_tmpCh << endl;
delete l_tmpCh;
the output is: 9.199999809
What to do in order for the result to be 9.200000000
Note: I need every float number printed with 9 decimals precision, so I don't want to have 9.2
The workaround is to not use floating point numbers..
Not every number can be represented accurately in the floating point format, such as, for example, 9.2. Or 0.1.
If you want all the decimals shown, then you get 9.199999809, because that's floating point value closest to 9.2.
If you use floating point numbers you have to accept this inaccuracy. Otherwise, your only option is to store the number in another format.
Required reading
There is no way a 32-bit binary float number have 9 digits of precision (there is only 7). You could fake it by appending 3 zeroes.
sprintf_s(l_tmpCh, 255, "%.6f000", pNum);
This won't work if the integer part exhausted a lot of precision already, e.g. 9222.2f will give 9222.200195000.
What you're asking for is not possible in the general case since floating point numbers by definiton are approximations, which might or might not have an exact representation in decimal. Read the famous Goldberg paper: http://docs.sun.com/source/806-3568/ncg_goldberg.html
Use a double literal rather than a float literal.
double pNum = 9.2;
char* l_tmpCh = new char[255];
sprintf_s(l_tmpCh, 255, "%.9f", pNum);
cout << l_tmpCh << endl;
delete l_tmpCh;
That f was making the literal a float; without it, the value is a double literal (more precise at about 15 decimal digits).
Of course, if 15 digits isn't enough, you're welcome to create your own class to represent values.
It's important to understand that native floating point numbers are seldom "accurate" because of the way they are represented in the computer. Thus most of the time you only get an approximation. And with printf, you also specify the precision with which to round that approximation to an output. E.g. "%.20f" will give you a representation that is rounded to 20 digits after the "."
This should do it:
double pNum = 9.2;
The f suffix makes it a float literal, which has only about 7 decimal digits of precision and of course suffers from representation errors. Assigning it to a double variable does not fix this. Of course, this assumes that float and double correspond to IEEE 754 single and double precision types.
EDIT: If you want to use float, then this problem cannot be solved at all. Read The Floating-Point Guide to understand why.