How can a hexadecimal floating point constant, as specified in C99, be printed from a array of bytes representing the machine representation of a floating point value? e.g. given
union u_double
{
double dbl;
char data[sizeof(double)];
};
An example hexadecimal floating point constant is a string of the form
0x1.FFFFFEp127f
A syntax specification for this form of literal can be found on the IBM site, and a brief description of the syntax is here on the GCC site.
The printf function can be used to do this on platforms with access to C99 features in the standard library, but I would like to be able to perform the printing in MSVC, which does not support C99, using standard C89 or C++98.
printf manual says:
a,A
(C99; not in SUSv2) For a conversion, the double argument is converted to hexadecimal notation (using the letters abcdef) in the style [-]0xh.hhhhp�d; for A conversion the prefix 0X, the letters ABCDEF, and the exponent separator P is used. There is one hexadecimal digit before the decimal point, and the number of digits after it is equal to the precision. The default precision suffices for an exact representation of the value if an exact representation in base 2 exists and otherwise is sufficiently large to distinguish values of type double. The digit before the decimal point is unspecified for non-normalized numbers, and non-zero but otherwise unspecified for normalized numbers.
You can use frexp() which is in math.h since at least C90 and then do the conversion yourself. Something like this (not tested, not designed to handle boundaries like NaN, infinities, buffer limits, and so on)
void hexfloat(double d, char* ptr)
{
double fract;
int exp = 0;
if (d < 0) {
*ptr++ = '-';
d = -d;
}
fract = frexp(d, &exp);
if (fract == 0.0) {
strcpy(ptr, "0x0.0");
} else {
fract *= 2.0;
--exp;
*ptr++ = '0';
*ptr++ = 'x';
*ptr++ = '1';
fract -= 1.0;
fract *= 16.0;
*ptr++ = '.';
do {
char const hexdigits[] = "0123456789ABCDEF";
*ptr++ = hexdigits[(int)fract]; // truncate
fract -= (int)fract;
fract *= 16;
} while (fract != 0.0);
if (exp != 0) {
sprintf(ptr, "p%d", exp);
} else {
*ptr++ = '\0';
}
}
}
#include <stdint.h>
#include <stdio.h>
int main(void)
{
union { double d; uint64_t u; } value;
value.d = -1.234e-5;
// see http://en.wikipedia.org/wiki/Double_precision
_Bool sign_bit = value.u >> 63;
uint16_t exp_bits = (value.u >> 52) & 0x7FF;
uint64_t frac_bits = value.u & 0xFFFFFFFFFFFFFull;
if(exp_bits == 0)
{
if(frac_bits == 0)
printf("%s0x0p+0\n", sign_bit ? "-" : "");
else puts("subnormal, too lazy to parse");
}
else if(exp_bits == 0x7FF)
puts("infinity or nan, too lazy to parse");
else printf("%s0x1.%llxp%+i\n",
sign_bit ? "-" : "",
(unsigned long long)frac_bits,
(int)exp_bits - 1023);
// check against libc implementation
printf("%a\n", value.d);
}
This might be an "outside the box" answer, but why not convert the double to a string using sprintf, then parse the string for the mantissa and exponent, convert those to
e.g., something like:
char str[256];
long long a, b, c;
sprintf(str, "%e", dbl);
sscanf(str, "%d.%de%d", &a, &b, &c);
printf("0x%x.%xp%x", a, b, c);
I'm sure you'd have to modify the formats for sprintf and sscanf. And you'd never get a first hex digit between A and F. But in general, I think this idea should work. And it's simple.
A better way would be to find an open source library that implements this format for printf (e.g., newlib, uClibc?) and copy what they do.
Related
I want to stringify a fraction of unsigned integers in C++ with variable precision. So 1/3 would be printed as 0.33 using a precision of 2. I know that float and std::ostream::precision could be used for a quick and dirty solution:
std::string stringifyFraction(unsigned numerator,
unsigned denominator,
unsigned precision)
{
std::stringstream output;
output.precision(precision);
output << static_cast<float>(numerator) / denominator;
return output.str();
}
However, this is not good enough because float has limited precision and can't actually represent decimal numbers accurately. What other options do I have? Even a double would fail if I wanted 100 digits or so, or in case of a recurring fraction.
It's always possible to just perform long division to stringify digit-by-digit. Note that the result consists of an integral part and a fractional part. We can get the integral part by simply dividing using the / operator and calling std::to_string. For the rest, we need the following function:
#include <string>
std::string stringifyFraction(const unsigned num,
const unsigned den,
const unsigned precision)
{
constexpr unsigned base = 10;
// prevent division by zero if necessary
if (den == 0) {
return "inf";
}
// integral part can be computed using regular division
std::string result = std::to_string(num / den);
// perform first step of long division
// also cancel early if there is no fractional part
unsigned tmp = num % den;
if (tmp == 0 || precision == 0) {
return result;
}
// reserve characters to avoid unnecessary re-allocation
result.reserve(result.size() + precision + 1);
// fractional part can be computed using long divison
result += '.';
for (size_t i = 0; i < precision; ++i) {
tmp *= base;
char nextDigit = '0' + static_cast<char>(tmp / den);
result.push_back(nextDigit);
tmp %= den;
}
return result;
}
You could easily extend this to work with other bases as well, by just making base a template parameter, but then you couldn't just use std::to_string anymore.
I am currently doing a project involving the conversion of decimal numbers into IEEE 754 Floating Point representation. I am making use of a code provided by GeeksforGeeks for the conversion process and edited it to suit my project.
I am unsure as to how to modify the code so that it returns the appropriate Floating Point representation into the variables as the variables currently stores the decimal values instead.
I ran into the issue where the function being used to convert said decimal values is just simply printing the Floating Point representation. I would need to be able to return the printed Floating Point representation value into a variable. Is that possible? Else, is there an alternative method?
// C program to convert a real value
// to IEEE 754 floating point representaion
#include <stdio.h>
#include <bits/stdc++.h>
using namespace std;
int modifyBit(int n, int p, int b)
{
int mask = 1 << p;
return (n & ~mask) | ((b << p) & mask);
}
void printBinary(int n, int i)
{
// Prints the binary representation
// of a number n up to i-bits.
int k;
for (k = i - 1; k >= 0; k--) {
if ((n >> k) & 1)
printf("1");
else
printf("0");
}
}
typedef union {
float f;
struct
{
// Order is important.
// Here the members of the union data structure
// use the same memory (32 bits).
// The ordering is taken
// from the LSB to the MSB.
unsigned int mantissa : 23;
unsigned int exponent : 8;
unsigned int sign : 1;
} raw;
} myfloat;
// Function to convert real value
// to IEEE foating point representation
void printIEEE(myfloat var)
{
// Prints the IEEE 754 representation
// of a float value (32 bits)
printf("%d | ", var.raw.sign);
printBinary(var.raw.exponent, 8);
printf(" | ");
printBinary(var.raw.mantissa, 23);
printf("\n");
}
// Driver Code
int main()
{
// Instantiate the union
myfloat var, var2;
int sub, sub2, mant, pos, finalMant;
// Get the real value
var.f = 1.25;
var2.f = 23.5;
// Get the IEEE floating point representation
printf("IEEE 754 representation of %f is : \n",
var.f);
printIEEE(var);
printf("No 2: %f : \n", var2.f);
printIEEE(var2);
printf("\n");
//Get the exponent value for the respective variable
//Need to compare which exponent is bigger
printBinary(var2.raw.exponent, 8);
printf("\n");
printf("Exponent of Var in Decimal: %d: \n", var.raw.exponent);
printf("Exponent of Var2 in Decimal: %d: \n", var2.raw.exponent);
if (var.raw.exponent > var2.raw.exponent)
{
printf("1st is bigger than 2 \n");
//Difference in exponent
sub = var.raw.exponent - var2.raw.exponent;
printf("Difference in exponent: %d \n", sub);
//New mantissa with the new right shift
mant = var2.raw.mantissa>>sub;
//Modifying the hidden bit to be included into the mantissa
pos = 23 - sub;
finalMant = modifyBit(mant, pos, 1);
//Print the final mantissa
printf("New Binary : \n");
printBinary(finalMant, 23);
}
else
{
printf("2nd bigger than 1 \n");
//Difference in exponent
sub = var2.raw.exponent - var.raw.exponent;
printf("Difference in exponent: %d \n", sub);
//New mantissa with the new right shift
mant = var.raw.mantissa>>sub;
//Modifying the hidden bit to be included into the mantissa
pos = 23 - sub;
finalMant = modifyBit(mant, pos, 1);
//Print the final mantissa
printf("New Binary : \n");
printBinary(finalMant, 23);
}
return 0;
}
This code correctly converts the decimal value into the intended Floating Point representation. However, if I were to print the variable as it is using printf(finalMant) instead of printBinary, it would display as a decimal value instead of a Floating Point representation. This might be due to the lack of any return values.
Based on clarification in comments you don't need to "return" anything. Just declare variable of type myfloat, put float in f member, and you can get sign, exponent and mantissa elements. I.e. something like this:
myfloat cnv;
cnv.f = someFloatVal;
unsigned int mantissa = cnv.mantissa;
Just remember, that this is valid only for C, not for C++. (Although most of compilers support this, probably to be compatible with C, it is not guaranteed, and discouraged, by standard).
There was once this little function
string format_dollars_for_screen(float d)
{
char ch[50];
sprintf(ch,"%1.2f",d);
return ch;
}
who liked to return -0.00.
I modified it to
string format_dollars_for_screen(float d)
{
char ch[50];
float value;
sprintf(ch,"%1.2f",d);
value = stof(ch);
if(value==0.0f) sprintf(ch,"0.00");
return ch;
}
And it started returning 0.00 as desired. My question is, why doesn't this other solution work?
string format_dollars_for_screen(float d)
{
char ch[50];
float value;
sprintf(ch,"%1.2f",d);
value = stof(ch);
if(value==0.0f) sprintf(ch,"%1.2f", value);
return ch;
}
And/or is there a more efficient way to do this? This is just off the top of my head, so critiques are welcome. =)
Floating point numbers have both, a plus-zero and a minus-zero. They compare equal with the == operator, but produce different results in other arithmetic expressions: 1/+0 == +inf but 1/-0 == -inf.
As for your case, you shall not use floating point numbers for monetary quantities. Instead use integers for counting cents (or other decimal fractions of cents), and format them accordingly:
string format_dollars_for_screen(int cents)
{
bool neg = cents < 0;
if(neg) cents = -cents;
char ch[50];
sprintf(ch, "%s%d.%.2d", "-"+!neg, cents/100, cents%100);
return ch;
}
Given the reprensentation of decimal I have --you can find it here for instance--, I tried to convert a double this way:
explicit Decimal(double n)
{
DoubleAsQWord doubleAsQWord;
doubleAsQWord.doubleValue = n;
uint64 val = doubleAsQWord.qWord;
const uint64 topBitMask = (int64)(0x1 << 31) << 32;
//grab the 63th bit
bool isNegative = (val & topBitMask) != 0;
//bias is 1023=2^(k-1)-1, where k is 11 for double
uint32 exponent = (((uint64)(val >> 31) >> 21) & 0x7FF) - 1023;
//exclude both sign and exponent (<<12, >>12) and normalize mantissa
uint64 mantissa = ((uint64)(0x1 << 31) << 21) | (val << 12) >> 12;
// normalized mantissa is 53 bits long,
// the exponent does not care about normalizing bit
uint8 scale = exponent + 11;
if (scale > 11)
scale = 11;
else if (scale < 0)
scale = 0;
lo_ = ((isNegative ? -1 : 1) * n) * std::pow(10., scale);
signScale_ = (isNegative ? 0x1 : 0x0) | (scale << 1);
// will always be 0 since we cannot reach
// a 128 bits precision with a 64 bits double
hi_ = 0;
}
The DoubleAsQWord type is used to "cast" from double to its uint64 representation:
union DoubleAsQWord
{
double doubleValue;
uint64 qWord;
};
My Decimal type has these fields:
uint64 lo_;
uint32 hi_;
int32 signScale_;
All this stuff is encapsulated in my Decimal class. You can notice I extract the mantissa even if I'm not using it. I'm still thinking of a way to guess the scale accurately.
This is purely practical, and seems to work in the case of a stress test:
BOOST_AUTO_TEST_CASE( convertion_random_stress )
{
const double EPSILON = 0.000001f;
srand(time(0));
for (int i = 0; i < 10000; ++i)
{
double d1 = ((rand() % 10) % 2 == 0 ? -1 : 1)
* (double)(rand() % 1000 + 1000.) / (double)(rand() % 42 + 2.);
Decimal d(d1);
double d2 = d.toDouble();
double absError = fabs(d1 - d2);
BOOST_CHECK_MESSAGE(
absError <= EPSILON,
"absError=" << absError << " with " << d1 << " - " << d2
);
}
}
Anyway, how would you convert from double to this decimal representation?
I think you guys will be interested in an implementation of a C++ wrapper to the Intel Decimal Floating-Point Math Library:
C++ Decimal Wrapper Class
Intel DFP
What about using VarR8FromDec Function ?
EDIT: This function is declared on Windows system only. However an equivalent C implementation is available with WINE, here: http://source.winehq.org/source/dlls/oleaut32/vartype.c
Perhaps you are looking for System::Convert::ToDecimal()
http://msdn.microsoft.com/en-us/library/a69w9ca0%28v=vs.80%29.aspx
Alternatively you could try recasting the Double as a Decimal.
An example from the MSDN.
http://msdn.microsoft.com/en-us/library/aa326763%28v=vs.71%29.aspx
// Convert the double argument; catch exceptions that are thrown.
void DecimalFromDouble( double argument )
{
Object* decValue;
// Convert the double argument to a Decimal value.
try
{
decValue = __box( (Decimal)argument );
}
catch( Exception* ex )
{
decValue = GetExceptionType( ex );
}
Console::WriteLine( formatter, __box( argument ), decValue );
}
If you do not have access to the .Net routines then this is tricky. I have done this myself for my hex editor (so that users can display and edit C# Decimal values using the Properties dialog) - see http://www.hexedit.com for more information. Also the source for HexEdit is freely available - see my article at http://www.codeproject.com/KB/cpp/HexEdit.aspx.
Actually my routines convert between Decimal and strings but you can of course use sprintf to convert the double to a string first. (Also when you talk about double I think you explicitly mean IEEE 64-bit floating point format, though this is what most compilers/systems use nowadays.)
Note that there are a few gotchas if you want to handle precisely all valid Decimal values and return an error for any value that cannot be converted, since the format is not well documented. (The Decimal format is aweful really, eg the same number can have many representations.)
Here is my code that converts a string to a Decimal. Note that it uses the the GNU Multiple Precision Arithmetic Library (functions that start with mpz_). The String2Decimal function obviously returns false if it fails for some reason, such as the value being too big. The parameter 'presult' must point to a buffer of at least 16 bytes, to store the result.
bool String2Decimal(const char *ss, void *presult)
{
bool retval = false;
// View the decimal (result) as four 32 bit integers
unsigned __int32 *dd = (unsigned __int32 *)presult;
mpz_t mant, max_mant;
mpz_inits(mant, max_mant, NULL);
int exp = 0; // Exponent
bool dpseen = false; // decimal point seen yet?
bool neg = false; // minus sign seen?
// Scan the characters of the value
const char *pp;
for (pp = ss; *pp != '\0'; ++pp)
{
if (*pp == '-')
{
if (pp != ss)
goto exit_func; // minus sign not at start
neg = true;
}
else if (isdigit(*pp))
{
mpz_mul_si(mant, mant, 10);
mpz_add_ui(mant, mant, unsigned(*pp - '0'));
if (dpseen) ++exp; // Keep track of digits after decimal pt
}
else if (*pp == '.')
{
if (dpseen)
goto exit_func; // more than one decimal point
dpseen = true;
}
else if (*pp == 'e' || *pp == 'E')
{
char *end;
exp -= strtol(pp+1, &end, 10);
pp = end;
break;
}
else
goto exit_func; // unexpected character
}
if (*pp != '\0')
goto exit_func; // extra characters after end
if (exp < -28 || exp > 28)
goto exit_func; // exponent outside valid range
// Adjust mantissa for -ve exponent
if (exp < 0)
{
mpz_t tmp;
mpz_init_set_si(tmp, 10);
mpz_pow_ui(tmp, tmp, -exp);
mpz_mul(mant, mant, tmp);
mpz_clear(tmp);
exp = 0;
}
// Get max_mant = size of largest mantissa (2^96 - 1)
//mpz_set_str(max_mant, "79228162514264337593543950335", 10); // 2^96 - 1
static unsigned __int32 ffs[3] = { 0xFFFFffffUL, 0xFFFFffffUL, 0xFFFFffffUL };
mpz_import(max_mant, 3, -1, sizeof(ffs[0]), 0, 0, ffs);
// Check for mantissa too big.
if (mpz_cmp(mant, max_mant) > 0)
goto exit_func; // value too big
else if (mpz_sgn(mant) == 0)
exp = 0; // if mantissa is zero make everything zero
// Set integer part
dd[2] = mpz_getlimbn(mant, 2);
dd[1] = mpz_getlimbn(mant, 1);
dd[0] = mpz_getlimbn(mant, 0);
// Set exponent and sign
dd[3] = exp << 16;
if (neg && mpz_sgn(mant) > 0)
dd[3] |= 0x80000000;
retval = true; // indicate success
exit_func:
mpz_clears(mant, max_mant, NULL);
return retval;
}
How about this:
1) sprintf number into s
2) find decimal point (strchr), store in idx
3) atoi = obtain integer part easily, use union to separate high/lo
4) use strlen - idx to obtain number of digits after point
sprintf may be slow but you´ll get the solution under 2 minutes of typing...
I was asked to get the internal binary representation of different types in C. My program currently works fine with 'int' but I would like to use it with "double" and "float". My code looks like this:
template <typename T>
string findBin(T x) {
string binary;
for(int i = 4096 ; i >= 1; i/=2) {
if((x & i) != 0) binary += "1";
else binary += "0";
}
return binary;
}
The program fails when I try to instantiate the template using a "double" or a "float".
Succinctly, you don't.
The bitwise operators do not make sense when applied to double or float, and the standard says that the bitwise operators (~, &, |, ^, >>, <<, and the assignment variants) do not accept double or float operands.
Both double and float have 3 sections - a sign bit, an exponent, and the mantissa. Suppose for a moment that you could shift a double right. The exponent, in particular, means that there is no simple translation to shifting a bit pattern right - the sign bit would move into the exponent, and the least significant bit of the exponent would shift into the mantissa, with completely non-obvious sets of meanings. In IEEE 754, there's an implied 1 bit in front of the actual mantissa bits, which also complicates the interpretation.
Similar comments apply to any of the other bit operators.
So, because there is no sane or useful interpretation of the bit operators to double values, they are not allowed by the standard.
From the comments:
I'm only interested in the binary representation. I just want to print it, not do anything useful with it.
This code was written several years ago for SPARC (big-endian) architecture.
#include <stdio.h>
union u_double
{
double dbl;
char data[sizeof(double)];
};
union u_float
{
float flt;
char data[sizeof(float)];
};
static void dump_float(union u_float f)
{
int exp;
long mant;
printf("32-bit float: sign: %d, ", (f.data[0] & 0x80) >> 7);
exp = ((f.data[0] & 0x7F) << 1) | ((f.data[1] & 0x80) >> 7);
printf("expt: %4d (unbiassed %5d), ", exp, exp - 127);
mant = ((((f.data[1] & 0x7F) << 8) | (f.data[2] & 0xFF)) << 8) | (f.data[3] & 0xFF);
printf("mant: %16ld (0x%06lX)\n", mant, mant);
}
static void dump_double(union u_double d)
{
int exp;
long long mant;
printf("64-bit float: sign: %d, ", (d.data[0] & 0x80) >> 7);
exp = ((d.data[0] & 0x7F) << 4) | ((d.data[1] & 0xF0) >> 4);
printf("expt: %4d (unbiassed %5d), ", exp, exp - 1023);
mant = ((((d.data[1] & 0x0F) << 8) | (d.data[2] & 0xFF)) << 8) | (d.data[3] & 0xFF);
mant = (mant << 32) | ((((((d.data[4] & 0xFF) << 8) | (d.data[5] & 0xFF)) << 8) | (d.data[6] & 0xFF)) << 8) | (d.data[7] & 0xFF);
printf("mant: %16lld (0x%013llX)\n", mant, mant);
}
static void print_value(double v)
{
union u_double d;
union u_float f;
f.flt = v;
d.dbl = v;
printf("SPARC: float/double of %g\n", v);
// image_print(stdout, 0, f.data, sizeof(f.data));
// image_print(stdout, 0, d.data, sizeof(d.data));
dump_float(f);
dump_double(d);
}
int main(void)
{
print_value(+1.0);
print_value(+2.0);
print_value(+3.0);
print_value( 0.0);
print_value(-3.0);
print_value(+3.1415926535897932);
print_value(+1e126);
return(0);
}
The commented out 'image_print()` function prints an arbitrary set of bytes in hex, with various minor tweaks. Contact me if you want the code (see my profile).
If you're using Intel (little-endian), you'll probably need to tweak the code to deal with the reverse bit order. But it shows how you can do it - using a union.
You cannot directly apply bitwise operators to float or double, but you can still access the bits indirectly by putting the variable in a union with a character array of the appropriate size, then reading the bits from those characters. For example:
string BitsFromDouble(double value) {
union {
double doubleValue;
char asChars[sizeof(double)];
};
doubleValue = value; // Write to the union
/* Extract the bits. */
string result;
for (size i = 0; i < sizeof(double); ++i)
result += CharToBits(asChars[i]);
return result;
}
You may need to adjust your routine to work on chars, which usually don't range up to 4096, and there may also be some weirdness with endianness here, but the basic idea should work. It won't be cross-platform compatible, since machines use different endianness and representations of doubles, so be careful how you use this.
Bitwise operators don't generally work with "binary representation" (also called object representation) of any type. Bitwise operators work with value representation of the type, which is generally different from object representation. That applies to int as well as to double.
If you really want to get to the internal binary representation of an object of any type, as you stated in your question, you need to reinterpret the object of that type as an array of unsigned char objects and then use the bitwise operators on these unsigned chars
For example
double d = 12.34;
const unsigned char *c = reinterpret_cast<unsigned char *>(&d);
Now by accessing elements c[0] through c[sizeof(double) - 1] you will see the internal representation of type double. You can use bitwise operations on these unsigned char values, if you want to.
Note, again, that in general case in order to access internal representation of type int you have to do the same thing. It generally applies to any type other than char types.
Do a bit-wise cast of a pointer to the double to long long * and dereference.
Example:
inline double bit_and_d(double* d, long long mask) {
long long t = (*(long long*)d) & mask;
return *(double*)&t;
}
Edit: This is almost certainly going to run afoul of gcc's enforcement of strict aliasing. Use one of the various workarounds for that. (memcpy, unions, __attribute__((__may_alias__)), etc)
Other solution is to get a pointer to the floating point variable and cast it to a pointer to integer type of the same size, and then get value of the integer this pointer points to. Now you have an integer variable with same binary representation as the floating point one and you can use your bitwise operator.
string findBin(float f) {
string binary;
for(long i = 4096 ; i >= 1; i/=2) {
long x = * ( long * ) &y;
if((x & i) != 0) binary += "1";
else binary += "0";
}
return binary;
}
But remember: you have to cast to a type with same size. Otherwise unpredictable things may happen (like buffer overflow, access violation etc.).
As others have said, you can use a bitwise operator on a double by casting double* to long long* (or sometimes just long*).
int main(){
double * x = (double*)malloc(sizeof(double));
*x = -5.12345;
printf("%f\n", *x);
*((long*)x) &= 0x7FFFFFFFFFFFFFFF;
printf("%f\n", *x);
return 0;
}
On my computer, this code prints:
-5.123450
5.123450