How to convert a double to a C# decimal in C++? - c++

Given the reprensentation of decimal I have --you can find it here for instance--, I tried to convert a double this way:
explicit Decimal(double n)
{
DoubleAsQWord doubleAsQWord;
doubleAsQWord.doubleValue = n;
uint64 val = doubleAsQWord.qWord;
const uint64 topBitMask = (int64)(0x1 << 31) << 32;
//grab the 63th bit
bool isNegative = (val & topBitMask) != 0;
//bias is 1023=2^(k-1)-1, where k is 11 for double
uint32 exponent = (((uint64)(val >> 31) >> 21) & 0x7FF) - 1023;
//exclude both sign and exponent (<<12, >>12) and normalize mantissa
uint64 mantissa = ((uint64)(0x1 << 31) << 21) | (val << 12) >> 12;
// normalized mantissa is 53 bits long,
// the exponent does not care about normalizing bit
uint8 scale = exponent + 11;
if (scale > 11)
scale = 11;
else if (scale < 0)
scale = 0;
lo_ = ((isNegative ? -1 : 1) * n) * std::pow(10., scale);
signScale_ = (isNegative ? 0x1 : 0x0) | (scale << 1);
// will always be 0 since we cannot reach
// a 128 bits precision with a 64 bits double
hi_ = 0;
}
The DoubleAsQWord type is used to "cast" from double to its uint64 representation:
union DoubleAsQWord
{
double doubleValue;
uint64 qWord;
};
My Decimal type has these fields:
uint64 lo_;
uint32 hi_;
int32 signScale_;
All this stuff is encapsulated in my Decimal class. You can notice I extract the mantissa even if I'm not using it. I'm still thinking of a way to guess the scale accurately.
This is purely practical, and seems to work in the case of a stress test:
BOOST_AUTO_TEST_CASE( convertion_random_stress )
{
const double EPSILON = 0.000001f;
srand(time(0));
for (int i = 0; i < 10000; ++i)
{
double d1 = ((rand() % 10) % 2 == 0 ? -1 : 1)
* (double)(rand() % 1000 + 1000.) / (double)(rand() % 42 + 2.);
Decimal d(d1);
double d2 = d.toDouble();
double absError = fabs(d1 - d2);
BOOST_CHECK_MESSAGE(
absError <= EPSILON,
"absError=" << absError << " with " << d1 << " - " << d2
);
}
}
Anyway, how would you convert from double to this decimal representation?

I think you guys will be interested in an implementation of a C++ wrapper to the Intel Decimal Floating-Point Math Library:
C++ Decimal Wrapper Class
Intel DFP

What about using VarR8FromDec Function ?
EDIT: This function is declared on Windows system only. However an equivalent C implementation is available with WINE, here: http://source.winehq.org/source/dlls/oleaut32/vartype.c

Perhaps you are looking for System::Convert::ToDecimal()
http://msdn.microsoft.com/en-us/library/a69w9ca0%28v=vs.80%29.aspx
Alternatively you could try recasting the Double as a Decimal.
An example from the MSDN.
http://msdn.microsoft.com/en-us/library/aa326763%28v=vs.71%29.aspx
// Convert the double argument; catch exceptions that are thrown.
void DecimalFromDouble( double argument )
{
Object* decValue;
// Convert the double argument to a Decimal value.
try
{
decValue = __box( (Decimal)argument );
}
catch( Exception* ex )
{
decValue = GetExceptionType( ex );
}
Console::WriteLine( formatter, __box( argument ), decValue );
}

If you do not have access to the .Net routines then this is tricky. I have done this myself for my hex editor (so that users can display and edit C# Decimal values using the Properties dialog) - see http://www.hexedit.com for more information. Also the source for HexEdit is freely available - see my article at http://www.codeproject.com/KB/cpp/HexEdit.aspx.
Actually my routines convert between Decimal and strings but you can of course use sprintf to convert the double to a string first. (Also when you talk about double I think you explicitly mean IEEE 64-bit floating point format, though this is what most compilers/systems use nowadays.)
Note that there are a few gotchas if you want to handle precisely all valid Decimal values and return an error for any value that cannot be converted, since the format is not well documented. (The Decimal format is aweful really, eg the same number can have many representations.)
Here is my code that converts a string to a Decimal. Note that it uses the the GNU Multiple Precision Arithmetic Library (functions that start with mpz_). The String2Decimal function obviously returns false if it fails for some reason, such as the value being too big. The parameter 'presult' must point to a buffer of at least 16 bytes, to store the result.
bool String2Decimal(const char *ss, void *presult)
{
bool retval = false;
// View the decimal (result) as four 32 bit integers
unsigned __int32 *dd = (unsigned __int32 *)presult;
mpz_t mant, max_mant;
mpz_inits(mant, max_mant, NULL);
int exp = 0; // Exponent
bool dpseen = false; // decimal point seen yet?
bool neg = false; // minus sign seen?
// Scan the characters of the value
const char *pp;
for (pp = ss; *pp != '\0'; ++pp)
{
if (*pp == '-')
{
if (pp != ss)
goto exit_func; // minus sign not at start
neg = true;
}
else if (isdigit(*pp))
{
mpz_mul_si(mant, mant, 10);
mpz_add_ui(mant, mant, unsigned(*pp - '0'));
if (dpseen) ++exp; // Keep track of digits after decimal pt
}
else if (*pp == '.')
{
if (dpseen)
goto exit_func; // more than one decimal point
dpseen = true;
}
else if (*pp == 'e' || *pp == 'E')
{
char *end;
exp -= strtol(pp+1, &end, 10);
pp = end;
break;
}
else
goto exit_func; // unexpected character
}
if (*pp != '\0')
goto exit_func; // extra characters after end
if (exp < -28 || exp > 28)
goto exit_func; // exponent outside valid range
// Adjust mantissa for -ve exponent
if (exp < 0)
{
mpz_t tmp;
mpz_init_set_si(tmp, 10);
mpz_pow_ui(tmp, tmp, -exp);
mpz_mul(mant, mant, tmp);
mpz_clear(tmp);
exp = 0;
}
// Get max_mant = size of largest mantissa (2^96 - 1)
//mpz_set_str(max_mant, "79228162514264337593543950335", 10); // 2^96 - 1
static unsigned __int32 ffs[3] = { 0xFFFFffffUL, 0xFFFFffffUL, 0xFFFFffffUL };
mpz_import(max_mant, 3, -1, sizeof(ffs[0]), 0, 0, ffs);
// Check for mantissa too big.
if (mpz_cmp(mant, max_mant) > 0)
goto exit_func; // value too big
else if (mpz_sgn(mant) == 0)
exp = 0; // if mantissa is zero make everything zero
// Set integer part
dd[2] = mpz_getlimbn(mant, 2);
dd[1] = mpz_getlimbn(mant, 1);
dd[0] = mpz_getlimbn(mant, 0);
// Set exponent and sign
dd[3] = exp << 16;
if (neg && mpz_sgn(mant) > 0)
dd[3] |= 0x80000000;
retval = true; // indicate success
exit_func:
mpz_clears(mant, max_mant, NULL);
return retval;
}

How about this:
1) sprintf number into s
2) find decimal point (strchr), store in idx
3) atoi = obtain integer part easily, use union to separate high/lo
4) use strlen - idx to obtain number of digits after point
sprintf may be slow but you´ll get the solution under 2 minutes of typing...

Related

How to represent a floating point number in binary from 32-bit hex value in C++ without using bitset or float variable?

Given a 32-bit hex like 0x7f000002, how do I get the full value of this number printed in binary without using bitset or defining any float variables to use union?
I know that it is supposed to display
+10000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0 for this particular 32-bit hex number.
But I don't know how to get there without using those 2 aforementioned function/variable.
You can appeal directly to what the bits in a floating point number mean: https://en.wikipedia.org/wiki/Single-precision_floating-point_format
The last 23 bits store the mantissa, and the eight bits before store the biased exponent (with the one bit before that being the signbit). The number is essentially "1.<mantissa> * 2**(<exponent> + bias)", and multiplying by a power of two is essentially shifting the binary radix point, or adding zeros to the left or right in the binary string.
Taking all that into account (+ edge cases for subnormal numbers and inf and NaN), you can make this function:
std::string floatbits_to_binaryfloatstring(std::uint32_t floatbits) {
bool signbit = floatbits >> 31;
int exponent = (floatbits >> 23) & 0xff;
std::uint32_t fraction = floatbits & 0x7fffffu;
std::string result;
result += signbit ? '-' : '+';
if (exponent == 0xff) {
if (fraction == 0) {
result += "inf";
} else {
result += "NaN";
}
} else if (exponent == 0) {
if (fraction == 0) {
result += "0.0";
} else {
// Subnormal
result += "0.";
result.append(125, '0');
for (int i = 23; i-- > 0;) {
result += (fraction >> i) & 1 ? '1' : '0';
}
// Remove trailing zeroes
result.erase(result.find_last_of('1') + 1u, result.npos);
}
} else {
fraction |= 0x800000u; // Make implicit bit explicit
exponent -= 127 + 23;
// The number is "fraction * 2**(exponent)" in binary
if (exponent <= -24) {
result += "0.";
result.append(-exponent - 24, '0');
for (int i = 24; i-- > 0;) {
result += (fraction >> i) & 1 ? '1' : '0';
}
} else if (exponent >= 0) {
for (int i = 24; i-- > 0;) {
result += (fraction >> i) & 1 ? '1' : '0';
}
result.append(exponent, '0');
result += '.';
} else {
int point = 24 + exponent;
for (int i = 24; i-- > 0;) {
result += (fraction >> i) & 1 ? '1' : '0';
if (--point == 0) result += '.';
}
}
// Remove trailing zeroes
result.erase(result.find_last_not_of('0') + 1u, result.npos);
if (result.back() == '.') result += '0';
}
return result;
}
Example: https://wandbox.org/permlink/9jtWfFJeEmTl6i1i
I haven't tested this thoroughly, there might be some mistake somewhere.
Since this is a weird format in the first place, there probably isn't a prebuilt solution for this. hexfloat is close, but in hex and with binary p notation instead

Adding positive and negative numbers in IEEE-754 format

My problem seems to be pretty simple: I wrote a program that manually adds floating point numbers together. This program has certain restrictions. (such as no iostream or use of any unary operators), so that is the reason for the lack of those things. As for the problem, the program seems to function correctly when adding two positive floats (1.5 + 1.5 = 3.0, for example), but when adding two negative numbers (10.0 + -5.0) I get very wacky numbers. Here is the code:
#include <cstdio>
#define BIAS32 127
struct Real
{
//sign bit
int sign;
//UNBIASED exponent
long exponent;
//Fraction including implied 1. at bit index 23
unsigned long fraction;
};
Real Decode(int float_value);
int Encode(Real real_value);
Real Normalize(Real value);
Real Add(Real left, Real right);
unsigned long Add(unsigned long leftop, unsigned long rightop);
unsigned long Multiply(unsigned long leftop, unsigned long rightop);
void alignExponents(Real* left, Real* right);
bool is_neg(Real real);
int Twos(int op);
int main(int argc, char* argv[])
{
int left, right;
char op;
int value;
Real rLeft, rRight, result;
if (argc < 4) {
printf("Usage: %s <left> <op> <right>\n", argv[0]);
return -1;
}
sscanf(argv[1], "%f", (float*)&left);
sscanf(argv[2], "%c", &op);
sscanf(argv[3], "%f", (float*)&right);
rLeft = Decode(left);
rRight = Decode(right);
if (op == '+') {
result = Add(rLeft, rRight);
}
else {
printf("Unknown operator '%c'\n", op);
return -2;
}
value = Encode(result);
printf("%.3f %c %.3f = %.3f (0x%08x)\n",
*((float*)&left),
op,
*((float*)&right),
*((float*)&value),
value
);
return 0;
}
Real Decode(int float_value)
{ // Test sign bit of float_value - Test exponent bits of float_value & apply bias - Test mantissa bits of float_value
Real result{ float_value >> 31 & 1 ? 1 : 0, ((long)Add(float_value >> 23 & 0xFF, -BIAS32)), (unsigned long)float_value & 0x7FFFFF };
return result;
};
int Encode(Real real_value)
{
int x = 0;
x |= real_value.fraction; // Set the fraction bits of x
x |= real_value.sign << 31; // Set the sign bits of x
x |= Add(real_value.exponent, BIAS32) << 23; // Set the exponent bits of x
return x;
}
Real Normalize(Real value)
{
if (is_neg(value))
{
value.fraction = Twos(value.fraction);
}
unsigned int i = 0;
while (i < 9)
{
if ((value.fraction >> Add(23, i)) & 1) // If there are set bits past the mantissa section
{
value.fraction >>= 1; // shift mantissa right by 1
value.exponent = Add(value.exponent, 1); // increment exponent to accomodate for shift
}
i = Add(i, 1);
}
return value;
}
Real Add(Real left, Real right)
{
Real a = left, b = right;
alignExponents(&a, &b); // Aligns exponents of both operands
unsigned long sum = Add(a.fraction, b.fraction);
Real result = Normalize({ a.sign, a.exponent, sum }); // Normalize result if need be
return result;
}
unsigned long Add(unsigned long leftop, unsigned long rightop)
{
unsigned long sum = 0, test = 1; // sum initialized to 0, test created to compare bits
while (test) // while test is not 0
{
if (leftop & test) // if the digit being tested is 1
{
if (sum & test) sum ^= test << 1; // if the sum tests to 1, carry a bit over
sum ^= test;
}
if (rightop & test)
{
if (sum & test) sum ^= test << 1;
sum ^= test;
}
test <<= 1;
}
return sum;
}
void alignExponents(Real* a, Real* b)
{
if (a->exponent != b->exponent) // If the exponents are not equal
{
if (a->exponent > b->exponent)
{
int disp = a->exponent - b->exponent; // number of shifts needed based on difference between two exponents
b->fraction |= 1 << 23; // sets the implicit bit for shifting
b->exponent = a->exponent; // sets exponents equal to each other
b->fraction >>= disp; // mantissa is shifted over to accomodate for the increase in power
return;
}
int disp = b->exponent - a->exponent;
a->fraction |= 1 << 23;
a->exponent = b->exponent;
a->fraction >>= disp;
return;
}
return;
}
bool is_neg(Real real)
{
if (real.sign) return true;
return false;
}
int Twos(int op)
{
return Add(~op, -1); // NOT the operand and add 1 to it
}
On top of that, I just tested the values 10.5 + 5.5 and got a 24.0, so there appears to be even more wrong with this than I initially thought. I've been working on this for days and would love some help/advice.
Here is some help/advice. Now that you have worked on some of the code, I suggest going back and reworking your data structure. The declaration of such a crucial data structure would benefit from a lot more comments, making sure you know exactly what each field means.
For example, the implicit bit is not always 1. It is zero if the exponent is zero. That should be dealt with in your Encode and Decode functions. For the rest of your code, it is just a significand bit and should not have any special handling.
When you start thinking about rounding, you will find you often need more than 23 bits in an intermediate result.
Making the significand of negative numbers 2's complement will create a problem of having the same information stored two ways. You will have both a sign bit as though doing sign-and-magnitude and have the sign encoded in the signed integer signficand. Keeping them consistent will be a mess. Whatever you decide about how Real will store negative numbers, document it and keep it consistent throughout.
If I were implementing this I would start by defining Real very, very carefully. I would then decide what operations I wanted to be able to do on Real, and write functions to do them. If you get those right each function will be relatively simple.

how can i get numerator and denominator from a fractional number?

How can I get numerator and denominator from a fractional number? for example, from "1.375" i want to get "1375/1000" or "11/8" as a result. How can i do it with c++??
I have tried to do it by separating the numbers before the point and after the point but it doesn't give any idea how to get my desired output.
You didn't really specify whether you need to convert a floating point or a string to ratio, so I'm going to assume the former one.
Instead of trying string or arithmetic-based approaches, you can directly use properties of IEEE-754 encoding.
Floats (called binary32 by the standard) are encoded in memory like this:
S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM
^ ^
bit 31 bit 0
where S is sign bit, Es are exponent bits (8 of them) Ms are mantissa bits (23 bits).
The number can be decoded like this:
value = (-1)^S * significand * 2 ^ expoenent
where:
significand = 1.MMMMMMMMMMMMMMMMMMMMMMM (as binary)
exponent = EEEEEEEE (as binary) - 127
(note: this is for so called "normal numbers", there are also zeroes, subnormals, infinities and NaNs - see Wikipedia page I linked)
This can be used here. We can rewrite the equation above like this:
(-1)^S * significand * exponent = (-1)^s * (significand * 2^23) * 2 ^ (exponent - 23)
The point is that significand * 2^23 is an integer (equal to 1.MMMMMMMMMMMMMMMMMMMMMMM, binary - by multiplying by 2^23, we moved the point 23 places right).2 ^ (exponent - 23) is an integer too, obviously.
In other words: we can write the number as:
(significand * 2^23) / 2^(-(exponent - 23)) (when exponent - 23 < 0)
or
[(significand * 2^23) * 2^(exponent - 23)] / 1 (when exponent - 23 >= 0)
So we have both numerator and denominator - directly from binary representation of the number.
All of the above could be implemented like this in C++:
struct Ratio
{
int64_t numerator; // numerator includes sign
uint64_t denominator;
float toFloat() const
{
return static_cast<float>(numerator) / denominator;
}
static Ratio fromFloat(float v)
{
// First, obtain bitwise representation of the value
const uint32_t bitwiseRepr = *reinterpret_cast<uint32_t*>(&v);
// Extract sign, exponent and mantissa bits (as stored in memory) for convenience:
const uint32_t signBit = bitwiseRepr >> 31u;
const uint32_t expBits = (bitwiseRepr >> 23u) & 0xffu; // 8 bits set
const uint32_t mntsBits = bitwiseRepr & 0x7fffffu; // 23 bits set
// Handle some special cases:
if(expBits == 0 && mntsBits == 0)
{
// special case: +0 and -0
return {0, 1};
}
else if(expBits == 255u && mntsBits == 0)
{
// special case: +inf, -inf
// Let's agree that infinity is always represented as 1/0 in Ratio
return {signBit ? -1 : 1, 0};
}
else if(expBits == 255u)
{
// special case: nan
// Let's agree, that if we get NaN, we returns max int64_t by 0
return {std::numeric_limits<int64_t>::max(), 0};
}
// mask lowest 23 bits (mantissa)
uint32_t significand = (1u << 23u) | mntsBits;
const int64_t signFactor = signBit ? -1 : 1;
const int32_t exp = expBits - 127 - 23;
if(exp < 0)
{
return {signFactor * static_cast<int64_t>(significand), 1u << static_cast<uint32_t>(-exp)};
}
else
{
return {signFactor * static_cast<int64_t>(significand * (1u << static_cast<uint32_t>(exp))), 1};
}
}
};
(hopefully comments and description above are understandable - let me know, if there's something to improve)
I've omitted checks for out of range values for simplicity.
We can use it like this:
float fv = 1.375f;
Ratio rv = Ratio::fromFloat(fv);
std::cout << "fv = " << fv << ", rv = " << rv << ", rv.toFloat() = " << rv.toFloat() << "\n";
And the output is:
fv = 1.375, rv = 11534336/8388608, rv.toFloat() = 1.375
As you can see, exactly the same values on both ends.
The problem is that numerators and denumerators are big. This is because the code always multiplies significand by 2^23, even if smaller value would be enough to make it integer (this is equivalent to writing 0.2 as 2000000/10000000 instead of 2/10 - it's the same thing, only written differently).
This can be solved by changing the code to multiply significand (and divide exponent) by minimum number, like this (ellipsis stands for parts which are the same as above):
// counts number of subsequent least significant bits equal to 0
// example: for 1001000 (binary) returns 3
uint32_t countTrailingZeroes(uint32_t v)
{
uint32_t counter = 0;
while(counter < 32 && (v & 1u) == 0)
{
v >>= 1u;
++counter;
}
return counter;
}
struct Ratio
{
...
static Ratio fromFloat(float v)
{
...
uint32_t significand = (1u << 23u) | mntsBits;
const uint32_t nTrailingZeroes = countTrailingZeroes(significand);
significand >>= nTrailingZeroes;
const int64_t signFactor = signBit ? -1 : 1;
const int32_t exp = expBits - 127 - 23 + nTrailingZeroes;
if(exp < 0)
{
return {signFactor * static_cast<int64_t>(significand), 1u << static_cast<uint32_t>(-exp)};
}
else
{
return {signFactor * static_cast<int64_t>(significand * (1u << static_cast<uint32_t>(exp))), 1};
}
}
};
And now, for the following code:
float fv = 1.375f;
Ratio rv = Ratio::fromFloat(fv);
std::cout << "fv = " << fv << ", rv = " << rv << ", rv.toFloat() = " << rv.toFloat() << "\n";
We get:
fv = 1.375, rv = 11/8, rv.toFloat() = 1.375
In C++ you can use the Boost rational class. But you need to give numerator and denominator.
For this you need to find out no of digits in the input string after the decimal point. You can do this by string manipulation functions. Read the input character by character and find no of characters after the .
char inputstr[30];
int noint=0, nodec=0;
char intstr[30], dec[30];
int decimalfound = 0;
int denominator = 1;
int numerator;
scanf("%s",inputstr);
len = strlen(inputstr);
for (int i=0; i<len; i++)
{
if (decimalfound ==0)
{
if (inputstr[i] == '.')
{
decimalfound = 1;
}
else
{
intstr[noint++] = inputstr[i];
}
}
else
{
dec[nodec++] = inputstr[i];
denominator *=10;
}
}
dec[nodec] = '\0';
intstr[noint] = '\0';
numerator = atoi(dec) + (atoi(intstr) * 1000);
// You can now use the numerator and denominator as the fraction,
// either in the Rational class or you can find gcd and divide by
// gcd.
What about this simple code:
double n = 1.375;
int num = 1, den = 1;
double frac = (num * 1.f / den);
double margin = 0.000001;
while (abs(frac - n) > margin){
if (frac > n){
den++;
}
else{
num++;
}
frac = (num * 1.f / den);
}
I don't really tested too much, it's only an idea.
I hope I'll be forgiven for posting an answer which uses "only the C language". I know you tagged the question with C++ - but I couldn't turn down the bait, sorry. This is still valid C++ at least (although it does, admittedly, use mainly C string-processing techniques).
int num_string_float_to_rat(char *input, long *num, long *den) {
char *tok = NULL, *end = NULL;
char buf[128] = {'\0'};
long a = 0, b = 0;
int den_power = 1;
strncpy(buf, input, sizeof(buf) - 1);
tok = strtok(buf, ".");
if (!tok) return 1;
a = strtol(tok, &end, 10);
if (*end != '\0') return 2;
tok = strtok(NULL, ".");
if (!tok) return 1;
den_power = strlen(tok); // Denominator power of 10
b = strtol(tok, &end, 10);
if (*end != '\0') return 2;
*den = static_cast<int>(pow(10.00, den_power));
*num = a * *den + b;
num_simple_fraction(num, den);
return 0;
}
Sample usage:
int rc = num_string_float_to_rat("0015.0235", &num, &den);
// Check return code -> should be 0!
printf("%ld/%ld\n", num, den);
Output:
30047/2000
Full example at http://codepad.org/CFQQEZkc .
Notes:
strtok() is used to parse the input in to tokens (no need to reinvent the wheel in that regard). strtok() modifies its input - so a temporary buffer is used for safety
it checks for invalid characters - and will return a non-zero return code if found
strtol() has been used instead of atoi() - as it can detect non-numeric characters in the input
scanf() has not been used to slurp the input - due to rounding issues with floating point numbers
the base for strtol() has been explicitly set to 10 to avoid problems with leading zeros (otherwise a leading zero will cause the number to be interpreted as octal)
it uses a num_simple_fraction() helper (not shown) - which in turn uses a gcd() helper (also not shown) - to convert the result to a simple fraction
log10() of the numerator is determined by calculating the length of the token after the decimal point
I'd do this in three steps.
1) find the decimal point, so that you know how large the denominator has to be.
2) get the numerator. That's just the original text with the decimal point removed.
3) get the denominator. If there was no decimal point, the denominator is 1. Otherwise, the denominator is 10^n, where n is the number of digits to the right of the (now-removed) decimal point.
struct fraction {
std::string num, den;
};
fraction parse(std::string input) {
// 1:
std::size_t dec_point = input.find('.');
// 2:
if (dec_point == std::string::npos)
dec_point = 0;
else {
dec_point = input.length() - dec_point;
input.erase(input.begin() + dec_point);
}
// 3:
int denom = 1;
for (int i = 1; i < dec_point; ++i)
denom *= 10;
string result = { input, std::to_string(denom) };
return result;
}

How to store bytes of a float value in a string and retrieve the value afterwards?

I'm trying to figure out a way to send a sequence of float values over the network. I've seen various answers for this, and this is my current attempt:
#include <iostream>
#include <cstring>
union floatBytes
{
float value;
char bytes[sizeof (float)];
};
int main()
{
floatBytes data1;
data1.value = 3.1;
std::string string(data1.bytes);
floatBytes data2;
strncpy(data2.bytes, string.c_str(), sizeof (float));
std::cout << data2.value << std::endl; // <-- prints "3.1"
return 0;
}
Which works nicely (though I suspect I might run into problems when sending this string to other systems, please comment).
However, if the float value is a round number (like 3.0 instead of 3.1) then this doesn't work.
data1.value = 3;
std::string string(data1.bytes);
floatBytes data2;
strncpy(data2.bytes, string.c_str(), sizeof (float));
std::cout << data2.value << std::endl; // <-- prints "0"
So what is the preferred way of storing the bytes of a float value, send it, and parse it "back" to a float value?
Never use str* functions this way. These are intended to deal with c-string and the byte representation of a float is certainly not a valid c-string. What you need is to send/receive your data in a common representation. There exist a lot of them, but basically two: a textual representation or a byte coding.
Textual representation) almost consist in converting your float value onto a string using stringstream to convert and then extract the string and send it over the connection.
Byte representation) that is much more problematic because if the two machines are not using the same byte-ordering, float encoding, etc then you can't send the raw byte as-is. But there exists (at least) one standard known as XDR (RFC 4506) that specify a standard to encode bytes of a float/double value natively encoded with IEEE 754.
You can reconstitute a float portably with rather involved code, which I maintain on my IEE754 git hub site. If you break the float into bytes using those functions, and reconstitute using the other function, you will obtain the same value in receiver as you sent, regardless of float encoding, up to the precision of the format.
https://github.com/MalcolmMcLean/ieee754
float freadieee754f(FILE *fp, int bigendian)
{
unsigned long buff = 0;
unsigned long buff2 = 0;
unsigned long mask;
int sign;
int exponent;
int shift;
int i;
int significandbits = 23;
int expbits = 8;
double fnorm = 0.0;
double bitval;
double answer;
for(i=0;i<4;i++)
buff = (buff << 8) | fgetc(fp);
if(!bigendian)
{
for(i=0;i<4;i++)
{
buff2 <<= 8;
buff2 |= (buff & 0xFF);
buff >>= 8;
}
buff = buff2;
}
sign = (buff & 0x80000000) ? -1 : 1;
mask = 0x00400000;
exponent = (buff & 0x7F800000) >> 23;
bitval = 0.5;
for(i=0;i<significandbits;i++)
{
if(buff & mask)
fnorm += bitval;
bitval /= 2;
mask >>= 1;
}
if(exponent == 0 && fnorm == 0.0)
return 0.0f;
shift = exponent - ((1 << (expbits - 1)) - 1); /* exponent = shift + bias */
if(shift == 128 && fnorm != 0.0)
return (float) sqrt(-1.0);
if(shift == 128 && fnorm == 0.0)
{
#ifdef INFINITY
return sign == 1 ? INFINITY : -INFINITY;
#endif
return (sign * 1.0f)/0.0f;
}
if(shift > -127)
{
answer = ldexp(fnorm + 1.0, shift);
return (float) answer * sign;
}
else
{
if(fnorm == 0.0)
{
return 0.0f;
}
shift = -126;
while (fnorm < 1.0)
{
fnorm *= 2;
shift--;
}
answer = ldexp(fnorm, shift);
return (float) answer * sign;
}
}
int fwriteieee754f(float x, FILE *fp, int bigendian)
{
int shift;
unsigned long sign, exp, hibits, buff;
double fnorm, significand;
int expbits = 8;
int significandbits = 23;
/* zero (can't handle signed zero) */
if (x == 0)
{
buff = 0;
goto writedata;
}
/* infinity */
if (x > FLT_MAX)
{
buff = 128 + ((1 << (expbits - 1)) - 1);
buff <<= (31 - expbits);
goto writedata;
}
/* -infinity */
if (x < -FLT_MAX)
{
buff = 128 + ((1 << (expbits - 1)) - 1);
buff <<= (31 - expbits);
buff |= (1 << 31);
goto writedata;
}
/* NaN - dodgy because many compilers optimise out this test, but
*there is no portable isnan() */
if (x != x)
{
buff = 128 + ((1 << (expbits - 1)) - 1);
buff <<= (31 - expbits);
buff |= 1234;
goto writedata;
}
/* get the sign */
if (x < 0) { sign = 1; fnorm = -x; }
else { sign = 0; fnorm = x; }
/* get the normalized form of f and track the exponent */
shift = 0;
while (fnorm >= 2.0) { fnorm /= 2.0; shift++; }
while (fnorm < 1.0) { fnorm *= 2.0; shift--; }
/* check for denormalized numbers */
if (shift < -126)
{
while (shift < -126) { fnorm /= 2.0; shift++; }
shift = -1023;
}
/* out of range. Set to infinity */
else if (shift > 128)
{
buff = 128 + ((1 << (expbits - 1)) - 1);
buff <<= (31 - expbits);
buff |= (sign << 31);
goto writedata;
}
else
fnorm = fnorm - 1.0; /* take the significant bit off mantissa */
/* calculate the integer form of the significand */
/* hold it in a double for now */
significand = fnorm * ((1LL << significandbits) + 0.5f);
/* get the biased exponent */
exp = shift + ((1 << (expbits - 1)) - 1); /* shift + bias */
hibits = (long)(significand);
buff = (sign << 31) | (exp << (31 - expbits)) | hibits;
writedata:
/* write the bytes out to the stream */
if (bigendian)
{
fputc((buff >> 24) & 0xFF, fp);
fputc((buff >> 16) & 0xFF, fp);
fputc((buff >> 8) & 0xFF, fp);
fputc(buff & 0xFF, fp);
}
else
{
fputc(buff & 0xFF, fp);
fputc((buff >> 8) & 0xFF, fp);
fputc((buff >> 16) & 0xFF, fp);
fputc((buff >> 24) & 0xFF, fp);
}
return ferror(fp);
}
Let me first clear the issue with your code.
You are using strncpy which stops the copy the moment it sees '\0'. Which simply means that it is not copying all your data.
And thus the 0 is expected.
Using memcpy instead of strncpy should do the trick.
I just tried this C++ code
int main(){
float f = 3.34;
printf("before = %f\n", f);
char a[10];
memcpy(a, (char*) &f, sizeof(float));
a[sizeof(float)] = '\0'; // For sending over network
float f1 = 1.99;
memcpy((char*) &f1, a, sizeof(float));
printf("after = %f\n", f1);
return 0;
}
I get the correct output as expected.
Now coming to the correctness. I am not sure if this classifies as Undefined Behaviour. It could also be called a case of type punning, in which case it would be implementation defined (and I assume any sane compiler would not muck this).
This is all okay as long as I am doing it for the same program.
Now for your problem of sending it over network. I don't think this would be the correct way of doing it. Like #Jean-Baptiste Yunès mentioned, both the systems could be using different representations for float, or even different ordering for bytes.
In that case you need to use a library to convert it to some standard representation like IEEE 754.
The main problem is that C++ do not enforce IEEE754, so the representation of your float may work between 2 computers and fail with another.
The problem have to be divided into two:
How to encode and decode a float to shared format
How to serialize the value to a char array for transmission.
How to encode/decode a float to a common format
C++ does not impose a specific bit-format, this mean a computer might transfer a float and the value on the other machine would be different.
Example of 1.0f
Machine1: sign + 8bit Exponent + 23bit mantissa:
0-01111111-00000000000000000000000
Machine2: sign + 7bit exponent
+ 24bit mantissa: 0-0111111-000000000000000000000000
Sending from machine 1 to machine 2 without shared format, would result in machine 2 receiving: 0-0111111-100000000000000000000000 = 1.5
This is a complex topic and may be difficult to solve completely cross-platform. C++ includes some convenience properties helping somehow with this:
bool isIeee754 = std::numeric_limits<float>::is_iec559;
The main problem is that the compiler may not know about the exact CPU architecture on which its output will run. So this is half reliable. Fortunately, the bit format is in most of the case correct. Additionally, if the format is not known, it may be very difficult to normalize it.
We might design some code to detect the float format, or we might decide to skip those cases as "unsupported platforms".
In the case of the IEEE754 32bit, we may easily extract Mantissa, Sign and Exponent with bitwise operations:
float input;
uint8_t exponent = (input>>23)&0xFF;
uint32_t mantissa = (input&0x7FFFFF);
bool sign = (input>>31);
A standard format for transmission could well be the 32 bit IEEE754, so it would work in most of the times without even encoding:
bool isStandard32BitIeee754( float f)
{
// TODO: improve for "special" targets.
return std::numeric_limits<decltype(f)>::is_iec559 && sizeof(f)==4;
}
Finally, and especially for those non-standard platforms, it is required to keep special values for NaN and infinite.
Serialization of a float for transmission
The second issue is much simpler, it is just required to transform the standardized binary to a char array, however, not all characters may be acceptable on network, especially if it is used in HTTP protocol or equivalent.
For this example, I will convert the stream to hexadecimal encoding (an alternative could be Base64, etc..).
Note: I know there are some function which may help, I deliberately use simple C++ to show the steps at a level as lower as possible.
void toHex( uint8_t &out1, uint8_t &out2, uint8_t in)
{
out1 = in>>4;
out1 = out1>9? out1-10+'A' : out1+'0';
out2 = in&0xF;
out2 = out2>9? out2-10+'A' : out2+'0';
}
void standardFloatToHex (float in, std::string &out)
{
union Aux
{
uint8_t c[4];
float f;
};
out.resize(8);
Aux converter;
converter.f = in;
for (int i=0; i<4; i++)
{
// Could use std::stringstream as an alternative.
uint8_t c1, c2, c = converter.c[i];
toHex(c1, c2, c);
out[i*2] = c1;
out[i*2+1] = c2;
}
}
Finally, the equivalent decoding is required in the opposite side.
Conclusion
The standardization of the float value into a shared bit format has been explained. Some implementation-dependent conversions may be required.
The serialization for most common network protocols is shown.

Going crazy, why are my variables changing on me?

Okay I've had this happen to me before where variables randomly change numbers because of memory allocation issues or wrong addressing etc, such as when you go out of bounds with an array. However, I'm not using arrays, or pointers or addresses so I have no idea why after executing this loop it suddenly decides that "exponent" after being set to 0 is equal to 288 inside the loop:
EDIT: It decides to break on specifically: 0x80800000.
This does not break in one test, we have a "testing" client which iterates through several test cases, each time it calls this again, each time the function is called again the values should be set equal to their original values.
/*
* float_i2f - Return bit-level equivalent of expression (float) x
* Result is returned as unsigned int, but
* it is to be interpreted as the bit-level representation of a
* single-precision floating point values.
* Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
* Max ops: 30
* Rating: 4
*/
unsigned float_i2f(int x) {
int sign= 0;
int a=0;
int exponent=0;
int crash_test=0;
int exp=0;
int fraction=0;
int counter=0;
if (x == 0) return 0;
if (!(x ^ (0x01 << 31)))
{
return 0xCF << 24;
}
if (x>>31)
{
sign = 0xFF << 31;
x = (~x) + 1;
}
else
{
sign = 0x00;
}
//printf(" After : %x ", x);
a = 1;
exponent = 0;
crash_test = 0;
while ((a*2) <= x)
{
if (a == 0) a =1;
if (a == 1) crash_test = exponent;
/*
if(exponent == 288)
{exponent =0;
counter ++;
if(counter <=2)
printf("WENT OVERBOARD WTF %d ORIGINAL %d", a, crash_test);
}
*/
if (exponent > 300) break;
exponent ++;
a *= 2;
}
exp = (exponent + 0x7F) << 23;
fraction = (~(((0x01)<< 31) >> 7)) & (x << (25 - (exponent + 1)));
return sign | exp | fraction;
}
Use a debugger or IDE, set a watch/breakpoint/assert on the value of exponent (e.g. (exponent > 100).
What was the offending value of x that float_i2f() was called with? Did exponent blow up for all x, or some range?
(Did you just say when x = 0x80800000 ? Did you set a watch on exponent and step that in a debugger for that value? Should answer your question. Did you check that 0x807FFFFF works, for example?)
I tried it myself with Visual Studio, and an input of "10", and it seemed to work OK.
Q: Can you give me an input value of "x" where it fails?
Q: What compiler are you using? What platform are you running on?
You have line that increments exponent at the end of your while loop.
while((a*2) <= x)
{
if(a == 0) a =1;
if(a == 1) crash_test = exponent;
/*
if(exponent == 288)
{
exponent =0;
counter ++;
if(counter <=2)
printf("WENT OVERBOARD WTF %d ORIGINAL %d", a, crash_test);
}
*/
if(exponent > 300) break;
exponent ++;
a *= 2;
}
The variable exponent isn't doing anything mysterious. You are incrementing exponent each time through the loop, so it eventually hits any number you like. The real question is why doesn't your loop exit when you think it should?
Your loop condition depends on a. Try printing out the successive values of a as your loop repeats. Do you notice anything funny happening after a reaches 1073741824? Have you heard about integer overflow in your classes yet?
Just handle the case where "a" goes negative (or better, validate your input so it never goes negative int he first place), and you should be fine :)
There were many useless attempts at optimization in there, I've removed them so the code is easier to read. Also I used <stdint.h> types as appropriate.
There was signed integer overflow in a *= 2 in the loop, but the main problem was lack of constants and weird computation of magic numbers.
This still isn't exemplary because the constants should all be named, but this seems to work reliably.
#include <stdio.h>
#include <stdint.h>
uint32_t float_i2f(int32_t x) {
uint32_t sign= 0;
uint32_t exponent=0;
uint32_t fraction=0;
if (x == 0) return 0;
if ( x == 0x80000000 )
{
return 0xCF000000u;
}
if ( x < 0 )
{
sign = 0x80000000u;
x = - x;
}
else
{
sign = 0;
}
/* Count order of magnitude, this will be excessive by 1. */
for ( exponent = 1; ( 1u << exponent ) <= x; ++ exponent ) ;
if ( exponent < 24 ) {
fraction = 0x007FFFFF & ( x << 24 - exponent ); /* strip leading 1-bit */
} else {
fraction = 0x007FFFFF & ( x >> exponent - 24 );
}
exponent = (exponent + 0x7E) << 23;
return sign | exponent | fraction;
}
a overflows. a*2==0 when a==1<<31, so every time exponent%32==0, a==0 and you loop until exponent==300.
There are a few other issues as well:
Your fraction calculation is off when exponent>=24. Negative left shifts do not automatically turn into positive right shifts.
The mask to generate the fraction is also slightly wrong. The leading bit is always assumed to be 1, and the mantissa is only 23 bits, so fraction for x<2^23 should be:
fraction = (~(((0x01)<< 31) >> 8)) & (x << (24 - (exponent + 1)));
The loop to calculate the exponent fails when abs(x)>=1<<31 (and incidentally results in precision loss if you don't round appropriately); a loop that takes the implicit 1 into account would be better here.