Checksum of floats with roundtrip through text file - c++

I need to write a couple of floats to a text file and store a CRC32 checksum with them. Then when I read the floats back from the text file, I want to recompute the checksum and compare it to the one that was previously computed when saving the file. My problem is that the checksum sometimes fails. This is due to the fact that equal floating point numbers can be represented by different bit patterns. For completeness' sake, I will summarize the code in the next paragraphs.
I have adapted this CRC32 algorithm which I found after reading this question. Here's what it looks like:
uint32_t updC32(uint32_t octet, uint32_t crc) {
return CRC32Tab[(crc ^ octet) & 0xFF] ^ (crc >> 8);
}
template <typename T>
uint32_t updateCRC32(T s, uint32_t crc) {
const char* buf = reinterpret_cast<const char*>(&s);
size_t len = sizeof(T);
for (; len; --len, ++buf)
crc = updC32(static_cast<uint32_t>(*buf), crc);
return crc;
}
CRC32Tab contains exactly the same values as the large array in the file linked above.
This is an abbreviated version of how I write the floats to a file and compute the checksum:
float x, y, z;
// set them to some values
uint32_t crc = 0xFFFFFFFF;
crc = Utility::updateCRC32(x, crc);
crc = Utility::updateCRC32(y, crc);
crc = Utility::updateCRC32(z, crc);
const uint32_t actualCrc = ~crc;
// stream is a FILE pointer, and I don't mind the scientific representation
fprintf(stream, " ( %g %g %g )", x, y, z);
fprintf(stream, " CRC %u\n", actualCrc);
I read the values back from the file as follows. There is actually a lot more involved as the file has a more complex syntax and has to be parsed, but let's assume that getNextFloat() returns the textual representation of each float written before.
float x = std::atof(getNextFloat());
float y = std::atof(getNextFloat());
float z = std::atof(getNextFloat());
uint32_t crc = 0xFFFFFFFF;
crc = Utility::updateCRC32(x, crc);
crc = Utility::updateCRC32(y, crc);
crc = Utility::updateCRC32(z, crc);
const uint32_t actualCrc = ~crc;
const uint32_t fileCrc = // read the CRC from the file
assert(fileCrc == actualCrc); // fails often, but not always
The source of this problem to be that std::atof will return a different bit representation of the float encoded in the string which was read from the file than the bit representation of the float that was used to write that string to the file.
So, my question is: Is there another way to achieve my goal of checksumming floats which are roundtripped through a textual representation other than to checksum the strings themselves?
Thanks for reading!

The source of the issue is apparent from your comment:
If I'm not completely mistaken, there is no rounding happening here. The %g specifier chooses the shortest string representation that exactly represents the number.
This is incorrect. If no precision is specified, it defaults to 6, and rounding will definitely occur for most floating-point inputs.
If you need a human-readable round-trippable format, %a is by far the best-choice. Failing that, you will need to specify a precision of at least 9 (assuming that float on your system is IEEE-754 single precision).
You may still be tripped up by NaN encodings, since the standard does not specify how or if they must be printed.

If the text file doesn't have to be human-readable, use hexadecimal float literals instead, they are exact so you won't have this problem of differences between textual and in-memory values.

If your standard library's float-to-text and text-to-float conversions do proper rounding, you just need enough sigificant digits for the float->text->float roundtrip to be lossless unless you also have Infs and NaNs, still it should be "value-preserving", not necessarily bitpattern preserving since there are multiple representations for infinity or NaN, I think. For an IEEE-754 64 bit double 17 significant digits is just enough to make the roundtrip lossless with respect to the actual value.

Your CRC algorithm is flawed for any type which has multiple binary representations for a single value. IEEE 754 has two representations for 0.0, to wit +0.0 and -0.0. Other, non-finite values such as NaN are potentially troublesome too.

Would it be acceptable to canonicalize your numbers before you update the CRC? So while saving, you would get a temporary string version of your number (with sprintf or whatever matches your serialization's format), then convert this string back to a numeric value, and then use this result to update the CRC. This way, you know that the CRC will match the deserialized value.

Related

float number to string converting implementation in STD

I faced with a curious issue. Look at this simple code:
int main(int argc, char **argv) {
char buf[1000];
snprintf_l(buf, sizeof(buf), _LIBCPP_GET_C_LOCALE, "%.17f", 0.123e30f);
std::cout << "WTF?: " << buf << std::endl;
}
The output looks quire wired:
123000004117574256822262431744.00000000000000000
My question is how it's implemented? Can someone show me the original code? I did not find it. Or maybe it's too complicated for me.
I've tried to reimplement the same transformation double to string with Java code but was failed. Even when I tried to get exponent and fraction parts separately and summarize fractions in cycle I always get zeros instead of these numbers "...822262431744". When I tried to continue summarizing fractions after the 23 bits (for float number) I faced with other issue - how many fractions I need to collect? Why the original code stops on left part and does not continue until the scale is end?
So, I really do not understand the basic logic, how it implemented. I've tried to define really big numbers (e.g. 0.123e127f). And it generates huge number in decimal format. The number has much higher precision than float can be. Looks like this is an issue, because the string representation contains something which float number cannot.
Please read documentation:
printf, fprintf, sprintf, snprintf, printf_s, fprintf_s, sprintf_s, snprintf_s - cppreference.com
The format string consists of ordinary multibyte characters (except %), which are copied unchanged into the output stream, and conversion specifications. Each conversion specification has the following format:
introductory % character
...
(optional) . followed by integer number or *, or neither that specifies precision of the conversion. In the case when * is used, the precision is specified by an additional argument of type int, which appears before the argument to be converted, but after the argument supplying minimum field width if one is supplied. If the value of this argument is negative, it is ignored. If neither a number nor * is used, the precision is taken as zero. See the table below for exact effects of precision.
....
Conversion Specifier
Explanation
Expected Argument Type
f F
converts floating-point number to the decimal notation in the style [-]ddd.ddd. Precision specifies the exact number of digits to appear after the decimal point character. The default precision is 6. In the alternative implementation decimal point character is written even if no digits follow it. For infinity and not-a-number conversion style see notes.
double
So with f you forced form ddd.ddd (no exponent) and with .17 you have forced to show 17 digits after decimal separator. With such big value printed outcome looks that odd.
Finally I've found out what the difference between Java float -> decimal -> string convertation and c++ float -> string (decimal) convertation. I did not find the original source code, but I replicated the same code in Java to make it clear. I think the code explains everything:
// the context size might be calculated properly by getting maximum
// float number (including exponent value) - its 40 + scale, 17 for me
MathContext context = new MathContext(57, RoundingMode.HALF_UP);
BigDecimal divisor = BigDecimal.valueOf(2);
int tmp = Float.floatToRawIntBits(1.23e30f)
boolean sign = tmp < 0;
tmp <<= 1;
// there might be NaN value, this code does not support it
int exponent = (tmp >>> 24) - 127;
tmp <<= 8;
int mask = 1 << 23;
int fraction = mask | (tmp >>> 9);
// at this line we have all parts of the float: sign, exponent and fractions. Let's build mantissa
BigDecimal mantissa = BigDecimal.ZERO;
for (int i = 0; i < 24; i ++) {
if ((fraction & mask) == mask) {
// i'm not sure about speed, maybe division at each iteration might be faster than pow
mantissa = mantissa.add(divisor.pow(-i, context));
}
mask >>>= 1;
}
// it was the core line where I was losing accuracy, because of context
BigDecimal decimal = mantissa.multiply(divisor.pow(exponent, context), context);
String str = decimal.setScale(17, RoundingMode.HALF_UP).toPlainString();
// add minus manually, because java lost it if after the scale value become 0, C++ version of code doesn't do it
if (sign) {
str = "-" + str;
}
return str;
Maybe topic is useless. Who really need to have the same implementation like C++ has? But at least this code keeps all precision for float number comparing to the most popular way converting float to decimal string:
return BigDecimal.valueOf(1.23e30f).setScale(17, RoundingMode.HALF_UP).toPlainString();
The C++ implementation you are using uses the IEEE-754 binary32 format for float. In this format, the closet representable value to 0.123•1030 is 123,000,004,117,574,256,822,262,431,744, which is represented in the binary32 format as +13,023,132•273. So 0.123e30f in the source code yields the number 123,000,004,117,574,256,822,262,431,744. (Because the number is represented as +13,023,132•273, we know its value is that exactly, which is 123,000,004,117,574,256,822,262,431,744, even though the digits “123000004117574256822262431744” are not stored directly.)
Then, when you format it with %.17f, your C++ implementation prints the exact value faithfully, yielding “123000004117574256822262431744.00000000000000000”. This accuracy is not required by the C++ standard, and some C++ implementations will not do the conversion exactly.
The Java specification also does not require formatting of floating-point values to be exact, at least in some formatting operations. (I am going from memory and some supposition here; I do not have a citation at hand.) It allows, perhaps even requires, that only a certain number of correct digits be produced, after which zeros are used if needed for positioning relative to the decimal point or for the requested format.
The number has much higher precision than float can be.
For any value represented in the float format, that value has infinite precision. The number +13,023,132•273 is exactly +13,023,132•273, which is exactly 123,000,004,117,574,256,822,262,431,744, to infinite precision. The precision the format has for representing numbers affects only which numbers it can represent, not how precisely it represents the numbers that it does represent.

How to convert a float into uint8_t?

I am trying to sent multiple float values from an arduino using the LMIC lora library. The LMIC function only takes an uint8_t as its transmission argument type.
temp contains my temperature value as a float and I can print the measured temperature as such without problem:
Serial.println((String)"Temp C: " + temp);
There is an example that shows this code being used to do the conversion:
uint16_t payloadTemp = LMIC_f2sflt16(temp);
// int -> bytes
byte tempLow = lowByte(payloadTemp);
byte tempHigh = highByte(payloadTemp);
payload[0] = tempLow;
payload[1] = tempHigh;
I am not sure if this would work, it doesn't seem to be. The resulting data that gets sent is: FF 7F
I don't believe this is what I am looking for.
I have also tried the following conversion procedure:
uint8_t *array;
array = (unit8_t*)(&f);
using arduino, this will not even compile.
something that does work, but creates a much too long result is:
String toSend = String(temp);
toSend.toCharArray(payload, toSend.length());
payloadActualLength = toSend.length();
Serial.print("the payload is: ");
Serial.println(payload);
but the resulting hex is far far too long to when I get my other values that I want to send in.
So how do I convert a float into a uint8_t value and why doesn't my original given conversion not work as how I expect it to work?
Sounds like you are trying to figure out a minimally sized representation for these numbers that you can transmit in some very small packet format. If the range is suitably limited, this can often best be done by using an appropriate fixed-point representation.
For example, if your temperatures are always in the range 0..63, you could use a 6.2 fixed point format in a single byte:
if (value < 0.0 || value > 63.75) {
// out of range for 6.2 fixed point, so do something else.
} else {
uint8_t bval = (uint8_t)(value * 4 + 0.5);
// output this byte value
}
when you read the byte back, you just multiply it by 0.25 to get the (approximate) float value back.
Of course, since 8 bits is pretty limited for precision (about 2 digits), it will get rounded a bit to fit -- your 23.24 value will be rounded to 23.25. If you need more precision, you'll need to use more bits.
If you only need a little precision but a wider range, you can use a custom floating point format. IEEE 16-bit floats (S5.10) are pretty good (give you 3 digits of precision and around 10 orders of magnitude range), but you can go even smaller, particularly if you don't need negative values. A U4.4 float format give you 1 digit of precision and 5 orders of magnitude range in 8 bits (positive only)
If you know that both sender and receiver use the same fp binary representation and both use the same endianness then you can just memcpy:
float a = 23.24;
uint8_t buffer[sizeof(float)];
::memcpy(buffer, &a, sizeof(float));
In Arduino one can convert the float into a String
float ds_temp=sensors.getTempCByIndex(0); // DS18b20 Temp sensor
then convert the String into a char array:
String ds_str = String(ds_temp);
char* ds_char[ds_str.length()];
ds_str.toCharArray(ds_char ,ds_str.length()-1);
uint8_t* data =(uint8_t*)ds_char;
the uint_8 value is stored in data with a size sizeof(data)
A variable of uint8_t can carry only 256 values. If you actually want to squeeze temperature into single byte, you have to use fixed-point approach or least significant bit value approach
Define working range, T0 and T1
divide T0-T1 by 256 ( 2^8, a number of possible values).
Resulting value would be a float constant (working with a flexible LSB value is possible) by which you divide original float value X: R = (X-T0)/LSB. You can round the result, it would fit into byte.
On receiving side you have to multiply integer value by same constant X = R*LSB + T0.

Conversion between 4-byte IBM floating-point and IEEEE [duplicate]

I need to read values from a binary file. The data format is IBM single Precision Floating Point (4-byte Hexadecimal Exponent Data). I have C++ code that reads from the file and takes out each byte and stores it like so
unsigned char buf[BUF_LEN];
for (long position = 0; position < fileLength; position += BUF_LEN) {
file.read((char* )(&buf[0]), BUF_LEN);
// printf("\n%8ld: ", pos);
for (int byte = 0; byte < BUF_LEN; byte++) {
// printf(" 0x%-2x", buf[byte]);
}
}
This prints out the hexadecimal values of each byte.
this picture specifies IBM single precision floating point
IBM single precision floating point
How do I convert the buffer into floating point values?
The format is actually quite simple, and not particularly different than IEEE 754 binary32 format (it's actually simpler, not supporting any of the "magic" NaN/Inf values, and having no subnormal numbers, because the mantissa here has an implicit 0 on the left instead of an implicit 1).
As Wikipedia puts it,
The number is represented as the following formula: (−1)sign × 0.significand × 16exponent−64.
If we imagine that the bytes you read are in a uint8_t b[4], then the resulting value should be something like:
uint32_t mantissa = (b[1]<<16) | (b[2]<<8) | b[3];
int exponent = (b[0] & 127) - 64;
double ret = mantissa * exp2(-24 + 4*exponent);
if(b[0] & 128) ret *= -1.;
Notice that here I calculated the result in a double, as the range of a IEEE 754 float is not enough to represent the same-sized IBM single precision value (also the opposite holds). Also, keep in mind that, due to endian issues, you may have to revert the indexes in my code above.
Edit: #Eric Postpischil correctly points out that, if you have C99 or POSIX 2001 available, instead of mantissa * exp2(-24 + 4*exponent) you should use ldexp(mantissa, -24 + 4*exponent), which should be more precise (and possibly faster) across implementations.

Rounding error of binary32

As part of a homework, I'm writing a program that takes a float decimal number as input entered from terminal, and return IEEE754 binary32 of that number AND return 1 if the binary exactly represents the number, 0 otherwise. We are only allowed to use iostream and cmath.
I already wrote the part that returns binary32 format, but I don't understand how to see if there's rounding to that format.
My idea to see the rounding was to calculate the decimal number back from binary32 form and compare it with the original number. But I am having difficulty with saving the returned binary32 as some type of data, since I can't use the vector header. I've tried using for loops and pow, but I still get the indices wrong.
Also, I'm having trouble understanding what exactly is df or *df? I wrote the code myself, but I only know that I needed to convert address pointed to float to address pointed to char.
My other idea was to compare binary32 and binary 64, which gives more precision. And again, I don't know how to do this without using vector?
int main(int argc, char* argv[]){
int i ,j;
float num;
num = atof(argv[1]);
char* numf = (char*)(&num);
for (i = sizeof(float) - 1; i >= 0; i--){
for (j = 7; j >= 0; j--)
if (numf[i] & (1 << j)) {
cout << "1";
}else{
cout << "0";
}
}
cout << endl;
}
//////
Update:
Since there's no other way around without using header files, I hard coded for loops to convert binary32 back to decimal.
Since x = 1.b31b30...b0 * 2^p. One for loop for finding the exponent and one for loop for finding the significand.
Basic idea: Convert your number d back to a string (eg. with to_string) and compare it to the input. If the strings are different, there was some loss because of the limitations of float.
Of course, this means your input always has to be in the same string format that to_string uses. No additional unneeded 0's, no whitespaces, etc.
...
That said, doing the float conversion without cast (but with manually parsing the input and calculating the IEEE754 bits) is more work initally, but in return, it sovled this problem automatically. And, as noted in the comments, your cast might not work the way you want.

How to use negative number with openssl's BIGNUM?

I want a C++ version of the following Java code.
BigInteger x = new BigInteger("00afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d", 16);
BigInteger y = x.multiply(BigInteger.valueOf(-1));
//prints y = ff5028d4a7ca52dd15a297d860053f49ad83e54f04ce0e19b908d728a342c519a3
System.out.println("y = " + new String(Hex.encode(y.toByteArray())));
And here is my attempt at a solution.
BIGNUM* x = BN_new();
BN_CTX* ctx = BN_CTX_new();
std::vector<unsigned char> xBytes = hexStringToBytes(“00afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d");
BN_bin2bn(&xBytes[0], xBytes.size(), x);
BIGNUM* negative1 = BN_new();
std::vector<unsigned char> negative1Bytes = hexStringToBytes("ff");
BN_bin2bn(&negative1Bytes[0], negative1Bytes.size(), negative1);
BIGNUM* y = BN_new();
BN_mul(y, x, negative1, ctx);
char* yHex = BN_bn2hex(y);
std::string yStr(yHex);
//prints y = AF27542CDD7775C7730ABF785AC5F59C299E964A36BFF460B031AE85607DAB76A3
std::cout <<"y = " << yStr << std::endl;
(Ignored the case.) What am I doing wrong? How do I get my C++ code to output the correct value "ff5028d4a7ca52dd15a297d860053f49ad83e54f04ce0e19b908d728a342c519a3". I also tried setting negative1 by doing BN_set_word(negative1, -1), but that gives me the wrong answer too.
The BN_set_negative function sets a negative number.
The negative of afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d is actually -afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d , in the same way as -2 is the negative of 2.
ff5028d4a7ca52dd15a297d860053f49ad83e54f04ce0e19b908d728a342c519a3 is a large positive number.
The reason you are seeing this number in Java is due to the toByteArray call . According to its documentation, it selects the minimum field width which is a whole number of bytes, and also capable of holding a two's complement representation of the negative number.
In other words, by using the toByteArray function on a number that current has 1 sign bit and 256 value bits, you end up with a field width of 264 bits. However if your negative number's first nibble were 7 for example, rather than a, then (according to this documentation - I haven't actually tried it) you would get a 256-bit field width out (i.e. 8028d4..., not ff8028d4.
The leading 00 you have used in your code is insignificant in OpenSSL BN. I'm not sure if it is significant in BigInteger although the documentation for that constructor says "The String representation consists of an optional minus or plus sign followed by a sequence of one or more digits in the specified radix. "; so the fact that it accepts a minus sign suggests that if the minus sign is not present then the input is treated as a large positive number, even if its MSB is set. (Hopefully a Java programmer can clear this paragraph up for me).
Make sure you keep clear in your mind the distinction between a large negative value, and a large positive number obtained by modular arithmetic on that negative value, such as is the output of toByteArray.
So your question is really: does Openssl BN have a function that emulates the behaviour of BigInteger.toByteArray() ?
I don't know if such a function exists (the BN library has fairly bad documentation IMHO, and I've never heard of it being used outside of OpenSSL, especially not in a C++ program). I would expect it doesn't, since toByteArray's behaviour is kind of weird; and in any case, all of the BN output functions appear to output using a sign-magnitude format, rather than a two's complement format.
But to replicate that output, you could add either 2^256 or 2^264 to the large negative number , and then do BN_bn2hex . In this particular case, add 2^264, In general you would have to measure the current bit-length of the number being stored and round the exponent up to the nearest multiple of 8.
Or you could even output in sign-magnitude format (using BN_bn2hex or BN_bn2mpi) and then iterate through inverting each nibble and fixing up the start!
NB. Is there any particular reason you want to use OpenSSL BN? There are many alternatives.
Although this is a question from 2014 (more than five years ago), I would like to solve your problem / clarify the situation, which might help others.
a) One's complement and two's complement
In finite number theory, there is "one's complement" and "two's complement" representation of numbers. One's complement stores absolute (positive) values only and does not know a sign. If you want to have a sign for a number stored as one's complement, then you have to store it separately, e.g. in one bit (0=positive, 1=negative). This is exactly the situation of floating point numbers (IEEE 754). The mantissa is stored as the one's complement together with the exponent and one additional sign bit. Numbers in one's complement have two zeros: -0 and +0 because you treat the sign independently of the absolute value itself.
In two's complement, the most significant bit is used as the sign bit. There is no '-0' because negating a value in two's complement means performing the logical NOT (in C: tilde) operation followed by adding one.
As an example, one byte (in two's complement) can be one of the three values 0xFF, 0x00, 0x01 meaning -1, 0 and 1. There is no room for the -0. If you have, e.g. 0xFF (-1) and want to negate it, then the logical NOT operation computes 0xFF => 0x00. Adding one yields 0x01, which is 1.
b) OpenSSL BIGNUM and Java BigInteger
OpenSSL's BIGNUM implementation represents numbers as one's complement. The Java BigInteger treats numbers as two's complement. That was your desaster. Your big integer (in hex) is 00afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d. This is a positive 256bit integer. It consists of 33 bytes because there is a leading zero byte 0x00, which is absolutely correct for an integer stored as two's complement because the most significant bit (omitting the initial 0x00) is set (in 0xAF), which would make this number a negative number.
c) Solution you were looking for
OpenSSL's function bin2bn works with absolute values only. For OpenSSL, you can leave the initial zero byte or cut it off - does not make any difference because OpenSSL canonicalizes the input data anyway, which means cutting off all leading zero bytes. The next problem of your code is the way you want to make this integer negative: You want to multiply it with -1. Using 0xFF as the only input byte to bin2bn makes this 255, not -1. In fact, you multiply your big integer with 255 yielding the overall result AF27542CDD7775C7730ABF785AC5F59C299E964A36BFF460B031AE85607DAB76A3, which is still positive.
Multiplication with -1 works like this (snippet, no error checking):
BIGNUM* x = BN_bin2bn(&xBytes[0], (int)xBytes.size(), NULL);
BIGNUM* negative1 = BN_new();
BN_one(negative1); /* negative1 is +1 */
BN_set_negative(negative1, 1); /* negative1 is now -1 */
BN_CTX* ctx = BN_CTX_new();
BIGNUM* y = BN_new();
BN_mul(y, x, negative1, ctx);
Easier is:
BIGNUM* x = BN_bin2bn(&xBytes[0], (int)xBytes.size(), NULL);
BN_set_negative(x,1);
This does not solve your problem because as M.M said, this just makes -afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d from afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d.
You are looking for the two's compülement of your big integer, which is
int i;
for (i = 0; i < (int)sizeof(value); i++)
value[i] = ~value[i];
for (i = ((int)sizeof(posvalue)) - 1; i >= 0; i--)
{
value[i]++;
if (0x00 != value[i])
break;
}
This is an unoptimized version of the two's complement if 'value' is your 33 byte input array containing your big integer prefixed by the byte 0x00. The result of this operation are the 33 bytes ff5028d4a7ca52dd15a297d860053f49ad83e54f04ce0e19b908d728a342c519a3.
d) Working with two's complement and OpenSSL BIGNUM
The whole sequence is like this:
Prologue: If input is negative (check most significant bit), then compute two's complement of input.
Convert to BIGNUM using BN_bin2bn
If input was negative, then call BN_set_negative(x,1)
Main function: Carry out all arithmetic operations using OpenSSL BIGNUM package
Call BN_is_negative to check for negative result
Convert to raw binary byte using BN_bn2bin
If result was negative, then compute two's complement of result.
Epilogue: If result was positive and result raw (output of step 7) byte's most significant bit is set, then prepend a byte 0x00. If result was negative and result raw byte's most significant bit is clear, then prepend a byte 0xFF.