How to convert a float into uint8_t? - c++

I am trying to sent multiple float values from an arduino using the LMIC lora library. The LMIC function only takes an uint8_t as its transmission argument type.
temp contains my temperature value as a float and I can print the measured temperature as such without problem:
Serial.println((String)"Temp C: " + temp);
There is an example that shows this code being used to do the conversion:
uint16_t payloadTemp = LMIC_f2sflt16(temp);
// int -> bytes
byte tempLow = lowByte(payloadTemp);
byte tempHigh = highByte(payloadTemp);
payload[0] = tempLow;
payload[1] = tempHigh;
I am not sure if this would work, it doesn't seem to be. The resulting data that gets sent is: FF 7F
I don't believe this is what I am looking for.
I have also tried the following conversion procedure:
uint8_t *array;
array = (unit8_t*)(&f);
using arduino, this will not even compile.
something that does work, but creates a much too long result is:
String toSend = String(temp);
toSend.toCharArray(payload, toSend.length());
payloadActualLength = toSend.length();
Serial.print("the payload is: ");
Serial.println(payload);
but the resulting hex is far far too long to when I get my other values that I want to send in.
So how do I convert a float into a uint8_t value and why doesn't my original given conversion not work as how I expect it to work?

Sounds like you are trying to figure out a minimally sized representation for these numbers that you can transmit in some very small packet format. If the range is suitably limited, this can often best be done by using an appropriate fixed-point representation.
For example, if your temperatures are always in the range 0..63, you could use a 6.2 fixed point format in a single byte:
if (value < 0.0 || value > 63.75) {
// out of range for 6.2 fixed point, so do something else.
} else {
uint8_t bval = (uint8_t)(value * 4 + 0.5);
// output this byte value
}
when you read the byte back, you just multiply it by 0.25 to get the (approximate) float value back.
Of course, since 8 bits is pretty limited for precision (about 2 digits), it will get rounded a bit to fit -- your 23.24 value will be rounded to 23.25. If you need more precision, you'll need to use more bits.
If you only need a little precision but a wider range, you can use a custom floating point format. IEEE 16-bit floats (S5.10) are pretty good (give you 3 digits of precision and around 10 orders of magnitude range), but you can go even smaller, particularly if you don't need negative values. A U4.4 float format give you 1 digit of precision and 5 orders of magnitude range in 8 bits (positive only)

If you know that both sender and receiver use the same fp binary representation and both use the same endianness then you can just memcpy:
float a = 23.24;
uint8_t buffer[sizeof(float)];
::memcpy(buffer, &a, sizeof(float));

In Arduino one can convert the float into a String
float ds_temp=sensors.getTempCByIndex(0); // DS18b20 Temp sensor
then convert the String into a char array:
String ds_str = String(ds_temp);
char* ds_char[ds_str.length()];
ds_str.toCharArray(ds_char ,ds_str.length()-1);
uint8_t* data =(uint8_t*)ds_char;
the uint_8 value is stored in data with a size sizeof(data)

A variable of uint8_t can carry only 256 values. If you actually want to squeeze temperature into single byte, you have to use fixed-point approach or least significant bit value approach
Define working range, T0 and T1
divide T0-T1 by 256 ( 2^8, a number of possible values).
Resulting value would be a float constant (working with a flexible LSB value is possible) by which you divide original float value X: R = (X-T0)/LSB. You can round the result, it would fit into byte.
On receiving side you have to multiply integer value by same constant X = R*LSB + T0.

Related

How to convert an arbitrary length unsigned int array to a base 10 string representation?

I am currently working on an arbitrary size integer library for learning purposes.
Each number is represented as uint32_t *number_segments.
I have functional arithmetic operations, and the ability to print the raw bits of my number.
However, I have struggled to find any information on how I could convert my arbitrarily long array of uint32 into the correct, and also arbitrarily long base 10 representation as a string.
Essentially I need a function along the lines of:
std::string uint32_array_to_string(uint32_t *n, size_t n_length);
Any pointers in the right direction would be greatly appreciated, thank you.
You do it the same way as you do with a single uint64_t except on a larger scale (bringing this into modern c++ is left for the reader):
char * to_str(uint64_t x) {
static char buf[23] = {0}; // leave space for a minus sign added by the caller
char *p = &buf[22];
do {
*--p = '0' + (x % 10);
x /= 10;
} while(x > 0);
return p;
}
The function fills a buffer from the end with the lowest digits and divides the number by 10 in each step and then returns a pointer to the first digit.
Now with big nums you can't use a static buffer but have to adjust the buffer size to the size of your number. You probably want to return a std::string and creating the number in reverse and then copying it into a result string is the way to go. You also have to deal with negative numbers.
Since a long division of a big number is expensive you probably don't want to divide by 10 in the loop. Rather divide by 1'000'000'000 and convert the remainder into 9 digits. This should be the largest power of 10 you can do long division by a single integer, not bigum / bignum. Might be you can only do 10'000 if you don't use uint64_t in the division.

Algorithm for converting large hex numbers into decimal form (base 10 form)

I have an array of bytes and length of that array. The goal is to output the string containing that number represented as base-10 number.
My array is little endian. It means that the first (arr[0]) byte is the least significant byte. This is an example:
#include <iostream>
using namespace std;
typedef unsigned char Byte;
int main(){
int len = 5;
Byte *arr = new Byte[5];
int i = 0;
arr[i++] = 0x12;
arr[i++] = 0x34;
arr[i++] = 0x56;
arr[i++] = 0x78;
arr[i++] = 0x9A;
cout << hexToDec(arr, len) << endl;
}
The array consists of [0x12, 0x34, 0x56, 0x78, 0x9A]. The function hexToDec which I want to implement should return 663443878930 which is that number in decimal.
But, the problem is because my machine is 32-bit so it instead outputs 2018915346 (notice that this number is obtained from integer overflow). So, the problem is because I am using naive way (iterating over the array and multiplying it by 256 to the power of position in the array, then multiplying by the byte at that position and finally adding to the sum). This of course yields integer overflow.
I also tried with long long int, but at some point of course, integer overflow occurs.
The arrays I want to represent as decimal number can be very long (more that 1000 bytes) which definitelly requires a lot more clever algorithm than my naive one.
Question
What would be the good algorithm to achieve that? Also, another question I must ask is what is the optimal complexity of that algorithm? Can it be done in linear complexity O(n) where n is the length of the array? I really cannot think about a good idea. Implementation is not the problem, my lack of ideas is.
Advice or idea how to do that will be enough. But, if it is easier to explain using some code, feel free to write in C++.
You can and can not achieve this in O(n). All depends on the internal representation of your number.
For truly binary form (power of 2 base like 256)
is this not solvable in O(n) The hex print of such number is in O(n) however and you can convert HEX string to decadic and back like this:
How to convert a gi-normous integer (in string format) to hex format?
As creating hex string does not require bignum math. You just consequently print the array from MSW to LSW in HEX. This is O(n) but the conversion to DEC is not.
To print bigint in decadic you need to continuously mod/div it by 10 obtaining digits from LSD to MSD until the subresult is zero. Then just print them in reverse order ... The division and modulus can be done at once as they are the same operation. So if your number has N decadic digits then you need N bigint divisions. Each bigint division can be done for example by binary division so we need log2(n) bit shifts and substraction which are all O(n) so the complexity of native bigint print is:
O(N.n.log2(n))
We can compute N from n by logarithms so for BYTEs:
N = log10(base^n)
= log10(2^(8.n))
= log2(2^(8.n))/log2(10)
= 8.n/log2(10)
= 8.n*0.30102999
= 2.40824.n
So the complexity will be:
O(2.40824.n.n.log2(n)) = O(n^2.log2(n))
Which is insaine for really big numbers.
power of 10 base binary form
To do this in O(n) you need to slightly change the base of your number. it will still be represented in binary form but the base will be power of 10.
For example if your number will be represented by 16bit WORDs you can use highest base 10000 which still fits in it (max is 16536). Now you print in decadic easily just print consequently each word in you array from MSW to LSW.
Example:
lets have big number 1234567890 stored as BYTEs with base 100 where MSW goes first. So the number will be stored as follows
BYTE x[] = { 12, 34, 56, 78, 90 }
But as you can see while using BYTEs and base 100 we are wasting space as only 100*100/256=~39% is used from the full BYTE range. The operations on such numbers are slightly different then in raw binary form as we need to handle overflow/underflow and carry flag differently.
BCD (binary coded decimal)
There is also another option which is to use BCD (binary coded decimal) it is almost the same as previous option but the base 10 is used for single digit of number... each nibel (4 bits) contains exactly one digit. Processors usually have instruction set for this number representation. The usage is like for binary encoded numbers but after each arithmetics operation is BCD recovery instruction called like DAA which uses Carry and Auxiliary Carry flags state to recover BCD encoding of the result. To print value in BCD in decadic you just print the value as HEX. Our number from previous example would be encoded in BCD like this:
BYTE x[] = { 0x12, 0x34, 0x56, 0x78, 0x90 }
Off course both #2,#3 will make impossible the HEX print of your number in O(n).
The number you posted 0x9a78563412, as you have represented it in little endian format, can be converted to a proper uint64_t with the following code:
#include <iostream>
#include <stdint.h>
int main()
{
uint64_t my_number = 0;
const int base = 0x100; /* base 256 */
uint8_t array[] = { 0x12, 0x34, 0x56, 0x78, 0x9a };
/* go from right to left, as it is little endian */
for (int i = sizeof array; i > 0;) {
my_number *= base;
my_number += array[--i];
}
std::cout << my_number << std::endl; /* conversion uses 10 base by default */
}
sample run gives:
$ num
663443878930
as we are in a base that is an exact power of 2, we can optimize the code by using
my_number <<= 8; /* left shift by 8 */
my_number |= array[--i]; /* bit or */
as these operations are simpler than integer multiplication and sum, it is expected some (but not much) efficiency improvement in doing that way. It should be more expressive to leave it as in the first example, as it more represents an arbitrary base conversion.
You'll need to brush up your elementary school skills and implement long division.
I think you'd be better off implementing the long division in base 16 (divide the number by 0x0A each iteration). Take the reminder of each division - these are your decimal digits (first one is the least significant digit).

Arduino code not adding numbers properly

I am trying to create a function to find the average of sensor values on the Arduino in order to calibrate it, however the summation is not working properly and therefore the average is not correct. The table below shows a sample of the output. The left column should be the rolling sum of the output which is displayed in the right column (How are negatives getting in there?)
-10782 17112
6334 17116
23642 17308
-24802 17092
-7706 17096
9326 17032
26422 17096
-21986 17128
The calibrateSensors() function, which is supposed to execute this is shown below
void calibrateSensors(int16_t * accelOffsets){
int16_t rawAccel[3];
int sampleSize = 2000;
Serial.println("Accelerometer calibration in progress...");
for (int i=0; i<sampleSize; i ++){
readAccelData(rawAccel); // get raw accelerometer data
accelOffsets[0] += rawAccel[0]; // add x accelerometer values
accelOffsets[1] += rawAccel[1]; // add y accelerometer values
accelOffsets[2] += rawAccel[2]; // add z accelerometer values
Serial.print(accelOffsets[2]);
Serial.print("\t\t");
Serial.print(rawAccel[2]);
Serial.print(" \t FIRST I:");
Serial.println(i);
}
for (int i=0; i<3; i++)
{
accelOffsets[i] = accelOffsets[i] / sampleSize;
Serial.print("Second I:");
Serial.println(i);
}
Serial.println("Accelerometer calibration complete");
Serial.println("Accelerometer Offsets:");
Serial.print("Offsets (x,y,z): ");
Serial.print(accelOffsets[0]);
Serial.print(", ");
Serial.print(accelOffsets[1]);
Serial.print(", ");
Serial.println(accelOffsets[2]);
}
and the readAccelData() function is as follows
void readAccelData(int16_t * destination){
// x/y/z accel register data stored here
uint8_t rawData[6];
// Read the six raw data registers into data array
I2Cdev::readBytes(MPU6050_ADDRESS, ACCEL_XOUT_H, 6, &rawData[0]);
// Turn the MSB and LSB into a signed 16-bit value
destination[0] = (int16_t)((rawData[0] << 8) | rawData[1]) ;
destination[1] = (int16_t)((rawData[2] << 8) | rawData[3]) ;
destination[2] = (int16_t)((rawData[4] << 8) | rawData[5]) ;
Any idea where I am going wrong?
You have two problems:
You do not initialise your arrays. They start with garbage data in them (space is allocated, but not cleared). You can initialise an array to be all zeros by doing:
int array[5] = {};
This will result in a array that initially looks like [0,0,0,0,0]
Your second problem is that you are creating an array of 16-bit signed integers.
A 16-bit integer can store 65536 different values. Problem is that because you are using a signed type, there are only 32767 positive integer values that you can use. You are overflowing and getting negative numbers when you try and store a value larger than that.
I believe the arduino supports 32-bit ints. If you only want positive numbers, then use an unsigned type.
To use an explicit 32-bit integer:
#include <stdint.h>
int32_t my_int = 0;
Some info on standard variable sizes (note that they can be different sizes based on the arduino model the code is built for):
https://www.arduino.cc/en/Reference/Int
On the Arduino Uno (and other ATMega based boards) an int stores a
16-bit (2-byte) value. This yields a range of -32,768 to 32,767
(minimum value of -2^15 and a maximum value of (2^15) - 1). On the
Arduino Due and SAMD based boards (like MKR1000 and Zero), an int
stores a 32-bit (4-byte) value. This yields a range of -2,147,483,648
to 2,147,483,647 (minimum value of -2^31 and a maximum value of (2^31)
- 1).
https://www.arduino.cc/en/Reference/UnsignedInt
On the Uno and other ATMEGA based boards, unsigned ints (unsigned
integers) are the same as ints in that they store a 2 byte value.
Instead of storing negative numbers however they only store positive
values, yielding a useful range of 0 to 65,535 (2^16) - 1).
The Due stores a 4 byte (32-bit) value, ranging from 0 to
4,294,967,295 (2^32 - 1).
https://www.arduino.cc/en/Reference/UnsignedLong
Unsigned long variables are extended size variables for number
storage, and store 32 bits (4 bytes). Unlike standard longs unsigned
longs won't store negative numbers, making their range from 0 to
4,294,967,295 (2^32 - 1).
With this code:
void calibrateSensors(int16_t * accelOffsets){
int16_t rawAccel[3];
// [...]
accelOffsets[0] += rawAccel[0];
There's an obvious problem: You are adding two 16bit signed integers here. A typical maximum value for a 16bit signed integer is 0x7fff (the first bit would be used as the sign bit), in decimal 32767.
Given your first two sample numbers, 17112 + 17116 is already 34228, so you're overflowing your integer type.
Overflowing a signed integer is undefined behavior in c, because different implementations could use different representations for negative numbers. For a program with undefined behavior, you can't expect any particular result. A very likely result is that the value will "wrap around" into the negative range.
As you already use types from stdint.h, the solution is simple: Use uint32_t for your sums, this type has enough bits for values up to 4294967295.
As a general rule: If you never need a negative value, just stick to the unsigned type. I don't see a reason why you use int16_t here, just use uint16_t.

Checksum of floats with roundtrip through text file

I need to write a couple of floats to a text file and store a CRC32 checksum with them. Then when I read the floats back from the text file, I want to recompute the checksum and compare it to the one that was previously computed when saving the file. My problem is that the checksum sometimes fails. This is due to the fact that equal floating point numbers can be represented by different bit patterns. For completeness' sake, I will summarize the code in the next paragraphs.
I have adapted this CRC32 algorithm which I found after reading this question. Here's what it looks like:
uint32_t updC32(uint32_t octet, uint32_t crc) {
return CRC32Tab[(crc ^ octet) & 0xFF] ^ (crc >> 8);
}
template <typename T>
uint32_t updateCRC32(T s, uint32_t crc) {
const char* buf = reinterpret_cast<const char*>(&s);
size_t len = sizeof(T);
for (; len; --len, ++buf)
crc = updC32(static_cast<uint32_t>(*buf), crc);
return crc;
}
CRC32Tab contains exactly the same values as the large array in the file linked above.
This is an abbreviated version of how I write the floats to a file and compute the checksum:
float x, y, z;
// set them to some values
uint32_t crc = 0xFFFFFFFF;
crc = Utility::updateCRC32(x, crc);
crc = Utility::updateCRC32(y, crc);
crc = Utility::updateCRC32(z, crc);
const uint32_t actualCrc = ~crc;
// stream is a FILE pointer, and I don't mind the scientific representation
fprintf(stream, " ( %g %g %g )", x, y, z);
fprintf(stream, " CRC %u\n", actualCrc);
I read the values back from the file as follows. There is actually a lot more involved as the file has a more complex syntax and has to be parsed, but let's assume that getNextFloat() returns the textual representation of each float written before.
float x = std::atof(getNextFloat());
float y = std::atof(getNextFloat());
float z = std::atof(getNextFloat());
uint32_t crc = 0xFFFFFFFF;
crc = Utility::updateCRC32(x, crc);
crc = Utility::updateCRC32(y, crc);
crc = Utility::updateCRC32(z, crc);
const uint32_t actualCrc = ~crc;
const uint32_t fileCrc = // read the CRC from the file
assert(fileCrc == actualCrc); // fails often, but not always
The source of this problem to be that std::atof will return a different bit representation of the float encoded in the string which was read from the file than the bit representation of the float that was used to write that string to the file.
So, my question is: Is there another way to achieve my goal of checksumming floats which are roundtripped through a textual representation other than to checksum the strings themselves?
Thanks for reading!
The source of the issue is apparent from your comment:
If I'm not completely mistaken, there is no rounding happening here. The %g specifier chooses the shortest string representation that exactly represents the number.
This is incorrect. If no precision is specified, it defaults to 6, and rounding will definitely occur for most floating-point inputs.
If you need a human-readable round-trippable format, %a is by far the best-choice. Failing that, you will need to specify a precision of at least 9 (assuming that float on your system is IEEE-754 single precision).
You may still be tripped up by NaN encodings, since the standard does not specify how or if they must be printed.
If the text file doesn't have to be human-readable, use hexadecimal float literals instead, they are exact so you won't have this problem of differences between textual and in-memory values.
If your standard library's float-to-text and text-to-float conversions do proper rounding, you just need enough sigificant digits for the float->text->float roundtrip to be lossless unless you also have Infs and NaNs, still it should be "value-preserving", not necessarily bitpattern preserving since there are multiple representations for infinity or NaN, I think. For an IEEE-754 64 bit double 17 significant digits is just enough to make the roundtrip lossless with respect to the actual value.
Your CRC algorithm is flawed for any type which has multiple binary representations for a single value. IEEE 754 has two representations for 0.0, to wit +0.0 and -0.0. Other, non-finite values such as NaN are potentially troublesome too.
Would it be acceptable to canonicalize your numbers before you update the CRC? So while saving, you would get a temporary string version of your number (with sprintf or whatever matches your serialization's format), then convert this string back to a numeric value, and then use this result to update the CRC. This way, you know that the CRC will match the deserialized value.

Heuristic to identify if a series of 4 bytes chunks of data are integers or floats

What's the best heuristic I can use to identify whether a chunk of X 4-bytes are integers or floats? A human can do this easily, but I wanted to do it programmatically.
I realize that since every combination of bits will result in a valid integer and (almost?) all of them will also result in a valid float, there is no way to know for sure. But I still would like to identify the most likely candidate (which will virtually always be correct; or at least, a human can do it).
For example, let's take a series of 4-bytes raw data and print them as integers first and then as floats:
1 1.4013e-45
10 1.4013e-44
44 6.16571e-44
5000 7.00649e-42
1024 1.43493e-42
0 0
0 0
-5 -nan
11 1.54143e-44
Obviously they will be integers.
Now, another example:
1065353216 1
1084227584 5
1085276160 5.5
1068149391 1.33333
1083179008 4.5
1120403456 100
0 0
-1110651699 -0.1
1195593728 50000
These will obviously be floats.
PS: I'm using C++ but you can answer in any language, pseudo code or just in english.
The "common sense" heuristic from your example seems to basically amount to a range check. If one interpretation is very large (or a tiny fraction, close to zero), that is probably wrong. Check the exponent of the float interpretation and compare it to the exponent that results from a proper static cast of the integer interpretation to a float.
Looks like a kolmogorov complexity issue. Basically, from what you show as example, the shorter number (when printed as string to be read by a human), be it integer or float, is the right answer for your heuristic.
Also, obviously if the value is an incorrect float, it is an integer :-)
Seems direct enough to implement.
You can probably "detect" it by looking at the high bits, with floats they'd generally be non-zero, with integers, they would be unless you're dealing with a very large number. So... you could try and see if (2^30) & number returns 0 or not.
If both numbers are positive, your floats are reasonably large (greater than 10^-42), and your ints are reasonably small (less than 8*10^6), then the check is pretty simple. Treat the data as a float and compare to the least normalized float.
union float_or_int {
float f;
int32_t i;
};
bool is_positive_normalized_float( float_or_int &u ) {
return u.f >= numeric_limits<float>::min();
}
This assumes IEEE float and same endinanness between the CPU and the FPU.
A human can do this easily
A human can't do it at all. Ergo neither can a computer. There are 2^32 valid int values. A large number of them are also valid float values. There is no way of distinguishing the intent of the data other than by tagging it or by not getting into such a mess in the first place.
Don't attempt this.
You are going to be looking at the upper 8 or 9 bits. That's where the sign and mantissa of a floating point value are. Values of 0x00 0x80 and 0xFF here are pretty uncommon for valid float data.
In particular if the upper 9 bits are all 0 then this likely to be a valid floating point value only if all 32 bits are 0. Another way to say this is that if the exponent is 0, the mantissa should also be zero. If the upper bit is 1 and the next 8 bits are 0, this is legal, but also not likely to be valid. It represents -0.0 which is a legal floating point value, but a meaningless one.
To put this into numerical terms. if the upper byte is 0x00 (or 0x80), then the value has a magnitude of at most 2.35e-38. Plank's constant is 6.62e-34 m2kg/s that's 4 orders of magnitude larger. The estimated diameter of a proton is much much larger than that (estimated at 1.6e−15 meters). The smallest non-zero value for audio data is about 2.3e-10. You aren't likely to see floating point values are are legitimate measurements of anything real that are smaller than 2.35e-38 but not zero.
Going the other direction if the upper byte is 0xFF then this value is either Infinite, a NaN or larger in magnitude than 3.4e+38. The age of the universe is estimated to be 1.3e+10 years (1.3e+25 femtoseconds). The observable universe has roughly e+23 stars, Avagadro's number is 6.02e+23. Once again float values larger than e+38 rarely show up in legitimate measurements.
This is not to say that the FPU can't load or produce such values, and you will certainly see them in intermediate values of calculations if you are working with modern FPUs. A modern FPU will load a floating point value that has a exponent of 0 but the other bits are not 0. These are called denormalized values. This is why you are seeing small positive integers show up as float values in the range of e-42 even though the normal range of a float only goes down to e-38
An exponent of all 1s represents Infinity. You probably won't find infinities in your data, but you would know better than I. -Infinity is 0xFF800000, +Infinity is 0x7F800000, any value other than 0 in the mantissa of Infinity is malformed. malformed infinities are used as NaNs.
Loading a NaN into a float register can cause it to throw an exception, so you want to use integer math to do your guessing about whether your data is float or int until you are fairly certain it is int.
If you know that your floats are all going to be actual values (no NaNs, INFs, denormals or other aberrant values) then you can use this a criterion. In general an array of ints will have a high probability of containing "bad" float values.
I assume the following:
that you mean IEEE 754 single precision floating point numbers.
that the sign bit of the float is saved in the MSB of an int.
So here we go:
static boolean probablyFloat(uint32_t bits) {
bool sign = (bits & 0x80000000U) != 0;
int exp = ((bits & 0x7f800000U) >> 23) - 127;
uint32_t mant = bits & 0x007fffff;
// +- 0.0
if (exp == -127 && mant == 0)
return true;
// +- 1 billionth to 1 billion
if (-30 <= exp && exp <= 30)
return true;
// some value with only a few binary digits
if ((mant & 0x0000ffff) == 0)
return true;
return false;
}
int main() {
assert(probablyFloat(1065353216));
assert(probablyFloat(1084227584));
assert(probablyFloat(1085276160));
assert(probablyFloat(1068149391));
assert(probablyFloat(1083179008));
assert(probablyFloat(1120403456));
assert(probablyFloat(0));
assert(probablyFloat(-1110651699));
assert(probablyFloat(1195593728));
return 0;
}
simplifying what Alan said, I'd ONLY look at the integer form. and say, if the number is bigger than 99999999 then it's almost definitely a float.
This has the advantage that it's fast, easy, and avoids nan issues.
It has the disadvantage that it pretty much full of crap... i didn't actually look at what floats these will represent or anything, but it looks reasonable from your examples...
In any case, this is a heuristic, so it's GONNA be full of crap, and not always work anyway...
Measure with a micrometer, mark with chalk, cut with an axe.
Here is a heuristic I came up with, based on #kriss' idea. After a brief look at some of my data, it seems to work fairly well.
I am using it in a disassembler to detect if a 32-bit value was likely originally an integer or float literal.
public class FloatUtil {
private static final int canonicalFloatNaN = Float.floatToRawIntBits(Float.NaN);
private static final int maxFloat = Float.floatToRawIntBits(Float.MAX_VALUE);
private static final int piFloat = Float.floatToRawIntBits((float)Math.PI);
private static final int eFloat = Float.floatToRawIntBits((float)Math.E);
private static final DecimalFormat format = new DecimalFormat("0.####################E0");
public static boolean isLikelyFloat(int value) {
// Check for some common named float values
if (value == canonicalFloatNaN ||
value == maxFloat ||
value == piFloat ||
value == eFloat) {
return true;
}
// Check for some named integer values
if (value == Integer.MAX_VALUE || value == Integer.MIN_VALUE) {
return false;
}
// a non-canocical NaN is more likely to be an integer
float floatValue = Float.intBitsToFloat(value);
if (Float.isNaN(floatValue)) {
return false;
}
// Otherwise, whichever has a shorter scientific notation representation is more likely.
// Integer wins the tie
String asInt = format.format(value);
String asFloat = format.format(floatValue);
// try to strip off any small imprecision near the end of the mantissa
int decimalPoint = asFloat.indexOf('.');
int exponent = asFloat.indexOf("E");
int zeros = asFloat.indexOf("000");
if (zeros > decimalPoint && zeros < exponent) {
asFloat = asFloat.substring(0, zeros) + asFloat.substring(exponent);
} else {
int nines = asFloat.indexOf("999");
if (nines > decimalPoint && nines < exponent) {
asFloat = asFloat.substring(0, nines) + asFloat.substring(exponent);
}
}
return asFloat.length() < asInt.length();
}
}
And here are some of the values it works for (and a couple it doesn't)
#Test
public void isLikelyFloatTest() {
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1.23f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1.0f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(Float.NaN)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(Float.NEGATIVE_INFINITY)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(Float.POSITIVE_INFINITY)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1e-30f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1000f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(-1f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(-5f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1.3333f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(4.5f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(.1f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(50000f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(Float.MAX_VALUE)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits((float)Math.PI)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits((float)Math.E)));
// Float.MIN_VALUE is equivalent to integer value 1. this should be detected as an integer
// Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(Float.MIN_VALUE)));
// This one doesn't quite work. It has a series of 2 0's, but we only strip 3 0's or more
// Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1.33333f)));
Assert.assertFalse(FloatUtil.isLikelyFloat(0));
Assert.assertFalse(FloatUtil.isLikelyFloat(1));
Assert.assertFalse(FloatUtil.isLikelyFloat(10));
Assert.assertFalse(FloatUtil.isLikelyFloat(100));
Assert.assertFalse(FloatUtil.isLikelyFloat(1000));
Assert.assertFalse(FloatUtil.isLikelyFloat(1024));
Assert.assertFalse(FloatUtil.isLikelyFloat(1234));
Assert.assertFalse(FloatUtil.isLikelyFloat(-5));
Assert.assertFalse(FloatUtil.isLikelyFloat(-13));
Assert.assertFalse(FloatUtil.isLikelyFloat(-123));
Assert.assertFalse(FloatUtil.isLikelyFloat(20000000));
Assert.assertFalse(FloatUtil.isLikelyFloat(2000000000));
Assert.assertFalse(FloatUtil.isLikelyFloat(-2000000000));
Assert.assertFalse(FloatUtil.isLikelyFloat(Integer.MAX_VALUE));
Assert.assertFalse(FloatUtil.isLikelyFloat(Integer.MIN_VALUE));
Assert.assertFalse(FloatUtil.isLikelyFloat(Short.MIN_VALUE));
Assert.assertFalse(FloatUtil.isLikelyFloat(Short.MAX_VALUE));
}