"Symmetrical difference" for unsigned ints - assumed rollover - c++

I made a simple function that I called symmetricDelta() which calculates the delta between value and previous "symmetrically". What I mean by that is: consider a number-line from e.g. 0 to ULLONG_MAX, where you connected the left and right ends of the number-line... To determine a "symmetric" delta, assume the change is positive if value - previous is less than half of the span, otherwise assume the change is negative, and we wrapped-around the number-line.
See a simple version of this for uint64_ts below:
int64_t symmetricDelta(uint64_t value, uint64_t previous) {
if (value-previous < (1ULL << 63)) {
uint64_t result = value - previous;
return result;
} else {
uint64_t negativeResult = previous - value;
return -1 * negativeResult;
}
}
Usage:
uint64_t value = ULLONG_MAX;
uint64_t previous = 0;
// Result: -1, not UULONG_MAX
cout << symmetricDelta(value, previous) << endl;
Demo: https://onlinegdb.com/BJ8FFZgrP
Other value examples, assume a uint8_t version for simplicity:
symmetricalDifference(1, 0) == 1
symmetricalDifference(0, 1) == -1
symmetricalDifference(0, 255) == 1
symmetricalDifference(255, 0) == -1
symmetricalDifference(227, 100) == 127
symmetricalDifference(228, 100) == -128
My question is: Is there an "official" name for what I'm calling "symmetrical subtraction"? This feels like the kind of thing that might already be implemented in the C++ STL, but I wouldn't even know what to search for...

Yes. The name is subtraction modulo 2^64. And it's identical to what your machine does with the instruction
int64_t symmetricDelta(uint64_t value, uint64_t previous) {
return (int64_t)(value-previous);
}
In C and C++, unsigned arithmetic is defined to wrap around, effectively joining the ends of the representable number range into a circle. This is the basis for the 2-complement representation of signed integers: Your CPU simply declares half the number circle to be interpreted negative. This part is the upper part in unsigned, with the -1 corresponding to the maximum representable unsigned integer. Simply because the 0 is next on the circle.
Side note:
This allows the CPU to use the exact same circuitry for signed and unsigned arithmetic. The CPU only provides an add instruction that is used irrespective of whether the numbers should be interpreted as signed or unsigned. This is true for addition, subtraction and multiplication, they all exist as sign-ignorant instructions. Only the division is implemented in a signed and an unsigned variant, as are the comparison instructions / the flag bits that the CPU provides.
Side note 2:
The above is not fully true, as modern CPUs implement saturating arithmetic as part of their vector units (AVX etc.). Because saturating arithmetic means clipping the result to the ends of the representable range instead of wrapping around, this clipping depends on where the circle of numbers is assumed to be broken. As such, saturating arithmetic instructions typically exist in signed and unsigned variants.
End of the needless background rambling...
So, when you subtract two numbers in unsigned representation, the result is the unsigned number of steps that you have to take to reach the minuend from the subtrahend. And by reinterpreting the result as a signed integer, you are interpreting a long route (that goes more than half around the circle) as the corresponding short route in the opposite direction.
There is one pitfall: 1 << 63 is not representable. It is exactly on the opposite side of the number circle from the zero, and since its sign bit is set, it's interpreted as -(1 << 63). If you try to negate it, the bit pattern does not change one bit (just like -0 == 0), so your computer happily declares that - -(1 << 63) == -(1 << 63). This is probably not a problem to you, but it's better to be aware of this, because it might bite you.

Related

How long long is represented in memory?

I am not an advanced C++ programmer. But I have been using C++ for a long time now. So, I love playing with it. Lately I was thinking about ways to maximize a variable programmatically. So I tried Bitwise Operators to fill a variable with 1's. Then there's signed and unsigned issue. My knowledge of memory representation is not very well. However, I ended up writing the following code which is working for both signed and unsigned short, int and long (although int and long are basically the same). Unfortunately, for long long, the program is failing.
So, what is going on behind the scenes for long long? How is it represented in memory? Besides, Is there any better way to do achieve the same thing in C++?
#include <bits/stdc++.h>
using namespace std;
template<typename T>
void Maximize(T &val, bool isSigned)
{
int length = sizeof(T) * 8;
cout << "\nlength = " << length << "\n";
// clearing
for(int i=0; i<length; i++)
{
val &= 0 << i;
}
if(isSigned)
{
length--;
}
val = 1 << 0;
for(int i=1; i<length; i++)
{
val |= 1 << i;
cout << "\ni = " << i << "\nval = " << val << "\n";
}
}
int main()
{
long long i;
Maximize(i, true);
cout << "\n\nsizeof(i) = " << sizeof(i) << " bytes" << "\n";
cout << "i = " << i << "\n";
return 0;
}
The basic issue with your code is in the statements
val &= 0 << i;
and
val |= 1 << i;
in the case that val is longer than an int.
In the first expression, 0 << i is (most likely) always 0, regardless of i (technically, it suffers from the same undefined behaviour described below, but you will not likely encounter the problem.) So there was no need for the loop at all; all of the statements do the same thing, which is to zero out val. Of course, val = 0; would have been a simpler way of writing that.
The issue 1 << i is that the constant literal 1 is an int (because it is small enough to be represented as an int, and int is the narrowest representation used for integeral constants). Since 1 is an int, so is 1 << i. If i is greater than or equal to the number of value bits in an int, that expression has undefined behaviour, so in theory the result could be anything. In practice, however, the result is likely to be the same width as an int, so only the low-order bits will be affected.
It is certainly possible to convert the 1 to type T (although in general, you might need to be cautious about corner cases when T is signed), but it is easier to convert the 1 to an unsigned type at least as wide as Tby using the maximum-width unsigned integer type defined in cstdint, uintmax_t:
val |= std::uintmax_t(1) << i;
In real-world code, it is common to see the assumption that the widest integer type is long long:
val |= 1ULL << i;
which will work fine if the program never attempts to instantiate the template with a extended integer type.
Of course, this is not the way to find the largest value for an integer type. The correct solution is to #include <limits> and then use the appropriate specialization of std::numeric_limits<T>::max()
C++ allows only one representation for positive (and unsigned) integers, and three possible representations for negative signed integers. Positive and unsigned integers are simply represented as a sequence of bits in binary notation. There may be padding bits as well, and signed integers have a single sign bit which must be 0 in the case of positive integers, so there is no guarantee that there are 8*sizeof(T) useful bits in the representation, even if the number of bits in a byte is known to be 8 (and, in theory, it could be larger). [Note 1]
The sign bit for negative signed integers is always 1, but there are three different formats for the value bits. The most common is "two's complement", where the value bits interpreted as a positive number would be exactly 2k more than the actual value of the number, where k is the number of value bits. (This is equivalent to specifying a weight of 2-k to the sign bits, which is why it is called 2s complement.)
Another alternative is "one's complement", in which the value bits are all inverted individually. This differs by exactly one from two's-complement representation.
The third allowable alternative is "sign-magnitude", in which the value bits are precisely the absolute value of the negative number. This representation is frequently used for floating point values, but only rarely used in integer values.
Both sign-magnitude and one's complement suffer from the disadvantage that there is a bit pattern which represents "negative 0". On the other hand, two's complement representation has the feature that the magnitude of the most negative representable value is one larger than the magnitude of the most positive representable value, with the result that both -x and x/-1 can overflow, leading to undefined behaviour.
Notes
I believe that it is theoretically possible for padding to be inserted between the value bits and the sign bits, but I certainly do not know of any real-world implementation with that feature. However, the fact that attempting to shift a 1 into the sign bit position is undefined behaviour makes it incorrect to assume that the sign bit is contiguous with the value bits.
I was thinking about ways to maximize a variable programmatically.
You are trying to reinvent the wheel. C++ STL already has this functionality: std::numeric_limits::max()
// x any kind of numeric type: any integer or any floating point value
x = std::numeric_limits<decltype(x)>::max();
This is also better since you will not relay on undefined behavior.
As harold commented, the solution is to use T(1) << i instead of 1 << i. Also as Some programmer dude mentioned, long long is represented as consecutive bytes (typically 8 bytes) with sign bit at the MSB if it is signed.

What is the purpose of "int mask = ~0;"?

I saw the following line of code here in C.
int mask = ~0;
I have printed the value of mask in C and C++. It always prints -1.
So I do have some questions:
Why assigning value ~0 to the mask variable?
What is the purpose of ~0?
Can we use -1 instead of ~0?
It's a portable way to set all the binary bits in an integer to 1 bits without having to know how many bits are in the integer on the current architecture.
C and C++ allow 3 different signed integer formats: sign-magnitude, one's complement and two's complement
~0 will produce all-one bits regardless of the sign format the system uses. So it's more portable than -1
You can add the U suffix (i.e. -1U) to generate an all-one bit pattern portably1. However ~0 indicates the intention clearer: invert all the bits in the value 0 whereas -1 will show that a value of minus one is needed, not its binary representation
1 because unsigned operations are always reduced modulo the number that is one greater than the largest value that can be represented by the resulting type
That on a 2's complement platform (that is assumed) gives you -1, but writing -1 directly is forbidden by the rules (only integers 0..255, unary !, ~ and binary &, ^, |, +, << and >> are allowed).
You are studying a coding challenge with a number of restrictions on operators and language constructions to perform given tasks.
The first problem is return the value -1 without the use of the - operator.
On machines that represent negative numbers with two's complement, the value -1 is represented with all bits set to 1, so ~0 evaluates to -1:
/*
* minusOne - return a value of -1
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 2
* Rating: 1
*/
int minusOne(void) {
// ~0 = 111...111 = -1
return ~0;
}
Other problems in the file are not always implemented correctly. The second problem, returning a boolean value representing the fact the an int value would fit in a 16 bit signed short has a flaw:
/*
* fitsShort - return 1 if x can be represented as a
* 16-bit, two's complement integer.
* Examples: fitsShort(33000) = 0, fitsShort(-32768) = 1
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 8
* Rating: 1
*/
int fitsShort(int x) {
/*
* after left shift 16 and right shift 16, the left 16 of x is 00000..00 or 111...1111
* so after shift, if x remains the same, then it means that x can be represent as 16-bit
*/
return !(((x << 16) >> 16) ^ x);
}
Left shifting a negative value or a number whose shifted value is beyond the range of int has undefined behavior, right shifting a negative value is implementation defined, so the above solution is incorrect (although it is probably the expected solution).
Loooong ago this was how you saved memory on extremely limited equipment such as the 1K ZX 80 or ZX 81 computer. In BASIC, you would
Let X = NOT PI
rather than
LET X = 0
Since numbers were stored as 4 byte floating points, the latter takes 2 bytes more than the first NOT PI alternative, where each of NOT and PI takes up a single byte.
There are multiple ways of encoding numbers across all computer architectures. When using 2's complement this will always be true:~0 == -1. On the other hand, some computers use 1's complement for encoding negative numbers for which the above example is untrue, because ~0 == -0. Yup, 1s complement has negative zero, and that is why it is not very intuitive.
So to your questions
the ~0 is assigned to mask so all the bits in mask are equal 1 -> making mask & sth == sth
the ~0 is used to make all bits equal to 1 regardless of the platform used
you can use -1 instead of ~0 if you are sure that your computer platform uses 2's complement number encoding
My personal thought - make your code as much platform-independent as you can. The cost is relatively small and the code becomes fail proof

why declare "score[11] = {};" and "grade" as "unsigned" instead of "int'

I'm new to C++ and is trying to learn the concept of array. I saw this code snippet online. For the sample code below, does it make any difference to declare:
unsigned scores[11] = {};
unsigned grade;
as:
int scores[11] = {};
int grade;
I guess there must be a reason why score[11] = {}; and grade is declared as unsigned, but what is the reason behind it?
int main() {
unsigned scores[11] = {};
unsigned grade;
while (cin >> grade) {
if (0 <= grade <= 100) {
++scores[grade / 10];
}
}
for (int i = 0; i < 11; i++) {
cout << scores[i] << endl;
}
}
unsigned means that the variable will not hold a negative values (or even more accurate - It will not care about the sign-). It seems obvious that scores and grades are signless values (no one scores -25). So, it is natural to use unsigned.
But note that: if (0 <= grade <= 100) is redundant. if (grade <= 100) is enough since no negative values are allowed.
As Blastfurnace commented, if (0 <= grade <= 100) is not right even. if you want it like this you should write it as:
if (0 <= grade && grade <= 100)
Unsigned variables
Declaring a variable as unsigned int instead of int has 2 consequences:
It can't be negative. It provides you a guarantee that it never will be and therefore you don't need to check for it and handle special cases when writing code that only works with positive integers
As you have a limited size, it allows you to represent bigger numbers. On 32 bits, the biggest unsigned int is 4294967295 (2^32-1) whereas the biggest int is 2147483647 (2^31-1)
One consequence of using unsigned int is that arithmetic will be done in the set of unsigned int. So 9 - 10 = 4294967295 instead of -1 as no negative number can be encoded on unsigned int type. You will also have issues if you compare them to negative int.
More info on how negative integer are encoded.
Array initialization
For the array definition, if you just write:
unsigned int scores[11];
Then you have 11 uninitialized unsigned int that have potentially values different than 0.
If you write:
unsigned int scores[11] = {};
Then all int are initialized with their default value that is 0.
Note that if you write:
unsigned int scores[11] = { 1, 2 };
You will have the first int intialized to 1, the second to 2 and all the others to 0.
You can easily play a little bit with all these syntax to gain a better understanding of it.
Comparison
About the code:
if(0 <= grade <= 100)
as stated in the comments, this does not do what you expect. In fact, this will always evaluate to true and therefore execute the code in the if. Which means if you enter a grade of, say, 20000, you should have a core dump. The reason is that this:
0 <= grade <= 100
is equivalent to:
(0 <= grade) <= 100
And the first part is either true (implicitly converted to 1) or false (implicitly converted to 0). As both values are lower than 100, the second comparison is always true.
unsigned integers have some strange properties and you should avoid them unless you have a good reason. Gaining 1 extra bit of positive size, or expressing a constraint that a value may not be negative, are not good reasons.
unsigned integers implement arithmetic modulo UINT_MAX+1. By contrast, operations on signed integers represent the natural arithmetic that we are familiar with from school.
Overflow semantics
unsigned has well defined overflow; signed does not:
unsigned u = UINT_MAX;
u++; // u becomes 0
int i = INT_MAX;
i++; // undefined behaviour
This has the consequence that signed integer overflow can be caught during testing, while an unsigned overflow may silently do the wrong thing. So use unsigned only if you are sure you want to legalize overflow.
If you have a constraint that a value may not be negative, then you need a way to detect and reject negative values; int is perfect for this. An unsigned will accept a negative value and silently overflow it into a positive value.
Bit shift semantics
Bit shift of unsigned by an amount not greater than the number of bits in the data type is always well defined. Until C++20, bit shift of signed was undefined if it would cause a 1 in the sign bit to be shifted left, or implementation-defined if it would cause a 1 in the sign bit to be shifted right. Since C++20, signed right shift always preserves the sign, but signed left shift does not. So use unsigned for some kinds of bit twiddling operations.
Mixed sign operations
The built-in arithmetic operations always operate on operands of the same type. If they are supplied operands of different types, the "usual arithmetic conversions" coerce them into the same type, sometimes with surprising results:
unsigned u = 42;
std::cout << (u * -1); // 4294967254
std::cout << std::boolalpha << (u >= -1); // false
What's the difference?
Subtracting an unsigned from another unsigned yields an unsigned result, which means that the difference between 2 and 1 is 4294967295.
Double the max value
int uses one bit to represent the sign of the value. unsigned uses this bit as just another numerical bit. So typically, int has 31 numerical bits and unsigned has 32. This extra bit is often cited as a reason to use unsigned. But if 31 bits are insufficient for a particular purpose, then most likely 32 bits will also be insufficient, and you should be considering 64 bits or more.
Function overloading
The implicit conversion from int to unsigned has the same rank as the conversion from int to double, so the following example is ill formed:
void f(unsigned);
void f(double);
f(42); // error: ambiguous call to overloaded function
Interoperability
Many APIs (including the standard library) use unsigned types, often for misguided reasons. It is sensible to use unsigned to avoid mixed-sign operations when interacting with these APIs.
Appendix
The quoted snippet includes the expression 0 <= grade <= 100. This will first evaluate 0 <= grade, which is always true, because grade can't be negative. Then it will evaluate true <= 100, which is always true, because true is converted to the integer 1, and 1 <= 100 is true.
Yes it does make a difference. In the first case you declare an array of 11 elements a variable of type "unsigned int". In the second case you declare them as ints.
When the int is on 32 bits you can have values from the following ranges
ā€“2,147,483,648 to 2,147,483,647 for plain int
0 to 4,294,967,295 for unsigned int
You normally declare something unsigned when you don't need negative numbers and you need that extra range given by unsigned. In your case I assume that that by declaring the variables unsigned, the developer doesn't accept negative scores and grades. You basically do a statistic of how many grades between 0 and 10 were introduced at the command line. So it looks like something to simulate a school grading system, therefore you don't have negative grades. But this is my opinion after reading the code.
Take a look at this post which explains what unsigned is:
what is the unsigned datatype?
As the name suggests, signed integers can be negative and unsigned cannot be. If we represent an integer with N bits then for unsigned the minimum value is 0 and the maximum value is 2^(N-1). If it is a signed integer of N bits then it can take the values from -2^(N-2) to 2^(N-2)-1. This is because we need 1-bit to represent the sign +/-
Ex: signed 3-bit integer (yes there are such things)
000 = 0
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1
But, for unsigned it just represents the values [0,7]. The most significant bit (MSB) in the example signifies a negative value. That is, all values where the MSB is set are negative. Hence the apparent loss of a bit in its absolute values.
It also behaves as one might expect. If you increment -1 (111) we get (1 000) but since we don't have a fourth bit it simply "falls off the end" and we are left with 000.
The same applies to subtracting 1 from 0. First take the two's complement
111 = twos_complement(001)
and add it to 000 which yields 111 = -1 (from the table) which is what one might expect. What happens when you increment 011(=3) yielding 100(=-4) is perhaps not what one might expect and is at odds with our normal expectations. These overflows are troublesome with fixed point arithmetic and have to be dealt with.
One other thing worth pointing out is the a signed integer can take one negative value more than it can positive which has a consequence for rounding (when using integer to represent fixed point numbers for example) but am sure that's better covered in the DSP or signal processing forums.

Shift left/right adding zeroes/ones and dropping first bits

I've got to program a function that receives
a binary number like 10001, and
a decimal number that indicates how many shifts I should perform.
The problem is that if I use the C++ operator <<, the zeroes are pushed from behind but the first numbers aren't dropped... For example
shifLeftAddingZeroes(10001,1)
returns 100010 instead of 00010 that is what I want.
I hope I've made myself clear =P
I assume you are storing that information in int. Take into consideration, that this number actually has more leading zeroes than what you see, ergo your number is most likely 16 bits, meaning 00000000 00000001 . Maybe try AND-ing it with number having as many 1 as the number you want to have after shifting? (Assuming you want to stick to bitwise operations).
What you want is to bit shift and then limit the number of output bits which can be active (hold a value of 1). One way to do this is to create a mask for the number of bits you want, then AND the bitshifted value with that mask. Below is a code sample for doing that, just replace int_type with the type of value your using -- or make it a template type.
int_type shiftLeftLimitingBitSize(int_type value, int numshift, int_type numbits=some_default) {
int_type mask = 0;
for (unsigned int bit=0; bit < numbits; bit++) {
mask += 1 << bit;
}
return (value << numshift) & mask;
}
Your output for 10001,1 would now be shiftLeftLimitingBitSize(0b10001, 1, 5) == 0b00010.
Realize that unless your numbits is exactly the length of your integer type, you will always have excess 0 bits on the 'front' of your number.

Heuristic to identify if a series of 4 bytes chunks of data are integers or floats

What's the best heuristic I can use to identify whether a chunk of X 4-bytes are integers or floats? A human can do this easily, but I wanted to do it programmatically.
I realize that since every combination of bits will result in a valid integer and (almost?) all of them will also result in a valid float, there is no way to know for sure. But I still would like to identify the most likely candidate (which will virtually always be correct; or at least, a human can do it).
For example, let's take a series of 4-bytes raw data and print them as integers first and then as floats:
1 1.4013e-45
10 1.4013e-44
44 6.16571e-44
5000 7.00649e-42
1024 1.43493e-42
0 0
0 0
-5 -nan
11 1.54143e-44
Obviously they will be integers.
Now, another example:
1065353216 1
1084227584 5
1085276160 5.5
1068149391 1.33333
1083179008 4.5
1120403456 100
0 0
-1110651699 -0.1
1195593728 50000
These will obviously be floats.
PS: I'm using C++ but you can answer in any language, pseudo code or just in english.
The "common sense" heuristic from your example seems to basically amount to a range check. If one interpretation is very large (or a tiny fraction, close to zero), that is probably wrong. Check the exponent of the float interpretation and compare it to the exponent that results from a proper static cast of the integer interpretation to a float.
Looks like a kolmogorov complexity issue. Basically, from what you show as example, the shorter number (when printed as string to be read by a human), be it integer or float, is the right answer for your heuristic.
Also, obviously if the value is an incorrect float, it is an integer :-)
Seems direct enough to implement.
You can probably "detect" it by looking at the high bits, with floats they'd generally be non-zero, with integers, they would be unless you're dealing with a very large number. So... you could try and see if (2^30) & number returns 0 or not.
If both numbers are positive, your floats are reasonably large (greater than 10^-42), and your ints are reasonably small (less than 8*10^6), then the check is pretty simple. Treat the data as a float and compare to the least normalized float.
union float_or_int {
float f;
int32_t i;
};
bool is_positive_normalized_float( float_or_int &u ) {
return u.f >= numeric_limits<float>::min();
}
This assumes IEEE float and same endinanness between the CPU and the FPU.
A human can do this easily
A human can't do it at all. Ergo neither can a computer. There are 2^32 valid int values. A large number of them are also valid float values. There is no way of distinguishing the intent of the data other than by tagging it or by not getting into such a mess in the first place.
Don't attempt this.
You are going to be looking at the upper 8 or 9 bits. That's where the sign and mantissa of a floating point value are. Values of 0x00 0x80 and 0xFF here are pretty uncommon for valid float data.
In particular if the upper 9 bits are all 0 then this likely to be a valid floating point value only if all 32 bits are 0. Another way to say this is that if the exponent is 0, the mantissa should also be zero. If the upper bit is 1 and the next 8 bits are 0, this is legal, but also not likely to be valid. It represents -0.0 which is a legal floating point value, but a meaningless one.
To put this into numerical terms. if the upper byte is 0x00 (or 0x80), then the value has a magnitude of at most 2.35e-38. Plank's constant is 6.62e-34 m2kg/s that's 4 orders of magnitude larger. The estimated diameter of a proton is much much larger than that (estimated at 1.6eāˆ’15 meters). The smallest non-zero value for audio data is about 2.3e-10. You aren't likely to see floating point values are are legitimate measurements of anything real that are smaller than 2.35e-38 but not zero.
Going the other direction if the upper byte is 0xFF then this value is either Infinite, a NaN or larger in magnitude than 3.4e+38. The age of the universe is estimated to be 1.3e+10 years (1.3e+25 femtoseconds). The observable universe has roughly e+23 stars, Avagadro's number is 6.02e+23. Once again float values larger than e+38 rarely show up in legitimate measurements.
This is not to say that the FPU can't load or produce such values, and you will certainly see them in intermediate values of calculations if you are working with modern FPUs. A modern FPU will load a floating point value that has a exponent of 0 but the other bits are not 0. These are called denormalized values. This is why you are seeing small positive integers show up as float values in the range of e-42 even though the normal range of a float only goes down to e-38
An exponent of all 1s represents Infinity. You probably won't find infinities in your data, but you would know better than I. -Infinity is 0xFF800000, +Infinity is 0x7F800000, any value other than 0 in the mantissa of Infinity is malformed. malformed infinities are used as NaNs.
Loading a NaN into a float register can cause it to throw an exception, so you want to use integer math to do your guessing about whether your data is float or int until you are fairly certain it is int.
If you know that your floats are all going to be actual values (no NaNs, INFs, denormals or other aberrant values) then you can use this a criterion. In general an array of ints will have a high probability of containing "bad" float values.
I assume the following:
that you mean IEEE 754 single precision floating point numbers.
that the sign bit of the float is saved in the MSB of an int.
So here we go:
static boolean probablyFloat(uint32_t bits) {
bool sign = (bits & 0x80000000U) != 0;
int exp = ((bits & 0x7f800000U) >> 23) - 127;
uint32_t mant = bits & 0x007fffff;
// +- 0.0
if (exp == -127 && mant == 0)
return true;
// +- 1 billionth to 1 billion
if (-30 <= exp && exp <= 30)
return true;
// some value with only a few binary digits
if ((mant & 0x0000ffff) == 0)
return true;
return false;
}
int main() {
assert(probablyFloat(1065353216));
assert(probablyFloat(1084227584));
assert(probablyFloat(1085276160));
assert(probablyFloat(1068149391));
assert(probablyFloat(1083179008));
assert(probablyFloat(1120403456));
assert(probablyFloat(0));
assert(probablyFloat(-1110651699));
assert(probablyFloat(1195593728));
return 0;
}
simplifying what Alan said, I'd ONLY look at the integer form. and say, if the number is bigger than 99999999 then it's almost definitely a float.
This has the advantage that it's fast, easy, and avoids nan issues.
It has the disadvantage that it pretty much full of crap... i didn't actually look at what floats these will represent or anything, but it looks reasonable from your examples...
In any case, this is a heuristic, so it's GONNA be full of crap, and not always work anyway...
Measure with a micrometer, mark with chalk, cut with an axe.
Here is a heuristic I came up with, based on #kriss' idea. After a brief look at some of my data, it seems to work fairly well.
I am using it in a disassembler to detect if a 32-bit value was likely originally an integer or float literal.
public class FloatUtil {
private static final int canonicalFloatNaN = Float.floatToRawIntBits(Float.NaN);
private static final int maxFloat = Float.floatToRawIntBits(Float.MAX_VALUE);
private static final int piFloat = Float.floatToRawIntBits((float)Math.PI);
private static final int eFloat = Float.floatToRawIntBits((float)Math.E);
private static final DecimalFormat format = new DecimalFormat("0.####################E0");
public static boolean isLikelyFloat(int value) {
// Check for some common named float values
if (value == canonicalFloatNaN ||
value == maxFloat ||
value == piFloat ||
value == eFloat) {
return true;
}
// Check for some named integer values
if (value == Integer.MAX_VALUE || value == Integer.MIN_VALUE) {
return false;
}
// a non-canocical NaN is more likely to be an integer
float floatValue = Float.intBitsToFloat(value);
if (Float.isNaN(floatValue)) {
return false;
}
// Otherwise, whichever has a shorter scientific notation representation is more likely.
// Integer wins the tie
String asInt = format.format(value);
String asFloat = format.format(floatValue);
// try to strip off any small imprecision near the end of the mantissa
int decimalPoint = asFloat.indexOf('.');
int exponent = asFloat.indexOf("E");
int zeros = asFloat.indexOf("000");
if (zeros > decimalPoint && zeros < exponent) {
asFloat = asFloat.substring(0, zeros) + asFloat.substring(exponent);
} else {
int nines = asFloat.indexOf("999");
if (nines > decimalPoint && nines < exponent) {
asFloat = asFloat.substring(0, nines) + asFloat.substring(exponent);
}
}
return asFloat.length() < asInt.length();
}
}
And here are some of the values it works for (and a couple it doesn't)
#Test
public void isLikelyFloatTest() {
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1.23f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1.0f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(Float.NaN)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(Float.NEGATIVE_INFINITY)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(Float.POSITIVE_INFINITY)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1e-30f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1000f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(-1f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(-5f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1.3333f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(4.5f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(.1f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(50000f)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(Float.MAX_VALUE)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits((float)Math.PI)));
Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits((float)Math.E)));
// Float.MIN_VALUE is equivalent to integer value 1. this should be detected as an integer
// Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(Float.MIN_VALUE)));
// This one doesn't quite work. It has a series of 2 0's, but we only strip 3 0's or more
// Assert.assertTrue(FloatUtil.isLikelyFloat(Float.floatToRawIntBits(1.33333f)));
Assert.assertFalse(FloatUtil.isLikelyFloat(0));
Assert.assertFalse(FloatUtil.isLikelyFloat(1));
Assert.assertFalse(FloatUtil.isLikelyFloat(10));
Assert.assertFalse(FloatUtil.isLikelyFloat(100));
Assert.assertFalse(FloatUtil.isLikelyFloat(1000));
Assert.assertFalse(FloatUtil.isLikelyFloat(1024));
Assert.assertFalse(FloatUtil.isLikelyFloat(1234));
Assert.assertFalse(FloatUtil.isLikelyFloat(-5));
Assert.assertFalse(FloatUtil.isLikelyFloat(-13));
Assert.assertFalse(FloatUtil.isLikelyFloat(-123));
Assert.assertFalse(FloatUtil.isLikelyFloat(20000000));
Assert.assertFalse(FloatUtil.isLikelyFloat(2000000000));
Assert.assertFalse(FloatUtil.isLikelyFloat(-2000000000));
Assert.assertFalse(FloatUtil.isLikelyFloat(Integer.MAX_VALUE));
Assert.assertFalse(FloatUtil.isLikelyFloat(Integer.MIN_VALUE));
Assert.assertFalse(FloatUtil.isLikelyFloat(Short.MIN_VALUE));
Assert.assertFalse(FloatUtil.isLikelyFloat(Short.MAX_VALUE));
}