What is a standard way to compare float with zero? - c++

May question is: What is a standard way to compare float with zero?
As far as I know direct comparison:
if ( x == 0 ) {
// x is zero?
} else {
// x is not zero??
can fail with floating points variables.
I used to use
float x = ...
...
if ( std::abs(x) <= 1e-7f ) {
// x is zero, do the job1
} else {
// x is not zero, do the job2
...
Same approach I find here. But I see two problems:
Random magic number 1e-7f ( or 0.00005 at the link above ).
The code harder to read
This is such a common comparison, I wonder whether there is a standard short way to do this. Like
x.is_zero();

To compare a floating-point value with 0, just compare it:
if (f == 0)
// whatever
There is nothing wrong with this comparison. If it doesn't do what you expect it's because the value of f is not what you thought it was. Its essentially the same problem as this:
int i = 1/3;
i *= 3;
if (i == 1)
// whatever
There's nothing wrong with that comparison, but the value of i is not 1. Almost all programmers understand the loss of precision with integer values; many don't understand it with floating-point values.
Using "nearly equal" instead of == is an advanced technique; it often leads to unexpected problems. For example, it is not transitive; that is, a nearly equals b and b nearly equals c does not mean that a nearly equals c.

There is no standard way, because whether or not you want to treat a small number as if it were zero depends on how you computed the number and what it's for. This in turn depends on the expected size of any errors introduced by your computations, and perhaps on errors of physical measurement that determined your original inputs.
For example, suppose that your value represents the length of a journey in miles in some mapping software. Then you are happy to treat 1e-7 as equal to zero because in that context it is a very small number: it has come about because of a rounding error or other reason for slight inexactness.
On the other hand, suppose that your value represents the size of a molecule in metres in some electron microscopy software. Then you certainly don't want to treat 1e-7 as equal to zero because in that context it's a very large number.
You should first consider what would be a suitable accuracy to present your value: what's the error bar, or how many significant figures can you reasonably display. This will give you some idea with what tolerance it would be appropriate to test against zero, although it still might not settle the case. For the mapping software, you can probably treat a journey as zero if it's less than some fixed value, although the value itself might depend on the resolution of your maps. For the microscopy software, if the difference between two sizes is such that zero lies with the 95% error range on those measurements, that still might not be sufficient to describe them as being the same size.

I don't know whether my answer useful, I've found this in irrlicht's irrmath.h and still using it in engine's mathlib till nowdays:
const float ROUNDING_ERROR_f32 = 0.000001f;
//! returns if a equals b, taking possible rounding errors into account
inline bool equals(const float a, const float b, const float tolerance = ROUNDING_ERROR_f32)
{
return (a + tolerance >= b) && (a - tolerance <= b);
}
The author has explained this approach by "after many rotations, which are trigonometric operations the coordinate spoils and the direct comparsion may cause fault".

Related

Compensating for double/float inaccuracy

I've written a math calculator that takes in a string from the user and parses it. It uses doubles to hold all values involved when calculating. Once solved, I then print it, and use std::setprecision() to make sure it is output correctly (for instance 0.9999999 will become 1 on the print out.
Returning the string that will be output:
//return true or false if this is in the returnstring.
if (returnString.compare("True") == 0 || returnString.compare("False") == 0) return returnString;
//create stringstream and put the answer into the returnString.
std::stringstream stream;
returnString = std::to_string(temp_answer.answer);
//write into the stream with precision set correctly.
stream << std::fixed << std::setprecision(5) << temp_answer.answer;
return stream.str();
I am aware of the accuracy issues when using doubles and floats. Today I started working on code so that the user can compare the two mathematical strings. For instance, 1=1 will evaluate to true, 2>3 false...etc. This works by running my math expression parser for each side of the comparison operator, then comparing the answers.
The issue i'm facing right now is when the user enters something like 1/3*3=1. Of course because i'm using doubles the parser will return 0.999999as the answer. Usually when just solving a non-comparison problem this is compensated for at printing time with std::setprecision() as mentioned before. However, when comparing two doubles it's going to return false as 0.99999!=1. How can I get it so when comparing the doubles this inaccuracy is compensated for, and the answer returned correctly? Here's the code that I use to compare the numbers themselves.
bool math_comparisons::equal_to(std::string lhs, std::string rhs)
{
auto lhs_ret = std::async(process_question, lhs);
auto rhs_ret = std::async(process_question, rhs);
bool ReturnVal = false;
if (lhs_ret.get().answer == rhs_ret.get().answer)
{
ReturnVal = true;
}
return ReturnVal;
}
I'm thinking some kind of rounding needs to occur, but i'm not 100% sure how to accomplish it properly. Please forgive me if this has already been addressed - I couldn't find much with a search. Thanks!
Assuming that answer is a double, replace this
lhs_ret.get().answer == rhs_ret.get().answer
with
abs(lhs_ret.get().answer - rhs_ret.get().answer) < TOL
where TOL is an appropriate tolerance value.
Floating point numbers should never be compared with == but by checking if the absolute difference is less than a given tolerance.
There is one difficulty that needs to be mentioned: The accuracy of doubles is about 16 decimals. So you might set TOL=1.0e-16. This will only work if your numbers are less than 1. For a number with 16 digits, it means that the tolerance has to be as large as 1.
So either you assume that your numbers are smaller than say 10e8 and use a relatively large tolerance like 10e-8 or you need to do something much more complicated.
First consider:
As a basic rule of thumb, a double will be a value with roughly 16dp where dp is decimal places, or 1.0e-16. You need to be aware that this only applies to numbers that are less than one(1) IE for 10.n you'll have to operate around that fact you can only have 15dp EG: 10.0e-15 and so on... Due to computers counting in base 2, and people counting in base 10 some values can never be properly expressed in the bit ranges that "most" modern OS use.
This can be highlighted by the fact that expressing 0.1 in binary or base 2 is infinitely long.
Therefore you should never compare a rational number via the == operator. Instead what the "go to" solution conventionally used is:
You implement a "close enough" solution. IE: you define epsilon as a value eg: epsilon = 0.000001 and you state that if (value a - value b) < epsilon == true. What we are saying is that if a - b is within e, for all intents and purposes for our program, its close enough to be regarded as true.
Now for choosing a value for epsilon, that all depends on how accurate you need to be for your purposes. For example, one can assume you need a high level of accuracy for structural engineering compared to a 2D side scrolling platform game.
The solution in your case you be to replace line 7 of you code:
(lhs_ret.get().answer == rhs_ret.get().answer)
with
abs(lhs_ret.get().answer - rhs_ret.get().answer) < epsilon where abs is the absolute value. IE ignoring the sign of the value lhs.
For more context i highly recommend this lecture on MIT open courseware which explains it in an easy to digest manner.
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-00sc-introduction-to-computer-science-and-programming-spring-2011/unit-1/lecture-7-debugging/

Prevent underflow when adding two logarithms

I am using the addition in log space equation described on the Wikipedia log probability article, but I am getting underflow when computing the exp of very large, negative, logarithms. As a result, my program crashes.
Example inputs are a = -2 and b = -1033.4391885529124.
My code, implemented straight from the Wikipedia article, looks like this:
double log_sum(double a, double b)
{
double min_ab = std::min(a, b);
a = std::max(a, b);
b = min_ab;
if (isinf(a) && isinf(b)) {
return -std::numeric_limits<double>::infinity();
} else if (isinf(a)) {
return b;
} else if (isinf(b)) {
return a;
} else {
return a + log2(1 + exp2(b - a));
}
}
I've come up with the following ideas, but can't decide which is best:
Check for out-of-range inputs before evaluation.
Disable (somehow) the exception, and flush or clamp the output after evaluation
Implement custom log and exp functions that do not throw exceptions and automatically flush or clamp the results.
Some other ways?
Additionally, I'd be interested to know what effect the choice of the logarithm base has on the computation. I chose base two because I believed that other log bases would be calculated from log_n(x) = log_2(x) / log_2(n), and would suffer from precision loss due to the division. Is that correct?
According to http://en.cppreference.com/w/cpp/numeric/math/exp:
For IEEE-compatible type double, overflow is guaranteed if 709.8 < arg, and underflow is guaranteed if arg < -708.4
So you can't prevent an underflow. However:
If a range error occurs due to underflow, the correct result (after rounding) is returned.
So there shouldn't be any program crash - "just" a loss of precision.
However, notice that
1 + exp(n)
will loose precision much sooner, i.e. already at n = -53. This is because the next representable number after 1.0 is 1.0 + 2^-52.
So loss of precision due to exp is far less than the precision lost when adding 1.0 + exp(...)
The problem here is accurately computing the expression log(1+exp(x)) without intermediate under/overflow. Fortunately, Martin Maechler (one of the R core developers) details how to do it in section 3 of this vignette.
He uses natural base functions: it should be possible to translate it to base-2 by appropriately scaling the functions, but it uses the log1p function in one part, and I'm not aware of any math library which supplies a base-2 variant.
The choice of base is unlikely to have any effect on accuracy (or performance), and most reasonable math libraries are able to give sub 1-ulp guarantees for both functions (i.e. you will have one of the two floating point values closest to the exact answer). A pretty common approach is to break up the floating point number into its base-2 exponent k and significand 1+f, such that 1/sqrt(2) < 1+f < sqrt(2), and then use a polynomial approximation to compute log(1+f): due to some mathematical quirks (basically, the fact that the 2nd term of the Taylor series can be represented exactly) it turns out to be more accurate to do this in the natural base rather than base-2, so a typical implementation will look like:
log(x) = k*log2 + p(f)
log2(x) = k + p(f)*invlog2
(e.g. see log and log2 in openlibm), so there is no real benefit to using one over the other.

Validity of checking exact value of preset double

I sometimes do the following in my code when I am being lazy:
double d = 0;
...
if( condition1 ) d = 1.0;
...
if (d == 0) { //not sure if this is valid
}
can I always count on the fact that if I set a double to be 0 I can later check in the code for it with d == 0 reliably?
I understand that if I do computations on d in the above code, I can never expect d to be exactly 0 but if I set it, I would think I can. All my testing so far on various systems seem to indicate that this is legitimate.
Yes, for most cases (i.e. excluding NaN): After the assignment a = b; the condition a == b is true. So as long as you're only assigning and comparing to the thing you assign from, this should be reliable, even for floating-point types.
I would avoid it. While the comparison would almost certainly work in the example you gave, it could fall apart unexpectedly when you use values that cannot be exactly expressed using a floating point variable. Where a value cannot be exactly expressed in a floating point variable some precision may be lost when copying from the coprocessor to memory, which would not match a value that stayed in the coprocessor the whole time.
My preference would be to add a range e.g. if (fabs(d)<0.0001) or keep the information in a boolean.
Edit: From wikipedia: "For example, the decimal number 0.1 is not representable in binary floating-point of any finite precision; the exact binary representation would have a "1100" sequence continuing endlessly:"
So if you had if (d==0.1) that has the potential to create a big mess - are you sure you will never have to change the code in this way?

Is floating-point == ever OK?

Just today I came across third-party software we're using and in their sample code there was something along these lines:
// Defined in somewhere.h
static const double BAR = 3.14;
// Code elsewhere.cpp
void foo(double d)
{
if (d == BAR)
...
}
I'm aware of the problem with floating-points and their representation, but it made me wonder if there are cases where float == float would be fine? I'm not asking for when it could work, but when it makes sense and works.
Also, what about a call like foo(BAR)? Will this always compare equal as they both use the same static const BAR?
Yes, you are guaranteed that whole numbers, including 0.0, compare with ==
Of course you have to be a little careful with how you got the whole number in the first place, assignment is safe but the result of any calculation is suspect
ps there are a set of real numbers that do have a perfect reproduction as a float (think of 1/2, 1/4 1/8 etc) but you probably don't know in advance that you have one of these.
Just to clarify. It is guaranteed by IEEE 754 that float representions of integers (whole numbers) within range, are exact.
float a=1.0;
float b=1.0;
a==b // true
But you have to be careful how you get the whole numbers
float a=1.0/3.0;
a*3.0 == 1.0 // not true !!
There are two ways to answer this question:
Are there cases where float == float gives the correct result?
Are there cases where float == float is acceptable coding?
The answer to (1) is: Yes, sometimes. But it's going to be fragile, which leads to the answer to (2): No. Don't do that. You're begging for bizarre bugs in the future.
As for a call of the form foo(BAR): In that particular case the comparison will return true, but when you are writing foo you don't know (and shouldn't depend on) how it is called. For example, calling foo(BAR) will be fine but foo(BAR * 2.0 / 2.0) (or even maybe foo(BAR * 1.0) depending on how much the compiler optimises things away) will break. You shouldn't be relying on the caller not performing any arithmetic!
Long story short, even though a == b will work in some cases you really shouldn't rely on it. Even if you can guarantee the calling semantics today maybe you won't be able to guarantee them next week so save yourself some pain and don't use ==.
To my mind, float == float is never* OK because it's pretty much unmaintainable.
*For small values of never.
The other answers explain quite well why using == for floating point numbers is dangerous. I just found one example that illustrates these dangers quite well, I believe.
On the x86 platform, you can get weird floating point results for some calculations, which are not due to rounding problems inherent to the calculations you perform. This simple C program will sometimes print "error":
#include <stdio.h>
void test(double x, double y)
{
const double y2 = x + 1.0;
if (y != y2)
printf("error\n");
}
void main()
{
const double x = .012;
const double y = x + 1.0;
test(x, y);
}
The program essentially just calculates
x = 0.012 + 1.0;
y = 0.012 + 1.0;
(only spread across two functions and with intermediate variables), but the comparison can still yield false!
The reason is that on the x86 platform, programs usually use the x87 FPU for floating point calculations. The x87 internally calculates with a higher precision than regular double, so double values need to be rounded when they are stored in memory. That means that a roundtrip x87 -> RAM -> x87 loses precision, and thus calculation results differ depending on whether intermediate results passed via RAM or whether they all stayed in FPU registers. This is of course a compiler decision, so the bug only manifests for certain compilers and optimization settings :-(.
For details see the GCC bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
Rather scary...
Additional note:
Bugs of this kind will generally be quite tricky to debug, because the different values become the same once they hit RAM.
So if for example you extend the above program to actually print out the bit patterns of y and y2 right after comparing them, you will get the exact same value. To print the value, it has to be loaded into RAM to be passed to some print function like printf, and that will make the difference disappear...
I'll provide more-or-less real example of legitimate, meaningful and useful testing for float equality.
#include <stdio.h>
#include <math.h>
/* let's try to numerically solve a simple equation F(x)=0 */
double F(double x) {
return 2 * cos(x) - pow(1.2, x);
}
/* a well-known, simple & slow but extremely smart method to do this */
double bisection(double range_start, double range_end) {
double a = range_start;
double d = range_end - range_start;
int counter = 0;
while (a != a + d) // <-- WHOA!!
{
d /= 2.0;
if (F(a) * F(a + d) > 0) /* test for same sign */
a = a + d;
++counter;
}
printf("%d iterations done\n", counter);
return a;
}
int main() {
/* we must be sure that the root can be found in [0.0, 2.0] */
printf("F(0.0)=%.17f, F(2.0)=%.17f\n", F(0.0), F(2.0));
double x = bisection(0.0, 2.0);
printf("the root is near %.17f, F(%.17f)=%.17f\n", x, x, F(x));
}
I'd rather not explain the bisection method used itself, but emphasize on the stopping condition. It has exactly the discussed form: (a == a+d) where both sides are floats: a is our current approximation of the equation's root, and d is our current precision. Given the precondition of the algorithm — that there must be a root between range_start and range_end — we guarantee on every iteration that the root stays between a and a+d while d is halved every step, shrinking the bounds.
And then, after a number of iterations, d becomes so small that during addition with a it gets rounded to zero! That is, a+d turns out to be closer to a then to any other float; and so the FPU rounds it to the closest representable value: to a itself. Calculation on a hypothetical machine can illustrate; let it have 4-digit decimal mantissa and some large exponent range. Then what result should the machine give to 2.131e+02 + 7.000e-3? The exact answer is 213.107, but our machine can't represent such number; it has to round it. And 213.107 is much closer to 213.1 than to 213.2 — so the rounded result becomes 2.131e+02 — the little summand vanished, rounded up to zero. Exactly the same is guaranteed to happen at some iteration of our algorithm — and at that point we can't continue anymore. We have found the root to maximum possible precision.
Addendum
No you can't just use "some small number" in the stopping condition. For any choice of the number, some inputs will deem your choice too large, causing loss of precision, and there will be inputs which will deem your choiсe too small, causing excess iterations or even entering infinite loop. Imagine that our F can change — and suddenly the solutions can be both huge 1.0042e+50 and tiny 1.0098e-70. Detailed discussion follows.
Calculus has no notion of a "small number": for any real number, you can find infinitely many even smaller ones. The problem is, among those "even smaller" ones might be a root of our equation. Even worse, some equations will have distinct roots (e.g. 2.51e-8 and 1.38e-8) — both of which will get approximated by the same answer if our stopping condition looks like d < 1e-6. Whichever "small number" you choose, many roots which would've been found correctly to the maximum precision with a == a+d — will get spoiled by the "epsilon" being too large.
It's true however that floats' exponent has finite limited range, so one actually can find the smallest nonzero positive FP number; in IEEE 754 single precision, it's the 1e-45 denorm. But it's useless! while (d >= 1e-45) {…} will loop forever with single-precision (positive nonzero) d.
At the same time, any choice of the "small number" in d < eps stopping condition will be too small for many equations. Where the root has high enough exponent, the result of subtraction of two neighboring mantissas will easily exceed our "epsilon". For example, 7.00023e+8 - 7.00022e+8 = 0.00001e+8 = 1.00000e+3 = 1000 — meaning that the smallest possible difference between numbers with exponent +8 and 6-digit mantissa is... 1000! It will never fit into, say, 1e-4. For numbers with relatively high exponent we simply have not enough precision to ever see a difference of 1e-4. This means eps = 1e-4 will be too small!
My implementation above took this last problem into account; you can see that d is halved each step — instead of getting recalculated as difference of (possibly huge in exponent) a and b. For reals, it doesn't matter; for floats it does! The algorithm will get into infinite loops with (b-a) < eps on equations with huge enough roots. The previous paragraph shows why. d < eps won't get stuck, but even then — needless iterations will be performed during shrinking d way down below the precision of a — still showing the choice of eps as too small. But a == a+d will stop exactly at precision.
Thus as shown: any choice of eps in while (d < eps) {…} will be both too large and too small, if we allow F to vary.
... This kind of reasoning may seem overly theoretical and needlessly deep, but it's to illustrate again the trickiness of floats. One should be aware of their finite precision when writing arithmetic operators around.
Perfect for integral values even in floating point formats
But the short answer is: "No, don't use ==."
Ironically, the floating point format works "perfectly", i.e., with exact precision, when operating on integral values within the range of the format. This means that you if you stick with double values, you get perfectly good integers with a little more than 50 bits, giving you about +- 4,500,000,000,000,000, or 4.5 quadrillion.
In fact, this is how JavaScript works internally, and it's why JavaScript can do things like + and - on really big numbers, but can only << and >> on 32-bit ones.
Strictly speaking, you can exactly compare sums and products of numbers with precise representations. Those would be all the integers, plus fractions composed of 1 / 2n terms. So, a loop incrementing by n + 0.25, n + 0.50, or n + 0.75 would be fine, but not any of the other 96 decimal fractions with 2 digits.
So the answer is: while exact equality can in theory make sense in narrow cases, it is best avoided.
The only case where I ever use == (or !=) for floats is in the following:
if (x != x)
{
// Here x is guaranteed to be Not a Number
}
and I must admit I am guilty of using Not A Number as a magic floating point constant (using numeric_limits<double>::quiet_NaN() in C++).
There is no point in comparing floating point numbers for strict equality. Floating point numbers have been designed with predictable relative accuracy limits. You are responsible for knowing what precision to expect from them and your algorithms.
It's probably ok if you're never going to calculate the value before you compare it. If you are testing if a floating point number is exactly pi, or -1, or 1 and you know that's the limited values being passed in...
I also used it a few times when rewriting few algorithms to multithreaded versions. I used a test that compared results for single- and multithreaded version to be sure, that both of them give exactly the same result.
Let's say you have a function that scales an array of floats by a constant factor:
void scale(float factor, float *vector, int extent) {
int i;
for (i = 0; i < extent; ++i) {
vector[i] *= factor;
}
}
I'll assume that your floating point implementation can represent 1.0 and 0.0 exactly, and that 0.0 is represented by all 0 bits.
If factor is exactly 1.0 then this function is a no-op, and you can return without doing any work. If factor is exactly 0.0 then this can be implemented with a call to memset, which will likely be faster than performing the floating point multiplications individually.
The reference implementation of BLAS functions at netlib uses such techniques extensively.
In my opinion, comparing for equality (or some equivalence) is a requirement in most situations: standard C++ containers or algorithms with an implied equality comparison functor, like std::unordered_set for example, requires that this comparator be an equivalence relation (see C++ named requirements: UnorderedAssociativeContainer).
Unfortunately, comparing with an epsilon as in abs(a - b) < epsilon does not yield an equivalence relation since it loses transitivity. This is most probably undefined behavior, specifically two 'almost equal' floating point numbers could yield different hashes; this can put the unordered_set in an invalid state.
Personally, I would use == for floating points most of the time, unless any kind of FPU computation would be involved on any operands. With containers and container algorithms, where only read/writes are involved, == (or any equivalence relation) is the safest.
abs(a - b) < epsilon is more or less a convergence criteria similar to a limit. I find this relation useful if I need to verify that a mathematical identity holds between two computations (for example PV = nRT, or distance = time * speed).
In short, use == if and only if no floating point computation occur;
never use abs(a-b) < e as an equality predicate;
Yes. 1/x will be valid unless x==0. You don't need an imprecise test here. 1/0.00000001 is perfectly fine. I can't think of any other case - you can't even check tan(x) for x==PI/2
The other posts show where it is appropriate. I think using bit-exact compares to avoid needless calculation is also okay..
Example:
float someFunction (float argument)
{
// I really want bit-exact comparison here!
if (argument != lastargument)
{
lastargument = argument;
cachedValue = very_expensive_calculation (argument);
}
return cachedValue;
}
I would say that comparing floats for equality would be OK if a false-negative answer is acceptable.
Assume for example, that you have a program that prints out floating points values to the screen and that if the floating point value happens to be exactly equal to M_PI, then you would like it to print out "pi" instead. If the value happens to deviate a tiny bit from the exact double representation of M_PI, it will print out a double value instead, which is equally valid, but a little less readable to the user.
I have a drawing program that fundamentally uses a floating point for its coordinate system since the user is allowed to work at any granularity/zoom. The thing they are drawing contains lines that can be bent at points created by them. When they drag one point on top of another they're merged.
In order to do "proper" floating point comparison I'd have to come up with some range within which to consider the points the same. Since the user can zoom in to infinity and work within that range and since I couldn't get anyone to commit to some sort of range, we just use '==' to see if the points are the same. Occasionally there'll be an issue where points that are supposed to be exactly the same are off by .000000000001 or something (especially around 0,0) but usually it works just fine. It's supposed to be hard to merge points without the snap turned on anyway...or at least that's how the original version worked.
It throws of the testing group occasionally but that's their problem :p
So anyway, there's an example of a possibly reasonable time to use '=='. The thing to note is that the decision is less about technical accuracy than about client wishes (or lack thereof) and convenience. It's not something that needs to be all that accurate anyway. So what if two points won't merge when you expect them to? It's not the end of the world and won't effect 'calculations'.

Hash function for floats

I'm currently implementing a hash table in C++ and I'm trying to make a hash function for floats...
I was going to treat floats as integers by padding the decimal numbers, but then I realized that I would probably reach the overflow with big numbers...
Is there a good way to hash floats?
You don't have to give me the function directly, but I'd like to see/understand different concepts...
Notes:
I don't need it to be really fast, just evenly distributed if possible.
I've read that floats should not be hashed because of the speed of computation, can someone confirm/explain this and give me other reasons why floats should not be hashed? I don't really understand why (besides the speed)
It depends on the application but most of time floats should not be hashed because hashing is used for fast lookup for exact matches and most floats are the result of calculations that produce a float which is only an approximation to the correct answer. The usually way to check for floating equality is to check if it is within some delta (in absolute value) of the correct answer. This type of check does not lend itself to hashed lookup tables.
EDIT:
Normally, because of rounding errors and inherent limitations of floating point arithmetic, if you expect that floating point numbers a and b should be equal to each other because the math says so, you need to pick some relatively small delta > 0, and then you declare a and b to be equal if abs(a-b) < delta, where abs is the absolute value function. For more detail, see this article.
Here is a small example that demonstrates the problem:
float x = 1.0f;
x = x / 41;
x = x * 41;
if (x != 1.0f)
{
std::cout << "ooops...\n";
}
Depending on your platform, compiler and optimization levels, this may print ooops... to your screen, meaning that the mathematical equation x / y * y = x does not necessarily hold on your computer.
There are cases where floating point arithmetic produces exact results, e.g. reasonably sized integers and rationals with power-of-2 denominators.
If your hash function did the following you'd get some degree of fuzziness on the hash lookup
unsigned int Hash( float f )
{
unsigned int ui;
memcpy( &ui, &f, sizeof( float ) );
return ui & 0xfffff000;
}
This way you'll mask off the 12 least significant bits allowing for a degree of uncertainty ... It really depends on yout application however.
You can use the std hash, it's not bad:
std::size_t myHash = std::cout << std::hash<float>{}(myFloat);
unsigned hash(float x)
{
union
{
float f;
unsigned u;
};
f = x;
return u;
}
Technically undefined behavior, but most compilers support this. Alternative solution:
unsigned hash(float x)
{
return (unsigned&)x;
}
Both solutions depend on the endianness of your machine, so for example on x86 and SPARC, they will produce different results. If that doesn't bother you, just use one of these solutions.
You can of course represent a float as an int type of the same size to hash it, however this naive approach has some pitfalls you need to be careful of...
Simply converting to a binary representation is error prone since values which are equal wont necessarily have the same binary representation.
An obvious case: -0.0 wont match 0.0 for example. *
Further, simply converting to an int of the same size wont give very even distribution, which is often important (implementing a hash/set that uses buckets for example).
Suggested steps for implementation:
filter out non-finite cases (nan, inf) and (0.0, -0.0 whether you need to do this explicitly or not depends on the method used).
convert to an int of the same size(that is - use a union for example to represent the float as an int, not simply cast to an int).
re-distribute the bits, (intentionally vague here!), this is basically a speed vs quality tradeoff. But if you have many values in a small range you probably don't want them to in a similar range too.
*: You may wan't to check for (nan and -nan) too. How to handle those exactly depends on your use case (you may want to ignore sign for all nan's as CPython does).
Python's _Py_HashDouble is a good reference for how you might hash a float, in production code (ignore the -1 check at the end, since that's a special value for Python).
If you're interested, I just made a hash function that uses floating point and can hash floats. It also passes SMHasher ( which is the main bias-test for non-crypto hash functions ). It's a lot slower than normal non-cryptographic hash functions due to the float calculations.
I'm not sure if tifuhash will become useful for all applications, but it's interesting to see a simple floating point function pass both PractRand and SMHasher.
The main state update function is very simple, and looks like:
function q( state, val, numerator, denominator ) {
// Continued Fraction mixed with Egyptian fraction "Continued Egyptian Fraction"
// with denominator = val + pos / state[1]
state[0] += numerator / denominator;
state[0] = 1.0 / state[0];
// Standard Continued Fraction with a_i = val, b_i = (a_i-1) + i + 1
state[1] += val;
state[1] = numerator / state[1];
}
Anyway, you can get it on npm
Or you can check out the github
Using is simple:
const tifu = require('tifuhash');
const message = 'The medium is the message.';
const number = 333333333;
const float = Math.PI;
console.log( tifu.hash( message ),
tifu.hash( number ),
tifu.hash( float ),
tifu.hash( ) );
There's a demo of some hashes on runkit here https://runkit.com/593a239c56ebfd0012d15fc9/593e4d7014d66100120ecdb9
Side note: I think that in future using floating point,possibly big arrays of floating point calculations, could be a useful way to make more computationally-demanding hash functions in future. A weird side effect I discovered of using floating point is that the hashes are target dependent, and I surmise maybe they could be use to fingerprint the platforms they were calculated on.
Because of the IEEE byte ordering the Java Float.hashCode() and Double.hashCode() do not give good results. This problem is wellknown and can be adressed by this scrambler:
class HashScrambler {
/**
* https://sites.google.com/site/murmurhash/
*/
static int murmur(int x) {
x ^= x >> 13;
x *= 0x5bd1e995;
return x ^ (x >> 15);
}
}
You then get a good hash function, which also allows you to use Float and Double in hash tables. But you need to write your own hash table that allows a custom hash function.
Since in a hash table you need also test for equality, you need an exact equality to make it work. Maybe the later is what President James K. Polk intends to adress?