What is an efficient method to calculate the integer part of the base 2 logarithm of a floating point number? Something like
N = ceil( log2( f ))
or
N = floor( log2( f ))
for floating point f. I guess this is possible to realize very efficiently somehow as one probably only needs access to the floating point exponent.
EDIT2: I am not primarily interested in exactness. I could tolerate an error of +-1. I listed the two variants just as an example because one might be computationally cheaper than the other (but I dont know).
I need this for accuracy control of an algorithm where the parameter f is some tolerance and the log is needed to control the number of terms. Accurate calculation of the log is not important.
EDIT: this is not a duplicate of other the many other questions asking for the log2 of an integer argument (e.g. How to do an integer log2() in C++?) . This here is about floating point argument and a completely different story. Specifically I need it for f < 1, which is not possible at all with the integer methods
The standard library function frexp does exactly that: it decomposes a double into an integer exponent and a normalized mantissa.
If you are content with the floor of the logarithm, rather than rounding the logarithm to the nearest integer, you are probably better off with the newer standard library function ilogb.
Note that these two functions treat zeros and infinities differently, so they are not quite interchangeable.
Inspired by rici, who pointed me to frexp, I think I found the answer. In C99 and recent C++ we have the function ilogb which does exactly that what I need
int ilogb( float arg );
int ilogb( double arg );
and is equivalent to
(int)logb( arg )
It returns one less than frexp. So the frexp result corresponds to
floor(log2(arg)+1
and ilogb(arg) to
floor(log2(arg))
This is a horrible hack which extracts the exponent from the little-endian float although I make no guarantee about portability, etc.
#include <stdio.h>
int main(void) {
float f;
unsigned i;
unsigned *ip = (unsigned*)&f;
printf("Enter a float: ");
scanf("%f", &f);
i = *ip;
i = (i >> 23) & 0xFF;
i -= 127;
printf("%f %d\n", f, (int)i);
return 0;
}
Program output:
Enter a float: 0.125
0.125000 -3
Related
Today in my C++ programming lessons, my proff told me that one should never compare two floating point values directly.
So I tried this piece of code and found out the reason for his statement.
double l_Value=94.9;
print("%.20lf",l_Value);
And I found the results as 94.89999999 ( some relative error )
I understand that floating numbers are not stored in the way one presents it to the code. Squeezing those ones and zeros in binary form involves some relative rounding errors.
Iam looking for solutions to two problems.
1. Efficient way to compare two floating values.
2. How to add a floating value to another one. Example. Add 0.1111 to 94.4345 to get the exact value as 94.5456
Thanks in advance.
Efficient way to compare two floating values.
A simple double a,b; if (a == b) is an efficient way to compare two floating values. Yet as OP noticed, this may not meet the overall coding goal. Better ways depend on the context of the compare, something not supplied by OP. See far below.
How to add a floating value to another one. Example. Add 0.1111 to 94.4345 to get the exact value as 94.5456
Floating values as source code have effective unlimited range and precision such as 1.23456789012345678901234567890e1234567. Conversion of this text to a double is limited typically to one of 264 different values. The closest is selected, but that may not be an exact match.
Neither 0.1111, 94.4345, 94.5456 can be representably exactly as a typical double.
OP has choices:
1.) Use another type other than double, float. Various libraries offer decimal floating point types.
2) Limit code to rare platforms that support double to a base 10 form such that FLT_RADIX == 10.
3) Write your own code to handle user input like "0.1111" into a structure/string and perform the needed operations.
4) Treat user input as strings and the convert to some integer type, again with supported routines to read/compute/and write.
5) Accept that floating point operations are not mathematically exact and handle round-off error.
double a = 0.1111;
printf("a: %.*e\n", DBL_DECIMAL_DIG -1 , a);
double b = 94.4345;
printf("b: %.*e\n", DBL_DECIMAL_DIG -1 , b);
double sum = a + b;
printf("sum: %.*e\n", DBL_DECIMAL_DIG -1 , sum);
printf("%.4f\n", sum);
Output
a: 1.1110000000000000e-01
b: 9.4434500000000000e+01
sum: 9.4545599999999993e+01
94.5456 // Desired textual output based on a rounded `sum` to the nearest 0.0001
More on #1
If an exact compare is not sought but some sort of "are the two values close enough?", a definition of "close enough" is needed - of which there are many.
The following "close enough" compares the distance by examining the ULP of the two numbers. It is a linear difference when the values are in the same power-of-two and becomes logarithmic other wise. Of course, change of sign is an issue.
float example:
Consider all finite float ordered from most negative to most positive. The following, somewhat-portable code, returns an integer for each float with that same order.
uint32_t sequence_f(float x) {
union {
float f;
uint32_t u32;
} u;
assert(sizeof(float) == sizeof(uint32_t));
u.f = x;
if (u.u32 & 0x80000000) {
u.u32 ^= 0x80000000;
return 0x80000000 - u.u32;
}
return u.u3
}
Now, to determine if two float are "close enough", simple compare two integers.
static bool close_enough(float x, float y, uint32_t ULP_delta) {
uint32_t ullx = sequence_f(x);
uint32_t ully = sequence_f(y);
if (ullx > ully) return (ullx - ully) <= ULP_delta;
return (ully - ullx) <= ULP_delta;
}
The way I've usually done this is is to have a custom equality comparison function. The basic idea, is you have a certain tolerance, say 0.0001 or something. Then you subtract your two numbers and take their absolute value, and if it is less than your tolerance you treat it as equal. There are other strategies that may be more appropriate for certain situations, of course.
Define for yourself a tolerance level e (for example, e=.0001) and check if abs(a-b) <= e
You aren't going to get an "exact" value with floating point. Ever. If you know in advance that you are using four decimals, and you want "exact", then you need to internally treat your numbers as integers and only display them as decimals. 944345 + 1111 = 945456
This question already has an answer here:
maximum value in float
(1 answer)
Closed 7 years ago.
So I've been looking at IEEE754 floating point double. (My C++ compiler uses that type for a double).
Consider this snippet:
// 9007199254740992 is the 53rd power of 2.
// 590295810358705700000 is the 69th power of 2.
for (double f = 9007199254740992; f <= 590295810358705700000; ++f){
/* what is f?*/
}
Presumably f increments in even steps up to the 54th power of 2, due to rounding up?
Then after that, nothing happens due to rounding down?
Is that correct? Is it even well-defined?
++f is essentially the same as f = f + 1, ignoring the fact that ++f is an expression that yields a value.
Now, for floating point values, the issue of representability comes into play. It may be that f + 1 is not representable. In which case, f + 1 will evaluate to the nearest representable value to the true value of f + 1. In case there are two equally near candidates for nearest representable value, round to even is used.
This is covered in the Operations section of What Every Computer Scientist Should Know About Floating-Point Arithmetic:
The IEEE standard requires that the result of addition, subtraction, multiplication and division be exactly rounded. That is, the result must be computed exactly and then rounded to the nearest floating-point number (using round to even).
So, if your example, for sufficiently large values of f, you will find that f == f + 1.
Yes, this loop will never end on rounding problem. I hope the reason is clear for you (since you are familiar with https://en.wikipedia.org/wiki/IEEE_floating_point) but let me describe in short for impatient audience.
We can think about floating point as forced by compiler/FPU/standard special presentation of number. For simple example let's review:
20000
2e4
0.2e5
Both three forms represents the same number. Last two form called "science" form but what is the best? IEEE754 answers - the last one because we can save the space by omitting leading 0 and just write .2e5 . Such decimal analogy is very close to binary presentation where there is a space for mantissa (.2) and exponent (5).
Now let's do the same for 20000.00000000001
0.2000000000000001e5
As we can see mantissa growth and there is some limit where fixed memory will overflow. Instead of exception we sacrifice precision, that (just as example) give as the 0.2e5.
For bigger numbers (as in question) we have lost in precision too.
9007199254740992 may be presented as 0.9e16 And when 1 is added nothing happens.
So f = f + 1 creates infinite loop
Being f++ the same as f = f + 1, as pointed out on the comments, and as i tested myself, f == f+1 (!!) for a large f dependent on the platform. An explanation is here (for small numbers, but the principle is the same) http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BinMath/addFloat.html
Here's how to add floating point numbers.
First, convert the two representations to scientific notation. Thus,
we explicitly represent the hidden 1. In order to add, we need the
exponents of the two numbers to be the same. We do this by rewriting
Y. This will result in Y being not normalized, but value is equivalent
to the normalized Y. Add x - y to Y's exponent. Shift the radix point
of the mantissa (signficand) Y left by x - y to compensate for the
change in exponent. Add the two mantissas of X and the adjusted Y
together. If the sum in the previous step does not have a single bit
of value 1, left of the radix point, then adjust the radix point and
exponent until it does. Convert back to the one byte floating point
representation.
In the process of converting the number to the same exponent, due to precision, 1 is rounded to 0, and hence f == f + 1.
According to IEEE754, after the sum the number is rounded to match the double format, and due to the rounding operation, f==f+1.
I don't know if there are problems where looping over large floating point values by increment of 1 is a meaningful solution, but people may be stumbling on this question looking for a workaround for their neverending loop. Therefore, even though the question only asks how the addition is defined by the standard, I'll propose a workaround.
Indeed, for large values of f, f++ == f is true, and using that as the increment in loop will have undefined behaviour.
Assuming it's OK that f be incremented by a number that is the smallest number e greater than 1 for which the floating point has a representation f + e > f. In that case, following workaround where the loop will always terminate could be OK:
// use template, or overloads for different floatingpoints
template<class T>
T add_s(T l, T r) {
T result = l + r;
T greater = std::max(l, r);
if(result == greater)
return std::nextafter(greater, std::numeric_limits<T>::max());
return result;
}
// ...
for (double f = /*...*/; f < /*...*/; f = add_s(f, 1.0))
That said, adding tiny floats to huge floats will result in an uncontrollable cumulation of errors. If that's not OK for you, then you need arbitraty precision math, not floating point.
When reading Lua’s source code, I noticed that Lua uses a macro to round double values to 32-bit int values. The macro is defined in the Llimits.h header file and reads as follows:
union i_cast {double d; int i[2]};
#define double2int(i, d, t) \
{volatile union i_cast u; u.d = (d) + 6755399441055744.0; \
(i) = (t)u.i[ENDIANLOC];}
Here ENDIANLOC is defined according to endianness: 0 for little endian, 1 for big endian architectures; Lua carefully handles endianness. The t argument is substituted with an integer type like int or unsigned int.
I did a little research and found that there is a simpler format of that macro which uses the same technique:
#define double2int(i, d) \
{double t = ((d) + 6755399441055744.0); i = *((int *)(&t));}
Or, in a C++-style:
inline int double2int(double d)
{
d += 6755399441055744.0;
return reinterpret_cast<int&>(d);
}
This trick can work on any machine using IEEE 754 (which means pretty much every machine today). It works for both positive and negative numbers, and the rounding follows Banker’s Rule. (This is not surprising, since it follows IEEE 754.)
I wrote a little program to test it:
int main()
{
double d = -12345678.9;
int i;
double2int(i, d)
printf("%d\n", i);
return 0;
}
And it outputs -12345679, as expected.
I would like to understand how this tricky macro works in detail. The magic number 6755399441055744.0 is actually 251 + 252, or 1.5 × 252, and 1.5 in binary can be represented as 1.1. When any 32-bit integer is added to this magic number—
Well, I’m lost from here. How does this trick work?
Update
As #Mysticial points out, this method does not limit itself to a 32-bit int, it can also be expanded to a 64-bit int as long as the number is in the range of 252. (Although the macro needs some modification.)
Some materials say this method cannot be used in Direct3D.
When working with Microsoft assembler for x86, there is an even faster macro written in assembly code (the following is also extracted from Lua source):
#define double2int(i,n) __asm {__asm fld n __asm fistp i}
There is a similar magic number for single precision numbers: 1.5 × 223.
A value of the double floating-point type is represented like so:
and it can be seen as two 32-bit integers; now, the int taken in all the versions of your code (supposing it’s a 32-bit int) is the one on the right in the figure, so what you are doing in the end is just taking the lowest 32 bits of mantissa.
Now, to the magic number; as you correctly stated, 6755399441055744 is 251 + 252; adding such a number forces the double to go into the “sweet range” between 252 and 253, which, as explained by Wikipedia, has an interesting property:
Between 252 = 4,503,599,627,370,496 and 253 = 9,007,199,254,740,992, the representable numbers are exactly the integers.
This follows from the fact that the mantissa is 52 bits wide.
The other interesting fact about adding 251 + 252 is that it affects the mantissa only in the two highest bits—which are discarded anyway, since we are taking only its lowest 32 bits.
Last but not least: the sign.
IEEE 754 floating point uses a magnitude and sign representation, while integers on “normal” machines use 2’s complement arithmetic; how is this handled here?
We talked only about positive integers; now suppose we are dealing with a negative number in the range representable by a 32-bit int, so less (in absolute value) than (−231 + 1); call it −a. Such a number is obviously made positive by adding the magic number, and the resulting value is 252 + 251 + (−a).
Now, what do we get if we interpret the mantissa in 2’s complement representation? It must be the result of 2’s complement sum of (252 + 251) and (−a). Again, the first term affects only the upper two bits, what remains in the bits 0–50 is the 2’s complement representation of (−a) (again, minus the upper two bits).
Since reduction of a 2’s complement number to a smaller width is done just by cutting away the extra bits on the left, taking the lower 32 bits gives us correctly (−a) in 32-bit, 2’s complement arithmetic.
This kind of "trick" comes from older x86 processors, using the 8087 intructions/interface for floating point. On these machines, there's an instruction for converting floating point to integer "fist", but it uses the current fp rounding mode. Unfortunately, the C spec requires that fp->int conversions truncate towards zero, while all other fp operations round to nearest, so doing an
fp->int conversion requires first changing the fp rounding mode, then doing a fist, then restoring the fp rounding mode.
Now on the original 8086/8087, this wasn't too bad, but on later processors that started to get super-scalar and out-of-order execution, altering the fp rounding mode generally serializes the CPU core and is quite expensive. So on a CPU like a Pentium-III or Pentium-IV, this overall cost is quite high -- a normal fp->int conversion is 10x or more expensive than this add+store+load trick.
On x86-64, however, floating point is done with the xmm instructions, and the cost of converting
fp->int is pretty small, so this "optimization" is likely slower than a normal conversion.
if this helps with visualization, that lua magic value
(2^52+2^51, or base2 of 110 then [50 zeros]
hex
0x 0018 0000 0000 0000 (18e12)
octal
0 300 00000 00000 00000 ( 3e17)
Here is a simpler implementation of the above Lua trick:
/**
* Round to the nearest integer.
* for tie-breaks: round half to even (bankers' rounding)
* Only works for inputs in the range: [-2^51, 2^51]
*/
inline double rint(double d)
{
double x = 6755399441055744.0; // 2^51 + 2^52
return d + x - x;
}
The trick works for numbers with absolute value < 2 ^ 51.
This is a little program to test it: ideone.com
#include <cstdio>
int main()
{
// round to nearest integer
printf("%.1f, %.1f\n", rint(-12345678.3), rint(-12345678.9));
// test tie-breaking rule
printf("%.1f, %.1f, %.1f, %.1f\n", rint(-24.5), rint(-23.5), rint(23.5), rint(24.5));
return 0;
}
// output:
// -12345678.0, -12345679.0
// -24.0, -24.0, 24.0, 24.0
Consider the following piece of code:
#include <iostream>
#include <cmath>
int main() {
int i = 23;
int j = 1;
int base = 10;
int k = 2;
i += j * pow(base, k);
std::cout << i << std::endl;
}
It outputs "122" instead of "123". Is it a bug in g++ 4.7.2 (MinGW, Windows XP)?
std::pow() works with floating point numbers, which do not have infinite precision, and probably the implementation of the Standard Library you are using implements pow() in a (poor) way that makes this lack of infinite precision become relevant.
However, you could easily define your own version that works with integers. In C++11, you can even make it constexpr (so that the result could be computed at compile-time when possible):
constexpr int int_pow(int b, int e)
{
return (e == 0) ? 1 : b * int_pow(b, e - 1);
}
Here is a live example.
Tail-recursive form (credits to Dan Nissenbaum):
constexpr int int_pow(int b, int e, int res = 1)
{
return (e == 0) ? res : int_pow(b, e - 1, b * res);
}
All the other answers so far miss or dance around the one and only problem in the question:
The pow in your C++ implementation is poor quality. It returns an inaccurate answer when there is no need to.
Get a better C++ implementation, or at least replace the math functions in it. The one pointed to by Pascal Cuoq is good.
Not with mine at least:
$ g++ --version | head -1
g++ (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2)
$ ./a.out
123
IDEone is also running version 4.7.2 and gives 123.
Signatures of pow() from http://www.cplusplus.com/reference/cmath/pow/
double pow ( double base, double exponent );
long double pow ( long double base, long double exponent );
float pow ( float base, float exponent );
double pow ( double base, int exponent );
long double pow ( long double base, int exponent );
You should set double base = 10.0; and double i = 23.0.
If you simply write
#include <iostream>
#include <cmath>
int main() {
int i = 23;
int j = 1;
int base = 10;
int k = 2;
i += j * pow(base, k);
std::cout << i << std::endl;
}
what do you think is pow supposed to refer to? The C++ standard does not even guarantee that after including cmath you'll have a pow function at global scope.
Keep in mind that all the overloads are at least in the std namespace. There is are pow functions that take an integer exponent and there are pow functions that take floating point exponents. It is quite possible that your C++ implementation only declares the C pow function at global scope. This function takes a floating point exponent. The thing is that this function is likely to have a couple of approximation and rounding errors. For example, one possible way of implementing that function is:
double pow(double base, double power)
{
return exp(log(base)*power);
}
It's quite possible that pow(10.0,2.0) yields something like 99.99999999992543453265 due to rounding and approximation errors. Combined with the fact that floating point to integer conversion yields the number before the decimal point this explains your result of 122 because 99+3=122.
Try using an overload of pow which takes an integer exponent and/or do some proper rounding from float to int. The overload taking an integer exponent might give you the exact result for 10 to the 2nd power.
Edit:
As you pointed out, trying to use the std::pow(double,int) overload also seems to yield a value slightly less 100. I took the time to check the ISO standards and the libstdc++ implementation to see that starting with C++11 the overloads taking integer exponents have been dropped as a result of resolving defect report 550. Enabling C++0x/C++11 support actually removes the overloads in the libstdc++ implementation which could explain why you did not see any improvement.
Anyhow, it is probably a bad idea to rely on the accuracy of such a function especially if a conversion to integer is involved. A slight error towards zero will obviously make a big difference if you expect a floating point value that is an integer (like 100) and then convert it to an int-type value. So my suggestion would be write your own pow function that takes all integers or take special care with respect to the double->int conversion using your own round function so that a slight error torwards zero does not change the result.
Your problem is not a bug in gcc, that's absolutely certain. It may be a bug in the implementation of pow, but I think your problem is really simply the fact that you are using pow which gives an imprecise floating point result (because it is implemented as something like exp(power * log(base)); and log(base) is never going to be absolutely accurate [unless base is a power of e].
I'm currently implementing a hash table in C++ and I'm trying to make a hash function for floats...
I was going to treat floats as integers by padding the decimal numbers, but then I realized that I would probably reach the overflow with big numbers...
Is there a good way to hash floats?
You don't have to give me the function directly, but I'd like to see/understand different concepts...
Notes:
I don't need it to be really fast, just evenly distributed if possible.
I've read that floats should not be hashed because of the speed of computation, can someone confirm/explain this and give me other reasons why floats should not be hashed? I don't really understand why (besides the speed)
It depends on the application but most of time floats should not be hashed because hashing is used for fast lookup for exact matches and most floats are the result of calculations that produce a float which is only an approximation to the correct answer. The usually way to check for floating equality is to check if it is within some delta (in absolute value) of the correct answer. This type of check does not lend itself to hashed lookup tables.
EDIT:
Normally, because of rounding errors and inherent limitations of floating point arithmetic, if you expect that floating point numbers a and b should be equal to each other because the math says so, you need to pick some relatively small delta > 0, and then you declare a and b to be equal if abs(a-b) < delta, where abs is the absolute value function. For more detail, see this article.
Here is a small example that demonstrates the problem:
float x = 1.0f;
x = x / 41;
x = x * 41;
if (x != 1.0f)
{
std::cout << "ooops...\n";
}
Depending on your platform, compiler and optimization levels, this may print ooops... to your screen, meaning that the mathematical equation x / y * y = x does not necessarily hold on your computer.
There are cases where floating point arithmetic produces exact results, e.g. reasonably sized integers and rationals with power-of-2 denominators.
If your hash function did the following you'd get some degree of fuzziness on the hash lookup
unsigned int Hash( float f )
{
unsigned int ui;
memcpy( &ui, &f, sizeof( float ) );
return ui & 0xfffff000;
}
This way you'll mask off the 12 least significant bits allowing for a degree of uncertainty ... It really depends on yout application however.
You can use the std hash, it's not bad:
std::size_t myHash = std::cout << std::hash<float>{}(myFloat);
unsigned hash(float x)
{
union
{
float f;
unsigned u;
};
f = x;
return u;
}
Technically undefined behavior, but most compilers support this. Alternative solution:
unsigned hash(float x)
{
return (unsigned&)x;
}
Both solutions depend on the endianness of your machine, so for example on x86 and SPARC, they will produce different results. If that doesn't bother you, just use one of these solutions.
You can of course represent a float as an int type of the same size to hash it, however this naive approach has some pitfalls you need to be careful of...
Simply converting to a binary representation is error prone since values which are equal wont necessarily have the same binary representation.
An obvious case: -0.0 wont match 0.0 for example. *
Further, simply converting to an int of the same size wont give very even distribution, which is often important (implementing a hash/set that uses buckets for example).
Suggested steps for implementation:
filter out non-finite cases (nan, inf) and (0.0, -0.0 whether you need to do this explicitly or not depends on the method used).
convert to an int of the same size(that is - use a union for example to represent the float as an int, not simply cast to an int).
re-distribute the bits, (intentionally vague here!), this is basically a speed vs quality tradeoff. But if you have many values in a small range you probably don't want them to in a similar range too.
*: You may wan't to check for (nan and -nan) too. How to handle those exactly depends on your use case (you may want to ignore sign for all nan's as CPython does).
Python's _Py_HashDouble is a good reference for how you might hash a float, in production code (ignore the -1 check at the end, since that's a special value for Python).
If you're interested, I just made a hash function that uses floating point and can hash floats. It also passes SMHasher ( which is the main bias-test for non-crypto hash functions ). It's a lot slower than normal non-cryptographic hash functions due to the float calculations.
I'm not sure if tifuhash will become useful for all applications, but it's interesting to see a simple floating point function pass both PractRand and SMHasher.
The main state update function is very simple, and looks like:
function q( state, val, numerator, denominator ) {
// Continued Fraction mixed with Egyptian fraction "Continued Egyptian Fraction"
// with denominator = val + pos / state[1]
state[0] += numerator / denominator;
state[0] = 1.0 / state[0];
// Standard Continued Fraction with a_i = val, b_i = (a_i-1) + i + 1
state[1] += val;
state[1] = numerator / state[1];
}
Anyway, you can get it on npm
Or you can check out the github
Using is simple:
const tifu = require('tifuhash');
const message = 'The medium is the message.';
const number = 333333333;
const float = Math.PI;
console.log( tifu.hash( message ),
tifu.hash( number ),
tifu.hash( float ),
tifu.hash( ) );
There's a demo of some hashes on runkit here https://runkit.com/593a239c56ebfd0012d15fc9/593e4d7014d66100120ecdb9
Side note: I think that in future using floating point,possibly big arrays of floating point calculations, could be a useful way to make more computationally-demanding hash functions in future. A weird side effect I discovered of using floating point is that the hashes are target dependent, and I surmise maybe they could be use to fingerprint the platforms they were calculated on.
Because of the IEEE byte ordering the Java Float.hashCode() and Double.hashCode() do not give good results. This problem is wellknown and can be adressed by this scrambler:
class HashScrambler {
/**
* https://sites.google.com/site/murmurhash/
*/
static int murmur(int x) {
x ^= x >> 13;
x *= 0x5bd1e995;
return x ^ (x >> 15);
}
}
You then get a good hash function, which also allows you to use Float and Double in hash tables. But you need to write your own hash table that allows a custom hash function.
Since in a hash table you need also test for equality, you need an exact equality to make it work. Maybe the later is what President James K. Polk intends to adress?