Converting float to long pointer and back to float pointer [duplicate]

Converting float to long pointer and back to float pointer [duplicate] - c++

This question already has answers here:
John Carmack's Unusual Fast Inverse Square Root (Quake III)
(6 answers)
Closed 4 years ago.
I am trying to understand the below code snippet taken from here
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // ???
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
What I dont understand is the conversion from float to long pointer and back to float pointer. Why cant we simply do i=y instead of first referencing and then dereferencering the float.
I am new to pointer conversions, so please bear with me.

This code snipped is obviously the fast inverse square root. The pointer semantics there are not really used to do pointer things, but to reinterpret the bits at a certain memory location as a different type.
If you were to assign i=y this would be turned into a truncating conversion from floating point to integer. This however is not what's desired here. What you actually want is raw access to the bits, which is not straightforward possible on a floating point typed variable.
Let's break this statement down:
i = * ( long * ) &y;
&y: address of y. The type of this expression is (float*).
(long*): cast to type. Appled to &y it steamrolls over the information, that this is the address of a floating point typed object.
*: dereference, which means, "read out" whatever is located at the address given and interpret as the base type of the pointer that's being dereferenced. We've overwritten that to be (long*) and essentially are lying to the compiler.
For all intents and purposes this breaks pointer aliasing rules and invokes undefined behaviour. You should not do this (caveats apply¹).
The somewhat well defined way (at least it doesn't break pointer aliasing rules) to do such trickery is by means of a union.
float Q_rsqrt( float number )
{
union {
float y;
long i;
} fl;
float x2;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
fl.y = number;
fl.i = 0x5f3759df - ( fl.i >> 1 ); // ???
fl.y = fl.y * ( threehalfs - ( x2 * fl.y * fl.y ) ); // 1st iteration
// fl.y = fl.y * ( threehalfs - ( x2 * fl.y * fl.y ) ); // 2nd iteration, this can be removed
return fl.y;
}
EDIT:
It should be noted, that the type-punning via union as illustrated above is not sanctioned by the C language standard as well. However unlike language undefined behavior the standard so far leaves the details of the kind of union accesses done in that way as implementation dependent behavior. Since type-punning is something required for certain tasks, I think a few proposals had been made to make this well defined in some upcoming standard of the C programming language.
For all intents and purposes practically all compilers support the above scheme, whereas type-punning via pointer casts will lead to weird things happening if all optimization paths are enabled.
1: Some compilers (old, or custom written, for specific language extensions – I'm looking at you CUDA nvcc) are severly broken and you actually have to coerce them with this into doing what you want.

OK, so you are looking at some ancient hackery from the time when floating point processors were either slow or non-existent. I doubt the original author would defend continuing to use it. It also doesn't meet the modern language transparency requirements (i.e. it is "Undefined behaviour") so may not be portable to all compilers or interpreters, or handled correctly by quality tools such as lint and valgrind, etc, but it was the way fast code was writ in the 80s and 90s.
At the bit level, everything is stored as bytes. A long is stored in 4 bytes, and a float is also stored in 4 bytes. However the bits are treated very differently. In integer/long, each bit is ranked similarly as a power of 2, and can be used as a bit field. In float, some bits are used to represent an exponent that is applied to the rest of the number. For more info read up on IEEE.
This trick takes the float value, and looks at the bytes as if it is an integer bit field, so then it can apply magic. The it looks at the resultant bytes as if they are a float again.
I have no idea what that magic is exactly. No-one else does, probably not even the guy who wrote it, as it isn't commented. On the other hand the doom and quake source did used to be cult code reading, so perhaps someone remembers the details?
There used to be many such tricks in the "good old days", but they are relatively unnecessary now, as floating point is now built in to the main processor and is as fast, and sometimes faster than, the integer operations. Originally, even uploading and downloading small ints from the co-processor could be done more quickly using such hacks than using the built-in methods.

Related

Optimization: Is it faster to multiply a float by an integer or another float

If I am trying to multiply a float by a whole number is it faster to multiply it by a whole number being represented by an integer
int x;
...
float y = 0.5784f * x; //Where x contains a dynamically chosen whole number
or by another float (provided there is no loss in accuracy)
float x;
...
float y = 0.5784f * x; //Where x contains a dynamically chosen and floating point representable whole number
or does it vary greatly between hardware? Is there a common circuit (found in most floating point units) that handles float and integer multiplication or is the general practice for the hardware to first convert the integer into a float and then use a circuit that performs float * float? What if the whole number being represented is extremely small such as a value of 0 or 1 determined dynamically and used to determine whether or not the float is added to a sum without branching?
int x;
...
float y = 0.5784f + 0.3412f * x; //Where x contains either 0 or 1 (determined dynamically).
Thanks for the help in advance.

Is it faster to multiply a float by an integer or another float
In general, float * float is faster, yet I suspect little or no difference. The speed of a program is a result of the entire code, not just this line. Faster here may cost one more in other places.
Trust your compile or get a better compiler to emit code that performs 0.5784f * some_int well.
In the 0.5784f * some_int case, the language obliges some_int to act as if it is converted to float first*1 before the multiplication. But a sharp compiler may known of implementation specific tricks to perform the multiplications better/faster directly without a separate explicit conversion - as long as it gets an allowable result..
In the float y = 0.5784f + 0.3412f * x; //Where x contains either 0 or 1 (determined dynamically). a compile might see that too and take advantage to emit efficient code.
Only in select cases and with experience will you out-guess the compiler. Code for clarity first.
You could always profile different codes/compiler options and compare.
Tip: In my experience, I find more performance gains with a larger view of code than the posted concern - which verges on micro-optimization.
*1 See FLT_EVAL_METHOD for other possibilities.

Source code for trigonometric functions calculations

For program that needs to be deterministic and provide the same result on different platforms (compilers), the built-in trigonometric functions can't be used, since the algorithm to compute it is different on different systems. It was tested, that the result values are different.
(Edit: the results need to be exactly the same to the last bit as it is used in game simulation that is ran on all the clients. These clients need to have the state of the simulation exactly the same to make it work. Any small error could result in bigger and bigger error over time and also the crc of the game state is used as check of synchronisation).
So the only solution that I came up with was to use our own custom code to calculate these values, the problem is, that (surprisingly) it is very hard to find any easy to use source code for all the set of the trigonometric functions.
This is my modification of the code I got (https://codereview.stackexchange.com/questions/5211/sine-function-in-c-c) for the sin function. It is deterministic on all platforms and the value is almost the same as the value of standard sin (both tested).
#define M_1_2_PI 0.159154943091895335769 // 1 / (2 * pi)
double Math::sin(double x)
{
// Normalize the x to be in [-pi, pi]
x += M_PI;
x *= M_1_2_PI;
double notUsed;
x = modf(modf(x, &notUsed) + 1, &notUsed);
x *= M_PI * 2;
x -= M_PI;
// the algorithm works for [-pi/2, pi/2], so we change the values of x, to fit in the interval,
// while having the same value of sin(x)
if (x < -M_PI_2)
x = -M_PI - x;
else if (x > M_PI_2)
x = M_PI - x;
// useful to pre-calculate
double x2 = x*x;
double x4 = x2*x2;
// Calculate the terms
// As long as abs(x) < sqrt(6), which is 2.45, all terms will be positive.
// Values outside this range should be reduced to [-pi/2, pi/2] anyway for accuracy.
// Some care has to be given to the factorials.
// They can be pre-calculated by the compiler,
// but the value for the higher ones will exceed the storage capacity of int.
// so force the compiler to use unsigned long longs (if available) or doubles.
double t1 = x * (1.0 - x2 / (2*3));
double x5 = x * x4;
double t2 = x5 * (1.0 - x2 / (6*7)) / (1.0* 2*3*4*5);
double x9 = x5 * x4;
double t3 = x9 * (1.0 - x2 / (10*11)) / (1.0* 2*3*4*5*6*7*8*9);
double x13 = x9 * x4;
double t4 = x13 * (1.0 - x2 / (14*15)) / (1.0* 2*3*4*5*6*7*8*9*10*11*12*13);
// add some more if your accuracy requires them.
// But remember that x is smaller than 2, and the factorial grows very fast
// so I doubt that 2^17 / 17! will add anything.
// Even t4 might already be too small to matter when compared with t1.
// Sum backwards
double result = t4;
result += t3;
result += t2;
result += t1;
return result;
}
But I didn't find anything suitable for other functions, like asin, atan, tan (other than the sin/cos) etc.
These functions doesn't have be as precise as the standard ones, but at least 8 figures would be nice.

"It was tested, that the result values are different."
How different is different enough to matter? You claim to want 8 significant (decimal?) digits of agreement. I don't believe that you've found less than that in any implementation that conforms to ISO/IEC 10967-3:2006 §5.3.2.
Do you understand how trivial a trigonometric error of one part per billion represents? It would be under 3 kilometers on a circle the size of the earth's orbit. Unless you are planning voyages to Mars, and using sub-standard implementation, your claimed "different" ain't going to matter.
added in response to comment:
What Every Programmer Should Know About Floating-Point Arithmetic. Read it. Seriously.
Since you claim that:
precision isn't as important as bit for bit equality
you need only 8 significant digits
then you should truncate your values to 8 significant digits.

I guess the easiest would be to pick a liberal runtime library which implements the required math functions:
FreeBSD
go, will need transliteration, but I think all functions have a non-assembly implementation
MinGW-w64
...
And just use their implementations. Note the ones listed above are either public domain or BSD licensed or some other liberal license. Make sure to abide by the licenses if you use the code.

You can use Taylor series (actually it seems that it is what you are using, maybe without knowing)
Take a look on wikipedia (or everywhere else):
https://en.wikipedia.org/wiki/Taylor_series
You have here the list for the most common functions (exp, log, cos, sin etc ...) https://en.wikipedia.org/wiki/Taylor_series#List_of_Maclaurin_series_of_some_common_functions
but with some mathematical knowledge you can find/calculate quite everything (ok clearly not everything but ...)
Some examples (there are many others)
Notes:
the more terms you add the more precision you have.
I don't think it's the most efficient way to calculate what you need but it's a quite "simple" one (the idea I mean)
A factorial(n) function could be really useful if you decide to use that
I hope it will help.

I'd suggest looking into using lookup tables and linear/bicubic interpolation.
that way you control exactly the values at each point, and you don't have to perform a awful lot of multiplications.
Taylor expansions for sin/cos functions sucks anyway
spring rts fought ages against this kind of desync error: try posting on their forum, not many old developers remain but those that do should still remember the issues and the fixes.
in this thread http://springrts.com/phpbb/viewtopic.php?f=1&t=8265 they talk specifically on libm determinism (but different os might have different libc with subtle optimization differences, so you need to take the approach and throw the library)

Is it always necessary to use float literals when performing arithmetic on float variables in C++?

I see a lot of C++ code that has lines like:
float a = 2;
float x = a + 1.0f;
float b = 3.0f * a;
float c = 2.0f * (1.0f - a);
Are these .0f after these literals really necessary? Would you lose numeric accuracy if you omit these?
I thought you only need them if you have a line like this:
float a = 10;
float x = 1 / a;
where you should use 1.0f, right?

You would need to use it in the following case:
float x = 1/3;
either 1 or 3 needs to have a .0 or else x will always be 0.

If a is an int, these two lines are definitely not equivalent:
float b = 3.0f * a;
float b = 3 * a;
The second will silently overflow if a is too large, because the right-hand side is evaluated using integer arithmetic. But the first is perfectly safe.
But if a is a float, as in your examples, then the two expressions are equivalent. Which one you use is a question of personal preference; the first is probably more hygeinic.

It somewhat depends on what you are doing with the numbers. The type of a floating point literal with a f or F are of type float. The type of a floating point literal without a suffix is of type double. As a result, there may be subtle differences when using a f suffix compared to not using it.
As long as a subexpression involves at least one object of floating point type it probably doesn't matter much. It is more important to use suffixes with integers to be interpreted as floating points: If there is no floating point value involved in a subexpression integer arithmetic is used. This can have major effects because the result will be an integer.

float b = 3.0f * a;
Sometimes this is done because you want to make sure 3.0 is created as a float and not as double.

Hash function for floats

I'm currently implementing a hash table in C++ and I'm trying to make a hash function for floats...
I was going to treat floats as integers by padding the decimal numbers, but then I realized that I would probably reach the overflow with big numbers...
Is there a good way to hash floats?
You don't have to give me the function directly, but I'd like to see/understand different concepts...
Notes:
I don't need it to be really fast, just evenly distributed if possible.
I've read that floats should not be hashed because of the speed of computation, can someone confirm/explain this and give me other reasons why floats should not be hashed? I don't really understand why (besides the speed)

It depends on the application but most of time floats should not be hashed because hashing is used for fast lookup for exact matches and most floats are the result of calculations that produce a float which is only an approximation to the correct answer. The usually way to check for floating equality is to check if it is within some delta (in absolute value) of the correct answer. This type of check does not lend itself to hashed lookup tables.
EDIT:
Normally, because of rounding errors and inherent limitations of floating point arithmetic, if you expect that floating point numbers a and b should be equal to each other because the math says so, you need to pick some relatively small delta > 0, and then you declare a and b to be equal if abs(a-b) < delta, where abs is the absolute value function. For more detail, see this article.
Here is a small example that demonstrates the problem:
float x = 1.0f;
x = x / 41;
x = x * 41;
if (x != 1.0f)
{
std::cout << "ooops...\n";
}
Depending on your platform, compiler and optimization levels, this may print ooops... to your screen, meaning that the mathematical equation x / y * y = x does not necessarily hold on your computer.
There are cases where floating point arithmetic produces exact results, e.g. reasonably sized integers and rationals with power-of-2 denominators.

If your hash function did the following you'd get some degree of fuzziness on the hash lookup
unsigned int Hash( float f )
{
unsigned int ui;
memcpy( &ui, &f, sizeof( float ) );
return ui & 0xfffff000;
}
This way you'll mask off the 12 least significant bits allowing for a degree of uncertainty ... It really depends on yout application however.

You can use the std hash, it's not bad:
std::size_t myHash = std::cout << std::hash<float>{}(myFloat);

unsigned hash(float x)
{
union
{
float f;
unsigned u;
};
f = x;
return u;
}
Technically undefined behavior, but most compilers support this. Alternative solution:
unsigned hash(float x)
{
return (unsigned&)x;
}
Both solutions depend on the endianness of your machine, so for example on x86 and SPARC, they will produce different results. If that doesn't bother you, just use one of these solutions.

You can of course represent a float as an int type of the same size to hash it, however this naive approach has some pitfalls you need to be careful of...
Simply converting to a binary representation is error prone since values which are equal wont necessarily have the same binary representation.
An obvious case: -0.0 wont match 0.0 for example. *
Further, simply converting to an int of the same size wont give very even distribution, which is often important (implementing a hash/set that uses buckets for example).
Suggested steps for implementation:
filter out non-finite cases (nan, inf) and (0.0, -0.0 whether you need to do this explicitly or not depends on the method used).
convert to an int of the same size(that is - use a union for example to represent the float as an int, not simply cast to an int).
re-distribute the bits, (intentionally vague here!), this is basically a speed vs quality tradeoff. But if you have many values in a small range you probably don't want them to in a similar range too.
*: You may wan't to check for (nan and -nan) too. How to handle those exactly depends on your use case (you may want to ignore sign for all nan's as CPython does).
Python's _Py_HashDouble is a good reference for how you might hash a float, in production code (ignore the -1 check at the end, since that's a special value for Python).

If you're interested, I just made a hash function that uses floating point and can hash floats. It also passes SMHasher ( which is the main bias-test for non-crypto hash functions ). It's a lot slower than normal non-cryptographic hash functions due to the float calculations.
I'm not sure if tifuhash will become useful for all applications, but it's interesting to see a simple floating point function pass both PractRand and SMHasher.
The main state update function is very simple, and looks like:
function q( state, val, numerator, denominator ) {
// Continued Fraction mixed with Egyptian fraction "Continued Egyptian Fraction"
// with denominator = val + pos / state[1]
state[0] += numerator / denominator;
state[0] = 1.0 / state[0];
// Standard Continued Fraction with a_i = val, b_i = (a_i-1) + i + 1
state[1] += val;
state[1] = numerator / state[1];
}
Anyway, you can get it on npm
Or you can check out the github
Using is simple:
const tifu = require('tifuhash');
const message = 'The medium is the message.';
const number = 333333333;
const float = Math.PI;
console.log( tifu.hash( message ),
tifu.hash( number ),
tifu.hash( float ),
tifu.hash( ) );
There's a demo of some hashes on runkit here https://runkit.com/593a239c56ebfd0012d15fc9/593e4d7014d66100120ecdb9
Side note: I think that in future using floating point,possibly big arrays of floating point calculations, could be a useful way to make more computationally-demanding hash functions in future. A weird side effect I discovered of using floating point is that the hashes are target dependent, and I surmise maybe they could be use to fingerprint the platforms they were calculated on.

Because of the IEEE byte ordering the Java Float.hashCode() and Double.hashCode() do not give good results. This problem is wellknown and can be adressed by this scrambler:
class HashScrambler {
/**
* https://sites.google.com/site/murmurhash/
*/
static int murmur(int x) {
x ^= x >> 13;
x *= 0x5bd1e995;
return x ^ (x >> 15);
}
}
You then get a good hash function, which also allows you to use Float and Double in hash tables. But you need to write your own hash table that allows a custom hash function.
Since in a hash table you need also test for equality, you need an exact equality to make it work. Maybe the later is what President James K. Polk intends to adress?

Float addition promoted to double?

I had a small WTF moment this morning. Ths WTF can be summarized with this:
float x = 0.2f;
float y = 0.1f;
float z = x + y;
assert(z == x + y); //This assert is triggered! (Atleast with visual studio 2008)
The reason seems to be that the expression x + y is promoted to double and compared with the truncated version in z. (If i change z to double the assert isn't triggered).
I can see that for precision reasons it would make sense to perform all floating point arithmetics in double precision before converting the result to single precision. I found the following paragraph in the standard (which I guess I sort of already knew, but not in this context):
4.6.1.
"An rvalue of type float can be converted to an rvalue of type double. The value is unchanged"
My question is, is x + y guaranteed to be promoted to double or is at the compiler's discretion?
UPDATE: Since many people has claimed that one shouldn't use == for floating point, I just wanted to state that in the specific case I'm working with, an exact comparison is justified.
Floating point comparision is tricky, here's an interesting link on the subject which I think hasn't been mentioned.

You can't generally assume that == will work as expected for floating point types. Compare rounded values or use constructs like abs(a-b) < tolerance instead.
Promotion is entirely at the compiler's discretion (and will depend on target hardware, optimisation level, etc).
What's going on in this particular case is almost certainly that values are stored in FPU registers at a higher precision than in memory - in general, modern FPU hardware works with double or higher precision internally whatever precision the programmer asked for, with the compiler generating code to make the appropriate conversions when values are stored to memory; in an unoptimised build, the result of x+y is still in a register at the point the comparison is made but z will have been stored out to memory and fetched back, and thus truncated to float precision.

The Working draft for the next standard C++0x section 5 point 11 says
The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby
So at the compiler's discretion.

Using gcc 4.3.2, the assertion is not triggered, and indeed, the rvalue returned from x + y is a float, rather than a double.
So it's up to the compiler. This is why it's never wise to rely on exact equality between two floating point values.

The C++ FAQ lite has some further discussion on the topic:
Why is cos(x) != cos(y) even though x == y?

It is the problem since float number to binary conversion does not give accurate precision.
And within sizeof(float) bytes it cant accomodate precise value of float number and arithmetic operation may lead to approximation and hence equality fails.
See below e.g.
float x = 0.25f; //both fits within 4 bytes with precision
float y = 0.50f;
float z = x + y;
assert(z == x + y); // it would work fine and no assert

I would think it would be at the compiler's discretion, but you could always force it with a cast if that was your thinking?

Yet another reason to never directly compare floats.
if (fabs(result - expectedResult) < 0.00001)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js