Need pow(-1,1.2) to be 1 - c++

I am using math.h with GCC and GSL. I was wondering how to get this to evaluate?
I was hoping that the pow function would recognize pow(-1,1.2) as ((-1)^6)^(1/5). But it doesn't.
Does anybody know of a c++ library that will recognize these? Perhaps somebody has a decomposition routine they could share.

Mathematically, pow(-1, 1.2) is simply not defined. There are no powers with fractional exponents of negative numbers, and I hope there is no library that will simply return some arbitray value for such an expression. Would you also expect things like
pow(-1, 0.5) = ((-1)^2)^(1/4) = 1
which obviously isn't desirable.
Moreover, the floating point number 1.2 isn't even exactly equal to 6/5. The closest double precision number to 1.2 is
1.1999999999999999555910790149937383830547332763671875
Given this, what result would you expect now for pow(-1, 1.2)?

If you want to raise negative numbers to powers -- especially fractional powers -- use the cpow() method. You'll need to include <complex> to use it.

It seems like you're looking for pow(abs(x), y).
Explanation: you seem to be thinking in terms of
xy = (xN)(y/N)
If we choose that N === 2, then you have
(x2)y/2 = ((x2)1/2)y
But
(x2)1/2 = |x|
Substituting gives
|x|y
This is a stretch, because the above manipulations only work for non-negative x, but you're the one who chose to use that assumption.

Sounds like you want to perform a complex power (cpow()) and then take the magnitude (abs()) of that after.
>>> abs(cmath.exp(1.2*cmath.log(-1)))
1.0
>>> abs(cmath.exp(1.2*cmath.log(-293.2834)))
913.57662451612202

pow(a,b) is often thought of, defined as, and implemented as exp(log(a)*b) where log(a) is natural logarithm of a. log(a) is not defined for a<=0 in real numbers. So you need to either write a function with special case for negative a and integer b and/or b=1/(some_integer). It's easy to special-case for integer b, but for b=1/(some_integer) it's prone to round-off problems, like Sven Marnach pointed out.
Maybe for your domain pow(-a,b) should always be -pow(a,b)? But then you'd just implement such function, so I assume the question warrants more explanation .
Like duskwuff suggested, a much more robust and "mathematical" solution is to use complex functions log and exp, but it's much more "complex" (excuse my pun) than it seems on the surface (even though there's cpow function). And it'll be much slower if you have to compute a lot of pow()s.
Now there's an important catch with complex numbers that may or may not be relevant to your problem domain: when done right, the result of pow(a,b) is not one, but often a few complex numbers, but in the cases you care about, one of them will be complex number with nearly-zero imaginary part (it'll be non-zero due to roundoff errors) which you can simply ignore and/or not compute in your code.
To demonstrate it, consider what pow(-1,.5) is. It's a number X such that X^2==-1. Guess what? There are 2 such numbers: i and -i. Generally, pow(-1, 1/N) has exactly N solutions, although you're interested in only one of them.
If the imaginary part of all results of pow(a,b) is significant, it means you are passing wrong values. For single-precision floating point values in the range you describe, 1e-6*max(abs(a),abs(b)) would be a good starting point for defining the "significant enough" threshold. The extreme "wrong values" would be pow(-1,0.5) which would return 0 + 1i (0 in real part, 1 in imaginary part). Here the imaginary part is huge relative to the input and real part, so you know you screwed up your input values.
In any reasonable single-return-result implementation of cpow() , cpow(-1,0.3333) will probably return something like -1+0.000001i and ignore two other values with significant imaginary parts. So you can just take that real value and that's your answer.

Use std::complex. Without that, the roots of unity don't make much sense. With it they make a whole lot of sense.

Related

Printing bits as IEEE-754 float

Is there some clever and reliable way to print series of bits as an IEEE-754 without actually using a float type?
I have found a way to print fractions, which allows me to represent the float as a a fraction. However, I then came to realize that the exponent may range from -127 to 128 (after adjusting with bias), which may result in the multiplication mantissa * 2^128. The fraction method relies on representing the numerator as an integer, and I would require a really large integer to do this multiplication. I mean, I could use "custom" type to represent this large value (i.e. https://gmplib.org/), but I would prefer if to avoid this. If we multiplied by 10^x, I could simply adjust the decimal point and add some zeros, but sadly that's not the case either.
I have not been able to find anything that mentions any solution for this. Probably due to the fact that googling stuff like "print from
Why am I actually trying to do this?
I'm only doing this to get a better understanding of how floats (IEEE-754 in particular) work, and I find that it always help to do some practical example. So I thought "Hey, why not try to code it?". This has no practical application (that I know of)!
So, almost immediately after posting this, I finally succeded in finding the resources I've been looking for.
https://www.ryanjuckett.com/printing-floating-point-numbers/ talks about it, and references other relevant sources.

Bracketing algorithm when root finding. Single root in "quadratic" function

I am trying to implement a root finding algorithm. I am using the hybrid Newton-Raphson algorithm found in numerical recipes that works pretty nicely. But I have a problem in bracketing the root.
While implementing the root finding algorithm I realised that in several cases my functions have 1 real root and all the other imaginary (several of them, usually 6 or 9). The only root I am interested is in the real one so the problem is not there. The thing is that the function approaches the root like a cubic function, touching with the point the y=0 axis...
Newton-Rapson method needs some brackets of different sign and all the bracketing methods I found don't work for this specific case.
What can I do? It is pretty important to find that root in my program...
EDIT: more problems: sometimes due to reaaaaaally small numerical errors, say a variation of 1e-6 in some value the "cubic" function does NOT have that real root, it is just imaginary with a neglectable imaginary part... (checked with matlab)
EDIT 2: Much more information about the problem.
Ok, I need root finding algorithm.
Info I have:
The root I need to find is between [0-1] , if there are more roots outside that part I am not interested in them.
The root is real, there may be imaginary roots, but I don't want them.
Probably all the rest of the roots will be imaginary
The root may be double in that point, but I think that actually doesn't mater in numerical analysis problems
I need to use the root finding algorithm several times during the overall calculations, but the function will always be a polynomial
In one of the particular cases of the root finding, my polynomial will be similar to a quadratic function that touches Y=0 with the point. Example of a real case:
The coefficient may not be 100% precise and that really slight imprecision may make the function not to touch the Y=0 axis.
I cannot solve for this specific case because in other cases it may be that the polynomial is pretty normal and doesn't make any "strange" thing.
The method I am actually using is NewtonRaphson hybrid, where if the derivative is really small it makes a bisection instead of NewRaph (found in numerical recipes).
Matlab's answer to the function on the image:
roots:
0.853553390593276 + 0.353553390593278i
0.853553390593276 - 0.353553390593278i
0.146446609406726 + 0.353553390593273i
0.146446609406726 - 0.353553390593273i
0.499999999999996 + 0.000000040142134i
0.499999999999996 - 0.000000040142134i
The function is a real example I prepared where I know that the answer I want is 0.5
Note:
I still haven't check completely some of the answers I you people have give me (Thank you!), I am just trying to give al the information I already have to complete the question.
Assuming you have a one-dimensional polynomial problem (which I assume from the imaginary solutions) you can use Sturm sequences to bracket all real roots. See Sturm's theorem.
Welcome to the wonderful world of numerical methods. Watch your hairline; it might start receding as you pull your hair out in frustration.
First off, with numerical root finding, you are toast if you can't bracket the problem. Newton Raphson is nice for polishing off a solution once you get close, and it only works if the derivative near the root is well away from zero. You always need to have some slower technique at hand as a backup because Newton Raphson can send you off to never-never land (i.e., somewhere well outside the bracket). If your function is not a polynomial, the first thing to try is Brent's method. If your function is a polynomial, try Laguerre's method or Jenkins-Traub.
BTW, it sounds like you have a pathological problem. You shouldn't expect particularly good performance. Pathological problems are, well, pathological.
Addendum
If you are having problems with things that appear to be roots, but aren't, you need to take care how you evaluate your function. If you do have a polynomial, form each term of the polynomial, sort by absolute value, and add smallest to largest. This produces better accuracy most of the time, but fails if you have large terms whose sum is nearly zero. If that's the case, you might want to add those canceling terms separately, add the rest smallest to largest, and then compute a grand total -- and your still kinda screwed. That big addition that nearly cancels loses a lot of precision. There's no escape other than extended precision arithmetic.
Ander, thanks for responding to my question (about the interval); sorry for the delay in following up - I have very busy work. Also - before I found the additional information you've provided - I had in mind to explain quite a few things how to handle this and was contemplating how to present that. However, I now believe your case is not too difficult and we can get at it without too much additional stuff, since you apparently have an explicit polynomial expression (coefficients to the various powers).
Let's start with a simple case, to pinpoint the approach.
Step 1.
If you have a 2nd degree polynomial, its derivative is first order and has a simple zero (which you can find by bracketing or simply by explicitly solving the equation). (Yes, I know there's a closed formula for the roots of a 2nd degree polynomial also, but for the sake of the current argument, let us forget that).
The zero's of the 2nd degree polynomial are then located one at the left side and one at the right side of the zero of the derivative. So, if you also have the interval where the roots of the original function (the 2nd degree polynomial) are to be found, you now have two intervals - left and right of the derivative-zero, each with one zero.
It is important to realize that the original function is MONOTONIC on each subinterval (decreasing on one of them, increasing on the other). Therefore, simply by checking the function values at the ends of the (sub)interval you can determine whether or not they actually bracket a zero. If not, there's a multiple zero (double, in this case) exactly at the zero of the derivative IF the function is zero there (otherwise, it is a double imaginary root of which you've now found the real part).
In case the zero of the derivative lies OUTSIDE the total interval, you will have at most one root inside your interval and you need to check only that particular (sub)interval.
Step 2.
Consider now a 3rd order polynomial.
Its derivative is 2nd order.
The derivative of THAT 2nd order polynomial is again 1st order and you proceed as before to get two subintervals to find the roots of the derivative of the original function. These two roots give you THREE (at most) intervals where you will find the 3 roots of the original (3rd order) function.
And also here, you will have intervals (3) where the original function is monotonic (alternatingly increasing/decreasing), making the analysis per subinterval quite easy.
Again, zeros may coincide (2 or even all 3) and may in addition turn out to be complex-valued (i.e. having non-zero imaginary parts). The analysis of the cases is straightforward: check function values at the borders of the intervals to assess whether not there's a sign-change (function is monotonic on each subinterval) and/or whether the function is zero at one of the subinterval-borders.
Step 4.
Generalize this with the known polynomial. Let's say - your example - it is 6th order:
a) construct the 5th derivative (i.e. reducing the original to a 1st order polynomial). Compute it's zero (it is at precisely 0.5 in your example). In this case you're already done, but suppose you don't realize that. So you have now 2 intervals 0..0.5 and 0.5..1
b) construct the 4th derivative. Inspect its values at the subinterval-boundaries (0, 0.5, 1)
For each subinterval determine if it has a real zero inside. If so, you re-partition your original interval in 3 subintervals, using the two found zeros (you forget about the zero of the 5th derivative). If they coincide (at the previous cut, 0.5) you stick with that 0.5 (don't care whether you've found a true double zero of your 4th derivative there or a "double imaginary") and still have only 2 intervals, but for the sake of the argument let's say you now have 3.
c) construct the 3rd derivative and do likewise as before. You will then have 4 (at most) intervals.
d) And so on. After having processed the 2nd derivative in this fashion you have 5 (at most) intervals, and after processing the 1st derivative you have 6 intervals (or less...) and knowing the function is monotonic on each subinterval, you'll quickly determine in each of them if there's a real root, as always using the know monotonicity of the function in each of the final subintervals.
Adding a note on numerical accuracy at evaluating a function:
A first (probably sufficient, in this case) method to reduce noise is NOT to evaluate your function in the way suggested by the original form (i.e. a6 x*6 + a5 x*5 +..), but to rewrite it as:
a0 + x*(a1 + x*(a2 + x*(a3 + x*(a4 + x*(a5 + x*a6)))))
So, in evaluating you proceed:
tmp = a6
tmp = x*tmp + a5
tmp = x*tmp + a4
etcetera.
In case this little rewriting is not sufficient for numerical stability, you should rewrite your polynomial in (for instance) a chebyshev-polynomial expansion and evaluate that one with its recurrence relations. Both (getting the expansion and applying the recurrence relations for evaluation) are rather simple. I can explain, if you need help, but I guess it won't be necessary here.
In all cases, you HAVE to allow for some inaccuracy, i.e. accept that a computation will, generally speaking, NEVER give you the mathematically exact function value. So the assessment whether the function is presumably zero at some point must include some "tolerance", there's no way around this, unfortunately; the best you can aim for is to minimize the noise.
Well, if your function touches zero but never crosses it, you seem to be looking for a minimum (or a maximum). In which case, you're better off telling computer to do exactly that --- either find the root of a derivative (if you can calculate it analytically), or use a minimization routine. Then check that the function value at the minimum is 'close enough' to zero.
Just to reiterate what was already said by other people:
don't start with Newton-Raphson method; it's almost always better to start with Brent or even a straightforward bisection (provided you can bracket the root).
An instability where 'small numerical errors' of the order of 1e-6 have bad effects is worth investigating. Immediate suspects: mixing floats and doubles, loss of precision somewhere etc.
EDIT: So, depending on some parameters, your function has either a zero crossing, or a minimum with zero value, is this correct? In this case, what I'd do is this: use a simple and robust bracketing strategy (e.g. start from [-1, 1], multiply the endpoints by 1.1, check the signs, keep multiplying, something like this). If that succeeds, there's a zero crossing, use a root finding routine. If bracketing fails, use minimization.
Using Newton-Raphson is an act of desperation. You are much better off finding the continued fraction that represents your function and calculating that. A CF will converge much faster and will produce the real root(s). Also, because the CF produces a ratio of two integers you have tight control over numeric precision and don't have to worry about accumulation of rounding errors and other similar hair-pulling-out problems.
To find the real roots of any polynomial function refer to "A Continued Fraction Algorithm for Approximating All Real Polynomial Roots" by David Rosen (1978).
------------ ADDENDUM 1 --- 11 OCT-----------------
Ok, you are solving a sextic. You have several options. The simplest is to use a Taylor approximation (say to the 3rd degree) in conjunction with Halley's method. This is much superior to Newton because it has cubic convergence and you can detect imaginary solutions. The disadvantage is that you will have rounding problems which may result in an incorrect answer.
The ideal option is to find the continued fraction that represents the monic root, because this CF will be computable as an integer ratio of any desired precision, thus elminating the problem of rounding.
One approach to computing this CF is via the Jacobi-Perron algorithm. See the paper Hendy and Jeans: http://www.ams.org/mcom/1981-36-154/S0025-5718-1981-0606514-X/S0025-5718-1981-0606514-X.pdf. This paper shows the exact algorithm for computing cubic and quartic roots via CF approximation.
Note that if the sextic is reducible then it can converted into a quartic and quadratic: http://elib.mi.sanu.ac.rs/files/journals/tm/21/tm1124.pdf. The quartic is then solvable by the algorithm in the Hendy paper.
The general solution to generate a CF for a sextic can be done via the Rogers-Ramunajan CF. See the following paper for the method: http://arxiv.org/pdf/1111.6023v2. This will generate the CF for any sextic.
As in your case, you are interested in the real factorization of a real polynomial. One may see that all complex roots come in conjugate pairs which correspond to a real quadratic factor. By finding this real quadratic and completing the square to get the form (x-r)^2 + s you will be able to see the "real" even order root r with an "error" given by s. If s > 0 is too large, you may discard it as probably being complex. If s < 0 is also large, then you have two faraway real roots given by x = r ± √(-s). If s is very small then you might suspect r is a real double root and keep it.
Finding such a quadratic factor may be done using Bairstow's method, which actually applies a two-dimensional Newton method. This gives x^2 + ux + v and r = -u/2; s = v - r^2.

how to take the root of a very large number?

given x=4 and y=1296;
we need to solve for z in z^x=y;
we can calculate z=6 in various ways;
Question is how do I find z if y is a very large number greater than 10^100? I obviously can't store that number as int, so how would I go about calculating z?
C++ implementation would be nice, if not, any solution will work.
It depends on the accuracy required. Since 1e100 cannot be exactly represented by a double, you have a problem.
This works, if you are willing to accept that it does not yield an exact solution. But then, I just said that 1e100 is not represented exactly as a double anyway. Thus, in MATLAB,
exp(log(1e100)/4)
ans =
1e+25
Ok, so it looks like 1e25 is the answer, but is it really? In fact, the number we really get, in terms of a double, is: 10000000000000026675773440.
One problem is the original number was not represented exactly anyway. So 1e100, when stored in the IEEE format, is more accurately stored as something like this:
1.00000000000000001590289110975991804683608085639452813897813e100
To solve this exactly, you would best be served by a big integer form, but a big decimal form would do reasonably well too.
Thus, in MATLAB, using my big decimal (HPF) form we see that 1e100 is exactly represented in 100 digits of precision.
x = hpf('1e100',100)
x =
1.e100
And, to 100 digits of precision, the root is correct.
exp(log(x)/4)
ans =
10000000000000000000000000
Actually though, be careful, as any floating point form cannot represent real numbers exactly. To more precision, we see that the number computed was actually slightly in error:
9999999999999999999999999.9999999999999999999999999999999999999999999999999999999999999999999999999999999999800
A big integer form will yield an exact result, if one exists. Thus, using a big integer form, we see the expected result:
vpi(10)^100
ans =
10000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000
nthroot(vpi(10)^100,4)
ans =
10000000000000000000000000
The point is, to do the computation you desire, you need to use tools that can do the computation. There are many such big decimal or big integer tools to be had. For example, Java has a BigDecimal and a BigInteger form that I have used on occasion (though I've written my own tools anyway, thus in MATLAB, HPF and VPI.)
Maybe you can do something evil with logarithms
maybe there is a library that you can find that lets you deal with big integers
You can try to use Newton's method. In this case you need to use arbitrary-precision arithmetic.
I.e. you need to write class for arbitrary-precision number. It would be composition of mantissa, which is represented by array of digits and exponent, which is represented by integer. You should realize basic operations on numbers similar to pencil-and-paper methods. Then you should realize Newton's algoriithm as described in wiki.

How can I get consistent program behavior when using floats?

I am writing a simulation program that proceeds in discrete steps. The simulation consists of many nodes, each of which has a floating-point value associated with it that is re-calculated on every step. The result can be positive, negative or zero.
In the case where the result is zero or less something happens. So far this seems straightforward - I can just do something like this for each node:
if (value <= 0.0f) something_happens();
A problem has arisen, however, after some recent changes I made to the program in which I re-arranged the order in which certain calculations are done. In a perfect world the values would still come out the same after this re-arrangement, but because of the imprecision of floating point representation they come out very slightly different. Since the calculations for each step depend on the results of the previous step, these slight variations in the results can accumulate into larger variations as the simulation proceeds.
Here's a simple example program that demonstrates the phenomena I'm describing:
float f1 = 0.000001f, f2 = 0.000002f;
f1 += 0.000004f; // This part happens first here
f1 += (f2 * 0.000003f);
printf("%.16f\n", f1);
f1 = 0.000001f, f2 = 0.000002f;
f1 += (f2 * 0.000003f);
f1 += 0.000004f; // This time this happens second
printf("%.16f\n", f1);
The output of this program is
0.0000050000057854
0.0000050000062402
even though addition is commutative so both results should be the same. Note: I understand perfectly well why this is happening - that's not the issue. The problem is that these variations can mean that sometimes a value that used to come out negative on step N, triggering something_happens(), now may come out negative a step or two earlier or later, which can lead to very different overall simulation results because something_happens() has a large effect.
What I want to know is whether there is a good way to decide when something_happens() should be triggered that is not going to be affected by the tiny variations in calculation results that result from re-ordering operations so that the behavior of newer versions of my program will be consistent with the older versions.
The only solution I've so far been able to think of is to use some value epsilon like this:
if (value < epsilon) something_happens();
but because the tiny variations in the results accumulate over time I need to make epsilon quite large (relatively speaking) to ensure that the variations don't result in something_happens() being triggered on a different step. Is there a better way?
I've read this excellent article on floating point comparison, but I don't see how any of the comparison methods described could help me in this situation.
Note: Using integer values instead is not an option.
Edit the possibility of using doubles instead of floats has been raised. This wouldn't solve my problem since the variations would still be there, they'd just be of a smaller magnitude.
I've worked with simulation models for 2 years and the epsilon approach is the sanest way to compare your floats.
Generally, using suitable epsilon values is the way to go if you need to use floating point numbers. Here are a few things which may help:
If your values are in a known range you and you don't need divisions you may be able to scale the problem and use exact operations on integers. In general, the conditions don't apply.
A variation is to use rational numbers to do exact computations. This still has restrictions on the operations available and it typically has severe performance implications: you trade performance for accuracy.
The rounding mode can be changed. This can be use to compute an interval rather than an individual value (possibly with 3 values resulting from round up, round down, and round closest). Again, it won't work for everything but you may get an error estimate out of this.
Keeping track of the value and a number of operations (possible multiple counters) may also be used to estimate the current size of the error.
To possibly experiment with different numeric representations (float, double, interval, etc.) you might want to implement your simulation as templates parameterized for the numeric type.
There are many books written on estimating and minimizing errors when using floating point arithmetic. This is the topic of numerical mathematics.
Most cases I'm aware of experiment briefly with some of the methods mentioned above and conclude that the model is imprecise anyway and don't bother with the effort. Also, doing something else than using float may yield better result but is just too slow, even using double due to the doubled memory footprint and the smaller opportunity of using SIMD operations.
I recommend that you single step - preferably in assembly mode - through the calculations while doing the same arithmetic on a calculator. You should be able to determine which calculation orderings yield results of lesser quality than you expect and which that work. You will learn from this and probably write better-ordered calculations in the future.
In the end - given the examples of numbers you use - you will probably need to accept the fact that you won't be able to do equality comparisons.
As to the epsilon approach you usually need one epsilon for every possible exponent. For the single-precision floating point format you would need 256 single precision floating point values as the exponent is 8 bits wide. Some exponents will be the result of exceptions but for simplicity it is better to have a 256 member vector than to do a lot of testing as well.
One way to do this could be to determine your base epsilon in the case where the exponent is 0 i e the value to be compared against is in the range 1.0 <= x < 2.0. Preferably the epsilon should be chosen to be base 2 adapted i e a value that can be exactly represented in a single precision floating point format - that way you know exactly what you are testing against and won't have to think about rounding problems in the epsilon as well. For exponent -1 you would use your base epsilon divided by two, for -2 divided by 4 and so on. As you approach the lowest and the highest parts of the exponent range you gradually run out of precision - bit by bit - so you need to be aware that extreme values can cause the epsilon method to fail.
If it absolutely has to be floats then using an epsilon value may help but may not eliminate all problems. I would recommend using doubles for the spots in the code you know for sure will have variation.
Another way is to use floats to emulate doubles, there are many techniques out there and the most basic one is to use 2 floats and do a little bit of math to save most of the number in one float and the remainder in the other (saw a great guide on this, if I find it I'll link it).
Certainly you should be using doubles instead of floats. This will probably reduce the number of flipped nodes significantly.
Generally, using an epsilon threshold is only useful when you are comparing two floating-point number for equality, not when you are comparing them to see which is bigger. So (for most models, at least) using epsilon won't gain you anything at all -- it will just change the set of flipped nodes, it wont make that set smaller. If your model itself is chaotic, then it's chaotic.

Why would I use 2's complement to compare two doubles instead of comparing their differences against an epsilon value?

Referenced here and here...Why would I use two's complement over an epsilon method? It seems like the epsilon method would be good enough for most cases.
Update: I'm purely looking for a theoretical reason why you'd use one over the other. I've always used the epsilon method.
Has anyone used the 2's complement comparison successfully? Why? Why Not?
the second link you reference mentions an article that has quite a long description of the issue:
http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
but unless you are tweaking performance I would stick with epsilon so people can debug your code
The bits method might be faster. I say might because on modern (multicore, highly pipelined) processors it is often impossible to guess what is really faster.
Code the simplest most obviously correct implementation, then measure, then optomise.
In short, when comparing two floats with unknown origins, picking an epsilon that is valid is almost impossible.
For example:
What is a good epsilon when comparing distance in miles between Atlanta GA, Dallas TX and some place in Ohio?
What is a good epsilon when comparing distance in miles between my left foot, my right foot and the computer under my desk?
EDIT:
Ok, I'm getting a fair number of people not understanding why you wouldn't know what your epsilon is.
Back in the old days of lore, I wrote two programs that worked with NeverWinter Nights (a game made by BioWare). One of the programs took a binary model and converted it to ASCII. The other program took an ASCII model and compiled it into binary. One of the tests I wrote was to take all of BioWare's binary models, decompile them to ASCII and then back to binary. Then I compared my binary version with original one from BioWare. One of the problems during the comparison was dealing with some of the slight variances in floating point values. So instead of coming up with a bunch of different EPSILONS for each type of floating point number (vertex, normal, etc), I wanted to use something such as this twos compliment compare. Thus avoiding the whole multiple EPSILON issue.
The same type of issue can apply to any type of software that processes 3rd party data and then needs to validate their results with the original. In these cases you might not even know what the floating point values represent, you just have to compare them. We ran into this issue with our industrial automation software.
EDIT:
LOL, this has been voted up and down by different people.
I'll boil the problem down to this, given two arbitrary floating point numbers, how do you decide what epsilon to use? You can't.
How can you compare 1e23 and 1.0001e23 with an epsilon and still compare 1e-23 and 5.2e-23 using the same epsilon? Sure, you can do some dynamic epsilon tricks, but that is the whole point to the integer compare (which does NOT require the integers be exact).
The integer compare is able to compare two floats using an epsilon relative to the magnitude of the numbers.
EDIT
Steve, lets look at what you said in the comments:
"But you know what equality means to you... Hence, you should be able to find an appropriate epsilon".
Turn this statement around to say:
"If you know what equality means to you, then you should be able to find an appropriate epsilon."
The whole point to what I am trying to say is that there are applications where we don't know what equality means in the absolute sense, thus we have to resort to a relative compare which is what the integer version is trying to do.
When it comes to speed, follow these rules:
If you're not a very experienced developer, don't optimize.
If you are an experienced developer, don't optimize yet.
Do the easiest method.
Alex
Oskar's right. Don't screw with this unless you really, really need that performance.
And you don't. If you were in the situation that did, you wouldn't have needed to ask the question -- you'd already know. If you think you do, then you don't. Your performance problems lie elsewhere. Just use the readable version.
Using any method that compares bitwise will result in trouble when fractions are represented by approximations. All floating point numbers with fractions that are not denominated in powers of two (1/2, 1/4, 1/8, 1/65536, &c) are approximated. So, of course, are all irrational numbers.
float third = 1/3;
float two=2.0;
float another_two=third*6.0;
if(two != another_two)
print ("Approximation!\n");
The only time comparing bitwise would work is when you derive the floating point numbers exactly the same way or they are exact representations (whole numbers, fraction powers of two). Even then, there can be multiple representations of some numbers, though I have never seen this in a working system.