Fast floating-point numbers equality comparison in Fortran

Fast floating-point numbers equality comparison in Fortran - fortran

I'm using gfortran and I'm starting to write a new program that I know will have billions and billions of float equality checks (think "Monte-Carlo simulation"). And I need to know if two real numbers are "close enough".
It could be a simple number (a == b) or 2D/3D points, and even complex numbers. (I'll mostly do simple numbers, however.)
Are you aware of a gfortran intrinsic that could help me here, or a perhaps a function in BLAS/LAPACK that could do the trick. Or maybe some neat (possibly non portable) trick for blazing fast comparison?
An epsilon of 0.0001 should be enough for me, but maybe there is a magic number that gfortran on x86_64 could optimize with bitwise operation.
I don't really know and google couldn't find a solution for me. Can you ?
If you're not aware of a neat trick, what do you think would be the fastest implementation of the traditional abs(a-b) <= epsilon? Or is it simply the fastest and there is nothing I can do about it? (luckily it's a "pure" function so it will probably be inlined)

Related

How do integers multiply in C++?

I was wondering what kind of method was used to multiply numbers in C++. Is it the traditional schoolbook long multiplication? Fürer's algorithm? Toom-Cook?
I was wondering because I am going to need to be multiplying extremely large numbers and need a high degree of efficiency. Therefore the traditional schoolbook long multiplication O(n^2) might be too inefficient, and I would need to resort to another method of multiplication.
So what kind of multiplication does C++ use?

You seem to be missing several crucial things here:
There's a difference between native arithmetic and bignum arithmetic.
You seem to be interested in bignum arithmetic.
C++ doesn't support bignum arithmetic. The primitive datatypes are generally native arithmetic to the processor.
To get bignum (arbitrary precision) arithmetic, you need to implement it yourself or use a library. (such as GMP) Unlike Java, and C# (among others), C++ does not have a library for arbitrary precision arithmetic.
All of those fancy algorithms:
Karatsuba: O(n^1.585)
Toom-Cook: < O(n^1.465)
FFT-based: ~ O(n log(n))
are applicable only to bignum arithmetic which are implemented in bignum libraries. What the processor uses for its native arithmetic operations is somewhat irrelevant as it's
usually constant time.
In any case, I don't recommend that you try to implement a bignum library. I've done it before and it's quite demanding (especially the math). So you're better off using a library.

What do you mean by "extremely large numbers"?
C++, like most other programming languages, uses the multiplication hardware that is built-in in the processor. Exactly how that works is not specified by the C++ language. But for normal integers and floating-point numbers, you will not be able to write something faster in software.
The largest numbers that can be represented by the various data types can vary between different implementations, but some typical values are 2147483647 for int, 9223372036854775807 for long, and 1.79769e+308 for double.

In C++ integer multiplication is handled by the chip. There is no equivalent of Perl's BigNum in the standard language, although I'm certain such libraries do exist.

That all depends on the library and compiler used.

It is performed in hardware. for the same reason huge numbers won't work. The largest number c++ can represent in 64 bit hardware is 18446744073709551616. if you need larger numbers you need an arbitrary precision library.

If you work with large numbers the standard integer multiplication in c++ will no longer work and you should use a library providing arbitrary precision multiplication, like GMP http://gmplib.org/
Also, you should not worry about performance prior to writing your application (=premature optimization). These multiplications will be fast, and most likely many other components in your software will cause much more slowdown.

plain c++ uses CPU mult instructions (or schoolbook multiplication using bitshifts and additions if your CPU does not have such an instruction. )
if you need fast multiplication for large numbers, I would suggest looking at gmp ( http://gmplib.org ) and use the c++ interface from gmpxx.h

Just how big are these numbers going to be? Even languages like python can do 1e100*1e100 with arbitrary precision integers over 3 million times a second on a standard processor. That's multiplication to 100 significant places taking less than one millionth of second. To put that into context there are only about 10^80 atoms in the observable universe.
Write what you want to achieve first, and optimise later if necessary.

How to make large calculations program faster

I'm implementing a compression algorithm. Thing is, it is taking a second for a 20 Kib files, so that's not acceptable. I think it's slow because of the calculations.
I need suggestions on how to make it faster. I have some tips already, like shifting bits instead of multiplying, but I really want to be sure of which changes actually help because of the complexity of the program. I also accept suggestions concerning compiler options, I've heard there is a way to make the program do faster mathematical calculations.
Common operations are:
pow(...) function of math library
large number % 2
large number multiplying
Edit: the program has no floating point numbers

The question of how to make things faster should not be asked here to other people, but rather in your environment to a profiler. Use the profiler to determine where most of the time is spent, and that will hint you into which operations need to be improved, then if you don't know how to do it, ask about specific operations. It is almost impossible to say what you need to change without knowing what your original code is, and the question does not provide enough information: pow(...) function: what are the arguments to the function, is the exponent fixed? how much precision do you need? can you change the function for something that will yield a similar result? large number: how large is large in large number? what is number in this context? integers? floating point?

Your question is very broad, without enough informaiton to give you concrete advise, we have to do with a general roadmap.
What platform, what compiler? What is "large number"? What have you done already, what do you know about optimization?
Test a release build with optimization (/Ox /LTCG in Visual C++, -O3 IIRC for gcc)
Measure where time is spent - disk access, or your actual compression routine?
Is there a better algorithm, and code flow? The fastest operation is the one not executed.
for 20K files, memory working set should not be an issue (unless your copmpression requries large data structures), so so code optimization are the next step indeed
a modern compiler implements a lot of optimizations already, e.g replacing a division by a power-of-two constant with a bit shift.
pow is very slow for native integers
if your code is well written, you may try to post it, maybe someone's up to the challenge.

Hints :-
1) modulo 2 works only on the last bit.
2) power functions can be implemented in logn time, where n is the power. (Math library should be fast enough though). Also for fast power you may check this out
If nothing works, just check if there exists some fast algorithm.

Fast way to compute n times 10 raised to the power of minus m

I want to compute 10 raised to the power minus m. In addition to use the math function pow(10, -m), is there any fast and efficient way to do that?
What I ask such a simple question to the c++ gurus from SO is that, as you know, just like base 2, 10 is also a special base. If some value n times the 10's power minus m, it is equivalent to move n's decimal point to the left m times. I think it must be a fast and efficient way to cope with.

For floating point m, so long as your standard library implementation is well written, then pow will be efficient.
If m is an integer, and you hinted that it is, then you could use an array of pre calculated values.
You should only be worrying about this kind of thing if that routine is a bottleneck in your code. That is if the calls to that routine take a significant proportion of the total running time.

Ten is not a special value on a binary machine, only two is. Use pow or exponentiation by squaring.

Unfortunately there is no fast and efficient way to calculate it using IEEE 754 floating point representation. The fastest way to get the result is to build a table for every value of m that you care about, and then just perform a lookup.

If there's a fast and efficient way to do it then I'm sure your CPU supports it, unless you're running on an embedded system in which case I'd hope that the pow(...) implementation is well written.
10 is special to us as most of us have ten fingers. Computers only have two digits, so 2 is special to them. :)

Use lookup table there cant be more than 1000 floats and especially if m is integer.

If you could operate with log n instead of n for a significant time, you could save time because instead of
n = pow(10*n,-m)
you now have to calculate (using the definition l = log10(n))
l = -m*(l+1)

Just some more ideas which may lead you to further solutions...
If you are interested in
optimization on algorithm level you
might look for a parallelized
approach.
You may speed up on
system/archtectural level on using Ipp
(for Intel Processors), or e.g. AMD
Core Math Library (ACML) for AMD
To use the power of your graphics
card may be another way (e.g. CUDA for NVIDEA cards)
I think it's also worth to look at
OpenCL

IEEE 754 specifies a bunch of floating-point formats. Those that are in widespread use are binary, which means that base 10 isn't in any way special. This is contrary to your assumption that "10 is also a special base".
Interestingly, IEEE 754-2008 does add decimal floating-point formats (decimal32 and friends). However, I'm yet to come across hardware implementations of those.
In any case, you shouldn't be micro-optimizing your code before you've profiled it and established that this is indeed the bottleneck.

What claims, if any, can be made about the accuracy/precision of floating-point calculations?

I'm working on an application that does a lot of floating-point calculations. We use VC++ on Intel x86 with double precision floating-point values. We make claims that our calculations are accurate to n decimal digits (right now 7, but trying to claim 15).
We go to a lot of effort of validating our results against other sources when our results change slightly (due to code refactoring, cleanup, etc.). I know that many many factors play in to the overall precision, such as the FPU control state, the compiler/optimizer, floating-point model, and the overall order of operations themselves (i.e., the algorithm itself), but given the inherent uncertainty in FP calculations (e.g., 0.1 cannot be represented), it seems invalid to claim any specific degree of precision for all calulations.
My question is this: is it valid to make any claims about the accuracy of FP calculations in general without doing any sort of analysis (such as interval analysis)? If so, what claims can be made and why?
EDIT:
So given that the input data is accurate to, say, n decimal places, can any guarantee be made about the result of any arbitrary calculations, given that double precision is being used? E.g., if the input data has 8 significant decimal digits, the output will have at least 5 significant decimal digits... ?
We are using math libraries and are unaware of any guarantees they may or may not make. The algorithms we use are not necessarily analyzed for precision in any way. But even given a specific algorithm, the implementation will affect the results (just changing the order of two addition operations, for example). Is there any inherent guarantee whatsoever when using, say, double precision?
ANOTHER EDIT:
We do empirically validate our results against other sources. So are we just getting lucky when we achieve, say, 10-digit accuracy?

As with all such questions, I have to just simply answer with the article What Every Computer Scientist Should Know About Floating-Point Arithmetic. It's absolutely indispensable for the type of work you are talking about.

Short answer: No.
Reason: Have you proved (yes proved) that you aren't losing any precision as you go along? Are you sure? Do you understand the intrinsic precision of any library functions you're using for transcendental functions? Have you computed the limits of additive errors? If you are using an iterative algorithm, do you know how well it has converged when you quit? This stuff is hard.

Unless your code uses only the basic operations specified in IEEE 754 (+, -, *, / and square root), you do not even know how much precision loss each call to library functions outside your control (trigonometric functions, exp/log, ...) introduce. Functions outside the basic 5 are not guaranteed to be, and are usually not, precise at 1ULP.
You can do empirical checks, but that's what they remain... empirical. Don't forget the part about there being no warranty in the EULA of your software!
If your software was safety-critical, and did not call library-implemented mathematical functions, you could consider http://www-list.cea.fr/labos/gb/LSL/fluctuat/index.html . But only critical software is worth the effort and has a chance to fit in the analysis constraints of this tool.
You seem, after your edit, mostly concerned about your compiler doing things in your back. It is a natural fear to have (because like for the mathematical functions, you are not in control). But it's rather unlikely to be the problem. Your compiler may compute with a higher precision than you asked for (80-bit extendeds when you asked for 64-bit doubles or 64-bit doubles when you asked for 32-bit floats). This is allowed by the C99 standard. In round-to-nearest, this may introduce double-rounding errors. But it's only 1ULP you are losing, and so infrequently that you needn't worry. This can cause puzzling behaviors, as in:
float x=1.0;
float y=7.0;
float z=x/y;
if (z == x/y)
...
else
... /* the else branch is taken */
but you were looking for trouble when you used == between floating-point numbers.
When you have code that does cancelations on purpose, such as in Kahan's summation algorithm:
d = (a+b)-a-b;
and the compiler optimizes that into d=0;, you have a problem. And yes, this optimization "as if floats operation were associative" has been seen in general compilers. It is not allowed by C99. But the situation has gotten better, I think. Compiler authors have become more aware of the dangers of floating-point and no longer try to optimize so aggressively. Plus, if you were doing this in your code you would not be asking this question.

Given that your vendors of machines, compilers, run-time libraries, and operation systems don't make any such claim about floating point accuracy, you should take that to be a warning that your group should be leery of making claims that could come under harsh scrutiny if clients ever took you to court.
Without doing formal verification of the entire system, I would avoid such claims. I work on scientific software that has indirect human safety implications, so we have consider such things in the past, and we do not make these sort of claims.
You could make useless claims about precision of double (length) floating point calculations, but it would be basically worthless.
Ref: The pitfalls of verifying floating-point computations from ACM Transactions on Programming Languages and Systems 30, 3 (2008) 12

No, you cannot make any such claim. If you wanted to do so, you would need to do the following:
Hire an expert in numerical computing to analyze your algorithms.
Either get your library and compiler vendors to open their sources to said expert for analysis, or get them to sign off on hard semantics and error bounds.
Double-precision floating-point typically carries about 15 digits of decimal accuracy, but there are far too many ways for some or all of that accuracy to be lost, that are far too subtle for a non-expert to diagnose, to make any claim like what you would like to claim.
There are relatively simpler ways to keep running error bounds that would let you make accuracy claims about any specific computation, but making claims about the accuracy of all computations performed with your software is a much taller order.

A double precision number on an Intel CPU has slightly better than 15 significant digits (decimal).
The potrntial error for a simple computation is in the ballparl of n/1.0e15, where n is the order of magnitude of the number(s) you are working with. I suspect that Intel has specs for the accuracy of CPU-based FP computations.
The potential error for library functions (like cos and log) is usually documented. If not, you can look at the source code (e.g. thr GNU source) and calculate it.
You would calculate error bars for your calculations just as you would for manual calculations.
Once you do that, you may be able to reduce the error by judicious ordering of the computations.

Since you seem to be concerned about accuracy of arbitrary calculations, here is an approach you can try: run your code with different rounding modes for floating-point calculations. If the results are pretty close to each other, you are probably okay. If the results are not close, you need to start worrying.
The maximum difference in the results will give you a lower bound on the accuracy of the calculations.

Fast exponentiation: real^real (C++ MinGW, Code::Blocks)

I am writing an application where in a certain block I need to exponentiate reals around 3*500*500 times. When I use the exp(y*log(x)) algorithm, the program noticeably lags. It is significantly faster if I use another algorithm based on playing with data types, but that algorithm isn't very precise, although provides decent results for the simulation, and it's still not perfect in terms of speed.
Is there any precise exponentiation algorithm for real powers faster than exp(y*log(x))?
Thank you in advance.

If you need good accuracy, and you don't know anything about the distribution of bases (x values) a priori, then pow(x, y) is the best portable answer (on many -- not all -- platforms, this will be faster than exp(y*log(x)), and is also better behaved numerically). If you do know something about what ranges x and y can lie in, and with what distribution, that would be a big help for people trying to offer advice.
The usual way to do it faster while keeping good accuracy is to use a library routine designed to do many of these computations simultaneously for an array of x values and an array of y values. The catch is that such library implementations tend to cost money (like Intel's MKL) or be platform-specific (vvpowf in the Accelerate.framework on OS X, for example). I don't know much about MinGW, so someone else will need to tell you what's available there. The GSL may have something along these lines.

Depending on your algorithm (in particular if you have few to no additions), sometimes you can get away with working (at least partially) in log-space. You've probably already considered this, but if your intermediate representation is log_x and log_y then log(x^y) = exp(log_y) * log_x, which will be faster. If you can even be selective about it, then obviously computing log(x^y) as y * log_x is even cheaper. If you can avoid even just a few exponentiations, you may win a lot of performance. If there's any way to rewrite whatever loops you have to get the exponentiation operations outside of the innermost loop, that's a fairly certain performance win.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js