Fast trigonometric functions using only integer in c++ for arm target

Fast trigonometric functions using only integer in c++ for arm target - c++

I am writing code for an ARM-Target which uses a lot of floating point operations and trigonometric functions. AFAIK floating point calculations are MUCH slower than int (especially on ARM). Accuracy is not crucial.
I thought about implementing my own trigonometric functions using a scaling factor (p.e. range of 0*pi to 2*pi becomes int 0 to 1024) and lookup tables. Is that a good approach?
Are there any alternatives?
Target platform is an Odroid U2 (Exynos4412) running ubuntu and lots of other stuff (webserver etc...).
(c++11 and boost/libraries allowed)

If your target platform has a math library, use it. If it is any good, it was written by experts who were considering speed. You should not base code design on guesses about what is fast or slow. If you do not have actual measurements or processor specifications, and you do not know trigonometric functions in your application are consuming a lot of time, then you do not have good reason for replacing the math libraries.
Floating-point instructions typically have longer latencies than integer instructions, but they are pipelined so that throughput may be comparable. (E.g., a floating-point unit might have four stages to do the work, so an instruction takes four cycles to work through all the stages, but you can push a new instruction into the first stage in each cycle.) Whether the pipelining is sufficient to provide performance on a par with an integer implementation depends greatly on the target processor, the algorithm being used, and the skill of the implementor.
If it is beneficial in your case to use custom implementations of the math routines, then how they should be designed is hugely dependent on circumstances. Proper advice depends on the domain to support (Just 0 to 2π? –2π to +2π? Possibly larger values, which have to be folded to -π to π?), what special cases needed to be supported (Propagate NaNs?), the accuracy required, what else is happening in the processor (Is a lot of memory in use or can we rely on a lookup table remaining in cache?), and more.
A significant part of the trigonometric routines is handling various cases (NaNs, infinities, small values) and reducing arguments modulo 2π. It may be possible to implement stripped-down routines that do not handle special cases or perform argument reduction but still use floating-point.

Exynos 4412 uses the Cortex-A9 core[1], which has fully pipelined single- and double-precision floating-point. There is no reason to resort to integer operations, as there was with some older ARM cores.
Depending on your specific accuracy requirements (and especially if you can guarantee that the inputs fall into a limited range), you may be able to use approximations that are significantly faster than the implementations available in the standard library. More information about your exact usage would be necessary to give sound advice.
[1] http://en.wikipedia.org/wiki/Exynos_(system_on_chip)

One possible alternative is trigint:
trigint download
trigint doxygen

You should use "fixed point" math rather than floating point.
Most ARM processors (7 and above) allow for 32 bits of resolution in the fixed point. So you could go to 1E-3 radians quite easily. But the real question is how much accuracy do you need in the results?
Whether to use lookup tables, lookup tables with interpolation or functions depends on how much data space you have on your system. Lookup tables are fastest execution, but use the most data space. Functions use the least amount of data but require the most execution time. Interpolation may be a mitigation that allows smaller tables and some extra processing.

Related

Recommendable test for software integer multiplication?

I've coded a number of integer multiplication routines for Atmel's AVR architecture. I found following a simple pattern for the multiplier (and a similar one for the multiplicand) useful, if unconvincing (start at zero, step by a one in every byte (in addition to eventual carries)).
There seems to be quite a bit about testing hardware multiplier implementations, but:
What can be recommended for testing software implementations of integer multiplication? Exhaustive testing gets out of hand - if not at, then beyond 16×16 bit.

Most approaches uses Genere & Test
generate test operands
The safest is to use all combinations of operands. But with bignums or small computing power or memory is this not possible or practical. In that case are usually used some test cases which will (most likely) stress the algorithm tested (like propagating carry ... or be near safe limits) and use only them. Another option is to use pseudo random operands and hope for a valid test sample. or combine all these in some way together
compute multiplication
Just apply the tested algorithm on generated input data.
assess the results
so you need to compare the algorithm(multiplication) result to something. Mostly use different algorithm of the same process to compare with. Or use inverse functions (like c=a*b; if (a!=c/b) ...). The problem with this is that you can not distinguish between error in compared or compared to algorithm ... unless you use something 100% working or compare to more then just one operation... There is also the possibility to have precomputed result table (computed on different platform 100% bug free) and compare to that.
AVR Tag make this a tough question. So some hints:
many MCU's have a lot of Flash memory for program that can be used to store precomputed values to compare to
sometimes running in emulator is faster/more comfortable but with the risk of dismissing some HW related issues.

What should i know when using floats/doubles between different machines?

I've heard that there are many problems with floats/doubles on different CPU's.
If i want to make a game that uses floats for everything, how can i be sure the float calculations are exactly the same on every machine so that my simulation will look exactly same on every machine?
I am also concerned about writing/reading files or sending/receiving the float values to different computers. What conversions there must be done, if any?
I need to be 100% sure that my float values are computed exactly the same, because even a slight difference in the calculations will result in a totally different future. Is this even possible ?

Standard C++ does not prescribe any details about floating point types other than range constraints, and possibly that some of the maths functions (like sine and exponential) have to be correct up to a certain level of accuracy.
Other than that, at that level of generality, there's really nothing else you can rely on!
That said, it is quite possible that you will not actually require binarily identical computations on every platform, and that the precision and accuracy guarantees of the float or double types will in fact be sufficient for simulation purposes.
Note that you cannot even produce a reliable result of an algebraic expression inside your own program when you modify the order of evaluation of subexpressions, so asking for the sort of reproducibility that you want may be a bit unrealistic anyway. If you need real floating point precision and accuracy guarantees, you might be better off with an arbitrary precision library with correct rounding, like MPFR - but that seems unrealistic for a game.
Serializing floats is an entirely different story, and you'll have to have some idea of the representations used by your target platforms. If all platforms were in fact to use IEEE 754 floats of 32 or 64 bit size, you could probably just exchange the binary representation directly (modulo endianness). If you have other platforms, you'll have to think up your own serialization scheme.

What every programmer should know: http://docs.sun.com/source/806-3568/ncg_goldberg.html

Same code using floats on two computers gives two different results

I've got some image processing code in C++ which calculates gradients and finds straight lines in them with the hough transformation algorithm. The program does most of the calculations with floats.
When I run this code on the same image on two different computers, one Pentium IV running latest Fedora, the other a Core i5 latest Ubuntu, both 32 bit, I get slightly different results. E.g. I have after some lengthy calculation 1.3456f for some variable on the one machine and 1.3457f on the other. Is this expected behavior or should I search for errors in my program?
My first guess was, that I'm accessing some uninitialized or out-of-bounds memory but I did run the program through valgrind and it can't find any errors, also running multiple times on the same machine always gives the same results.

This is not uncommon and it will depend on your compiler, optimisation settings, math libraries, CPU, and of course the numerical stability of the algorithms that you are using.
You need to have a good idea of your accuracy requirements and if you are not meeting these then you may need to look at your algorithms and e.g. consider using double rather than float where needed.

For background on why given source code might not result in the same output on different computers, see What Every Computer Scientist Should Know About Floating-Point Arithmetic. I doubt this is due to any deficiency of your code unless it performs aggregation in a non-deterministic way eg. by centrally collating calculation results from multiple threads.
Floating point behaviour is often tunable per compiler options, even to the level of different CPUs. Check your compiler docs to see if you can reduce or eliminate the discrepancy. On Visual C++ (for example) this is done via /fp.

Is it due to the a phonomena called machine epsilon?
http://en.wikipedia.org/wiki/Machine_epsilon
There are limitations on flaoting-point number. The fact that floating-point numbers cannot precisely represent all real numbers, and that floating-point operations cannot precisely represent true arithmetic operations, leads to many surprising situations. This is related to the finite precision with which computers generally represent numbers.

Basically, the same C++ instructions can be compiled to different machine instructions (even on the same CPU and certainly on different CPUs) depending on a large number of factors, and the same machine instructions can lead to different low-level CPU actions depending on a large number of factors. In theory, these are supposed to be semantically equivalent, but with floating-point numbers, there are edge cases where they aren't.
Read "The pitfalls of verifying floating-point computations" by David Monniaux for details.

I will also say that this is very common, and probably not your fault.
I spent a lot of time in the past trying to figure out the same problem.
I would suggest to use decimal instead of float and double as long as your numbers do not refer to scientific calculations but to values like prices, quantities, exchange rates, etc.

This is totally normal, unfortunately.
There are libraries which can produce identical results everywhere--see http://www.mpfr.org/ for an example. But the performance cost is substantial and it's probably not worth it unless exact identical results are the most important criterion.
I've actually written a closed-source library which implemented floating-point math in the integer unit, in order to make floats provide identical results on multiple platforms (Intel, AMD, PowerPC) across different compilers. We had an app which simply could not function if floating-point results varied. It was quite a challenge, though. If we could do it again we'd have just designed the original app in fixed-point, but at the time it was too much code to rewrite.

Either this is a difference between the internal representation of the float, making slightly different results, or perhaps it is a difference in the way the float is printed to the screen? I doubt that it is your fault...

What claims, if any, can be made about the accuracy/precision of floating-point calculations?

I'm working on an application that does a lot of floating-point calculations. We use VC++ on Intel x86 with double precision floating-point values. We make claims that our calculations are accurate to n decimal digits (right now 7, but trying to claim 15).
We go to a lot of effort of validating our results against other sources when our results change slightly (due to code refactoring, cleanup, etc.). I know that many many factors play in to the overall precision, such as the FPU control state, the compiler/optimizer, floating-point model, and the overall order of operations themselves (i.e., the algorithm itself), but given the inherent uncertainty in FP calculations (e.g., 0.1 cannot be represented), it seems invalid to claim any specific degree of precision for all calulations.
My question is this: is it valid to make any claims about the accuracy of FP calculations in general without doing any sort of analysis (such as interval analysis)? If so, what claims can be made and why?
EDIT:
So given that the input data is accurate to, say, n decimal places, can any guarantee be made about the result of any arbitrary calculations, given that double precision is being used? E.g., if the input data has 8 significant decimal digits, the output will have at least 5 significant decimal digits... ?
We are using math libraries and are unaware of any guarantees they may or may not make. The algorithms we use are not necessarily analyzed for precision in any way. But even given a specific algorithm, the implementation will affect the results (just changing the order of two addition operations, for example). Is there any inherent guarantee whatsoever when using, say, double precision?
ANOTHER EDIT:
We do empirically validate our results against other sources. So are we just getting lucky when we achieve, say, 10-digit accuracy?

As with all such questions, I have to just simply answer with the article What Every Computer Scientist Should Know About Floating-Point Arithmetic. It's absolutely indispensable for the type of work you are talking about.

Short answer: No.
Reason: Have you proved (yes proved) that you aren't losing any precision as you go along? Are you sure? Do you understand the intrinsic precision of any library functions you're using for transcendental functions? Have you computed the limits of additive errors? If you are using an iterative algorithm, do you know how well it has converged when you quit? This stuff is hard.

Unless your code uses only the basic operations specified in IEEE 754 (+, -, *, / and square root), you do not even know how much precision loss each call to library functions outside your control (trigonometric functions, exp/log, ...) introduce. Functions outside the basic 5 are not guaranteed to be, and are usually not, precise at 1ULP.
You can do empirical checks, but that's what they remain... empirical. Don't forget the part about there being no warranty in the EULA of your software!
If your software was safety-critical, and did not call library-implemented mathematical functions, you could consider http://www-list.cea.fr/labos/gb/LSL/fluctuat/index.html . But only critical software is worth the effort and has a chance to fit in the analysis constraints of this tool.
You seem, after your edit, mostly concerned about your compiler doing things in your back. It is a natural fear to have (because like for the mathematical functions, you are not in control). But it's rather unlikely to be the problem. Your compiler may compute with a higher precision than you asked for (80-bit extendeds when you asked for 64-bit doubles or 64-bit doubles when you asked for 32-bit floats). This is allowed by the C99 standard. In round-to-nearest, this may introduce double-rounding errors. But it's only 1ULP you are losing, and so infrequently that you needn't worry. This can cause puzzling behaviors, as in:
float x=1.0;
float y=7.0;
float z=x/y;
if (z == x/y)
...
else
... /* the else branch is taken */
but you were looking for trouble when you used == between floating-point numbers.
When you have code that does cancelations on purpose, such as in Kahan's summation algorithm:
d = (a+b)-a-b;
and the compiler optimizes that into d=0;, you have a problem. And yes, this optimization "as if floats operation were associative" has been seen in general compilers. It is not allowed by C99. But the situation has gotten better, I think. Compiler authors have become more aware of the dangers of floating-point and no longer try to optimize so aggressively. Plus, if you were doing this in your code you would not be asking this question.

Given that your vendors of machines, compilers, run-time libraries, and operation systems don't make any such claim about floating point accuracy, you should take that to be a warning that your group should be leery of making claims that could come under harsh scrutiny if clients ever took you to court.
Without doing formal verification of the entire system, I would avoid such claims. I work on scientific software that has indirect human safety implications, so we have consider such things in the past, and we do not make these sort of claims.
You could make useless claims about precision of double (length) floating point calculations, but it would be basically worthless.
Ref: The pitfalls of verifying floating-point computations from ACM Transactions on Programming Languages and Systems 30, 3 (2008) 12

No, you cannot make any such claim. If you wanted to do so, you would need to do the following:
Hire an expert in numerical computing to analyze your algorithms.
Either get your library and compiler vendors to open their sources to said expert for analysis, or get them to sign off on hard semantics and error bounds.
Double-precision floating-point typically carries about 15 digits of decimal accuracy, but there are far too many ways for some or all of that accuracy to be lost, that are far too subtle for a non-expert to diagnose, to make any claim like what you would like to claim.
There are relatively simpler ways to keep running error bounds that would let you make accuracy claims about any specific computation, but making claims about the accuracy of all computations performed with your software is a much taller order.

A double precision number on an Intel CPU has slightly better than 15 significant digits (decimal).
The potrntial error for a simple computation is in the ballparl of n/1.0e15, where n is the order of magnitude of the number(s) you are working with. I suspect that Intel has specs for the accuracy of CPU-based FP computations.
The potential error for library functions (like cos and log) is usually documented. If not, you can look at the source code (e.g. thr GNU source) and calculate it.
You would calculate error bars for your calculations just as you would for manual calculations.
Once you do that, you may be able to reduce the error by judicious ordering of the computations.

Since you seem to be concerned about accuracy of arbitrary calculations, here is an approach you can try: run your code with different rounding modes for floating-point calculations. If the results are pretty close to each other, you are probably okay. If the results are not close, you need to start worrying.
The maximum difference in the results will give you a lower bound on the accuracy of the calculations.

Fast exponentiation: real^real (C++ MinGW, Code::Blocks)

I am writing an application where in a certain block I need to exponentiate reals around 3*500*500 times. When I use the exp(y*log(x)) algorithm, the program noticeably lags. It is significantly faster if I use another algorithm based on playing with data types, but that algorithm isn't very precise, although provides decent results for the simulation, and it's still not perfect in terms of speed.
Is there any precise exponentiation algorithm for real powers faster than exp(y*log(x))?
Thank you in advance.

If you need good accuracy, and you don't know anything about the distribution of bases (x values) a priori, then pow(x, y) is the best portable answer (on many -- not all -- platforms, this will be faster than exp(y*log(x)), and is also better behaved numerically). If you do know something about what ranges x and y can lie in, and with what distribution, that would be a big help for people trying to offer advice.
The usual way to do it faster while keeping good accuracy is to use a library routine designed to do many of these computations simultaneously for an array of x values and an array of y values. The catch is that such library implementations tend to cost money (like Intel's MKL) or be platform-specific (vvpowf in the Accelerate.framework on OS X, for example). I don't know much about MinGW, so someone else will need to tell you what's available there. The GSL may have something along these lines.

Depending on your algorithm (in particular if you have few to no additions), sometimes you can get away with working (at least partially) in log-space. You've probably already considered this, but if your intermediate representation is log_x and log_y then log(x^y) = exp(log_y) * log_x, which will be faster. If you can even be selective about it, then obviously computing log(x^y) as y * log_x is even cheaper. If you can avoid even just a few exponentiations, you may win a lot of performance. If there's any way to rewrite whatever loops you have to get the exponentiation operations outside of the innermost loop, that's a fairly certain performance win.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js