double precision functions within the compute shader [duplicate] - c++

In GLSL there's rudimentary support for double precision variables and operations which can be found here. However they also mention "Double-precision versions of angle, trigonometry, and exponential
functions are not supported.".
Is there a simple workaround for this, or do I have to write my own functions from scratch?

this link seem's to be the best answer
So yes, you'll need to make your own implementation for those functions.
glibc source may be your friend.

Related

Is there a way to enable fixed-point calculations in Box2D?

I am using Box2D for a game that communicates to a server and I need complete determinism. I would simply like to use integer math/fixed-point math to achieve this and I was wondering if there was a way to enable that in Box2D.
Yes. Albeit with a fixed-point implementation and with modifications to the Box2D library code.
The C++ library code for Box2D 2.3.2 uses the float32-type for its implementation of real-number-like values. As float32 is defined in b2Settings.h via a typedef (to the C++ float type), it can be changed on the one line to use a different underlying implementation of real-number-like values.
Unfortunately some of the code (like b2Max) is used or written in ways that break if float32 is not defined to be a float. So then those errors have to be chased down and the errant code rewritten such that the new type can be used.
I have done this sort of work myself including writing my own fixed-point implementation. The short of this is that I'd recommend using a 64-bit implementation with between 14 to 24 bits for the fractional portion of values (at least to make it through most of the Testbed tests without unusable amounts of underflow/overflow issues). You can take a look at my fork to see how I've done this but it's not presently code that's ready for release (not as of 2/11/2017).
The only way you can achieve determinism in Physics Engine, is to use a Fixed Time step update on the physics engine. You can read more information in this link. http://saltares.com/blog/games/fixing-your-timestep-in-libgdx-and-box2d/

GLSL Double Precision Angle, Trig and Exponential Functions Workaround

In GLSL there's rudimentary support for double precision variables and operations which can be found here. However they also mention "Double-precision versions of angle, trigonometry, and exponential
functions are not supported.".
Is there a simple workaround for this, or do I have to write my own functions from scratch?
this link seem's to be the best answer
So yes, you'll need to make your own implementation for those functions.
glibc source may be your friend.

Matlab - Cumulative distribution function (CDF)

I'm in the middle of a code translation from Matlab to C++, and for some important reasons I must obtain the cumulative distribution function of a 'normal' function (in matlab, 'norm') with mean=0 and variance=1.
The implementation in Matlab is something like this:
map.c = cdf( 'norm', map.c, 0,1 );
Which is suppose to be the equalization of the histogram from map.c.
The problem comes at the moment of translating it into C++, due to the lack of decimals I have. I tried a lot of typical cdf implementations: such as the C++ code I found here,
Cumulative Normal Distribution Function in C/C++ but I got an important lack of decimals, so I tried the boost implementation:
#include "boost/math/distributions.hpp"
boost::math::normal_distribution<> d(0,1);
but still it is not the same implementation as Matlab's (I guess it seems to be even more precise!)
Does anyone know where I could find the original Matlab source for such a process, or which is the correct amount of decimals I should consider?
Thanks in advance!
The Gaussian CDF is an interesting function. I do not know whether my answer will interest you, but it is likely to interest others who look up your question later, so here it is.
One can compute the CDF by integrating the Taylor series of the PDF term by term. This approach works well in the body of the Gaussian bell curve, only it fails numerically in the tails. In the tails, it wants special-function techniques. The best source I have read for this is N. N. Lebedev's Special Functions and Their Applications, Ch. 2, Dover, 1972.
Octave is an open-source Matlab clone. Here's the source code for Octave's implementation of normcdf: http://octave-nan.sourcearchive.com/documentation/1.0.6/normcdf_8m-source.html
It should be (almost) the same as Matlab's, if it helps you.
C and C++ support long double for a more precise floating point type. You could try using that in your implementation. You can check your compiler documentation to see if it provides an even higher precision floating point type. GCC 4.3 and higher provide __float128 which has even more precision.

SSE2: Double precision log function

I need open source (no restriction on license) implementation of log function, something with signature
__m128d _mm_log_pd(__m128d);
It is available in Intel Short Vector Math Library (part of ICC), but ICC is neither free nor open source. I am looking for implementation using intrinsics only.
It should use special rational function approximations. I need something almost as accurate as cmath log, say 9-10 decimal digits, but faster.
I believe log2 is easier to compute. You can multiply/divide your number by a power of two (very quick) such that it lies in (0.5, 2], and then you use a Pade approximant (take M close to N) which is easy to derive once and for all, and whose order you can chose according to your needs. You only need arithmetic operations that you can do with SSE intrinsics. Don't forget to add/remove a constant according to the above scaling factor.
If you want natural log, divide by log2(e), that you can compute once and for all.
It is not rare to see custom log functions in some specific projects. Standard library functions address the general case, but you need something more specific. I sincerely think it is not that hard to do it yourself.
Take a look at AMD LibM. It isn't open source, but free. AFAIK, it works on Intel CPUs. On the same web page you find a link to ACML, another free math lib from AMD. It has everything from AMD LibM + Matrix algos, FF and distributions.
I don't know any open source implementation of double precision vectorized math functions. I guess Intel and AMD libs are hand optimised by the CPU manufacturer and everyone uses them when speed is important. IIRC, there was an attempt to implement intrinsics for vectorized math functions in GCC. I don't how far they managed to get. Obviously, it isn't a trivial task.
Framewave project is Apache 2.0 licensed and aims to be the open source equivalent of Intel IPP. It has implementations that are close to what you are looking for.
Check the fixed accuracy arithmetic functions in the documentation.
Here's the counterpart for __m256d: https://stackoverflow.com/a/45898937/1915854 . It should be pretty trivial to cut it to __m128d. Let me know if you encounter any problems with this.
Or you can view my implementation as something obtaining two __m128d numbers at once.
If you cannot find an existing open source implementation it is relatively easy to create your own using the standard method of a Taylor series. See Wikipedia for this and a variety of other methods.

Matrix classes in c++

I'm doing some linear algebra math, and was looking for some really lightweight and simple to use matrix class that could handle different dimensions: 2x2, 2x1, 3x1 and 1x2 basically.
I presume such class could be implemented with templates and using some specialization in some cases, for performance.
Anybody know of any simple implementation available for use? I don't want "bloated" implementations, as I'll running this in an embedded environment where memory is constrained.
Thanks
You could try Blitz++ -- or Boost's uBLAS
I've recently looked at a variety of C++ matrix libraries, and my vote goes to Armadillo.
The library is heavily templated and header-only.
Armadillo also leverages templates to implement a delayed evaluation framework (resolved at compile time) to minimize temporaries in the generated code (resulting in reduced memory usage and increased performance).
However, these advanced features are only a burden to the compiler and not your implementation running in the embedded environment, because most Armadillo code 'evaporates' during compilation due to its design approach based on templates.
And despite all that, one of its main design goals has been ease of use - the API is deliberately similar in style to Matlab syntax (see the comparison table on the site).
Additionally, although Armadillo can work standalone, you might want to consider using it with LAPACK (and BLAS) implementations available to improve performance. A good option would be for instance OpenBLAS (or ATLAS). Check Armadillo's FAQ, it covers some important topics.
A quick search on Google dug up this presentation showing that Armadillo has already been used in embedded systems.
std::valarray is pretty lightweight.
I use Newmat libraries for matrix computations. It's open source and easy to use, although I'm not sure it fits your definition of lightweight (it includes over 50 source files which Visual Studio compiles it into a 1.8MB static library).
CML matrix is pretty good, but may not be lightweight enough for an embedded environment. Check it out anyway: http://cmldev.net/?p=418
Another option, altough may be too late is:
https://launchpad.net/lwmatrix
I for one wasn't able to find simple enough library so I wrote it myself: http://koti.welho.com/aarpikar/lib/
I think it should be able to handle different matrix dimensions (2x2, 3x3, 3x1, etc) by simply setting some rows or columns to zero. It won't be the most fastest approach since internally all operations will be done with 4x4 matrices. Although in theory there might exist that kind of processors that can handle 4x4-operations in one tick. At least I would much rather believe in existence of such processors that than go optimizing those low level matrix calculations. :)
How about just store the matrix in an array, like
2x3 matrix = {2,3,val1,val2,...,val6}
This is really simple, and addition operations are trivial. However, you need to write your own multiplication function.