I am trying to write a short C++ routine to calculate the following function F(i,j,z) for given integers j > i (typically they lie between 0 and 100) and complex number z (bounded by |z| < 100), where L are the associated Laguerre Polynomials:
The issue is that I want this function to be callable from within a CUDA kernel (i.e. with a __device__ attribute). Standard library/Boost/etc functions are therefore out of the questions, unless they are simple enough to re-implement on my own - this especially relates to the Laguerre polynomials which exist in Boost and C++17. Regardless if I manage to wrap any standard function for Laguerre polynomials, I still have a similar pre-factor to calculate of the form (z^j/j!).
Question: How can I do a relatively simple implementation of such a function, without introducing significant numerical instability?
My idea so far is to calculate L and its pre-factor independently. The pre-factor I will calculate by first looping from 0 to j-i and calculate (z^1 * z^2/2 * ... * z^(j-1)/(j-i)!). I will then calculate the remaining factor exp(-|z|^2/2) *(j-i)! * sqrt(i!/j!) (either in a similar way, or through the Gamma-function, which is implemented in CUDA math). The idea is then to find a minimal algorithm to calculate the associated Laguerre polynomial, unless I manage to wrap an implementation from e.g. Boost or GNU C++.
Edit/side note: The expression for F actually blows up numerically for some values of i/j. It was derived wrong in the source where I got it, and the indices of the associated Laguerre polynomials should instead be L_i^(j-i). That does not invalidate the approaches suggested in the answers/comments.
I recommend finding a recurrence relation for the coefficients of the Laguerre Polynomial:
C(k+1) = g(k)C(k)
g(k) = C(k+1) / C(k)
g(k) = -z * (j - k) / ((j - i + k + 1) * (k + 1)) //Verify this yourself :)
This allows you to avoid most of factorials in computing the polynomial.
After that I would follow Severin's idea of doing the calculations in logarithms
so as to not overload the double floating point range:
log(F) = log(sqrt(i!/j!)) - |z|^2 + (j-i) * log(-z) + log(L(|z|^2))
log(L) = log((2*j - i)!) + log(sum) // where the summation is computed using the recurrence relation above
and using the fact that:
log(a!) = sum(k=1..a, log(k))
and also:
log(z) = log(|z|) + I * arg(z) for complex z
log(-z) = log(|z|) + I * arg(-z)
log(-z) = log(|z|) - I * arg(z)
for the log(sqrt(i!/j!)) part I would do (assuming that j >= i):
log(sqrt(i!/j!))
= 0.5 * (log(i!) - log(j!))
= -0.5 * sum(k==i+1..j, log(k))
I haven't tried this out so there could definitely be little mistakes here and there. This answer is more about the technique rather than a copy-paste-ready answer
Well, what you should do is to logarithm it
Assuming natural logarithm,
q = log(z^j/j!) = log(z^j) - log(j!) = j*log(z) - log(Gamma(j+1))
First term is simple, second term is standard C++ function lgamma(x) (or you could use GSL).
compute value of q and return cexp(q)
You could fold exponent in this method as well
I am trying to use PPVAL function from the IMSL library, to evaluate a piecewise polynomial, similar to MATLAB's ppval function.
Sadly, I could not understand well the documentation on how to properly use the function.
I tried to evaluate a simple polynomial.
The polynomials p(x) = 1 and p(x) = x were ok. What I don't understand is when I try to get higher degree polynomials.
The program below tries to evaluate the polynomial p(x)=x^4:
program main
include 'LINK_FNL_SHARED.h'
USE PPVAL_INT
implicit none
double precision :: x(11), y(11)
double precision :: BREAK(2),PPCOEF(5,1)
integer :: ii
break = [0.0d0,10.0d0] ! breakpoints in increasing order
PPCOEF(:,1) = [0.0d0,0.0d0,0.0d0,0.0d0,1.0d0] ! polynomial coefficients
x = dble([0,1,2,3,4,5,6,7,8,9,10])
do ii=1,11
y(ii) = PPVAL(x(ii),break,PPCOEF)
end do
print *, 'x = ', x
print *, 'y = ', y
pause
end program main
But the function returns the polynomial p(x) = x^4/4!.
For degrees 2,3 and 5, the return is always, p(x) = x^2/2!, p(x)=x^3/3!, p(x)=x^5/5!. Why does this factor appear in the ppval function? Why can't I just supply the polynomial coefficients and evaluate it?
Is there another simpler function to evaluate polynomials in Fortran, like MATLAB polyval?
I'm attempting to write a program that calculates the value of sin(0.75) using the Taylor series, until the absolute difference between the calculated value and FORTRAN'S intrinsic sin function is less then 1E-6, printing each iteration to the screen. (it also needs to print the absolute difference at each step, but I haven't gotten round to implementing that yet!). Here's my code:
program taylor
implicit none
real :: x = 0.75
if (abs(x - sin(0.75)) > 10.00**(-7)) then
print *, x
x = x - ((x**3)/6)
x = x + ((x**5)/120)
x = x - ((x**7)/5040)
end if
end program taylor
This just prints out 0.75 to the terminal, why is this? How can I fix it so that it does what I specified above?
Thanks, any help is appreciated.
I realized that the resolution of the input signal dramatically affects the results of the convolution. I'm wondering if there is a way to compensate somehow for this. Let me give you an example:
Lets take the Sersic equation:
Sersic
with, for example, parameters.
Now we solve this equation both for a R step of 0.1 and 0.01. For example for the 1st point (R=0) we get \mu(0) = 9.82.
The next step is to convolve the data, after converting it into counts (to convert it to counts we can use this simple equation: Data(R) = 10^((\mu(R)-25)/(-2.5)). I'm using the bellow mentioned subroutine that I wrote but I tried with others and I get the same result (the PSF is Moffat with FWHM = 0.5 arcsec and its constructed in a way that its total area equals 1):
sum1 = 0
DO i = 1,n
sum1 = 0
g = i
DO f = 1,i
sum1(f) = Data(f)*PSF(g)
g = i - f
ENDDO
convData(i) = sum(sum1)
ENDDO
convData = convData(n:1:-1)
So, for this example, for the data with 0.1 resolution after convolution (and after reconverting the counts to \mu) I get for \mu(0)* = 13.52. For the data with 0.01 resolution I get \mu(0)* = 15.52. This is 2 magnitudes difference!! What am I doing wrong or how can I somehow compensate for this effect?
Thank you so much for the help!
I have the following code :
cv::Mat temp0 = R.t();
cv::Mat temp1 = R * temp0;
cv::Mat temp2(960, 960, CV_32FC1);
temp2 = temp1.inv();
cv::Size s = temp1.size();
std::cout<<s.height<<" "<<s.width<<std::endl;
std::cout<<cv::format(temp1, "numpy" ) << std::endl;
std::cout<<cv::format(temp2, "numpy" ) << std::endl;
The Transpose works correctly, so does the matrix multiplication. Thus the Mat temp1 has a size of 960x960. However, when I do temp2 =temp1.inv(), I recieve all zeroes in temp2. I mean zeroes is all of the 960x960 cells. Also, R is of type CV_32FC1 only. So it is probably not a datatype issue. I cannot understand the issue here. I googled so much. Can you please help.
EDIT
I am copying below the gdb output for the Mat::inv() function. I am having a hard time figuring it all out, but if someone is more familiar with OpenCV, maybe it will be of help :)
Breakpoint 1, CreateShares::ConstructShares (this=0x80556d0, channel=..., k=2, n=4) at CreateShares.cpp:165
165 temp2 = temp1.inv();
(gdb) step
cv::Mat::operator= (this=0xbffff294, e=...) at /usr/include/opencv2/core/mat.hpp:1373
1373 e.op->assign(e, *this);
(gdb)
1374 return *this;
(gdb) step
1375 }
(gdb) step
cv::MatExpr::~MatExpr (this=0xbfffef64, __in_chrg=<optimized out>) at /usr/include/opencv2/core/mat.hpp:1167
1167 class CV_EXPORTS MatExpr
(gdb) step
cv::Mat::~Mat (this=0xbfffefdc, __in_chrg=<optimized out>) at /usr/include/opencv2/core/mat.hpp:295
295 release();
(gdb) step
cv::Mat::release (this=0xbfffefdc) at /usr/include/opencv2/core/mat.hpp:381
381 if( refcount && CV_XADD(refcount, -1) == 1 )
(gdb) step
383 data = datastart = dataend = datalimit = 0;
(gdb) step
384 size.p[0] = 0;
(gdb) step
385 refcount = 0;
(gdb) step
386 }
(gdb) step
cv::Mat::~Mat (this=0xbfffefdc, __in_chrg=<optimized out>) at /usr/include/opencv2/core/mat.hpp:296
296 if( step.p != step.buf )
(gdb) step
298 }
(gdb) step
cv::Mat::~Mat (this=0xbfffefa4, __in_chrg=<optimized out>) at /usr/include/opencv2/core/mat.hpp:295
295 release();
(gdb) step
cv::Mat::release (this=0xbfffefa4) at /usr/include/opencv2/core/mat.hpp:381
381 if( refcount && CV_XADD(refcount, -1) == 1 )
(gdb) step
383 data = datastart = dataend = datalimit = 0;
(gdb) step
384 size.p[0] = 0;
(gdb) step
385 refcount = 0;
(gdb) step
386 }
(gdb) step
cv::Mat::~Mat (this=0xbfffefa4, __in_chrg=<optimized out>) at /usr/include/opencv2/core/mat.hpp:296
296 if( step.p != step.buf )
(gdb) step
298 }
(gdb) step
cv::Mat::~Mat (this=0xbfffef6c, __in_chrg=<optimized out>) at /usr/include/opencv2/core/mat.hpp:295
295 release();
(gdb) step
cv::Mat::release (this=0xbfffef6c) at /usr/include/opencv2/core/mat.hpp:381
381 if( refcount && CV_XADD(refcount, -1) == 1 )
(gdb) step
383 data = datastart = dataend = datalimit = 0;
(gdb) step
384 size.p[0] = 0;
(gdb) step
385 refcount = 0;
(gdb) step
386 }
(gdb) step
cv::Mat::~Mat (this=0xbfffef6c, __in_chrg=<optimized out>) at /usr/include/opencv2/core/mat.hpp:296
296 if( step.p != step.buf )
(gdb) step
298 }
(gdb) step
CreateShares::ConstructShares (this=0x80556d0, channel=..., k=2, n=4) at CreateShares.cpp:167
167 cv::Size s = temp1.size();
(gdb) step
cv::Mat::MSize::operator() (this=0xbffff284) at /usr/include/opencv2/core/mat.hpp:705
705 return Size(p[1], p[0]);
(gdb) step
cv::Size_<int>::Size_ (this=0xbffff2f8, _width=960, _height=960) at /usr/include/opencv2/core/operations.hpp:1624
1624 : width(_width), height(_height) {}
(gdb) step
cv::Mat::MSize::operator() (this=0xbffff284) at /usr/include/opencv2/core/mat.hpp:706
706 }
(gdb) step
Most likely, the determinant is zero.
From Wikipedia:
A square matrix that is not invertible is called singular or
degenerate. A square matrix is singular if and only if its determinant
is 0.
You can display the determinant like so...
std::cout<<"determinant(temp1)="<<cv::determinant(temp1)<<"\n";
From the documentation for Mat::inv(), there are three methods to choose from:
DECOMP_LU (default) is the LU decomposition. The matrix must be non-singular.
DECOMP_CHOLESKY is the Cholesky decomposition for symmetrical positively defined matrices only. This type is about twice faster than LU on big matrices.
DECOMP_SVD is the SVD decomposition. If the matrix is singular or even non-square, the pseudo inversion is computed.
From the documentation for invert(), which is presumably used internally by Mat::inv():
In the case of DECOMP_LU method, the function returns the src
determinant ( src must be square). If it is 0, the matrix is not
inverted and dst is filled with zeros.
This agrees with the results that you are seeing.
notes about the math
I'm no mathematician, but I get the impression that inverting a matrix can be a messy business -- all the more so if your matrix is very large. In fact, it may be true that these inverses exist in principle, but are practically impossible to calculate with any accuracy. In running some experiments with your code, I found that in many cases I would get determinants that were not exactly zero, but were very close to zero -- perhaps indicating that numerical precision may be the limiting factor. I tried specifying the matrices using 64-bit values instead of 32, and got different, but not necessarily better answers.
It may be useful to recognize that, based on the way you are calculating the temp1 matrix, it will always be symmetric. The DECOMP_CHOLESKY method is specifically designed to work on symmetric positive definite matrices, so using that might provide some advantages.
Experimentally, I found that normalizing (as suggested by #cedrou) makes it more likely that the inverse function returns a non-zero matrix (with DECOMP_LU but not with DECOMP_CHOLESKY). However, based on my guesses of how you might be initializing the R matrix, the resulting matrices never seemed to satisfy the definition of an inverse: A*inverse(A)=Identity. But you don't necessarily care about that -- which is perhaps why the SVD method computes a pseudo inverse.
Finally, it seems that this deeper question of why inversion is failing might be a math question rather than a programming question. Based on that I did some searching on the math site, and it turns out that someone has already asked how to do this very thing: https://math.stackexchange.com/questions/182662
notes about debugging
Based on your debug trace, I am inclined to think that the part that you are interested in was compiled into a non-traceable library and skipped over when you ran step. In other words, that mysterious blank line after your first step represents the part where it actually ran the inv() function. After that it is assigning the result to temp2 and destructing temporary objects. So your debug trace doesn't tell us anything about what is happening inside of inv().
165 temp2 = temp1.inv();
(gdb) step
cv::Mat::operator= (this=0xbffff294, e=...) at /usr/include/opencv2/core/mat.hpp:1373
1373 e.op->assign(e, *this);
I ran a debugger on this myself and was able to trace through the inner call to invert() and watch it decide to fail based on an internal analysis of the matrix (determining that it was not invertible) -- and therefore return a matrix filled with zeros, matching what you have reported.
The invert() function is defined in cxlapack.cpp, in case you are interested in taking a look at the source code.
For random matrix R product R^T*R may be singular. So any kind of LU-decomposition will stop prematurely, resulting in zero output.
To overcome this, one may invert matrix R^T*R+alpha*I. Here I is identity matrix, alpha - some positive number. If alpha is close zero and R^T*R is not singular, inverse of R^T*R+alpha*I is close to inverse of R^T*R. For details, see Tikhonov regularization
Another case is matrix R^T*R being not singular but ill-conditioned. Condition number for large unstructured matrix may be tremendous, resulting in weird behavior of matrix inversion (LU-decomposition works correctly only for relatively small condition numbers).
About normalizing
Normalizing matrix will improve inversion behavior because it decreases condition number.
I've came to the same conclusion than #berak. I hope the following experiments will help you.
I tried your code with a matrix filled with some random values (normal distribution centered on 0 with a standard dev of sigma. As sigma is increasing, the values of the final matrix decrease.
Here is the code I used:
cv::Mat R(960,960, CV_32FC1);
double sigma[] = { 1.0, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, 10000000.0 };
cv::Scalar mean[_countof(sigma)] = {0};
cv::Scalar stdv[_countof(sigma)] = {0};
for (int i = 0; i < _countof(sigma); i++)
{
cv::randn(R, cv::Scalar::all(0.0), cv::Scalar::all(sigma[i]));
cv::Mat temp2 = (R * R.t()).inv();
cv::meanStdDev(temp2, mean[i], stdv[i]);
}
Here are the mean and standard dev of the output matrix for increasing sigmas:
sigma mean stddev
1.0 3.94e-004 1.32
10.0 1.25e-004 3.82e-002
100.0 3.32e-007 1.09e-004
1000.0 2.40e-009 2.23e-006
10000.0 9.82e-012 1.05e-008
100000.0 2.23e-013 1.73e-010
1000000.0 1.44e-015 2.88e-012
10000000.0 9.61e-017 2.77e-014
So, a solution for you would be to normalize your input matrix so that all the values fit into [0;1] or [-1;1].