CUDA convolutionFFT2D example - I can't understand it

CUDA convolutionFFT2D example - I can't understand it - c++

I studied the Cooley Tukey algorithm and I understood it. I got everything in the CUDA convolutionFFT2D example till these kernels:
spProcess2D calls -> spProcess2D_kernel which calls a lot of -> spPostprocessC2C, mulAndScale and spPreprocessC2C
Here's the complete code:
http://nopaste.info/30c13e44fe.html (convolutionFFT2D.cu, here is the spProcess2D function)
http://nopaste.info/78d22afac2.html (convolutionFFT2D.cuh, here are the other functions)
I already read all the nvidia sdk papers but I can't still figure out what these function do (they use twiddles, but nothing seems like a Cooley Tukey algorithm there)
Please help me if you can, or at least point me out where to solve my problem
Update: I found this link: http://cnx.org/content/m16336/latest/#uid38
Maybe these functions are performing a breadth-first algorithm? I still can't say that but the shape seems the same

It looks like the algorithm is doing something similar to the algorithm mentioned here. The preprocess step looks to be re-ordering the Real input of size N (after padding) to complex input of size N/2. The postprocess step is re-ordering the data to get back the FFT of the original
input array.

spPostprocessC2C looks like a single FFT butterfly. The complexity in the calling routines just comes from fitting the FFT algorithm into a SIMT model for CUDA.
Perhaps if you explained what it is that you are trying to achieve (beyond just understanding how this particular FFT implementation works) then you might get some more specific answers.

Related

equivalent of wavedec (matlab function) in opencv

I am trying to rewrite a matlab code to cpp and I still blocked with this line :
[c, l]=wavedec(S,4,'Dmey');
Is there something like that in opencv ?
if someone have an idea about it try to share it with and thanks in advance.

Maybe if you would integrate your codes with Python, then PyWavelet might be an option.
I'm seeing the function that you're looking for is in there. That function is called Discrete Meyer (Dmey). Not sure what you're planning to do with that though, maybe you're processing some images or something, but Dmey is OK, not very widely used. You might want to just find some GitHub codes and integrate to whatever you're doing to see if it would work first, and based on those you can also change the details of your currently posted function (might find something more efficient).
1-D wavelet decomposition (wavedec)
In your code, c and l stand for coefficients and level. You're passing level four with a Dmey function. If you'd have one dimensional data, the following map is how your decomposition would look like, roughly I guess:
There are usually two types of decomposition models that are being used in Wavelets, one is called packet which is similar to a Full Binary Tree, from architecture standpoint:
The other one, which is the one you're most likely using, is less computationally expensive, because it does not decompose both branches of a tree, if you will. It'd just do the mathematical decomposition in one branch of the tree. Maybe, these images would shed some lights:
1 D
2 D
Notes:
If you have a working model in MatLab, you might want to see the C/C++ Code Generation in MatLab, will automatically convert MatLab codes to C++.
References:
Images are from Wikipedia or mathworks.com
Wiki
mathworks
Wavelet 2D

FastMarching method from MATLAB alternative in OpenCV C++

I am currently working on image processing software from MATLAB code to C++ with OpenCV.
I came upon the imseggmm function in MATLAB (Link) wich use the internal MEX fastMarchingMex wich return a distance map.
There seems to be no equivalent at this day in OpenCV, and implementations using OpenCV are not very common (in fact, inexistants on the web)
My question is : does others functions that do the same exact job exists even with Superior computation time ? (computation time is not an issue in my case)
EDIT : as pointed by comment, my question is not very clear.
Let's say that for me, performance is not important, I only search a function that output the same exact results as the FastMarching method.
I tested the DistanceTransform function wich did not returned the same results at all with my test sample.
There seems to be something I don't understand in the process of the FastMarching method, that differ from others distance calculation methods.
My MATLAB code use fastMarching providing it a binary image of zeros and ones, and an index of those non-zero points (in linear array coordinates). It return a distance map.
Please consider also this is the first time I post a question on StackOverflow, I hope I have not been anoying and that my question, if it got an answer, will help other people to find an alternative to this function.

Least Median of Squares robust regression C++

I have a set of data z(0), z(1), z(2)...,z(n) that I am currently fitting with a 2 variables polynomial of the kind p(x,y) = a(1)*x^2+a(2)*y^2+a(3)*x*y+a(4). I have i=1,...,n (x(i),y(i)) coordinates that I impose to be p(x(i),y(i))=z(i). In this way I have a Overdetermined System that I can solve using Eigen SVD . I am looking for a more sophisticated method that can take care of outliers, like a Least Median of Squares robust regression (as described here) but I haven't found a C++ implementation for 2 variables. I looked in GSL but it seems there is nothing for 2 variable functions. The only other solution I can think of is using a TGraph2D in ROOT. Do you know any other solution? Numerical recipes maybe? Since I am writing C++ code I would prefer C or C++ implementations.

Since non answer has been given yet, but I am still working on this problem, I will share my progresses here.
The class TLinearFitter has a fit method that allows you to select Robust fitting - Least Trimmed Squares regression (LTS):
https://root.cern.ch/root/html532/TLinearFitter.html
Another possible solution, more time consuming maybe, but maybe more efficient on the long run is to write my own function to be minimized, and the use:
https://projects.coin-or.org/Ipopt to minimize it. Although in this approach there is a bigger "step". I don't know how to use the library and I haven't (yet?) found a nice tutorial to understand it.
here: https://wis.kuleuven.be/stat/robust/software there is a Fortran implementation of the LMedS algorithm called PROGRESS. So another possible solution could be to port this software to C/C++ and make a library out of it.

What is a fast simple solver for a large Laplacian matrix?

I need to solve some large (N~1e6) Laplacian matrices that arise in the study of resistor networks. The rest of the network analysis is being handled with boost graph and I would like to stay in C++ if possible. I know there are lots and lots of C++ matrix libraries but no one seems to be a clear leader in speed or usability. Also, the many questions on the subject, here and elsewhere seem to rapidly devolve into laundry lists which are of limited utility. In an attempt to help myself and others, I will try to keep the question concise and answerable:
What is the best library that can effectively handle the following requirements?
Matrix type: Symmetric Diagonal Dominant/Laplacian
Size: Very large (N~1e6), no dynamic resizing needed
Sparsity: Extreme (maximum 5 nonzero terms per row/column)
Operations needed: Solve for x in A*x=b and mat/vec multiply
Language: C++ (C ok)
Priority: Speed and simplicity to code. I would really rather avoid having to learn a whole new framework for this one problem or have to manually write too much helper code.
Extra love to answers with a minimal working example...

If you want to write your own solver, in terms of simplicity, it's hard to beat Gauss-Seidel iteration. The update step is one line, and it can be parallelized easily. Successive over-relaxation (SOR) is only slightly more complicated and converges much faster.
Conjugate gradient is also straightforward to code, and should converge much faster than the other iterative methods. The important thing to note is that you don't need to form the full matrix A, just compute matrix-vector products A*b. Once that's working, you can improve the convergance rate again by adding a preconditioner like SSOR (Symmetric SOR).
Probably the fastest solution method that's reasonable to write yourself is a Fourier-based solver. It essentially involves taking an FFT of the right-hand side, multiplying each value by a function of its coordinate, and taking the inverse FFT. You can use an FFT library like FFTW, or roll your own.
A good reference for all of these is A First Course in the Numerical Analysis of Differential Equations by Arieh Iserles.

Eigen is quite nice to use and one of the fastest libraries I know:
http://eigen.tuxfamily.org/dox/group__TutorialSparse.html

There is a lot of related post, you could have look.
I would recommend C++ and Boost::ublas as used in UMFPACK and BOOST's uBLAS Sparse Matrix

C++ Newton-Raphson algo?

I have a huge problem. I need to solve a non linear sistem of 3 equations in 3 variables with a C++ function or class. I thought about using Newton-Raphson method to perform the solution. Unlukily I didn't find a source code that can do that for me. There would be someone that knows a program like that? I'm near deciding to build it myself. Thanks

A 3x3 system is not huge; it's actually a very small problem. People routinely solve nonlinear systems of equations with thousands (and more) of variables and constraints.
Given that your system is 3x3 and possibly nasty, a more appropriate choice of method would be a line search method. You get global convergence to a local minimum of the residual this way; it's very easy to make straight Newton's method diverge.
Steepest descent with backtracking line search is the simplest line search method possible. You might try implementing it first.

First, see related questions What good libraries are there for solving a system of non-linear equations in C++? and https://stackoverflow.com/questions/4914967/could-you-explain-how-newton-raphson-for-a-set-of-equations-works-code-inside. Also, try to use boost.

Consider this cozy C++ library

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

CUDA convolutionFFT2D example - I can't understand it - c++

It looks like the algorithm is doing something similar to the algorithm mentioned here. The preprocess step looks to be re-ordering the Real input of size N (after padding) to complex input of size N/2. The postprocess step is re-ordering the data to get back the FFT of the original input array.

Related

equivalent of wavedec (matlab function) in opencv

FastMarching method from MATLAB alternative in OpenCV C++

Least Median of Squares robust regression C++

What is a fast simple solver for a large Laplacian matrix?

C++ Newton-Raphson algo?

Categories

Resources