Questions regarding algorithm used by matchTemplate() in OpenCV - c++

At the link there is the documentation on how opencv matches images with cv::matchTemplate. I am interested in equation 5.
In literature I have found that this equation is usually computed via a truncated Fourier transform or with other truncated expansion on a basis set to better the performances. Obviously, this is an approximation.
I have implemented the same formula reported in the link without any discrepancies. The results are slightly different. I would like to know how opencv implements this metrics.
I have read the source code, but they use a foggy calls to kernel functions. Do they uses Fourier transform or other basis sets like rectangular basis functions?

Related

Proper way to do a 2D deconvolution, assuming no noise (just a convolution has been applied to the input)

I would like to be able to do a convolution on an matrix, save the kernel, and then be able to use the output of the convolution to do a deconvolution with the output of the convolution and the kernel to get the original matrix. I am stuck on the deconvolution function, as most of the cases I have found on the internet have either been for matlab or python, and I am using c++. In addition, the cases online are for bigger images, where I am not even going to have images at all, just matrices. Also, most of those cases, because they are for image processing, are assuming noise will be builtin to the original image. As I am using matrices and just a convolution (not adding noise), this won't apply. I have found the correct formula for a deconvolution, but it is kind of over my head as I am not that skilled in advanced math. I would appreciate any help, even just basic pseudocode.
I have tried looking at libraries on the internet, and what I have found is either in matlab, using a lot of libraries (which I don't have in c++), or python, again, using libraries I don't have in c++. They also apply to larger images, whereas in my case the matrices will be smaller. This means a more brute force approach will be ok because the inputs will never be more than 10x10 integers.

batch CUDA solution of sparse banded Ax=b for various b's

I have a sparse banded matrix A and I'd like to (direct) solve Ax=b. I have about 500 vectors b, so I'd like to solve for the corresponding 500 x's.
I'm brand new to CUDA, so I'm a little confused as to what options I have available.
cuSOLVER has a batch direct solver cuSolverSP for sparse A_i x_i = b_i using QR here. (I'd be fine with LU too since A is decently conditioned.) However, as far as I can tell, I can't exploit the fact that all my A_i's are the same.
Would an alternative option be to first determine a sparse LU (QR) factorization on the CPU or GPU then perform in parallel the backsubstitution (respectively, backsub and matrix mult) on the GPU? If cusolverSp< t >csrlsvlu() is for one b_i, is there a standard way to batch perform this operation for multiple b_i's?
Finally, since I don't have intuition for this, should I expect a speedup on a GPU for either of these options, given the necessary overhead? x has length ~10000-100000. Thanks.
I'm currently working on something similar myself. I decided to basically wrap the conjugate gradient and level-0 incomplete cholesky preconditioned conjugate gradient solvers utility samples that came with the CUDA SDK into a small class.
You can find them in your CUDA_HOME directory under the path:
samples/7_CUDALibraries/conjugateGradient and /Developer/NVIDIA/CUDA-samples/7_CUDALibraries/conjugateGradientPrecond
Basically, you would load the matrix into the device memory once (and for ICCG, compute the corresponding conditioner / matrix analysis) then call the solve kernel with different b vectors.
I don't know what you anticipate your matrix band structure to look like, but if it is symmetric and either diagonally dominant (off diagonal bands along each row and column are opposite sign of diagonal and their sum is less than the diagonal entry) or positive definite (no eigenvectors with an eigenvalue of 0.) then CG and ICCG should be useful. Alternately, the various multigrid algorithms are another option if you are willing to go through coding them up.
If your matrix is only positive semi-definite (e.g. has at least one eigenvector with an eigenvalue of zero) you can still ofteb get away with using CG or ICCG as long as you ensure that:
1) The right hand side (b vectors) are orthogonal to the null space (null space meaning eigenvectors with an eigenvalue of zero).
2) The solution you obtain is orthogonal to the null space.
It is interesting to note that if you do have a non-trivial null space, then different numeric solvers can give you different answers for the same exact system. The solutions will end up differing by a linear combination of the null space... That problem has caused me many many man hours of debugging and frustration before I finally caught on, so its good to be aware of it.
Lastly, if your matrix has a Circulant Band structure you might consider using a fast fourier transform (FFT) based solver. FFT based numerical solvers can often yield superior performance in cases where they are applicable.
is there a standard way to batch perform this operation for multiple b_i's?
One option is to use the batched refactorization module in CUDA's cuSOLVER, but I am not sure if it is standard.
Batched refactorization module in cuSOLVER provides an efficient method to solve batches of linear systems with fixed left-hand side sparse matrix (or matrices with fixed sparsity pattern but varying coefficients) and varying right-hand sides, based on LU decomposition. Only some partially completed code snippets can be found in the official documentation (as of CUDA 10.1) that are related to it. A complete example can be found here.
If you don't mind going with an open-source library, you could also check out CUSP:
CUSP Quick Start Page
It has a fairly decent suite of solvers, including a few preconditioned methods:
CUSP Preconditioner Examples
The smoothed aggregation preconditioner (a variant of algebraic multigrid) seems to work very well as long as your GPU has enough onboard memory for it.

Smoothing discrete data

I'm trying to write a program to smooth discrete digitized data for use in a motion simulator. The data will be provided as a set of t, x(t) points and is intended to be used to create cyclic motion; thus the smoothed data must be not only continuous over the range of t values but also between the two endpoints. In addition, data provided will most likely be of significantly lower resolution than required and thus significant interpolation will take place.
I've looked at various techniques such as Gauss-Newton and Levenberg–Marquardt curve fits, but these assume that an objective function is known beforehand (and it will not be). Unfortunately the users of said motion simulator may not be able to choose an appropriate function (due to their differing backgrounds). Finally, the code must be usable on a non-proprietary, cross-platform (and preferably compiled) language which can run on embedded platforms (most likely Linux on ARM) - this precludes the use of Maple (which provides a generic 'fit' routine that selects an appropriate objective function), Matlab (similar IIRC) or other math-related languages. I should say that I'm predisposed to C++ due to experience.
Some typical data can be found on pages here.
What technique would be useful for this?
It would likely be simpler and more adaptive to different data sets to apply Digital Signal Processing (DSP) techniques for rate conversion by upsampling and interpolation. The C++ SPUC library may help you here - it supports several interpolation filters.
I implemented a generic cubic spline fitting function that can be applied to any dimensional euclidean and quaternion data, which might fit (no pun intended) your purpose. I don't know how well the fitting compares to other algorithms though since only the input data keys are considered as potential spline key placement, but you can have a look at it here: http://sourceforge.net/p/spinxengine/code/HEAD/tree/sxp_src/core/math/parametric.inl (fit_spline function).
For creating cyclic motion you should be able to replicate the keys before and after the fitted data sequence and modify the function so that it forces creation of a key at end of the cycle. This is relatively simple function so it shouldn't be a big effort to modify it like that. You need to also setup a spline fitting constraint to define in what kind of conditions new splines are created (e.g. what's distance tolerance from fitted data to input data, etc.)
Cheers, Jarkko

Forward and inverse Gabor transform library in C/C++

I am wondering if there exists a highly-optimized C/C++ library for both the forward and inverse Gabor transforms (Wikipedia link). This is not the same as the Gabor filter, which is normally applied to images. The library can either be closed or FOSS/open-source, but I would prefer the latter since I am working on a research application.
I am implementing inverse Q filtering algorithms from the Seismic Inverse Q filtering book (see pg. 125). The author appears to be fond of using the Gabor transform.
The forward and inverse transforms are required, since some operations are computed on the frequency-domain signals, and the inverse transform is used to compute a discrete time-domain signal.
Up to my knowledge there is no specialized library to compute a Gabor transform (GT). Analogous to the continuous wavelet transform (CWT) the GT can only be approximated to a certain degree as it is defined as a time and frequency continuous function.
However, standard tools can be used to get a decent approximation of a GT. The usual way, similar to the CWT, is to implement those transforms in Fourier space. The GT as is the CWT are essentially just a filter bank. For a GT you would compute a FFT, multiply with the Fourier transform of the GT kernel, which is a gaussian centered at the desired frequency band, and then compute the inverse FFT for each band.
Another good approximation of the GT based on a IIR filterbank is described in this article. This method can also be implemented with standard tools (MatLab, SciPy, etc.)
I'm curious, what kind of scientific application are you aiming? Usually a CWT is the better choice, since it respects the natural scaling behavior and has a higher degree of symmetry, in particular it is invariant under dilations.
For 2022, there is a library toolbox LTFAT with matlab/octave/python interfaces and a c backend supporting 1D/2D forward and inverse gabor transform.

How are FFTs different from DFTs and how would one go about implementing them in C++?

After some studying, I created a small app that calculates DFTs (Discrete Fourier Transformations) from some input. It works well enough, but it is quite slow.
I read that FFTs (Fast Fourier Transformations) allow quicker calculations, but how are they different? And more importantly, how would I go about implementing them in C++?
If you don't need to manually implement the algorithm, you could take a look at the Fastest Fourier Transform in the West
Even thought it's developed in C, it officially works in C++ (from the FAQ)
Question 2.9. Can I call FFTW from
C++?
Most definitely. FFTW should compile
and/or link under any C++ compiler.
Moreover, it is likely that the C++
template class is
bit-compatible with FFTW's
complex-number format (see the FFTW
manual for more details).
FFT has n*log(n) compexity compared to DFT which has n^2.
There are lot of literature about that, and I strongly advise that you check that first, because such wide topic can not be full explaned here.
http://en.wikipedia.org/wiki/Fast_Fourier_transform (check external links )
If you need library I advise you to use existing one, for instance.
http://www.fftw.org/
This library has efficiently implementation of FFT and is also used in propariaretery software (MATLAB for instance)
Steven Smith's book The Scientist and Engineer's Guide to Digital Signal Processing , specifically Chapter 8 on the DFT and Chapter 12 on the FFT, does a much better job of explaining the two transforms that I ever could.
By the way, the whole book is available for free (link above) and it's a very good introduction to signal processing.
Regarding the C++ code request, I've only used the Fastest Fourier Transform in the West (already cited by superexsl) or DSP libraries such as those from TI or Analog Devices.
The results of a correctly implemented DFT are essentially identical to the results of a correctly implemented FFT (they differ only by rounding errors). As others have pointed out here, the major difference is that of performance. DFT has O(n^2) operations while the FFT has O(nlogn) operations.
The best, most readable publication I have ever found (the one I still refer to) is The Fast Fourier Transform and its Applications by E Oran Brigham. The first few chapters provide a very thorough overview of the continuous and discrete forms of the Fourier Transform. He then uses that to develop the fast version of the DFT based on the Cooley-Tukey Algorithm for the radix-2 (n is a power of 2) and mixed-radix cases (though the latter being somewhat more shallow treatise than the former).
The basic approach in the radix-2 algorithm to perform a linear time operation on the input X and to recursively split the result in half and perform a similar linear time operation on the two halves. The mixed radix case is similar, though you need to divide X into equal portions each time, so it helps if n doesn't have any large prime factors.
I've found this nice explanation with some algorithms described.
FastFourierTransform
About implementation,
first i'd make sure your implementation returns correct results (compare the output from matlab or octave - which have built in fourier transformates)
optimize when necessary, use profilers
don't use unnecesary for loops