CUDA cufftPlan2d plan size question

CUDA cufftPlan2d plan size question - c++

I'm studying the code behind the convolutionFFT2D example of the Nvidia CUDA sdk, but I don't get the point of this line:
cufftPlan2d(&fftPlan, fftH, fftW/2, CUFFT_C2C);
Apparently this initializes a complex plane for the FFT to be running in, but I don't see the point of dividing the plan width by 2.
Just to be precise: the fftH and fftW are rounded values for imageX+kernelX+1 and imageY+kernelY+1 dimensions (just for speed reasons). I know that in the frequency domain you usually have a positive component and a symmetric negative component of the same frequency.. but this sounds like cutting half of my image data away..
Can someone explain this to me a little better? I've never used a FFT (I just know the theory behind a fourier transformation)

When you perform a real to complex FFT half the frequency domain data is redundant due to symmetry. This is only the case in one axis of a 2D FFT though. You can think of a 2D FFT as two 1D FFT operations, the first operates on all the rows, and for a real valued image this will give you complex row values. In the second stage you apply a 1D FFT to every column, but since the row values are now complex this will be a complex to complex FFT with no redundancy in the output. Hence you only need width / 2 points in the horizontal axis, but you still need height pointe in the vertical axis.

Related

OpenCV Image stiching when camera parameters are known

We have pictures taken from a plane flying over an area with 50% overlap and is using the OpenCV stitching algorithm to stitch them together. This works fine for our version 1. In our next iteration we want to look into a few extra things that I could use a few comments on.
Currently the stitching algorithm estimates the camera parameters. We do have camera parameters and a lot of information available from the plane about camera angle, position (GPS) etc. Would we be able to benefit anything from this information in contrast to just let the algorithm estimate everything based on matched feature points?
These images are taken in high resolution and the algorithm takes up quite amount of RAM at this point, not a big problem as we just spin large machines up in the cloud. But I would like to in our next iteration to get out the homography from down sampled images and apply it to the large images later. This will also give us more options to manipulate and visualize other information on the original images and be able to go back and forward between original and stitched images.
If we in question 1 is going to take apart the stitching algorithm to put in the known information, is it just using the findHomography method to get the info or is there better alternatives to create the homography when we actually know the plane position and angles and the camera parameters.
I got a basic understanding of opencv and is fine with c++ programming so its not a problem to write our own customized stitcher, but the theory is a bit rusty here.

Since you are using homographies to warp your imagery, I assume you are capturing areas small enough that you don't have to worry about Earth curvature effects. Also, I assume you don't use an elevation model.
Generally speaking, you will always want to tighten your (homography) model using matched image points, since your final output is a stitched image. If you have the RAM and CPU budget, you could refine your linear model using a max likelihood estimator.
Having a prior motion model (e.g. from GPS + IMU) could be used to initialize the feature search and match. With a good enough initial estimation of the feature apparent motion, you could dispense with expensive feature descriptor computation and storage, and just go with normalized crosscorrelation.

If I understand correctly, the images are taken vertically and overlap by a known amount of pixels, in that case calculating homography is a bit overkill: you're just talking about a translation matrix, and using more powerful algorithms can only give you bad conditioned matrixes.
In 2D, if H is a generalised homography matrix representing a perspective transformation,
H=[[a1 a2 a3] [a4 a5 a6] [a7 a8 a9]]
then the submatrixes R and T represent rotation and translation, respectively, if a9==1.
R= [[a1 a2] [a4 a5]], T=[[a3] [a6]]
while [a7 a8] represents the stretching of each axis. (All of this is a bit approximate since when all effects are present they'll influence each other).
So, if you known the lateral displacement, you can create a 3x3 matrix having just a3, a6 and a9=1 and pass it to cv::warpPerspective or cv::warpAffine.
As a criteria of matching correctness you can, f.e., calculate a normalized diff between pixels.

Deciding about dimensionality reduction with PCA

I have 2D data (I have a zero mean normalized data). I know the covariance matrix, eigenvalues and eigenvectors of it. I want to decide whether to reduce the dimension to 1 or not (I use principal component analysis, PCA). How can I decide? Is there any methodology for it?
I am looking sth. like if you look at this ratio and if this ratio is high than it is logical to go on with dimensionality reduction.
PS 1: Does PoV (Proportion of variation) stands for it?
PS 2: Here is an answer: https://stats.stackexchange.com/questions/22569/pca-and-proportion-of-variance-explained does it a criteria to test it?

PoV (Proportion of variation) represents how much information of data will remain relatively to using all of them. It may be used for that purpose. If POV is high than less information will be lose.

You want to sort your eigenvalues by magnitude then pick the highest 1 or 2 values. Eigenvalues with a very small relative value can be considered for exclusion. You can then translate data values and using only the top 1 or 2 eigenvectors you'll get dimensions for plotting results. This will give a visual representation of the PCA split. Also check out scikit-learn for more on PCA. Precisions, recalls, F1-scores will tell you how well it works
from http://sebastianraschka.com/Articles/2014_pca_step_by_step.html...
Step 1: 3D Example
"For our simple example, where we are reducing a 3-dimensional feature space to a 2-dimensional feature subspace, we are combining the two eigenvectors with the highest eigenvalues to construct our d×kd×k-dimensional eigenvector matrix WW.
matrix_w = np.hstack((eig_pairs[0][1].reshape(3,1),
eig_pairs[1][1].reshape(3,1)))
print('Matrix W:\n', matrix_w)
>>>Matrix W:
[[-0.49210223 -0.64670286]
[-0.47927902 -0.35756937]
[-0.72672348 0.67373552]]"
Step 2: 3D Example
"
In the last step, we use the 2×32×3-dimensional matrix WW that we just computed to transform our samples onto the new subspace via the equation
y=W^T×x
transformed = matrix_w.T.dot(all_samples)
assert transformed.shape == (2,40), "The matrix is not 2x40 dimensional."

Finding curvature from a noisy set of data points using 2d/3dsplines? (C++)

I am trying to extract the curvature of a pulse along its profile (see the picture below). The pulse is calculated on a grid of length and height: 150 x 100 cells by using Finite Differences, implemented in C++.
I extracted all the points with the same value (contour/ level set) and marked them as the red continuous line in the picture below. The other colors are negligible.
Then I tried to find the curvature from this already noisy (due to grid discretization) contour line by the following means:
(moving average already applied)
1) Curvature via Tangents
The curvature of the line at point P is defined by:
So the curvature is the limes of angle delta over the arclength between P and N. Since my points have a certain distance between them, I could not approximate the limes enough, so that the curvature was not calculated correctly. I tested it with a circle, which naturally has a constant curvature. But I could not reproduce this (only 1 significant digit was correct).
2) Second derivative of the line parametrized by arclength
I calculated the first derivative of the line with respect to arclength, smoothed with a moving average and then took the derivative again (2nd derivative). But here I also got only 1 significant digit correct.
Unfortunately taking a derivative multiplies the already inherent noise to larger levels.
3) Approximating the line locally with a circle
Since the reciprocal of the circle radius is the curvature I used the following approach:
This worked best so far (2 correct significant digits), but I need to refine even further. So my new idea is the following:
Instead of using the values at the discrete points to determine the curvature, I want to approximate the pulse profile with a 3 dimensional spline surface. Then I extract the level set of a certain value from it to gain a smooth line of points, which I can find a nice curvature from.
So far I could not find a C++ library which can generate such a Bezier spline surface. Could you maybe point me to any?
Also do you think this approach is worth giving a shot, or will I lose too much accuracy in my curvature?
Do you know of any other approach?
With very kind regards,
Jan
edit: It seems I can not post pictures as a new user, so I removed all of them from my question, even though I find them important to explain my issue. Is there any way I can still show them?
edit2: ok, done :)

There is ALGLIB that supports various flavours of interpolation:
Polynomial interpolation
Rational interpolation
Spline interpolation
Least squares fitting (linear/nonlinear)
Bilinear and bicubic spline interpolation
Fast RBF interpolation/fitting
I don't know whether it meets all of your requirements. I personally have not worked with this library yet, but I believe cubic spline interpolation could be what you are looking for (two times differentiable).
In order to prevent an overfitting to your noisy input points you should apply some sort of smoothing mechanism, e.g. you could try if things like Moving Window Average/Gaussian/FIR filters are applicable. Also have a look at (Cubic) Smoothing Splines.

Bilinear interpolation to enlarge bitmap images

I'm a student, and I've been tasked to optimize bilinear interpolation of images by invoking parallelism from CUDA.
The image is given as a 24-bit .bmp format. I already have a reader for the .bmp and have stored the pixels in an array.
Now I need to perform bilinear interpolation on the array. I do not understand the math behind it (even after going through the wiki article and other Google results). Because of this I'm unable to come up with an algorithm.
Is there anyone who can help me with a link to an existing bilinear interpolation algorithm on a 1-D array? Or perhaps link to an open source image processing library that utilizes bilinear and bicubic interpolation for scaling images?

The easiest way to understand bilinear interpolation is to understand linear interpolation in 1D.
This first figure should give you flashbacks to middle school math. Given some location a at which we want to know f(a), we take the neighboring "known" values and fit a line between them.
So we just used the old middle-school equations y=mx+b and y-y1=m(x-x1). Nothing fancy.
We basically carry over this concept to 2-D in order to get bilinear interpolation. We can attack the problem of finding f(a,b) for any a,b by doing three interpolations. Study the next figure carefully. Don't get intimidated by all the labels. It is actually pretty simple.
For a bilinear interpolation, we again using the neighboring points. Now there are four of them, since we are in 2D. The trick is to attack the problem one dimension at a time.
We project our (a,b) to the sides and first compute two (one dimensional!) interpolating lines.
f(a,yj) where yj is held constant
f(a,yj+1) where yj+1 is held constant.
Now there is just one last step. You take the two points you calculated, f(a,yj) and f(a,yj+1), and fit a line between them. That's the blue one going left to right in the diagram, passing through f(a,b). Interpolating along this last line gives you the final answer.
I'll leave the math for the 2-D case for you. It's not hard if you work from the diagram. And going through it yourself will help you really learn what's going on.
One last little note, it doesn't matter which sides you pick for the first two interpolations. You could have picked the top and bottom, and then done the third interpolation line between those two instead. The answer would have been the same.

When you enlarge an image by scaling the sides by an integral factor, you may treat the result as the original image with extra pixels inserted between the original pixels.
See the pictures in IMAGE RESIZE EXAMPLE.
The f(x,y)=... formula in this article in Wikipedia gives you a method to compute the color f of an inserted pixel:
For every inserted pixel you combine the colors of the 4 original pixels (Q11, Q12, Q21, Q22) surrounding it. The combination depends on the distance between the inserted pixel and the surrounding original pixels, the closer it is to one of them, the closer their colors are:
The original pixels are shown as red. The inserted pixel is shown as green.
That's the idea.
If you scale the sides by a non-integral factor, the formulas still hold, but now you need to recalculate all pixel colors as you can't just take the original pixels and simply insert extra pixels between them.

Don't get hung up on the fact that 2D arrays in C are really 1D arrays. It's an implementation detail. Mathematically, you'll still need to think in terms of 2D arrays.
Think about linear interpolation on a 1D array. You know the value at 0, 1, 2, 3, ... Now suppose I ask you for the value at 1.4. You'd give me a weighted mix of the values at 1 and 2: (1 - 0.4)*A[1] + 0.4*A[2]. Simple, right?
Now you need to extend to 2D. No problem. 2D interpolation can be decomposed into two 1D interpolations, in the x-axis and then y-axis. Say you want (1.4, 2.8). Get the 1D interpolants between (1, 2)<->(2,2) and (1,3)<->(2,3). That's your x-axis step. Now 1D interpolate between them with the appropriate weights for y = 2.8.
This should be simple to make massively parallel. Just calculate each interpolated pixel separately. With shared memory access to the original image, you'll only be doing reads, so no synchronization issues.

Gaussian Blur with FFT Questions

I have a current implementation of Gaussian Blur using regular convolution. It is efficient enough for small kernels, but once the kernels size gets a little bigger, the performance takes a hit. So, I am thinking to implement the convolution using FFT. I've never had any experience with FFT related image processing so I have a few questions.
Is a 2D FFT based convolution also separable into two 1D convolutions ?
If true, does it go like this - 1D FFT on every row, and then 1D FFT on every column, then multiply with the 2D kernel and then inverse transform of every column and the inverse transform of every row? Or do I have to multiply with a 1D kernel after each 1D FFT Transform?
Now I understand that the kernel size should be the same size as the image (row in case of 1D). But how will it affect the edges? Do I have to pad the image edges with zeros? If so the kernel size should be equal to the image size before or after padding?
Also, this is a C++ project, and I plan on using kissFFT, since this is a commercial project. You are welcome to suggest any better alternatives. Thank you.
EDIT: Thanks for the responses, but I have a few more questions.
I see that the imaginary part of the input image will be all zeros. But will the output imaginary part will also be zeros? Do I have to multiply the Gaussian kernel to both real and imaginary parts?
I have instances of the same image to be blurred at different scales, i.e. the same image is scaled to different sizes and blurred at different kernel sizes. Do I have to perform a FFT every time I scale the image or can I use the same FFT?
Lastly, If I wanted to visualize the FFT, I understand that a log filter has to be applied to the FFT. But I am really lost on which part should be used to visualize FFT? The real part or the imaginary part.
Also for an image of size 512x512, what will be the size of real and imaginary parts. Will they be the same length?
Thank you again for your detailed replies.

The 2-D FFT is seperable and you are correct in how to perform it except that you must multiply by the 2-D FFT of the 2D kernel. If you are using kissfft, an easier way to perform the 2-D FFT is to just use kiss_fftnd in the tools directory of the kissfft package. This will do multi-dimensional FFTs.
The kernel size does not have to be any particular size. If the kernel is smaller than the image, you just need to zero-pad up to the image size before performing the 2-D FFT. You should also zero pad the image edges since the convoulution being performed by multiplication in the frequency domain is actually circular convolution and results wrap around at the edges.
So to summarize (given that the image size is M x N):
come up with a 2-D kernel of any size (U x V)
zero-pad the kernel up to (M+U-1) x (N+V-1)
take the 2-D fft of the kernel
zero-pad the image up to (M+U-1) x (N+V-1)
take the 2-D FFT of the image
multiply FFT of kernel by FFT of image
take inverse 2-D FFT of result
trim off garbage at edges
If you are performing the same filter multiple times on different images, you don't have to perform 1-3 every time.
Note: The kernel size will have to be rather large for this to be faster than direct computation of convolution. Also, did you implement your direct convolution taking advantage of the fact that a 2-D gaussian filter is separable (see this a few paragraphs into the "Mechanics" section)? That is, you can perform the 2-D convolution as 1-D convolutions on the rows and then the columns. I have found this to be faster than most FFT-based approaches unless the kernels are quite large.
Response to Edit
If the input is real, the output will still be complex except for rare circumstances. The FFT of your gaussian kernel will also be complex, so the multiply must be a complex multiplication. When you perform the inverse FFT, the output should be real since your input image and kernel are real. The output will be returned in a complex array, but the imaginary components should be zero or very small (floating point error) and can be discarded.
If you are using the same image, you can reuse the image FFT, but you will need to zero pad based on your biggest kernel size. You will have to compute the FFTs of all of the different kernels.
For visualization, the magnitude of the complex output should be used. The log scale just helps to visualize smaller components of the output when larger components would drown them out in a linear scale. The Decibel scale is often used and is given by either 20*log10(abs(x)) or 10*log10(x*x') which are equivalent. (x is the complex fft output and x' is the complex conjugate of x).
The input and output of the FFT will be the same size. Also the real and imaginary parts will be the same size since one real and one imaginary value form a single sample.

Remember that convolution in space is equivalent to multiplication in frequency domain. This means that once you perform FFT of both image and mask (kernel), you only have to do point-by-point multiplication, and then IFFT of the result. Having said that, here are a few words of caution.
You probably know that in digital signal processing, we often use circular convolution, not linear convolution. This happens because of curious periodicity. What this means in simple terms is that DFT (and FFT which is its computationally efficient variant) assumes that you signal is periodic, and when you filter your signal in such manner -- suppose your image is N x M pixels -- that it takes pixel at (1,m) to the the neighbor or pixel at (N, m) for some m<M. You signal virtually wraps around onto itself. This means that your Gaussian mask will be averaging pixels on the far right with pixels on the far left, and same goes for top and bottom. This might or might not be desired, but in general one has to deal with edging artifacts anyway. It is however much easier to forget about this issue when dealing with FFT multiplication because the problem stops being apparent. There are many ways to take care of this problem. The best way is to simply pad your image with zeros and remove the extra pixels later.
A very neat thing about using a Gaussian filter in frequency domain is that you never really have to take its FFT. It si a well-know fact that Fourier transform of a Gaussian is a Gaussian (some technical details here). All you would have to do then is pad you image with zeros (both top and bottom), generate a Gaussian in the frequency domain, multiply them together and take IFFT. Then you're done.
Hope this helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js