I want to use a GPU-accelerated algorithm, to perform a fast and memory saving dft. But, when I perform the gpu::dft, the destination matrix is scaled as it is explained in the documentation. How I can avoid this problem with the scaling of the width to dft_size.width / 2 + 1? Also, why is it scaled like this? My Code for the DFT is this:
cv::gpu::GpuMat d_in, d_out;
d_in = in;
d_out.create(d_in.size(), CV_32FC2 );
cv::gpu::dft( d_in, d_out, d_in.Size );
where in is a CV_32FC1 matrix, which is 512x512.
The best solution would be a destination matrix which has the size d_in.size and the type CV_32FC2.
This is due to complex conjugate symmetry that is present in the output of an FFT. Intel IPP has a good description of this packing (the same packing is used by OpenCV). The OpenCV dft function also describes this packing.
So, from the gpu::dft documentation we have:
If the source matrix is complex and the output is not specified as real, the destination matrix is complex and has the dft_size size and CV_32FC2 type.
So, make sure you pass a complex matrix to the gpu::dft function if you don't want it to be packed. You will need to set the second channel to all zeros:
Mat realData;
// ... get your real data...
Mat cplxData = Mat::zeros(realData.size(), realData.type());
vector<Mat> channels;
channels.push_back(realData);
channels.push_back(cplxData);
Mat fftInput;
merge(channels, fftInput);
GpuMat fftGpu(fftInput.size(), fftInput.type());
fftGpu.upload(fftInput);
// do the gpu::dft here...
There is a caveat though...you get about a 30-40% performance boost when using CCS packed data, so you will lose some performance by using the full-complex output.
Hope that helps!
Scaling is done for obtaining the result within the range of +/- 1.0. This is the most useful form for most applications that need to deal with frequency representation of the data. For retrieving a result which is not scaled just don't enable the DFT_SCALE flag.
Edit
The width of the result is scaled, because it is symmetric. So all you have to do is append the former values in a symmetric fashion.
The spectrum is symmetric, because at half of the width the sampling theorem is fulfilled. For example a 2048 point DFT for a signal source with a samplerate of 48 kHz can only represent values up to 24 kHz and this value is represented at half of the width.
Also for reference take a look at Spectrum Analysis Using the Discrete Fourier Transform.
Related
I read the paper explaining Yolact and Yolact++. I'm confused with the mask size and prototype mask. There is an illustration of protonet and the output from protonet is of size 138 * 138 * 32. Is this the size of protomask? I have read in the paper saying that the algorithm produces an image sized mask. So Please clarify the size of the mask produced.
Take for example an input with the following size:
(H,W,C) = (512,512,3)
The protonet will give you the following output size (a.k.a proto-masks): (128,128,32) - where 32=Number of Protos. It is 1/4 of the input size.
The protos are being used for getting the mask by a linear combination of them, with the corresponding coefficients predicted by the prediction module.
Therefore you will have a mask, with the size (128,128). Then a crop is being done on this mask, the cropping is done according to the bbox prediction (after NMS).
The bbox values can be related as relative to the image size, therefore (0.5,0.5,1.,1.) which corresponds to (256.,256.,512.,512.) in the input image, are (64.,64.,128.,128.) in the mask created by the protos
I have two videos, one of a background and one of that same background with a person sitting in the frame. I generated two images from the video of just the background: the mean image of the background video (by accumulating the frames and dividing by the number of frames) and an image of standard deviations from the mean per pixel, taken over the frames. In other words, I have two images representing the Gaussian distribution of the background video. Now, I want to threshold an image, not using one fixed threshold value for all pixels, but using the standard deviations from the image (a different threshold per pixel). However, as far as I understand, OpenCV's threshold() function only allows for one fixed threshold. Are there functions I'm missing, or is there a workaround?
A cv::Mat provides methodology to accomplish this.
The setTo() methods takes an optional InputArray as mask.
Assuming the following:
std is your standard deviations cv::Mat, in is the cv::Mat you want to threshold and thresh is the factor for your standard deviations.
Using these values the custom thresholding could be done like this:
// Computes threshold values based on input image + std dev
cv::Mat mask = in +/- (std * thresh);
// Set in.at<>(x,y) to 0 if value is lower than mask.at<>(x,y)
in.setTo(0, in < mask);
The in < mask expression creates a new MatExpr object, a matrix which is 0 at every pixel where the predicate is false, 255 otherwise.
This is a handy way to implement custom thresholding.
I want to find if there is a utility function or variable that outputs the maximum value a specific Mat type can take. For example, the maximum possible value of a CV_U8 is 255.
Example case
Matlab has a couple of built in functions which can take an image of arbitrary image type and convert it (with scaling if necessary) to another image type.
For example, Matlab has the function im2double. Running help im2double shows:
Class Support
-------------
Intensity and truecolor images can be uint8, uint16, double, logical,
single, or int16. Indexed images can be uint8, uint16, double or
logical. Binary input images must be logical. The output image is double.
So it will run on 10 different image types, and outputs a double image with the same number of color channels, scaled by dividing the max allowable value in the original data type.
Thus the OpenCV functions convertTo() and normalize() would be able to do the same thing if one was able to get the max value of the input data type and input it into those functions.
In particular convertTo(dst, type, scale) would work identically if one could use scale = 1/<max_value_of_input_type> and normalize(src, dst, alpha, beta, NORM_MINMAX) would work with alpha = <src_min>/<max_value_of_input_type> and beta = <src_max>/<max_value_of_input_type>.
The utility function saturate_cast() can perform clipping to the min and max value of a wanted type. In order to divide by the max possible value of an arbitrary type, use the biggest number an image type can take in OpenCV and saturate it with the destination type the same as the image. This will work for unsigned images. For images with signed values, saturate on the positive and negative side, and then shift and scale.
See the OpenCV docs for saturate_cast here: http://docs.opencv.org/3.1.0/db/de0/group__core__utils.html#gab93126370b85fda2c8bfaf8c811faeaf
Edit: The obvious solution is to just write the seven if statements for the different available Mat types: CV_8U, CV_8S, CV_16U, CV_16S, CV_32S, CV_32F, CV_64F, which I suppose would not be too annoying and much more clear to a reader.
I am using opencv to compute a butterworth filter of an image. The image in questions is a physical parameter, i.e. the pressure, in some units, at every nodal point. It is not just gray scale or color values.
I have followed the examples here: http://docs.opencv.org/2.4/doc/tutorials/core/discrete_fourier_transform/discrete_fourier_transform.html
http://breckon.eu/toby/teaching/dip/opencv/lecture_demos/c++/butterworth_lowpass.cpp
I have successfully implemented this filter. I.E. I can DFT, create the filter kernel, apply it, and inverse Fourier transform back.
However, the magnitude of the values after the idft are completely off.
In particular, I replicate lines of code that can be found in both the above links:
// Perform Inverse Fourier Transform
idft(complexImg, complexImg);
split(complexImg, planes);
imgOutput = planes[0].clone();
In the above code segment,
1.) I compute the idft of complexImg and save it to complexImg.
2.) I split complexImg into real and imaginary parts (which is saved in planes[0] and planes[1], respectively)
3.) I save the save the real part to imgOutput as my original image was real.
However, if the original image, i.e. imgInput had a mean value of the order of O(10^-1), imgOutput has a mean value of the order of O(10^4 to 10^5). It seems some type of normalization is needed? In the above example links, the values are normalized between 0 and 1 for viewing purposes, but that is not what I need.
Any help will be appreciated.
Thank you.
The problem was solved by normalizing by 2*N, where N is the number of pixels in the image.
i.e.
imgOutput = imgOutput/imgOutput.cols/imgOutput.rows/2;
According to the documentation: https://docs.opencv.org/2.4/modules/core/doc/operations_on_arrays.html#idft
Note
None of dft and idft scales the result by default. So, you should pass DFT_SCALE to one of dft or idft explicitly to make these transforms mutually inverse.
Therefore something liek this would fix it:
icvdft=cv.idft(dft_array,flags=cv.DFT_SCALE)
My book says this about the Image Kernel concept in OpenCV
When a computation is done over a pixel neighborhood, it is common to
represent this with a kernel matrix. This kernel describes how the
pixels involved in the computation are combined in order to obtain the
desired result.
In image blur techniques, we use the kernel size.
cv::GaussianBlur(inputImage,outputImage,Size(1,1),0,0)
So, if I say the kernel size is Size(1,1) does that mean the kernel got only 1 pixel?
Please have a look at the following image
In here, what's the Kernel size? Size(3,3) ? If I say size Size(1,1) in this image, does that mean the kernel got only 1 pixel and the pixel value is 0 (The first value in the image)?
The kernel size in the example image you gave is 3-by-3 (Size(3,3)), yes. A kernel size of 1-by-1 is valid, although it wouldn't be very interesting.
The generic name for the operation being performed by GaussianBlur is a convolution.
The GaussianBlur function is creating a Gaussian kernel, which is basically a matrix that represents how you should combine a window of n-by-n pixels to get a single pixel value (using a Gaussian-shaped blurring pattern in this case).
A kernel of size 1-by-1 can't do anything other than scalar multiplication of an image; that is, convolution by the 1-by-1 matrix [c] is just c * inputImage.
Typically, you'll want to choose a n-by-n Gaussian kernel that satisfies:
spread of Gaussian (i.e. standard deviation or variance) such that it blurs the amount you want
larger number means more blurring; smaller number means less blurring
choose n sufficiently large as to not truncate the Gaussian too close to the mode
Links:
Convolution (Wikipedia)
Gaussian blur (Wikipedia)
this section in particular
The image you post is a 3x3 kernel, which would be specified by cv::Size(3,3). You are correct in saying that cv::Size(1,1) corresponds to a single pixel, but saying "cv::Size(1,1)" in reference to the image is not meaningful. A 1x1 kernel would simply have the value [1].
This image is a kernel and it's size is 3x3. Kernels are applied to image by multiplying corresponding pixel values and getting sum of 9 results. This is called convolution / filtering in literature. You can look at following resources for more information :
http://en.wikipedia.org/wiki/Kernel_(image_processing)
http://homepages.inf.ed.ac.uk/rbf/HIPR2/filtops.htm
http://www.cse.usf.edu/~r1k/MachineVisionBook/MachineVision.files/MachineVision_Chapter4.pdf