OpenCV: Understanding Kernel - c++

My book says this about the Image Kernel concept in OpenCV
When a computation is done over a pixel neighborhood, it is common to
represent this with a kernel matrix. This kernel describes how the
pixels involved in the computation are combined in order to obtain the
desired result.
In image blur techniques, we use the kernel size.
cv::GaussianBlur(inputImage,outputImage,Size(1,1),0,0)
So, if I say the kernel size is Size(1,1) does that mean the kernel got only 1 pixel?
Please have a look at the following image
In here, what's the Kernel size? Size(3,3) ? If I say size Size(1,1) in this image, does that mean the kernel got only 1 pixel and the pixel value is 0 (The first value in the image)?

The kernel size in the example image you gave is 3-by-3 (Size(3,3)), yes. A kernel size of 1-by-1 is valid, although it wouldn't be very interesting.
The generic name for the operation being performed by GaussianBlur is a convolution.
The GaussianBlur function is creating a Gaussian kernel, which is basically a matrix that represents how you should combine a window of n-by-n pixels to get a single pixel value (using a Gaussian-shaped blurring pattern in this case).
A kernel of size 1-by-1 can't do anything other than scalar multiplication of an image; that is, convolution by the 1-by-1 matrix [c] is just c * inputImage.
Typically, you'll want to choose a n-by-n Gaussian kernel that satisfies:
spread of Gaussian (i.e. standard deviation or variance) such that it blurs the amount you want
larger number means more blurring; smaller number means less blurring
choose n sufficiently large as to not truncate the Gaussian too close to the mode
Links:
Convolution (Wikipedia)
Gaussian blur (Wikipedia)
this section in particular

The image you post is a 3x3 kernel, which would be specified by cv::Size(3,3). You are correct in saying that cv::Size(1,1) corresponds to a single pixel, but saying "cv::Size(1,1)" in reference to the image is not meaningful. A 1x1 kernel would simply have the value [1].

This image is a kernel and it's size is 3x3. Kernels are applied to image by multiplying corresponding pixel values and getting sum of 9 results. This is called convolution / filtering in literature. You can look at following resources for more information :
http://en.wikipedia.org/wiki/Kernel_(image_processing)
http://homepages.inf.ed.ac.uk/rbf/HIPR2/filtops.htm
http://www.cse.usf.edu/~r1k/MachineVisionBook/MachineVision.files/MachineVision_Chapter4.pdf

Related

Resize a kernel

For learning purposes I am implementing a blur function. I have it working but I want to resize my kernel to achieve a more blurred affect.
If I scale up my kernel will that indeed create a more blurred affect? And how can I resize my kernel?
I have tried to resize the kernel using resize but that results in a white image.
// create blur kernel
float kdata[] = { 0.0625f, 0.125f, 0.0625f, 0.125f, 0.25f, 0.125f, 0.0625f, 0.125f, 0.0625f };
Mat kernel(3, 3, CV_32F, kdata);
// resize kernel to 9x9 to create a more blurred effect
resize(kernel, kernel, {9,9});
// output is white, whats going wrong?
filter2D(src, output, -1, kernel);
Going a bit to the basics, a kernel is a matrix that is convoluted with your image.
The operation of convolution picks a pixel in each matrix, overlaps the kernel at the kernel's anchor point (usually the middle), and you sum all the values in the image weighted by the values in the kernel.
For example, imagine you had the kernel:
1 0 -1
0 0 0
-1 0 1
(only for demonstration purposes - the values are random)
With the anchor point at the center. Then, filter2D would take all the pixels in the image and overlap the kernel. At each pixel, it would add the upper left and the lower right pixels and subtract the upper right and the lower left pixels, as indicated by the weights in the kernel.
Now, to achieve a greater blur, you need to have a bigger kernel. You cannot simply resize the kernel - the resize function is to change the size of the images. For the kernel, you need to compute the values of the bigger kernel - keep in mind that the kernel is a matrix with special values, not an image.
What a kernel for Gaussian blur does is to have the values carefully chosen (according to a Gaussian distribution) such that the center pixel (the initial value) has the biggest contribution to the final pixel, but also the surrounding pixels get added, with lesser and lesser weights. The contribution of the surrounding pixels, their weights, are tuned by the sigma parameter of the Gaussian. This parameter indicates how fast the gaussian's value drop.
In the end, you need to calculate the values for your kernel, considering the sigma and the size of the kernel you want. This is done either manually (pen and paper), or use a calculator such as this one: http://dev.theomader.com/gaussian-kernel-calculator/.

Normalizing output of opencv idft call

I am using opencv to compute a butterworth filter of an image. The image in questions is a physical parameter, i.e. the pressure, in some units, at every nodal point. It is not just gray scale or color values.
I have followed the examples here: http://docs.opencv.org/2.4/doc/tutorials/core/discrete_fourier_transform/discrete_fourier_transform.html
http://breckon.eu/toby/teaching/dip/opencv/lecture_demos/c++/butterworth_lowpass.cpp
I have successfully implemented this filter. I.E. I can DFT, create the filter kernel, apply it, and inverse Fourier transform back.
However, the magnitude of the values after the idft are completely off.
In particular, I replicate lines of code that can be found in both the above links:
// Perform Inverse Fourier Transform
idft(complexImg, complexImg);
split(complexImg, planes);
imgOutput = planes[0].clone();
In the above code segment,
1.) I compute the idft of complexImg and save it to complexImg.
2.) I split complexImg into real and imaginary parts (which is saved in planes[0] and planes[1], respectively)
3.) I save the save the real part to imgOutput as my original image was real.
However, if the original image, i.e. imgInput had a mean value of the order of O(10^-1), imgOutput has a mean value of the order of O(10^4 to 10^5). It seems some type of normalization is needed? In the above example links, the values are normalized between 0 and 1 for viewing purposes, but that is not what I need.
Any help will be appreciated.
Thank you.
The problem was solved by normalizing by 2*N, where N is the number of pixels in the image.
i.e.
imgOutput = imgOutput/imgOutput.cols/imgOutput.rows/2;
According to the documentation: https://docs.opencv.org/2.4/modules/core/doc/operations_on_arrays.html#idft
Note
None of dft and idft scales the result by default. So, you should pass DFT_SCALE to one of dft or idft explicitly to make these transforms mutually inverse.
Therefore something liek this would fix it:
icvdft=cv.idft(dft_array,flags=cv.DFT_SCALE)

Understanding OpenCV image smoothing

This question is about this tutorial http://docs.opencv.org/doc/tutorials/imgproc/gausian_median_blur_bilateral_filter/gausian_median_blur_bilateral_filter.html#smoothing
In that code, all the smoothing methods are running inside a loop for MAX_KERNEL_LENGTH times. What is this kernel?
To calculate a smoothing for example an average is calculated for the closest pixels. Which and how many pixels that are given by this kernel. The kernel also contains information about weighting of the pixels.
The kernel is most often represented as a matrix (and in this case also) which is centered at each pixel that is the average is calculated for. The calculating looks like this in pseudo c++ code.
for(int i=0;i<src.rows;i++){
for (int j=0;j<src.cols;j++){
dst[i][j]=0;
for(int kernel_i=0;i<kernel.rows;i++){
for (int kernel_j=0;j<kernel.cols;j++){
dst[i][j]+=
src[i-kernel.rows+kernel_i][j-kernel.cols+kernel_j]*
kernel[kernel_i][kernel_j];
}
}
}
}
The variable mentioned as MAX_KERNEL_LENGTH is simply the biggest size of the matrix creating one such kernel.
The MAX_KERNEL_LENGTH is defined as a constant (31) in the code. It is used to change the kernel size from 1x1 to 31x31 to show the effect of different kernel sizes in different blurring algorithms used in the tutorial.

openCV filter image - replace kernel with local maximum

Some details about my problem:
I'm trying to realize corner detector in openCV (another algorithm, that are built-in: Canny, Harris, etc).
I've got a matrix filled with the response values. The biggest response value is - the biggest probability of corner detected is.
I have a problem, that in neighborhood of a point there are few corners detected (but there is only one). I need to reduce number of false-detected corners.
Exact problem:
I need to walk through the matrix with a kernel, calculate maximum value of every kernel, leave max value, but others values in kernel make equal zero.
Are there build-in openCV functions to do this?
This is how I would do it:
Create a kernel, it defines a pixels neighbourhood.
Create a new image by dilating your image using this kernel. This dilated image contains the maximum neighbourhood value for every point.
Do an equality comparison between these two arrays. Wherever they are equal is a valid neighbourhood maximum, and is set to 255 in the comparison array.
Multiply the comparison array, and the original array together (scaling appropriately).
This is your final array, containing only neighbourhood maxima.
This is illustrated by these zoomed in images:
9 pixel by 9 pixel original image:
After processing with a 5 by 5 pixel kernel, only the local neighbourhood maxima remain (ie. maxima seperated by more than 2 pixels from a pixel with a greater value):
There is one caveat. If two nearby maxima have the same value then they will both be present in the final image.
Here is some Python code that does it, it should be very easy to convert to c++:
import cv
im = cv.LoadImage('fish2.png',cv.CV_LOAD_IMAGE_GRAYSCALE)
maxed = cv.CreateImage((im.width, im.height), cv.IPL_DEPTH_8U, 1)
comp = cv.CreateImage((im.width, im.height), cv.IPL_DEPTH_8U, 1)
#Create a 5*5 kernel anchored at 2,2
kernel = cv.CreateStructuringElementEx(5, 5, 2, 2, cv.CV_SHAPE_RECT)
cv.Dilate(im, maxed, element=kernel, iterations=1)
cv.Cmp(im, maxed, comp, cv.CV_CMP_EQ)
cv.Mul(im, comp, im, 1/255.0)
cv.ShowImage("local max only", im)
cv.WaitKey(0)
I didn't realise until now, but this is what #sansuiso suggested in his/her answer.
This is possibly better illustrated with this image, before:
after processing with a 5 by 5 kernel:
solid regions are due to the shared local maxima values.
I would suggest an original 2-step procedure (there may exist more efficient approaches), that uses opencv built-in functions :
Step 1 : morphological dilation with a square kernel (corresponding to your neighborhood). This step gives you another image, after replacing each pixel value by the maximum value inside the kernel.
Step 2 : test if the cornerness value of each pixel of the original response image is equal to the max value given by the dilation step. If not, then obviously there exists a better corner in the neighborhood.
If you are looking for some built-in functionality, FilterEngine will help you make a custom filter (kernel).
http://docs.opencv.org/modules/imgproc/doc/filtering.html#filterengine
Also, I would recommend some kind of noise reduction, usually blur, before all processing. That is unless you really want the image raw.

Scaling performed by gpu::dft with OpenCV in C++

I want to use a GPU-accelerated algorithm, to perform a fast and memory saving dft. But, when I perform the gpu::dft, the destination matrix is scaled as it is explained in the documentation. How I can avoid this problem with the scaling of the width to dft_size.width / 2 + 1? Also, why is it scaled like this? My Code for the DFT is this:
cv::gpu::GpuMat d_in, d_out;
d_in = in;
d_out.create(d_in.size(), CV_32FC2 );
cv::gpu::dft( d_in, d_out, d_in.Size );
where in is a CV_32FC1 matrix, which is 512x512.
The best solution would be a destination matrix which has the size d_in.size and the type CV_32FC2.
This is due to complex conjugate symmetry that is present in the output of an FFT. Intel IPP has a good description of this packing (the same packing is used by OpenCV). The OpenCV dft function also describes this packing.
So, from the gpu::dft documentation we have:
If the source matrix is complex and the output is not specified as real, the destination matrix is complex and has the dft_size size and CV_32FC2 type.
So, make sure you pass a complex matrix to the gpu::dft function if you don't want it to be packed. You will need to set the second channel to all zeros:
Mat realData;
// ... get your real data...
Mat cplxData = Mat::zeros(realData.size(), realData.type());
vector<Mat> channels;
channels.push_back(realData);
channels.push_back(cplxData);
Mat fftInput;
merge(channels, fftInput);
GpuMat fftGpu(fftInput.size(), fftInput.type());
fftGpu.upload(fftInput);
// do the gpu::dft here...
There is a caveat though...you get about a 30-40% performance boost when using CCS packed data, so you will lose some performance by using the full-complex output.
Hope that helps!
Scaling is done for obtaining the result within the range of +/- 1.0. This is the most useful form for most applications that need to deal with frequency representation of the data. For retrieving a result which is not scaled just don't enable the DFT_SCALE flag.
Edit
The width of the result is scaled, because it is symmetric. So all you have to do is append the former values in a symmetric fashion.
The spectrum is symmetric, because at half of the width the sampling theorem is fulfilled. For example a 2048 point DFT for a signal source with a samplerate of 48 kHz can only represent values up to 24 kHz and this value is represented at half of the width.
Also for reference take a look at Spectrum Analysis Using the Discrete Fourier Transform.